1 PB3
1 PB3
1 PB3
net/publication/353213860
Article in Enthusiastic International Journal of Applied Statistics and Data Science · April 2021
DOI: 10.20885/enthusiastic.vol1.iss1.art1
CITATIONS READS
14 4,128
3 authors, including:
All content following this page was uploaded by Dewi Wulandari on 26 November 2021.
1. Introduction
In linear model, normality assumption should be met [3], and we agree that linear regression is
of no exception. What if this assumption is not fulfilled? Some previous researchers discussed the
unfulfilled normality assumption. On this basis, we take 3 statements about violation of normality
assumptions by some researchers: 1) Non-normality distributed variables can distort relationships
and significance test [4]; 2) The violation of assumption can have detrimental effects to the result
and future directions of any analysis [5]; 3) Violation of the normality assumption may lead to the
use of sub optimal estimation, invalid inferential statements, and inaccurate predictions [6].
To satisfy the normality assumption for the sake of dependent variables or response variables as
stated in [7], we need to ensure that every value of independent variable, including the corresponding
dependent value follows the normal distribution. It shall apply not only to response variables but also
to residuals. However, in regression model, it is estimated that using OLS requires assumption of
normally distributed error, not the assumption of normally distributed response or predictor variables
[8]. Even though we object to that statement, because we are yet to study that in-depth, in this paper
we demonstrate only the normality assumption for residuals or errors.
There are many methods to assess normality assumption. One of those methods is skewness and
kurtosis test. Kim [9] stated that there is no current gold standard method to assess normality of data.
Saphiro-Wilk test and Kolmogorov-Smirnov test are regarded as an unreliable tests for large samples,
while Skewness and kurtosis test may be relatively correct in both small and large sample. The
importance of normality assumption not only lies in univariate models, but also in multivariate
models. Assumptions for a multivariate regression analysis are similar to the assumptions under the
univariate regression, but they are extended to a multivariate domain [10]. Multivariate skewness
and kutosis proposed by Mardia is one of the extended methods. Romeu and Ozturk [11] did 10 tests
for multivariate normality using skewness and kurtosis and the result showed that Mardia’s skewness
and kurtosis were the most stable and reliable. In this paper, we demonstrate the assessment
procedure of the normality assumption in multivariate regression using data and multivariate
regression model by Khasanah [12].
2. Methods
Using literature study method in this paper, we collected references, searched for the urgency of
normality assumption in regression analysis, examined the procedure of skewness and kurtosis test
to do assessment, applied it to the multivariate regression case, and showed how to use the statistic
software to help in numerical calculations.
(a) (b)
Kurtosis is measure of a peakedness of a distribution. The data which have high kurtosis value will
have heavy tails and the kurtosis is known as leptokurtic kurtosis. Meanwhile, kurtosis with the
opposite condition is defined as platykurtic kurtosis. Fig. 2 shows us the peakedness of a distribution.
Kim [9] applied a z-test for univariate normality test using skewness and kurtosis. The statistic
value are expressed in (7) and (8).
Skew Value
zS = (7)
SEskewness
Excess Kurtosis
zK = (8)
SEexcesskurtsis
where 𝑧𝑆 is a z-score for skewness and 𝑧𝐾 is a z-score for kurtosis. When given observations data
(X1 ,X2 ,X3 ,…,Xn ) the skew value formula is ( ∑ni=1(Xi -X ̅ )3 /n)/s, SEskewness formula is
∑ni=1(Xi -X
̅ )4
√(6n(n-1))/(n-2)(n+1)(n+3), the excess kurtosis formula is ( n
⁄s4 ) -3, and the SEexceskurtosis
formula is √6n⁄(n-2)(n-3)(n+3)(n+5). According to Kim [9], the critical value for rejecting the null
hypothesis needs to be differentiated based on the sample size:
1. For small samples (n<50), if the absolute z-score for skewness or kurtosis is less than 1.96,
corresponding to an alpha level of 0.05, then do not reject the null hypothesis. It means that the
sample is normally distributed.
2. For a medium-sized sample (50 ≤ n ≤ 300), do not reject the null hypothesis at an absolute z
value under 3.29, corresponding to an alpha level of 0.05, and conclude the sample distribution
is not normally distributed.
3. For sample sizes greater than 300 rely on the histogram and absolute values of skewness and
kurtosis regardless of the z value. Either an absolute skew value greater than 2 or an absolute
kurtosis greater than 7 can be used as a reference value to determine substantial non-normality.
Leptokurtic
Kurtosis
Platykurtic Kurtosis
Normal Kurtosis
Fig. 2. Kurtosis
The procedure for multivariate skewness and kurtosis test is discussed in [13]. The null hypothesis
stated that the sample is from the normal distribution population. The recommended significance level
is 5%. In this section, we determine the value of multivariate skewness and kurtosis and then the
critical value and decision is discussed in section 3.3.
In the case as determined in section 3.1., we have bivariate residuals. The multivariate skewness is
expressed in (9) by supposing that Xti =(X1i ,X2i ,…,Xpi ), i=1,2,…,n are n independent observations on
̅ t =(X
𝑋, X ̅ 1i ,X
̅ 2i ,…,X
̅ pi ) denote the sample mean matrix and 𝑺 denotes the covariance matrix [14].
1 n 3
b1,p = ̅ )t S-1 (Xj -X
∑ {(Xi -X ̅ )} (9)
n2 i,j=1
Then in multivariate regression case, we express the multivariate skewness as in (10).
1 n 3
b1,2 = ∑ {(ei -e̅)t S-1 (ej -e̅)} (10)
n2 i,j=1
where ei t =(e1i ,e2i ), i=1,2,…,148 are 148 residuals, ̅et =(e̅1 ,e̅2 ) denote the sample residuals mean
matrix.
Meanwhile, the multivariate kurtosis is expressed in (11).
1 n 3
b2,p = ̅ )t S-1 (Xi -X
∑ {(Xi -X ̅ )} (11)
n2 i=1
Therefore, in multivariate regression case, we express the kurtosis as presented in (12).
1 n 3
b2,2 = ∑ {(ei -e̅)t S-1 (ei -e̅)} (12)
n2 i=1
Zhang, et. al. [16] also developed online calculator to determine the value of Mardia’s skewness
and kurtosis. We accessed the link of this online calculator freely and obtained the result directly. The
link was retrieved from https://webpower.psychstat.org/models/kurtosis/. We chose our data file in
any kinds of file types, and clicked the ‘calculate’ button. The result of the calculation through this
calculator is presented in Fig.3.
Based in Fig 3., the value of b1,2 =2.620028 and b2,2 =11.528782. Stated in Mardia’s table in [17],
we have these critical values; b1,2,0.05,148 =0.4, lower b2,2,0.05,148 =6.858, and upper b2,2,0.05,148 =9.3. For
skewness, the sample is from multivariate normal distribution if the statistic value is less than critical
value, while for kurtosis, the sample is from normal distribution if the statistic value is between lower
critical value and upper critical value. Because the value of skewness is greater than 0.4 and kurtosis
value is not in range [6.858, 9.3], residuals in our case do not follow multivariate normal distribution.
Besides using Mardia’s table, we can also use the p-value taken from the result as in Fig.3. P-value
of skewness and kurtosis are less than the significance level. This result indicates that we reject the
null hypothesis. Based on the hypothesis stated in section 3.2, we can draw a conclusion that
multivariate normality assumption for residuals in our case is not fulfilled.
4. Conclusion
Based on the results and discussion, multivariate skewness and kurtosis test are adaptive enough
to be applied in multivariate regression. The more observations we have, the more difficult we
determine the value of skewness and kurtosis because we have to calculate as many combinations of
𝑖 and 𝑗 matrix operations as possible, including subtraction, addition, inversion and multiplication.
However, this difficulties are no longer a problem because we can use R, SPSS and SAS though macro
and the online calculator. Using these tools, we are not required to do the comparison with another
multivariate normality assumption test. Thus, it is expected that the future research of this comparison
topic gain a better result.
References
[1] Field. A, Discovering Statistics Using SPSS, London: Sage Publication, 2009.
[2] K.V. Mardia, “Measures of multivariate skewness and kurtosis with applications”, Biometrika, vol.57,
no.3, pp. 519-530, Dec 1970.
[3] E.C. Alexopoulos, “Introduction to multivariate regression analysis”, Hippokratia, vol.14, pp.23-28,
December 2010.
[4] J. Osborne and E. Waters, “Four assumptions of multiple regression that researchers should always test”,
Pract. Assess. Res. Eval., vol. 8, Jan 2002.
[5] D.S. Gregory and H.M. Jackson, “Logistic and linear regression assumptions: violation recognition and
control”, in Proc. Pharma. SAS Users Group (PharmaSUG), Jun. 16-19, Pennsylvania, 2019. p.21.
[6] K.R. Das and A.H.M.R. Imon, “A brief review of test for normality”, Am. J. Theor. Appl. Stat., vol. 5, no.
1, pp. 5-12. Jan. 2016.
[7] Budiyono, Statistika untuk Penelitian, Surakarta: UNS Press, 2009.
[8] M.Williams, C.A.G. Grajales, and D. Kurkiewicz, “Assumptions of multiple regression: correcting two
misconceptions”, Pract. Assess. Res. Eval., vol. 18, 2013.
[9] H.Y. Kim, “Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and
kurtosis”, Restor. Dent. Endo., vol. 38, no. 1, pp. 52-54, Feb. 2003.
[10] C.D. Lin, “Conducting test in multivariate regression”, in Proc. SAS Globar Forum. April 28-May 1,
Texas, p. 13.
[11] J.L. Romeu and A. Ozturk, “A comparative study of goodness-of-fit for multivariate normality”, J.
Multivar. Annal., vol.46, pp. 309-334, Aug. 1993.
[12] U. Khasanah, ”Penggunaan analisis regresi multivariat untuk memodelkan faktor-faktor yang
memperngaruhi hasil belajar”, B.S. thesis, Math. Educ., Universitas PGRI Semarang, Semarang, INA.
[13] A.C. Rencher, Multivariate Statistical Inference and Applications, Canada: John Wiley and Sons, Inc.
1998.
[14] K.V. Mardia, “Assessment of multinormality and robustness of Hotelling’s T2 test”, J. R. Stat. Soc. Series
C (Appl. Stats.),vol. 24, no. 2, pp. 163-171, 1975.
[15] M.K. Cain, Z. Zhang, and K.H. Yuan, “Univariate and multivariate skewness and kurtosis for measuring
normality: prevalence, influence and estimastion”, Behav. Res. Methods, vol. 17, Oct 2016.
[16] Z.J. Zhang, K.H. Yuan, Y.Mai, M.K. Cain, H.Du, G.Jiang, H.Liu, A.Santoso, M.yang, X.Wang, and D.
Mattew. “Univariate and Multivariate Skewness and Kurtosis Calculation”. Web power: Analysis Online.
Retrieved from https://webpower.psychstat.org/models/kurtosis/ on Feb 17, 2021.
[17] K.V. Mardia, “Application of some measure of multivariate skewness and kurtosis in testing normality
and robustness studies”, Sankhya: Indian J. Stat. Series B, vol. 36, no. 2, pp. 115-128, May 1974.