05 Diagnostic Test of CLRM 2
05 Diagnostic Test of CLRM 2
05 Diagnostic Test of CLRM 2
Multicollinearity
Model Specifictions Error
Topics
Multicollinearity
Under/ Overfitting a model
Functional Form
Errors on Measurement
07/04/15
MULTICOLLINEARITY
Nature of Multicollinearity
Multicollinearity
Perfect multicollinearity
( X ' X ) 1 X ' y
Sources of Multicollinearity
VIF
Detection of Multicollinearity
Scatterplot
Maximum eigenvalue
Minimum eigenvalue
If the VIF of a variable higher than 10, that variable said be highly collinear
2.
3.
4.
5.
6.
yt 1 2 X 2t 3 X 3t u1t
yt 1 2 X 2 t u 2 t
u 2t 3 X 3t u1t
(1)
(2)
(3)
Consequences
If the omitted variable is correlated with the included variable in regression (the correlation
coefficient between the two variables is nonzero), then parameter (intercept and coefficient)
are biased as well as inconsistent.
2
var( 2 )
x22t
Detection
where b32
x x
x
3t 3 t
2
2t
If the omitted variable is not correlated with the included variable in regression (the
correlation coefficient between the two variables is zero), intercept still biased.
Variance of the regression is incorrectly estimated.
Variance of the intercept and other coefficients are biased.
E ( 2 ) 2 3b32
var(2 )
x22t (1 r2,3 )
Consequences
yt 1 2 X 2t u1t
yt 1 2 X 2 t 3 X 3 t u 2 t
(1)
(2)
var(2 )
x22t
2
var( 2 )
x22t (1 r2,3 )
then,
var( 2 )
1
var(2 ) 1 r23
ln
(yt)
xt
u
1
2
t
Log-Linear:
causes a 100x2% increase in yt.
a 1-unit increase in x 2t
(x2t)ut
t
1
2ln
Linear-Log:
a 0.01x2 increase in yt.
a 1% increase in x 2t causes
ln(y
(x2t)ut
t)
1
2ln
Double Log:
causes a 2 % increase in yt.
a 1% increase in x 2t
2.
2
Regress and obtain the estimated yt and y t values
Obtain R2 from this regression. The test statistic is given by TR2 and is distributed as a
. y X y
2 y 3 u
t
2t
3 t
3 t
So if the value of the test statistic is greater than a ( p 1) then reject the null
yt 1 2 X 2t 3 y t2 3 y t3 ut
2 ( p 1)
yt Axt e ut ln yt ln xt ut
We can test this implicit assumption using parameter stability tests. The
idea is essentially to split the data into sub-periods and then to estimate up
to three models, for each of the sub-parts and for all the data and then to
compare the RSS of the models.
Problem with the Chow test is that we need to have enough data to do the
regression on both sub-samples, i.e. T1>>k, T2>>k.
An alternative formulation is the predictive failure test.
What we do with the predictive failure test is estimate the regression over a long
sub-period (i.e. most of the data) and then we predict values for the other period
and compare the two.
RSS1
T2
where T2 = number of observations we are attempting to predict. The test statistic
will follow an F(T2, T1-k).
= 0.164
0.0420
24
600
400
200
Sample Period
443
417
391
365
339
313
287
261
235
209
183
157
131
79
105
0
53
800
27
1000
1200
t)
1400
Value of Series (y
Skewness and kurtosis are the (standardised) third and fourth moments of a
distribution.
Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
~ 2
24
6
Y
ZN
N
has a probability distribution that converges to the standard
normal
N(0, 1) as N
Y ~,
N
Example of CLT
otherwise
First step is to form a large model with lots of variables on the right hand
side
This is known as a GUM (generalised unrestricted model)
At this stage, we want to make sure that the model satisfies all of the
assumptions of the CLRM
If the assumptions are violated, we need to take appropriate actions to
remedy this, e.g.
- taking logs
- adding lags
- dummy variables
We need to do this before testing hypotheses
Once we have a model which satisfies the assumptions, it could be very big
with lots of lags & independent variables
Questions?