Chapter11 Econometrics SpecificationerrorAnalysis
Chapter11 Econometrics SpecificationerrorAnalysis
Chapter11 Econometrics SpecificationerrorAnalysis
The specification of a linear regression model consists of a formulation of the regression relationships and of
statements or assumptions concerning the explanatory variables and disturbances. If any of these is violated,
e.g., incorrect functional form, incorrect introduction of disturbance term in the model etc., then
specification error occurs. In narrower sense, the specification error refers to explanatory variables.
The complete regression analysis depends on the explanatory variables present in the model. It is understood
in the regression analysis that only correct and important explanatory variables appears in the model. In
practice, after ensuring the correct functional form of the model, the analyst usually has a pool of
explanatory variables which possibly influence the process or experiment. Generally, all such candidate
variables are not used in the regression modeling but a subset of explanatory variables is chosen from this
pool.
While choosing a subset of explanatory variables, there are two possible options:
1. In order to make the model as realistic as possible, the analyst may include as many as
possible explanatory variables.
2. In order to make the model as simple as possible, one may include only fewer number of
explanatory variables.
Now we discuss the statistical consequences arising from the both situations.
X =X 1 X 2 and β β1 β2 .
n× k
n× r n× ( k − r ) r×1 ( k − r )×1)
X β + ε , E (ε ) =
The model y = 0, V (ε ) =
σ 2 I can be expressed as
y = X 1β1 + X 2 β 2 + ε
which is called as full model or true model.
After dropping the r explanatory variable in the model, the new model is
=y X 1β1 + δ
which is called as misspecified model or false model.
y′H1 y = ( X 1β1 + X 2 β 2 + ε ) H1 ( X 2 β 2 + ε )
= ( β 2' X 2 H1' H1 X 2 β 2 + β 2' X 2' H1ε + β 2' X 2 H1' X 2 β 2 + β1' X 1' H1ε + ε ' H1' X 2 β 2 + ε ' H1ε ).
1
=E (s 2 ) E ( β 2' X 2' H1 X 2 β 2 ) + 0 + 0 + E (ε ' H ε )
n−r
1
= β 2' X 2' H1 X 2 β 2 ) + (n − r )σ 2
n−r
1
= σ2 + β 2' X 2' H1 X 2 β 2 .
n−r
Thus s2 is a biased estimator of σ 2 and s 2 provides an over estimate of σ 2 . Note that even if
X 1' X 2 = 0, then also s 2 gives an overestimate of σ 2 . So the statistical inferences based on this will be
faulty. The t -test and confidence region will be invalid in this case.
If the response is to be predicted at x ' = ( x1' , x2' ), then using the full model, the predicted value is
Thus ŷ1 is a biased predictor of y . It is unbiased when X 1' X 2 = 0. The MSE of predictor is
Also
Var ( yˆ ) ≥ MSE ( yˆ1 )
so bF is unbiased even when some irrelevant variables are added to the model.
V (bF ) =E ( bF − β )( bF − β )
1
= σ 2 ( X ' H Z X ) X ' H Z IH Z X ( X ' H Z X )
−1 −1
= σ 2 ( X ' HZ X ) .
−1
with E (bT ) = β
Result: If A and B are two positive definite matrices then A − B is atleast positive semi definite if
Let
A = ( X ' H Z X ) −1
B = ( X ' X ) −1
B −1 − A−1 = X ' X − X ' H Z X
= X ' X − X ' X + X ' Z ( Z ' Z ) −1 Z ' X
= X ' Z ( Z ' Z ) −1 Z ' X
which is atleast positive semi definite matrix. This implies that the efficiency declines unless X ' Z = 0. If
X ' Z = 0, i.e., X and Z are orthogonal, then both are equally efficient.
The residual sum of squares under false model is
SS res = eF' eF
where
eF =−
y XbF − ZCF
bF = ( X ' H Z X ) −1 X ' H Z y
=cF ( Z ' Z ) −1 Z ' y − ( Z ' Z ) −1 Z ' XbF
= ( Z ' Z ) −1 Z '( y − XbF )
= ( Z ' Z ) −1 Z ' I − X ( X ' H Z X ) −1 X ' H z y
= ( Z ' Z ) −1 Z ' H XZ y
H Z = I − Z ( Z ' Z ) −1 Z '
H Zx= I − X ( X ' H Z X ) −1 X ' H Z
2
H ZX = H ZX : idempotent.