Heteroskedasticity and Autocorrelation Consistent Standard Errors
Heteroskedasticity and Autocorrelation Consistent Standard Errors
Heteroskedasticity and Autocorrelation Consistent Standard Errors
yt = β1 + β2 yt−1 + ut , ut ∼ IID(0, σ 2 ).
In this simple model, even if we assume that yt−1 and ut are uncorrelated,
OLS estimator is still biased. That is because condition 4 is not satisfied: yt−1
depends on ut−1 , ut−2 and so on. Assumption 4 does not hold for regressions
with lagged dependent variables. Models with time series data are likely to
violate assumption 4.
1
1.2 When is OLS estimator consistent?
For OLS estimator to be consistent, a much weaker condition is needed:
E(ut |Xt ) = 0, (5)
This condition is much weaker since it only assumes that the mean of current
error term does not depend on the current predictors. Even a model with
lagged dependent variable can easily satisfy this condition. Condition 5 is called
predeterminedness condition, or say regressors are predetermined.
2 Estimation of Variance
If the OLS estimator is unbiased (that is, if X are exogenous), and suppose that
data is generated by this model:
the linear regression models is
2
There are n distinct σt2 to estimate, the problem seems hopeless: with n goes
to infinity, the number of parameters need to estimate also goes to infinity.
White (1980) shows that we don’t need to do that: all we need to do is to get a
consistent estimator for X0 ΩX, which is k × k and symmetric. It has 21 (k 2 + k)
distinct elements. The White estimator replaces the unknown σt2 by û2t , the
estimated OLS residuals. This provides a consistent estimator of the variance
matrix for the OLS coefficient vector and is particularly useful since it does not
require any specific assumptions about the form of the heteroscedasticity. This
type of estimator is also called heteroskedasticity-consistent covariance matrix
estimator.
ˆ β̂) 1 1 0 1 1
Var( = [ (X X)]−1 [ X0 Ω̂X][ (X0 X)]−1 (9)
n n n n
Ω̂ = diag(û21 , û22 , · · · , û2n ) (10)
2.
∞
X
S = lim T · E[(ȳT − µ)(ȳT − µ)0 ] = Γv .
T →∞
v=−∞
3
The first one says for a covariance-stationary vector process, the law of large
numbers still holds. The second one is used to calculate the standard error.
If the data were generated by a vector MA(q) process, then
q
X
S= Γv .
v=−q
A natural estimate is
q
X
Ŝ = Γ̂0 + (Γ̂v + Γ̂0v ),
v=1
where
T
X
Γ̂v = (1/T) (yt − ȳ)(yt−v − ȳ).
t=v+1
yt = x0t β + ut
The first term converges in probability to some constant. The second term is
the sample mean of the vector xt ut .
Under general conditions,
√
T (bt − β) →L N (0, Q−1 SQ−1 )
where S can be estimated by
q
X v
ŜT = Γ̂0T + (1 − )(Γ̂v,T + Γ̂0v,T ),
v=1
q+1
4
where
T
X
Γ̂0v,T = (1/T ) (xt ût,T ût−v,T x0t−v ),
t=v+1
T T q T T
X X X v X X
Σ̂N W = [ xt x0t ][ û2t xt x0t + (1 − ) (xt ût,T ût−v,T x0t−v +xt−v ût−v,T ût,T x0t )][ xt x0t ]−1
t=1 t=1 v=1
q + 1 t=v+1 t=1
2.2.3 Implementation
Stata has newey and newey2 implemented for cross-sectional data. For panel
data, it has xtivreg2 which implements Newey-West (1994) estimator with au-
tomatic bandwidth selection.
R has a library called sandwich which implements different HAC estimators,
including Newey-West.