Heteroskedasticity and Autocorrelation Consistent Standard Errors

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Heteroskedasticity and Autocorrelation

Consistent Standard Errors


Xiang Ao
January 27, 2009

1 Properties of OLS estimator


1.1 When is OLS estimator unbiased?
In a linear model
y = Xβ + u, (1)
the OLS estimator is
β̂ = (X0 X)−1 X0 y
Since y = Xβ + u,

β̂ = β + (X0 X)−1 X0 u. (2)


This makes
E(β̂) = β + (X0 X)−1 X0 E(u|X). (3)
The condition that makes the OLS estimator unbiased is:
E(u|X) = 0, (4)
that is, the explanatory variables which form the columns of X are exoge-
nous. This condition is weaker than the independence condition that u and X
are independent.
In the context of cross-sectional data, this assumption is plausible. However,
when we have time series data, the assumption becomes strong, because it
assumes that the entire series of X has no relationship with the error term. In
a time series context, this is hard to satisfy. The OLS estimator is biased if
condition 4 is not satisfied.
For example, suppose we have a model

yt = β1 + β2 yt−1 + ut , ut ∼ IID(0, σ 2 ).
In this simple model, even if we assume that yt−1 and ut are uncorrelated,
OLS estimator is still biased. That is because condition 4 is not satisfied: yt−1
depends on ut−1 , ut−2 and so on. Assumption 4 does not hold for regressions
with lagged dependent variables. Models with time series data are likely to
violate assumption 4.

1
1.2 When is OLS estimator consistent?
For OLS estimator to be consistent, a much weaker condition is needed:
E(ut |Xt ) = 0, (5)
This condition is much weaker since it only assumes that the mean of current
error term does not depend on the current predictors. Even a model with
lagged dependent variable can easily satisfy this condition. Condition 5 is called
predeterminedness condition, or say regressors are predetermined.

2 Estimation of Variance
If the OLS estimator is unbiased (that is, if X are exogenous), and suppose that
data is generated by this model:
the linear regression models is

y = Xβ + u, E(u|X) = 0, E(uu0 ) = Ω, (6)


that is, the error terms are independently but not identically distributed,
then the following holds:

Var(β̂) = E[(β̂ − β)(β̂ − β)0 ] (7)


= [(X0 X)−1 X0 (E(uu0 ))X(X0 X)−1 ]
= (X0 X)−1 X0 ΩX(X0 X)−1
In the case of IID, Ω is identity matrix, and

var(β̂) = σ 2 (X0 X)−1


In more general situations, the error terms are not IID.

2.1 White’s estimator


In the case of heteroskedatic errors, Ω is the error variance-covariance matrix
with diagonal elements being σt2 for tth element, off-diagonal elements being
zero.
If we know σt2 , then we would be able to estimate this ”sandwich covariance
matrix”. But we don’t.
1 1 0 1 1
Var(β̂) = [ (X X)]−1 [ X0 ΩX][ (X0 X)]−1
n n n n
Let yt denote the tth observation on the dependent variable, and x0t =
[1 x2t · · · xkt ] denote the tth row of the X matrix. Then
n
X
X0 ΩX = σt2 xt x0t (8)
t=1

2
There are n distinct σt2 to estimate, the problem seems hopeless: with n goes
to infinity, the number of parameters need to estimate also goes to infinity.
White (1980) shows that we don’t need to do that: all we need to do is to get a
consistent estimator for X0 ΩX, which is k × k and symmetric. It has 21 (k 2 + k)
distinct elements. The White estimator replaces the unknown σt2 by û2t , the
estimated OLS residuals. This provides a consistent estimator of the variance
matrix for the OLS coefficient vector and is particularly useful since it does not
require any specific assumptions about the form of the heteroscedasticity. This
type of estimator is also called heteroskedasticity-consistent covariance matrix
estimator.

ˆ β̂) 1 1 0 1 1
Var( = [ (X X)]−1 [ X0 Ω̂X][ (X0 X)]−1 (9)
n n n n
Ω̂ = diag(û21 , û22 , · · · , û2n ) (10)

2.2 Newey-West estimator


White’s estimator deals with the situation that we have heteroskedasticity (a
diagonal Σ) of unknown form. When we have serial correlation of unknown
form (a non-diagonal Σ), we can estimate the variance-covariance matrix by a
heteroskedasticity and autocorrelation consistent, or HAC, estimator. Newey-
West estimator is the most popular HAC estimator. It’s not as straightforward
as White’s estimator to illustrate, but I’ll try to summarize.

2.2.1 Consistent Estimation of the Variance of the Sample Mean


Given a time series data set, suppose we are interested in estimating the mean
vector (suppose we have more than one variable) and its variance. We know
that given IID data, we can apply central limit theorem: sample mean is a
consistent estimator of the population mean and it’s variance can be calculated
since asymptotically the sample mean conforms to a normal distribution and
the variance can be estimated, relatively easily. However, in the case of time
series data, autocorrelation usually exists. We may be concerned the CLT may
not work in this case.
Fortunately, as proved in Hamilton (1994), if yt is a covariance-stationary
(meaning that the covariance is not a function of time) vector process, then the
sample mean satisfies:
1.
ȳt → µ,

2.

X
S = lim T · E[(ȳT − µ)(ȳT − µ)0 ] = Γv .
T →∞
v=−∞

where Γv is the variance-covariance matrix for yt and yt−v .

3
The first one says for a covariance-stationary vector process, the law of large
numbers still holds. The second one is used to calculate the standard error.
If the data were generated by a vector MA(q) process, then
q
X
S= Γv .
v=−q

A natural estimate is
q
X
Ŝ = Γ̂0 + (Γ̂v + Γ̂0v ),
v=1

where
T
X
Γ̂v = (1/T) (yt − ȳ)(yt−v − ȳ).
t=v+1

This gives a consistent estimate of S; however, it sometimes is not positive


semidefinite.
Newey-West (1987) suggested putting in a weight:
q
X v
Ŝ = Γ̂0 + (1 − )(Γ̂v + Γ̂0v ),
v=1
q+1

where q is from the MA(q) process.

2.2.2 Newey-West estimator for linear regressions


Consider a linear regression model:

yt = x0t β + ut

Suppose we have the OLS estimator bT , then


T T
√ X √ X
T (bt − β) = [(1/T ) xt x0t ]−1 [( T ) xt u0t ]
t=1 t=1

The first term converges in probability to some constant. The second term is
the sample mean of the vector xt ut .
Under general conditions,

T (bt − β) →L N (0, Q−1 SQ−1 )
where S can be estimated by
q
X v
ŜT = Γ̂0T + (1 − )(Γ̂v,T + Γ̂0v,T ),
v=1
q+1

4
where
T
X
Γ̂0v,T = (1/T ) (xt ût,T ût−v,T x0t−v ),
t=v+1

where ût,T is the OLS residual for data t in a sample of size T .


Overall, the variance of bT is approximated by

T T q T T
X X X v X X
Σ̂N W = [ xt x0t ][ û2t xt x0t + (1 − ) (xt ût,T ût−v,T x0t−v +xt−v ût−v,T ût,T x0t )][ xt x0t ]−1
t=1 t=1 v=1
q + 1 t=v+1 t=1

This estimation obviously depends on the selection of q, the lag length


beyond which we are willing to assume that the autocorrelation of xt ut and
1
xt−v ut−v is essentially zero. The rule of thumb for the selection of q is 0.75 · T 3 .
Newey-West (1994) has suggested a way to automatically select the bandwidth
q. Here we omitted the discussion. Both Stata and R now also implement
Newey-West (1994) estimator, with no need to specify q.

2.2.3 Implementation
Stata has newey and newey2 implemented for cross-sectional data. For panel
data, it has xtivreg2 which implements Newey-West (1994) estimator with au-
tomatic bandwidth selection.
R has a library called sandwich which implements different HAC estimators,
including Newey-West.

You might also like