Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1
Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1
Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1
Heteroskedasticity
11.1 The Nature of Heteroskedasticity
In Chapter 3 we introduced the linear model
y = 1 + 2 x
(11.1.1)
Slide 11.1
Undergraduate Econometrics,2nd Edition-Chapter 11
(11.1.2)
We assumed the et were uncorrelated random error terms with mean zero and constant
variance 2. That is,
E (et ) = 0
var(et ) = 2
cov(ei , e j ) = 0
(11.1.3)
Including the standard errors for b1 and b2, the estimated mean function was
y t = 40.768+0.1283 xt
(11.1.4)
(22.139)(0.0305)
A graph of this estimated function, along with all the observed expenditure-income
points ( yt , xt ) , appears in Figure 11.1.
Slide 11.2
Undergraduate Econometrics,2nd Edition-Chapter 11
Notice that, as income (xt) grows, the observed data points ( yt , xt ) have a tendency to
deviate more and more from the estimated mean function.
The least squares residuals, defined by
et = yt b1 b2 xt
(11.1.5)
(11.1.6)
The information in Figure 11.1 suggests that the unobservable errors also increase in
absolute value as income ( xt ) increases.
Is this type of behavior consistent with the assumptions of our model?
Slide 11.3
Undergraduate Econometrics,2nd Edition-Chapter 11
The parameter that controls the spread of yt around the mean function, and measures the
uncertainty in the regression model, is the variance 2.
If the scatter of yt around the mean function increases as xt increases, then the
uncertainty about yt increases as xt increases, and we have evidence to suggest that
the variance is not constant.
Thus, we are questioning the constant variance assumption
var( yt ) = var(et ) = 2
(11.1.7)
The most general way to relax this assumption is to add a subscript t to 2, recognizing
that the variance can be different for different observations. We then have
var( yt ) = var(et ) = t2
(11.1.8)
In this case, when the variances for all observations are not the same, we say that
heteroskedasticity exists. Alternatively, we say the random variable yt and the random
error et are heteroskedastic.
Slide 11.4
Undergraduate Econometrics,2nd Edition-Chapter 11
Conversely, if (11.1.7) holds we say that homoskedasticity exists, and yt and et are
homoskedastic.
The heteroskedastic assumption is illustrated in Figure 11.2.
[Figure 11.2 here]
The existence of different variances, or heteroskedasticity, is often encountered when
using cross-sectional data.
11.2 The Consequences of Heteroskedasticity for the Least Squares Estimator
If we have a linear regression model with heteroskedasticity and we use the least
squares estimator to estimate the unknown coefficients, then:
1.The least squares estimator is still a linear and unbiased estimator, but it is no longer
best. It is no longer B.L.U.E.
Slide 11.5
Undergraduate Econometrics,2nd Edition-Chapter 11
2.The standard errors usually computed for the least squares estimator are incorrect.
Confidence intervals and hypothesis tests that use these standard errors may be
misleading.
Consider the model
yt = 1 + 2 xt + et
(11.2.1)
where
E (et ) = 0
var(et ) = t2
cov(ei , e j ) = 0
(i j)
(11.2.2)
where
wt =
xt x
( xt x )
Slide 11.6
Undergraduate Econometrics,2nd Edition-Chapter 11
= 2 + wt E (et ) = 2
(11.2.4)
The next result is that the least squares estimator is no longer best. The way we tackle
this question is to derive an alternative estimator which is the best linear unbiased
estimator. This new estimator is considered in Sections 10.3 and 11.5.
To show that the usual formulas for the least squares standard errors are incorrect
under heteroskedasticity, we return to the derivation of var(b2) in (4.2.11). From that
equation, and using (11.2.2), we have
Slide 11.7
Undergraduate Econometrics,2nd Edition-Chapter 11
= wt2t2
2 2
x
t
(
)
t
=
2
( xt x )2
(11.2.5)
(x x )
(11.2.6)
Slide 11.8
Undergraduate Econometrics,2nd Edition-Chapter 11
Note that standard computer software for least squares regression will compute the
estimated variance for b2 based on (11.2.6), unless told otherwise.
11.2.1White's Approximate Estimator for the Variance of the Least Squares Estimator
Halbert White, an econometrician, has suggested an estimator for the variances and
covariances of the least squares coefficient estimators when heteroskedasticity exists.
In the context of the simple regression model, his estimator for var(b2) is obtained by
replacing t2 by the squares of the least squares residuals et2 , in (11.2.5).
Large variances are likely to lead to large values of the squared residuals.
Because the squared residuals are used to approximate the variances, White's estimator
is strictly appropriate only in large samples.
If we apply White's estimator to the food expenditure-income data, we obtain
Slide 11.9
Undergraduate Econometrics,2nd Edition-Chapter 11
b1 ) = 561.89
var(
b2 ) = 0.0014569
var(
0.1283 xt
(23.704)
(0.0382)
(22.139)
(0.0305) (incorrect)
(White)
In this case, ignoring heteroskedasticity and using incorrect standard errors tends to
overstate the precision of estimation; we tend to get confidence intervals that are
narrower than they should be.
We can construct two corresponding 95% confidence intervals for 2.
White:
Slide 11.10
Undergraduate Econometrics,2nd Edition-Chapter 11
(11.3.1)
cov(ei , e j ) = 0 (i j)
By itself, the assumption var(et) = t2 is not adequate for developing a better procedure
for estimating 1 and 2.
Slide 11.11
Undergraduate Econometrics,2nd Edition-Chapter 11
We overcome this problem by making a further assumption about the t2 . Our earlier
inspection of the least squares residuals suggested that the error variance increases as
income increases. A reasonable model for such a variance relationship is
var (et ) = t2 = 2 xt
(11.3.2)
xt
Slide 11.12
yt
1
x
e
= 1
+ 2 t + t
xt
xt
xt
xt
(11.3.3)
yt
xt
xt*1 =
1
xt
xt*2 =
xt
xt
et* =
et
xt
(11.3.4)
(11.3.5)
The beauty of this transformed model is that the new transformed error term et is
homoskedastic. The proof of this result is:
et 1
1
var(e ) = var
= var(et ) = 2 xt = 2
x xt
xt
t
(11.3.6)
Slide 11.13
Undergraduate Econometrics,2nd Edition-Chapter 11
The transformed error term will retain the properties E (et ) = 0 and zero correlation
between different observations, cov(ei , ej ) = 0 for i j.
As a consequence, we can apply least squares to the transformed variables, yt , xt1 and
xt2 to obtain the best linear unbiased estimator for 1 and 2.
The transformed model is linear in the unknown parameters 1 and 2. These are the
original parameters that we are interested in estimating.
The transformed model satisfies the conditions of the Gauss-Markov Theorem, and the
least squares estimators defined in terms of the transformed variables are B.L.U.E.
The estimator obtained in this way is called a generalized least squares estimator.
One way of viewing the generalized least squares estimator is as a weighted least
squares estimator. Recall that the least squares estimator is those values of 1 and 2
that minimize the sum of squared errors. In this case, we are minimizing the sum of
squared transformed errors that are given by
Slide 11.14
Undergraduate Econometrics,2nd Edition-Chapter 11
et2
e =
t =1
t =1 xt
T
*2
t
The errors are weighted by the reciprocal of xt. When xt is small, the data contain more
information about the regression function and the observations are weighted heavily.
When xt is large, the data contain less information and the observations are weighted
lightly. In this way we take advantage of the heteroskedasticity to improve parameter
estimation.
Slide 11.15
Undergraduate Econometrics,2nd Edition-Chapter 11
Remark: In the transformed model xt1 1. That is, the variable associated
with the intercept parameter is no longer equal to 1. Since least squares
software usually automatically inserts a 1 for the intercept, when dealing
with transformed variables you will need to learn how to turn this option
off. If you use a weighted or generalized least squares option on your
software, the computer will do both the transforming and the estimating. In
this case suppressing the constant will not be necessary.
(11.3.7)
(17.986)(0.0270)
Slide 11.16
Undergraduate Econometrics,2nd Edition-Chapter 11
It is important to recognize that the interpretations for 1 and 2 are the same in the
transformed model in (11.3.5) as they are in the untransformed model in (11.3.1).
The standard errors in (11.3.8), namely se( 1 ) = 17.986 and se( 2 ) = 0.0270 are both
lower than their least squares counterparts that were calculated from White's estimator,
namely se(b1) = 23.704 and se(b2) = 0.0382. Since generalized least squares is a better
estimation procedure than least squares, we do expect the generalized least squares
standard errors to be lower.
Slide 11.17
Undergraduate Econometrics,2nd Edition-Chapter 11
The smaller standard errors have the advantage of producing narrower more
informative confidence intervals. For example, using the generalized least squares
results, a 95% confidence interval for 2 is given by
2 tcse( 2 ) = 0.1410 2.024(0.0270) = [0.086, 0.196]
The least squares confidence interval computed using White's standard errors was [0.051,
0.206].
Slide 11.18
Undergraduate Econometrics,2nd Edition-Chapter 11
11.4.2
Slide 11.19
Undergraduate Econometrics,2nd Edition-Chapter 11
Divide the sample such that the observations with potentially high variances are in one
subsample and those with potentially low variances are in the other subsample.
2.Compute estimated error variances 12 and 22 for each of the subsamples. Let 12 be the
estimate from the subsample with potentially large variances and let 22 be the estimate
from the subsample with potentially small variances. If a null hypothesis of equal
variances is not true, we expect 12 22 to be large.
3.Compute GQ = 12 22 and reject the null hypothesis of equal variances if GQ > Fc
where Fc is a critical value form the F-distribution with (T1 K ) and (T2 K ) degrees
of freedom. The values T1 and T2 are the numbers of observations in each of the
subsamples; if the sample is split exactly in half, T1 = T2 = T 2 .
Applying this test procedure to the household food expenditure model, we set up the
hypotheses
H 0 : t2 = 2
H1 : t2 = 2 xt
(11.4.1)
Slide 11.20
After ordering the data according to decreasing values of xt , and using a partition of 20
observations in each subset of data, we find 12 = 2285.9 and 22 = 682.46. Hence, the
value of the Goldfeld-Quandt statistic is
GQ =
2285.9
= 3.35
682.46
The 5 percent critical value for (18, 18) degrees of freedom is Fc = 2.22. Thus, because
GQ = 3.35 > Fc = 2.22, we reject H 0 and conclude that heteroskedasticity does exist;
the error variance does depend on the level of income.
Slide 11.21
Undergraduate Econometrics,2nd Edition-Chapter 11
(11.5.1)
The data we have available from the Australian wheat growing district consist of 26
years of aggregate time-series data on quantity supplied and price.
Because there is no obvious index of production technology, some kind of proxy needs
to be used for this variable. We use a simple linear time-trend, a variable that takes the
value 1 in year 1, 2 in year 2, and so on, up to 26 in year 26.
An obvious weather variable is also unavailable; thus, in our statistical model, weather
effects will form part of the random error term. Using these considerations, we specify
the linear supply function
qt = 1 + 2 pt + 3t + et
t = 1, 2,..., 26
(11.5.2)
Slide 11.23
Undergraduate Econometrics,2nd Edition-Chapter 11
Since the weather effect is a major component of the random error term et, we can
model the reduced weather effect of the last 13 years by assuming the error variance in
those years is different from the error variance in the first 13 years. Thus, we assume
that
E (et ) = 0
var (et ) = 12
t = 1,,13
var (et ) = 22
t = 14,, 26
(11.5.3)
var (et ) = 12
t = 1,,13
qt = 1 + 2 pt + 3t + et
var (et ) =
t = 14, , 26
2
2
(11.5.4)
Slide 11.25
Undergraduate Econometrics,2nd Edition-Chapter 11
Dividing each variable by 1 for the first 13 observations and by 2 for the last 13
observations yields
qt
1
p
t
e
= 1 + 2 t + 3 + t
1
1
1
1 1
t = 1,,13
qt
1
p
t
e
= 1 + 2 t + 3
+ t
2
2
2
2 2
t = 14,, 26
(11.5.5)
This transformation yields transformed error terms that have the same variance for all
observations. Specifically, the transformed error variances are all equal to one because
et 1
12
var = 2 var (et ) = 2 = 1
1
1 1
et 1
22
var = 2 var (et ) = 2 = 1
2
2 2
t = 1,,13
t = 14,, 26
Slide 11.26
Undergraduate Econometrics,2nd Edition-Chapter 11
Providing 1 and 2 are known, the transformed model in (11.5.5) provides a set of
new transformed variables to which we can apply the least squares principle to obtain
the best linear unbiased estimator for (1, 2, 3).
The transformed variables are
qt
i
1
i
pt
i
t
i
(11.5.6)
Slide 11.27
Undergraduate Econometrics,2nd Edition-Chapter 11
11.5.3
and
22 = 57.76
(R11.7)
Slide 11.28
Undergraduate Econometrics,2nd Edition-Chapter 11
= 138.1 +21.72pt+3.283t
(12.7)
(R11.8)
(8.81) (0.812)
Slide 11.29
Undergraduate Econometrics,2nd Edition-Chapter 11
Slide 11.30
Undergraduate Econometrics,2nd Edition-Chapter 11
11.5.4
To use a residual plot to check whether the wheat-supply error variance has decreased
over time, it is sensible to plot the least-squares residuals against time. See Figure 11.3.
The dramatic drop in the variation of the residuals after year 13 supports our belief that
the variance has decreased.
For the Goldfeld-Quandt test the sample is already split into two natural subsamples.
Thus, we set up the hypotheses
H 0 : 12 = 22
H1 : 22 < 12
(11.5.9)
Slide 11.31
Undergraduate Econometrics,2nd Edition-Chapter 11
T1 = T2 = 13 and K = 3 ; thus, if H 0 is true, 11.11 is an observed value from an Fdistribution with (10, 10) degrees of freedom. The corresponding 5 percent critical
value is Fc = 2.98.
Since GQ = 11.11 > Fc = 2.98, we reject H 0 and conclude that the observed difference
between 12 and 22 could not reasonably be attributable to chance. There is evidence to
suggest the new varieties have reduced the variance in the supply of wheat.
Slide 11.32
Undergraduate Econometrics,2nd Edition-Chapter 11