Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1

Chapter 11
Heteroskedasticity
11.1 The Nature of Heteroskedasticity
In Chapter 3 we introduced the linear model
y = 1 + 2 x
(11.1.1)
to explain household expenditure on food (y) as a function of household income (x).

We begin this section by asking whether a function such as y = 1 + 2 x is better at
explaining expenditure on food for low-income households than it is for high-income
households.
Income is less important as an explanatory variable for food expenditure of highincome families. It is harder to guess their food expenditure.
Slide 11.1
Undergraduate Econometrics,2nd Edition-Chapter 11
This type of effect can be captured by a statistical model that exhibits

heteroskedasticity.
yt = 1 + 2 xt + et
(11.1.2)
We assumed the et were uncorrelated random error terms with mean zero and constant
variance 2. That is,
E (et ) = 0
var(et ) = 2
cov(ei , e j ) = 0
(11.1.3)
Including the standard errors for b1 and b2, the estimated mean function was
y t = 40.768+0.1283 xt
(11.1.4)
(22.139)(0.0305)
A graph of this estimated function, along with all the observed expenditure-income
points ( yt , xt ) , appears in Figure 11.1.
Slide 11.2
Notice that, as income (xt) grows, the observed data points ( yt , xt ) have a tendency to
deviate more and more from the estimated mean function.
The least squares residuals, defined by
et = yt b1 b2 xt
(11.1.5)
increase in absolute value as income grows.

[Figure 11.1 here]
The observable least squares residuals (et ) are proxies for the unobservable errors (et )
that are given by
et = yt 1 2 xt
(11.1.6)
The information in Figure 11.1 suggests that the unobservable errors also increase in
absolute value as income ( xt ) increases.
Is this type of behavior consistent with the assumptions of our model?
Slide 11.3
The parameter that controls the spread of yt around the mean function, and measures the
uncertainty in the regression model, is the variance 2.
If the scatter of yt around the mean function increases as xt increases, then the
uncertainty about yt increases as xt increases, and we have evidence to suggest that
the variance is not constant.
Thus, we are questioning the constant variance assumption
var( yt ) = var(et ) = 2
(11.1.7)
The most general way to relax this assumption is to add a subscript t to 2, recognizing
that the variance can be different for different observations. We then have
var( yt ) = var(et ) = t2
(11.1.8)
In this case, when the variances for all observations are not the same, we say that
heteroskedasticity exists. Alternatively, we say the random variable yt and the random
error et are heteroskedastic.
Slide 11.4
Conversely, if (11.1.7) holds we say that homoskedasticity exists, and yt and et are
homoskedastic.
The heteroskedastic assumption is illustrated in Figure 11.2.
[Figure 11.2 here]
The existence of different variances, or heteroskedasticity, is often encountered when
using cross-sectional data.
11.2 The Consequences of Heteroskedasticity for the Least Squares Estimator
If we have a linear regression model with heteroskedasticity and we use the least
squares estimator to estimate the unknown coefficients, then:
1.The least squares estimator is still a linear and unbiased estimator, but it is no longer
best. It is no longer B.L.U.E.
Slide 11.5
2.The standard errors usually computed for the least squares estimator are incorrect.
Confidence intervals and hypothesis tests that use these standard errors may be
misleading.
Consider the model
yt = 1 + 2 xt + et
(11.2.1)
where
E (et ) = 0
var(et ) = t2
cov(ei , e j ) = 0
(i j)
In Chapter 4, equation 4.2.1, we wrote the least squares estimator for 2 as

b2 = 2 + wt et
(11.2.2)
where
wt =
xt x
( xt x )
Slide 11.6
The first property that we establish is that of unbiasedness.

E (b2 ) = E (2 ) + E ( wt et )
= 2 + wt E (et ) = 2
(11.2.4)
The next result is that the least squares estimator is no longer best. The way we tackle
this question is to derive an alternative estimator which is the best linear unbiased
estimator. This new estimator is considered in Sections 10.3 and 11.5.
To show that the usual formulas for the least squares standard errors are incorrect
under heteroskedasticity, we return to the derivation of var(b2) in (4.2.11). From that
equation, and using (11.2.2), we have
Slide 11.7
var (b2 ) = var (2 ) + var ( wt et )

= var ( wt et )
= wt2 var (et ) + wi w j cov (ei , e j )

i j
= wt2t2
2 2
x
t
(
)
t
=
2
( xt x )2
(11.2.5)
Note from the last line in (11.2.5) that

var(b2 )
(x x )
(11.2.6)
Slide 11.8
Note that standard computer software for least squares regression will compute the
estimated variance for b2 based on (11.2.6), unless told otherwise.
11.2.1White's Approximate Estimator for the Variance of the Least Squares Estimator
Halbert White, an econometrician, has suggested an estimator for the variances and
covariances of the least squares coefficient estimators when heteroskedasticity exists.
In the context of the simple regression model, his estimator for var(b2) is obtained by
replacing t2 by the squares of the least squares residuals et2 , in (11.2.5).
Large variances are likely to lead to large values of the squared residuals.
Because the squared residuals are used to approximate the variances, White's estimator
is strictly appropriate only in large samples.
If we apply White's estimator to the food expenditure-income data, we obtain
Slide 11.9
b1 ) = 561.89
var(
b2 ) = 0.0014569
var(
We could write our estimated equation as

y t =40.768 +
0.1283 xt
(23.704)
(0.0382)
(22.139)
(0.0305) (incorrect)
(White)
In this case, ignoring heteroskedasticity and using incorrect standard errors tends to
overstate the precision of estimation; we tend to get confidence intervals that are
narrower than they should be.
We can construct two corresponding 95% confidence intervals for 2.
White:
b2 tcse(b2 ) = 0.1283 2.024(0.0382) = [0.051, 0.206]
Incorrect: b2 tcse(b2 ) = 0.1283 2.024(0.0305) = [0.067, 0.190]
Slide 11.10
11.3 Proportional Heteroskedasticity

Return to the example where weekly food expenditure (yt) is related to weekly income
(xt) through the equation
yt = 1 + 2 xt + et
(11.3.1)
We make the following assumptions:

E (et ) = 0 var (et ) = t2
cov(ei , e j ) = 0 (i j)
By itself, the assumption var(et) = t2 is not adequate for developing a better procedure
for estimating 1 and 2.
Slide 11.11
We overcome this problem by making a further assumption about the t2 . Our earlier
inspection of the least squares residuals suggested that the error variance increases as
income increases. A reasonable model for such a variance relationship is
var (et ) = t2 = 2 xt
(11.3.2)
The assumption of heteroskedastic errors in (11.3.2) is a reasonable one for the

expenditure model.
Under heteroskedasticity the least squares estimator is not the best linear unbiased
estimator. One way of overcoming this dilemma is to change or transform our
statistical model into one with homoskedastic errors. Leaving the basic structure of the
model intact, it is possible to turn the heteroskedastic error model into a homoskedastic
error model. Once this transformation has been carried out, application of least squares
to the transformed model gives a best linear unbiased estimator.
Begin by dividing both sides of the original equation in (11.3.1) by
xt
Slide 11.12
yt
1
x
e
= 1
+ 2 t + t
xt
xt
xt
xt
(11.3.3)
Define the transformed variables

yt* =
yt
xt
xt*1 =
1
xt
xt*2 =
xt
xt
et* =
et
xt
(11.3.4)
(11.3.3) can be rewritten as

yt = 1 xt1 + 2 xt2 + et
(11.3.5)
The beauty of this transformed model is that the new transformed error term et is
homoskedastic. The proof of this result is:
et 1
1
var(e ) = var
= var(et ) = 2 xt = 2
x xt
xt
t
(11.3.6)
Slide 11.13
The transformed error term will retain the properties E (et ) = 0 and zero correlation
between different observations, cov(ei , ej ) = 0 for i j.
As a consequence, we can apply least squares to the transformed variables, yt , xt1 and
xt2 to obtain the best linear unbiased estimator for 1 and 2.
The transformed model is linear in the unknown parameters 1 and 2. These are the
original parameters that we are interested in estimating.
The transformed model satisfies the conditions of the Gauss-Markov Theorem, and the
least squares estimators defined in terms of the transformed variables are B.L.U.E.
The estimator obtained in this way is called a generalized least squares estimator.
One way of viewing the generalized least squares estimator is as a weighted least
squares estimator. Recall that the least squares estimator is those values of 1 and 2
that minimize the sum of squared errors. In this case, we are minimizing the sum of
squared transformed errors that are given by
Slide 11.14
et2
e =
t =1
t =1 xt
T
*2
t
The errors are weighted by the reciprocal of xt. When xt is small, the data contain more
information about the regression function and the observations are weighted heavily.
When xt is large, the data contain less information and the observations are weighted
lightly. In this way we take advantage of the heteroskedasticity to improve parameter
estimation.
Slide 11.15
Remark: In the transformed model xt1 1. That is, the variable associated
with the intercept parameter is no longer equal to 1. Since least squares
software usually automatically inserts a 1 for the intercept, when dealing
with transformed variables you will need to learn how to turn this option
off. If you use a weighted or generalized least squares option on your
software, the computer will do both the transforming and the estimating. In
this case suppressing the constant will not be necessary.
Applying the generalized (weighted) least squares procedure to our household

expenditure data yields the following estimates:
y t =31.924+0.1410 xt
(11.3.7)
(17.986)(0.0270)
Slide 11.16
It is important to recognize that the interpretations for 1 and 2 are the same in the
transformed model in (11.3.5) as they are in the untransformed model in (11.3.1).
The standard errors in (11.3.8), namely se( 1 ) = 17.986 and se( 2 ) = 0.0270 are both
lower than their least squares counterparts that were calculated from White's estimator,
namely se(b1) = 23.704 and se(b2) = 0.0382. Since generalized least squares is a better
estimation procedure than least squares, we do expect the generalized least squares
standard errors to be lower.
Remark: Remember that standard errors are square roots of estimated

variances; in a single sample the relative magnitudes of variances may not
always be reflected by their corresponding variance estimates. Thus, lower
standard errors do not always mean better estimation.
Slide 11.17
The smaller standard errors have the advantage of producing narrower more
informative confidence intervals. For example, using the generalized least squares
results, a 95% confidence interval for 2 is given by
2 tcse( 2 ) = 0.1410 2.024(0.0270) = [0.086, 0.196]
The least squares confidence interval computed using White's standard errors was [0.051,
0.206].
Slide 11.18
11.4 Detecting Heteroskedasticity

11.4.1 Residual Plots
One way of investigating the existence of heteroskedasticity is to estimate your model
using least squares and to plot the least squares residuals.
If the errors are homoskedastic, there should be no patterns of any sort in the residuals.
If the errors are heteroskedastic, they may tend to exhibit greater variation in some
systematic way.
11.4.2
The Goldfeld-Quandt Test
A formal test for heteroskedasticity is the Goldfeld-Quandt test. It involves the

following steps:
1.Split the sample into two approximately equal subsamples. If heteroskedasticty exists,
some observations will have large variances and others will have small variances.
Slide 11.19
Divide the sample such that the observations with potentially high variances are in one
subsample and those with potentially low variances are in the other subsample.
2.Compute estimated error variances 12 and 22 for each of the subsamples. Let 12 be the
estimate from the subsample with potentially large variances and let 22 be the estimate
from the subsample with potentially small variances. If a null hypothesis of equal
variances is not true, we expect 12 22 to be large.
3.Compute GQ = 12 22 and reject the null hypothesis of equal variances if GQ > Fc
where Fc is a critical value form the F-distribution with (T1 K ) and (T2 K ) degrees
of freedom. The values T1 and T2 are the numbers of observations in each of the
subsamples; if the sample is split exactly in half, T1 = T2 = T 2 .
Applying this test procedure to the household food expenditure model, we set up the
hypotheses
H 0 : t2 = 2
H1 : t2 = 2 xt
(11.4.1)
Slide 11.20
After ordering the data according to decreasing values of xt , and using a partition of 20
observations in each subset of data, we find 12 = 2285.9 and 22 = 682.46. Hence, the
value of the Goldfeld-Quandt statistic is
GQ =
2285.9
= 3.35
682.46
The 5 percent critical value for (18, 18) degrees of freedom is Fc = 2.22. Thus, because
GQ = 3.35 > Fc = 2.22, we reject H 0 and conclude that heteroskedasticity does exist;
the error variance does depend on the level of income.
Slide 11.21
REMARK: The above test is a one-sided test because

the alternative hypothesis suggested which sample
partition will have the larger variance. If we suspect that
two sample partitions could have different variances,
but we do not know which variance is potentially larger,
11.5 A Sample With a Heteroskedastic Partition

11.5.1 Economic Model
Consider modeling the supply of wheat in a particular wheat growing area in Australia.
In the supply function the quantity of wheat supplied will typically depend upon the
production technology of the firm, on the price of wheat or expectations about the
price of wheat, and on weather conditions.
Slide 11.22
We can depict this supply function as

Quantity = f (Price, Technology, Weather)
(11.5.1)
The data we have available from the Australian wheat growing district consist of 26
years of aggregate time-series data on quantity supplied and price.
Because there is no obvious index of production technology, some kind of proxy needs
to be used for this variable. We use a simple linear time-trend, a variable that takes the
value 1 in year 1, 2 in year 2, and so on, up to 26 in year 26.
An obvious weather variable is also unavailable; thus, in our statistical model, weather
effects will form part of the random error term. Using these considerations, we specify
the linear supply function
qt = 1 + 2 pt + 3t + et
t = 1, 2,..., 26
(11.5.2)
Slide 11.23
qt is the quantity of wheat produced in year t,

pt is the price of wheat guaranteed for year t,
t = 1, 2,..., 26 is a trend variable introduced to capture changes in production
technology, and
et is a random error term that includes, among other things, the influence of
weather.
To complete the econometric model in (11.5.2) some statistical assumptions for the
random error term et are needed.
In this case, however, we have additional information that makes an alternative
assumption more realistic. After the 13th year, new wheat varieties whose yields are
less susceptible to variations in weather conditions were introduced. These new
varieties do not have an average yield that is higher than that of the old varieties, but
the variance of their yields is lower because yield is less dependent on weather
conditions.
Slide 11.24
Since the weather effect is a major component of the random error term et, we can
model the reduced weather effect of the last 13 years by assuming the error variance in
those years is different from the error variance in the first 13 years. Thus, we assume
that
E (et ) = 0
var (et ) = 12
t = 1,,13
var (et ) = 22
t = 14,, 26
(11.5.3)
From the above argument, we expect that 22 < 12 .

11.5.2 Generalized Least Squares Through Model Transformation
Write the model corresponding to the two subsets of observations as
qt = 1 + 2 pt + 3t + et
var (et ) = 12
t = 1,,13
qt = 1 + 2 pt + 3t + et
var (et ) =
t = 14, , 26
2
2
(11.5.4)
Slide 11.25
Dividing each variable by 1 for the first 13 observations and by 2 for the last 13
observations yields
qt
1
p
t
e
= 1 + 2 t + 3 + t
1
1
1
1 1
t = 1,,13
qt
1
p
t
e
= 1 + 2 t + 3
+ t
2
2
2
2 2
t = 14,, 26
(11.5.5)
This transformation yields transformed error terms that have the same variance for all
observations. Specifically, the transformed error variances are all equal to one because
et 1
12
var = 2 var (et ) = 2 = 1
1
1 1
et 1
22
var = 2 var (et ) = 2 = 1
2
2 2
t = 1,,13
t = 14,, 26
Slide 11.26
Providing 1 and 2 are known, the transformed model in (11.5.5) provides a set of
new transformed variables to which we can apply the least squares principle to obtain
the best linear unbiased estimator for (1, 2, 3).
The transformed variables are
qt
i
1
i
pt
i
t
i
(11.5.6)
where i is either 1 or 2 , depending on which half of the observations are being

considered.
Like before, the complete process of transforming variables, then applying least
squares to the transformed variables, is called generalized least squares.
Slide 11.27
11.5.3
Implementing Generalized Least Squares
The transformed variables in (11.5.6) depend on the unknown variance parameters 12

and 22 . Thus, as they stand, the transformed variables cannot be calculated.
To overcome this difficulty, we use estimates of
and
and transform the variables
as if the estimates were the true variances.

It makes sense to split the sample into two, applying least squares to the first half to
estimate 12 and applying least squares to the second half to estimate 22 . Substituting
these estimates for the true values causes no difficulties in large samples.
For the wheat supply example we obtain
12 = 641.64
22 = 57.76
(R11.7)
Slide 11.28
Using these estimates to calculate observations on the transformed variables in (11.5.6),

and then applying least squares to the complete sample defined in (11.5.5) yields the
estimated equation:
qt
= 138.1 +21.72pt+3.283t
(12.7)
(R11.8)
(8.81) (0.812)
Slide 11.29
Remark: A word of warning about calculation of the standard errors is

necessary. As demonstrated below (11.5.5), the transformed errors in
(11.5.5) have a variance equal to one. However, when you transform your
variables using 1 and 2 , and apply least squares to the transformed
variables for the complete sample, your computer program will
automatically estimate a variance for the transformed errors. This estimate
will not be exactly equal to one. The standard errors in (R11.8) were
calculated by forcing the computer to use one as the variance of the
transformed errors. Most software packages will have options that let you
do this, but it is not crucial if your package does not; the variance estimate
will usually be close to one anyway.
Slide 11.30
11.5.4
Testing the Variance Assumption
To use a residual plot to check whether the wheat-supply error variance has decreased
over time, it is sensible to plot the least-squares residuals against time. See Figure 11.3.
The dramatic drop in the variation of the residuals after year 13 supports our belief that
the variance has decreased.
For the Goldfeld-Quandt test the sample is already split into two natural subsamples.
Thus, we set up the hypotheses
H 0 : 12 = 22
H1 : 22 < 12
(11.5.9)
The computed value of the Goldfeld-Quandt statistic is

12 641.64
= 11.11
GQ = 2 =
2 57.76
Slide 11.31
T1 = T2 = 13 and K = 3 ; thus, if H 0 is true, 11.11 is an observed value from an Fdistribution with (10, 10) degrees of freedom. The corresponding 5 percent critical
value is Fc = 2.98.
Since GQ = 11.11 > Fc = 2.98, we reject H 0 and conclude that the observed difference
between 12 and 22 could not reasonably be attributable to chance. There is evidence to
suggest the new varieties have reduced the variance in the supply of wheat.
Slide 11.32

Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1

Uploaded by

Copyright:

Available Formats

Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Undergraduate Econometrics, 2 Edition-Chapter 11: Slide 11.1

Uploaded by

Copyright:

Available Formats

Chapter 11

to explain household expenditure on food (y) as a function of household income (x).

This type of effect can be captured by a statistical model that exhibits

increase in absolute value as income grows.

In Chapter 4, equation 4.2.1, we wrote the least squares estimator for 2 as

The first property that we establish is that of unbiasedness.

var (b2 ) = var (2 ) + var ( wt et )

= wt2 var (et ) + wi w j cov (ei , e j )

Note from the last line in (11.2.5) that

We could write our estimated equation as

b2 tcse(b2 ) = 0.1283 2.024(0.0382) = [0.051, 0.206]

Incorrect: b2 tcse(b2 ) = 0.1283 2.024(0.0305) = [0.067, 0.190]

11.3 Proportional Heteroskedasticity

We make the following assumptions:

The assumption of heteroskedastic errors in (11.3.2) is a reasonable one for the

Undergraduate Econometrics,2nd Edition-Chapter 11

Define the transformed variables

(11.3.3) can be rewritten as

Applying the generalized (weighted) least squares procedure to our household

Remark: Remember that standard errors are square roots of estimated

11.4 Detecting Heteroskedasticity

The Goldfeld-Quandt Test

A formal test for heteroskedasticity is the Goldfeld-Quandt test. It involves the

Undergraduate Econometrics,2nd Edition-Chapter 11

REMARK: The above test is a one-sided test because

11.5 A Sample With a Heteroskedastic Partition

We can depict this supply function as

qt is the quantity of wheat produced in year t,

From the above argument, we expect that 22 < 12 .

where i is either 1 or 2 , depending on which half of the observations are being

Implementing Generalized Least Squares

The transformed variables in (11.5.6) depend on the unknown variance parameters 12

and transform the variables

as if the estimates were the true variances.

Using these estimates to calculate observations on the transformed variables in (11.5.6),

Remark: A word of warning about calculation of the standard errors is

Testing the Variance Assumption

The computed value of the Goldfeld-Quandt statistic is

You might also like