Chapter 14

Chapter 14


The Nature of Simultaneous

Equations Systems
In a typical econometric equation:
Yt = 0 + 1X1t + 2X2t + t


a simultaneous system is one in which Y has an effect on at least

one of the Xs in addition to the effect that the Xs have on Y

Jargon here involves feedback effects, dual causality as well as X

and Y being jointly determined
Such systems are usually modeled by distinguishing between variables
that are simultaneously determined (the Ys, called endogenous
variables) and those that are not (the Xs, called exogenous variables):
Y1t = 0 + 1Y2t + 2X1t + 3X2t + 1t


Y2t = 0 + 1Y1t + 2X3t + 3X2t + 2t


The Nature of Simultaneous

Equations Systems (cont.)
Equations 14.2 and 14.3 are examples of structural
Structural equations characterize the underlying
economic theory behind each endogenous variable by
expressing it in terms of both endogenous and
exogenous variables
For example, Equations 14.2 and 14.3 could be a
demand and a supply equation, respectively

The Nature of Simultaneous

Equations Systems (cont.)
The term predetermined variable includes all
exogenous variables and lagged endogenous variables
Predetermined implies that exogenous and lagged endogenous
variables are determined outside the system of specified
equations or prior to the current period

The main problem with simultaneous systems is that

they violate Classical Assumption III (the error term
and each explanatory variable should be uncorrelated)

Reduced-Form Equations
An alternative way of expressing a simultaneous equations
system is through the use of reduced-form equations
Reduced-form equations express a particular
endogenous variable solely in terms of an error term and all
the predetermined (exogenous plus lagged endogenous)
variables in the simultaneous system

Reduced-Form Equations
The reduced-form equations for the structural
Equations 14.2 and 14.3 would thus be:
Y1t = 0 + 1X1t + 2X2t + 3X3t + v1t


Y2t = 4 + 5X1t + 6X2t + 7X3t + v2t


where the vs are stochastic error terms and the s are

called reduced-form coefficients

Reduced-Form Equations
There are at least three reasons for using reduced-form equations:
1. Since the reduced-form equations have no inherent simultaneity, they
do not violate Classical Assumption III
Therefore, they can be estimated with OLS without encountering the
problems discussed in this chapter

2. The interpretation of the reduced-form coefficients as impact

multipliers means that they have economic meaning and useful
applications of their own
3. Reduced-form equations play a crucial role in Two-Stage Least
Squares, the estimation technique most frequently used for
simultaneous equations (discussed in Section 14.3)

The Bias of Ordinary Least

Squares (OLS)
Simultaneity bias refers to the fact that in a simultaneous system,
the expected values of the OLS-estimated structural coefficients are
not equal to the true s, that is:
The reason for this is that the two error terms of Equation 14.2
and 14.3 are correlated with the endogenous variables when they
appear as explanatory variables
As an example of how the application of OLS to simultaneous
equations estimation causes bias, a Monte Carlo experiment was
conducted for a supply and demand model
As Figure 14.2 illustrates, the sampling distributions differed greatly
from the true distributions defined in the Monte Carlo experiment
Figure 14.2 Sampling Distributions Showing

Simultaneity Bias of OLS Estimates

What Is Two-Stage Least

Two-Stage Least Squares (2SLS) helps mitigate
simultaneity bias in simultaneous equation systems
2SLS requires a variable that is:
1. a good proxy for the endogenous variable
2. uncorrelated with the error term
Such a variable is called an instrumental variable

2SLS essentially consist of the following two steps:

What Is Two-Stage Least

Run OLS on the reduced-form equations for each of the
endogenous variables that appear as explanatory variables in the
structural equations in the system
That is, estimate (using OLS):

What Is Two-Stage Least

Squares? (cont.)
Substitute the Ys from the reduced form for the Ys that appear on
the right side (only) of the structural equations, and then estimate
these revised structural equations with OLS
That is, estimate (using OLS):


The Properties of Two-Stage

Least Squares
1. 2SLS estimates are still biased in small samples
But consistent in large samples (get closer to true s as N increases)

2. Bias in 2SLS for small samples typically is of the opposite sign of

the bias in OLS
3. If the fit of the reduced-form equation is poor, then 2SLS will not rid
the equation of bias even in a large sample
4. 2SLS estimates have increased variances and standard errors
relative to OLS

Note that Two-Stage Least Squares cannot be applied to an

equation unless that equation is identified, however
We therefore now turn to the issue of identification
What Is the Identification

Identification is a precondition for the application of 2SLS to equations
in simultaneous systems
A structural equation is identified only when enough of the systems
predetermined variables are omitted from the equation in question to
allow that equation to be distinguished from all the others in the system
Note that one equation in a simultaneous system might be identified and
another might not

Most simultaneous systems are fairly complicated, so econometricians

need a general method by which to determine whether equations are
The method typically used is the order condition of identification, to
which we now turn
The Order Condition of

Is a systematic method of determining whether a particular
equation in a simultaneous system has the potential to be
If an equation can meet the order condition, then it is
almost always identified
We thus say that the order condition is a necessary but not
sufficient condition of identification

The Order Condition of

Identification (cont.)
A necessary condition for an equation to be identified is that
the number of predetermined (exogenous plus lagged
endogenous) variables in the system be greater than or
equal to the number of slope coefficients in the equation of

Or, in equation form, a structural equation meets the order

condition if:
# predetermined variables # slope coefficients
(in the simultaneous system) (in the equation)
The General IV Regression


So far we have considered IV regression with a single endogenous

regressor (X) and a single instrument (Z).
We need to extend this to:
multiple endogenous regressors (X1,,Xk)
multiple included exogenous variables (W1,,Wr) or control
variables, which need to be included for the usual OV reason
multiple instrumental variables (Z1,,Zm). More (relevant)
instruments can produce a smaller variance of TSLS: the R2 of
the first stage increases, so you have more variation in X hat.
New terminology: identification & overidentification

In general, a parameter is said to be identified if different
values of the parameter produce different distributions of
the data.
In IV regression, whether the coefficients are identified
depends on the relation between the number of instruments
(m) and the number of endogenous regressors (k)
Intuitively, if there are fewer instruments than endogenous
regressors, we cant estimate 1,,k
For example, suppose k = 1 but m = 0 (no instruments)!

The coefficients 1,, k are said to be:
exactly identified if m = k.
There are just enough instruments to estimate 1,,k.
overidentified if m > k.
There are more than enough instruments to estimate 1,,k. If so, you
can test whether the instruments are valid (a test of the overidentifying
restrictions) well return to this later
underidentified if m < k.
There are too few instruments to estimate 1,,k. If so, you need to
get more instruments!

The General IV Regression

Model: Summary of Jargon
Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui

Yi is the dependent variable

X1i,, Xki are the endogenous regressors (potentially correlated with
ui )
W1i,,Wri are the included exogenous regressors (uncorrelated with
ui) or control variables (included so that Zi is uncorrelated with ui, once
the Ws are included)
0, 1,, k+r are the unknown regression coefficients
Z1i,,Zmi are the m instrumental variables (the excluded exogenous
The coefficients are overidentified if m > k; exactly identified if m = k;
and underidentified if m < k.
TSLS with a Single

Endogenous Regressor
Yi = 0 + 1X1i + 2W1i + + 1+rWri + ui

m instruments: Z1i,, Zm

First stage
Regress X1 on all the exogenous regressors: regress X1 on
W1,,Wr, Z1,, Zm, and an intercept, by OLS
Compute predicted values , i = 1,,n
Second stage
Regress Y on , W1,, Wr, and an intercept, by OLS
The coefficients from this second stage regression are the TSLS
estimators, but SEs are wrong
To get correct SEs, do this in a single step in your regression software

Checking Assumption #1:

Instrument Relevance

We will focus on a single included endogenous regressor:

Yi = 0 + 1Xi + 2W1i + + 1+rWri + ui

First stage regression:

Xi = 0 + 1Z1i ++ mZmi + m+1W1i ++ m+kWki + ui

The instruments are relevant if at least one of 1,,m are


The instruments are said to be weak if all the 1,,m are

either zero or nearly zero.
Weak instruments explain very little of the variation in X,
beyond that explained by the Ws
What are the consequences

of weak instruments?
If instruments are weak, the sampling distribution of TSLS
and its t-statistic are not (at all) normal, even with n large.
Consider the simplest case:
Yi = 0 + 1Xi + ui

Xi = 0 + 1Zi + ui
The IV estimator is = Zi
If cov(X,Z) is zero or small, then sXZ will be small: With weak
instruments, the denominator is nearly zero.
If so, the sampling distribution of (and its t-statistic) is not
well approximated by its large-n normal approximation
Measuring the Strength of

Instruments in Practice:
The First-Stage F-statistic
The first stage regression (one X):
Regress X on Z1,..,Zm,W1,,Wk.

Totally irrelevant instruments

are zero.

all the coefficients on Z1,,Zm

The first-stage F-statistic tests the hypothesis that Z1,,Zm

do not enter the first stage regression.
Weak instruments imply a small first stage F-statistic.

Checking for Weak

Instruments with a Single X
Compute the first-stage F-statistic.
Rule-of-thumb: If the first stage F-statistic
is less than 10, then the set of instruments
is weak.
If so, the TSLS estimator will be biased, and
statistical inferences (standard errors,
hypothesis tests, confidence intervals) can be
Checking for Weak

Instruments with a Single X,
Why compare the first-stage F to 10?

Simply rejecting the null hypothesis that the

coefficients on the Zs are zero isnt enough you
need substantial predictive content for the normal
approximation to be a good one.
Comparing the first-stage F to 10 tests for whether
the bias of TSLS, relative to OLS, is less than 10%.
If F is smaller than 10, the relative bias exceeds
10%that is, TSLS can have substantial bias.
What to do if you have weak

Get better instruments (often easier said than done!)

If you have many instruments, some are probably weaker

than others and its a good idea to drop the weaker ones
(dropping an irrelevant instrument will increase the firststage F)
If you only have a few instruments, and all are weak, then
you need to do some IV analysis other than TSLS
Separate the problem of estimation of 1 and construction
of confidence intervals
This seems odd, but if TSLS isnt normally distributed, it
makes sense (right?)
Checking Assumption #2:

Instrument Exogeneity
Instrument exogeneity: All the instruments are
uncorrelated with the error term: corr(Z1i, ui) = 0,,
corr(Zmi, ui) = 0
If the instruments are correlated with the error term, the first
stage of TSLS cannot isolate a component of X that is
uncorrelated with the error term, so is correlated with u and
TSLS is inconsistent.
If there are more instruments than endogenous regressors,
it is possible to test partially for instrument exogeneity.

Testing Overidentifying

Consider the simplest case:

Yi = 0 + 1Xi + ui,

Suppose there are two valid instruments: Z1i, Z2i

Then you could compute two separate TSLS estimates.

Intuitively, if these 2 TSLS estimates are very different from each other,
then something must be wrong: one or the other (or both) of the
instruments must be invalid.
The J-test of overidentifying restrictions makes this comparison in a
statistically precise way.
This can only be done if #Zs > #Xs (overidentified).

The J-test of Overidentifying

Suppose #instruments = m > # Xs = k (overidentified)

Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui

The J-test is the Anderson-Rubin test, using the TSLS estimator instead
of the hypothesized value 1,0. The recipe:

First estimate the equation of interest using TSLS and all m instruments;
compute the predicted values , using the actual Xs (not the s used to
estimate the second stage)
Compute the residuals = Yi
Regress against Z1i,,Zmi, W1i,,Wri
Compute the F-statistic testing the hypothesis that the coefficients on
Z1i,,Zmi are all zero;
The J-statistic is J = mF
The J-test,
J = mF, where F = the F-statistic testing the coefficients on
Z1i,,Zmi in a regression of the TSLS residuals against
Z1i,,Zmi, W1i,,Wri.
Distribution of the J-statistic
Under the null hypothesis that all the instruments are
exogeneous, J has a chi-squared distribution with mk
degrees of freedom
If m = k, J = 0 (does this make sense?)

If some instruments are exogenous and others are

endogenous, the J statistic will be large, and the null
hypothesis that all instruments are exogenous will be rejected.
Checking Instrument Validity:

This summary considers the case of a single X. The two
requirements for valid instruments are:
1. Relevance
At least one instrument must enter the population counterpart of the
first stage regression.
If instruments are weak, then the TSLS estimator is biased and the
and t-statistic has a non-normal distribution
To check for weak instruments with a single included endogenous
regressor, check the first-stage F
If F>10, instruments are strong use TSLS
If F<10, weak instruments take some action.
Checking Instrument Validity:

2. Exogeneity

All the instruments must be uncorrelated with the error term:

corr(Z1i,ui) = 0,, corr(Zmi,ui) = 0

We can partially test for exogeneity: if m>1, we can test the null
hypothesis that all the instruments are exogenous, against the
alternative that as many as m1 are endogenous (correlated with u)
The test is the J-test, which is constructed using the TSLS

If the J-test rejects, then at least some of your instruments are

endogenous so you must make a difficult decision and jettison
some (or all) of your instruments.
Figure 14.1 Supply and Demand

Simultaneous Equations

Figure 14.3
A Shifting Supply Curve

Figure 14.4
When Both Curves Shift

Table 14.1a
Data for a Small Macromodel

Table 14.1b
Data for a Small Macromodel

Key Terms from Chapter 14

Endogenous variable
Predetermined variable
Structural equation
Reduced-form equation
Simultaneity bias
Two-Stage Least Squares

Order condition for identification
Example #1: Effect of

Studying on Grades
What is the effect on grades of studying for an
additional hour per day?

X = study time (hours per day)

Data: grades and study hours of college freshmen.
Would you expect the OLS estimator of 1 (the
effect on GPA of studying an extra hour per day) to
be unbiased? Why or why not?
Studying on grades

Stinebrickner, Ralph and Stinebrickner, Todd R. (2008) "The Causal Effect of Studying on
Academic Performance," The B.E. Journal of Economic Analysis & Policy: Vol. 8: Iss. 1
(Frontiers), Article 14.

n = 210 freshman at Berea College (Kentucky) in 2001

Y = first-semester GPA
X = average study hours per day (time use survey)
Roommates were randomly assigned
Z = 1 if roommate brought video game, = 0 otherwise
Do you think Zi (whether a roommate brought a video game) is a valid
Is it relevant (correlated with X)?
Is it exogenous (uncorrelated with u)?
Studying on grades
X = 0 + 1Z + v i
Y = 0 + 1Z + w i
Y = GPA (4 point scale)
X = time spent studying (hours per day)

Z = 1 if roommate brought video game, = 0 otherwise

Stinebrinckner and Stinebrinckners findings

1= -.668 and 1= -.241


1 = 1/ 1 = 0.360
What are the units? Do these estimates make sense in a real-world way?
(Note: They actually ran the regressions including additional regressors
more on this later.)
Example #3: Test scores and

class size
The California test score/class size regressions still
could have Omitted Variable bias (e.g. parental
In principle, this bias can be eliminated by IV
regression (TSLS).
IV regression requires a valid instrument, that is, an
instrument that is:
relevant: corr(Zi,STRi) 0
exogenous: corr(Zi,ui) = 0
Example #3: Test scores and

class size
Here is a (hypothetical) instrument:
some districts, randomly hit by an earthquake, double up
Zi = Quakei = 1 if hit by quake, = 0 otherwise
Do the two conditions for a valid instrument hold?
The earthquake makes it as if the districts were in a random
assignment experiment. Thus, the variation in STR arising from
the earthquake is exogenous.
The first stage of TSLS regresses STR against Quake, thereby
isolating the part of STR that is exogenous (the part that is as
if randomly assigned)
Application to the Demand for

Why are we interested in knowing the elasticity of demand
for cigarettes?
Theory of optimal taxation. The optimal tax rate is inversely
related to the price elasticity: the greater the elasticity, the
less quantity is affected by a given percentage tax, so the
smaller is the change in consumption and deadweight loss.
Externalities of smoking role for government intervention
to discourage smoking
health effects of second-hand smoke? (non-monetary)
monetary externalities
Application to the Demand for

Panel data set

Annual cigarette consumption, average prices paid by end consumer

(including tax), personal income, and tax rates (cigarette-specific and
general statewide sales tax rates)
48 continental US states, 1985-1995

Estimation strategy
We need to use IV estimation methods to handle the simultaneous
causality bias that arises from the interaction of supply and demand.
State binary indicators = W variables (control variables) which control for
unobserved state-level characteristics that affect the demand for
cigarettes and the tax rate, as long as those characteristics dont vary
over time.
Application to the Demand for

Fixed-effects model of cigarette demand

ln(Q ) = i + 1ln(P ) + 2ln(Incomeit) + uit

i = 1,,48, t = 1985, 1986,,1995
corr(ln(P ),uit) is plausibly nonzero because of supply/demand interactions
i reflects unobserved omitted factors that vary across states but not over
time, e.g. attitude towards smoking
Estimation strategy:
Use panel data regression methods to eliminate i
Use TSLS to handle simultaneous causality bias
Use T = 2 with 1985 1995 changes (changes method) look at
long-term response, not short-term dynamics (short- v. long-run
Application to the Demand for

The changes method (when T=2)
One way to model long-term effects is to consider 10-year changes,
between 1985 and 1995
Rewrite the regression in changes form:
ln(Q95) ln(Q85)= 1[ln(P95) ln(P85)] +2[ln(Incomei95) ln(Incomei85)]

+ (ui95 ui85)
Create 10-year change variables, for example:
10-year change in log price = ln(Pi1995) ln(Pi1985)
Then estimate the demand elasticity by TSLS using 10-year changes in
the instrumental variables
This is equivalent to using the original data and including the state binary
indicators (W variables) in the regression
