Econometrics II-1-1
Econometrics II-1-1
Econometrics II-1-1
DEPARTMENT OF ECONOMICS
CHAPTER ONE
1. REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION: BINARY/ DUMMY
VARIABLES
1.1 Describing Qualitative Information
1.2. Dummy as Independent Variables
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 1
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
1.2.2 Regression on one quantitative variable and one qualitative variable with more than two
classes
1.4 Regression on one quantitative variable and two qualitative variables
1.5 Testing for structural stability of regression models
1.3. Dummy as Dependent Variable
1.3.1. The Linear Probability Model (LPM)
1.3.2. The Logit Model
1.3.3. The Probit Model
CHAPTER TWO
2. INTRODUCTION TO BASIC REGRESSION ANALYSIS WITH TIME SERIES
2.1. The Nature of Time Series Data
2.2. Stationary and Non-stationary stochastic processes
CHAPTER THREE
3. INTRODUCTION TO SIMULTANEOUS EQUATION MODELS
3.1. The Nature of Simultaneous Equation Model
1.2. Simultaneity bias
2.3 The Order and Rank Condition of identification problem
CHAPTER FOUR
2. INTRODUCTION TO PANEL DATA ANALYSIS
1.1. Introduction
1.2. Estimation of panel data Regression model The fixed Effect Approach
1.3. Estimation of panel data Regression model Random effect estimation
WORK SHEET
CHAPTER ONE
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 2
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
In regression analysis, a dummy variable (also known as an indicator variable) is one that
takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be
expected to shift the outcome. Those variables are variables essentially not quantified or non-
measurable by their nature. Dummy variables are "proxy" variables or numeric stand-ins for
qualitative facts in a regression model. In regression analysis, the dependent variables can be also
influenced by qualitative variables (gender, religion, geographic region, etc.). Hence the method
of converting such variables into quantitative method is possible through constructing artificial
values.
In regression analysis the dependent variable is frequently influenced not only by variables that
can be readily quantified on some well-defined scale (e.g., income, output, prices, costs, height,
and temperature), but also by variables that are essentially qualitative in nature (e.g., sex, race,
color, religion, nationality, wars, earthquakes, strikes, political upheavals, and changes in
government economic policy).
For example, holding all other factors constant, female professors are found to earn less
than their male counterparts, and nonwhites are found to earn less than whites.
This pattern may result from sex or racial discrimination, but whatever the reason, qualitative
variables such as sex and race does influence the dependent variable and clearly should be
included among the explanatory variables. Since such qualitative variables usually indicate the
presence or absence of a “quality” or an attribute, such as male or female, black or white, or
Christian or Muslim, one method of “quantifying” such attributes is by constructing artificial
variables that take on values of 1 or 0, 0 indicating the absence of an attribute and 1 indicating
the presence (or possession) of that attribute. For example, 1 may indicate that a person is a
male, and 0 may designate a female; or 1 may indicate that a person is a college graduate, and 0
that he is not, and so on. Variables that assume such 0 and 1 values are called dummy variables.
Alternative names are indicator variables, binary variables, categorical variables, and
dichotomous variables.
Dummy variables can be used in regression models just as easily as quantitative variables.
Example:
Yi Di u i ------------------------------------------ (1.1)
Where: Y = annual salary of a college professor
Di 1 if male college professor
= 0 otherwise (i.e., female professor)
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 3
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Note that (1.1) is like the two variable regression models encountered previously except that
instead of a quantitative X variable we have a dummy variable D (hereafter, we shall designate
all dummy variables by the letter D).
Assuming that the disturbance satisfy the usually assumptions of the classical linear regression
model (i.e the mean of error term is zero), we obtain from (1.1).
Mean salary of female college professor: E (Yi / Di 0) ….………………….(1.2)
That is, the intercept term gives the mean salary of female college professors and the slope
coefficient tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart, reflecting the mean salary of the male college
professor.
A test of the null hypothesis that there is no sex discrimination ( H 0 : 0) can be easily made
by running regression in the usual manner and finding out whether on the basis of the t test the
estimated is statistically significant.
Consider the following hypothetical data on satisfying salaries of college teachers by sex
Starting salary Sex
(Y) (1 = male, 0 = female)
22,000 1
19,000 0
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
21,200 1
The estimated mean salary of female is =
The estimated mean salary of male is =
Therefore the results of regression analysis are presented as follows:
Yˆi = 18,000 + 3,280D
i
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 4
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
(0.32) (0.44)
t = (57.74) (7.439)
R2 = 0.8737
The above results shows that the estimated mean salary of female college instructor is birr
18,000
1.2.1 Regression on one quantitative variable and one qualitative variable with two classes,
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 5
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
If the assumption of common slopes is valid, a test of the hypothesis that the two regressions
(1.4) and (1.5) have the same intercept (i.e., there is no sex discrimination) can be made easily by
running the regression (1.3) and noting the statistical significance of the estimated 2 on the
basis of the traditional t test. If the t test shows that ̂ 2 is statistically significant, we reject the
null hypothesis that the male and female college professors’ levels of mean annual salary are the
same.
Before proceeding further, note the following features of the dummy variable regression model
considered previously.
1. To distinguish the two categories, male and female, we have introduced only one dummy
D 1
variable Di . For if i always denotes a male, when Di 0 we know that it is a
female since there are only two possible outcomes. Hence, one dummy variable suffices
to distinguish two categories. The general rule is this: If a qualitative variable has ‘m’
categories, introduce only ‘m-1’ dummy variables. If this rule is not followed, we shall
fall into what might be called the dummy variable trap, that is, the situation of perfect
multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary
in the sense that in our example we could have assigned D = 1 for female and D = 0 for
male.
3. The group, category, or classification that is assigned the value of 0 is often referred to as
the base, benchmark, control, comparison, reference, or omitted category. It is the base
in the sense that comparisons are made with that category.
4. The coefficient 2 attached to the dummy variable D can be called the differential
intercept coefficient because it tells by how much the value of the intercept term of the
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 6
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
category that receives the value of 1 differs from the intercept coefficient of the base
category.
1.2.2 Regression on one quantitative variable and one qualitative variable with more than
two classes
Suppose that, on the basis of the cross-sectional data, we want to regress the annual expenditure
on health care by an individual on the income and education of the individual. Since the variable
education is qualitative in nature, suppose we consider three mutually exclusive levels of
education: primary school, high school, and college. Now, unlike the previous case, we have
more than two categories of the qualitative variable education. Therefore, following the rule that
the number of dummies be one less than the number of categories of the variable, we should
introduce two dummies to take care of the three levels of education. Assuming that the three
educational groups have a common slope but different intercepts in the regression of annual
expenditure on health care on annual income, we can use the following model:
Yi 1 2 D2i 3 D3i X i u i -------------------------- (1.6)
Yi
Where: annual expenditure on health care
Xi
Annual expenditure
D2 1 if high school education and 0 otherwise
D3
1 if college education and 0 otherwise
Note that in the preceding assignment of the dummy variables we are arbitrarily treating the
“primary school education” category as the base category. Therefore, the intercept 1 will
reflect the intercept for this category. The differential intercepts 2 and 3 tells by how much
the intercepts of the other two categories differ from the intercept of the base category, which
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 7
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Which are, respectively the mean health care expenditure functions for the three levels of
education, namely, primary school, high school, and college. Geometrically, the situation is
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. Let us revert to the college professors’ salary regression (1.3), but now assume that in
addition to years of teaching experience and sex the skin color of the teacher is also an important
determinant of salary. For simplicity, assume that color has two categories: black and white.
We can now write (1.3) as :
Yi 1 2 D2i 3 D3i X i u i ------------------------------------------- (1.7)
Yi
Where: annual salary
Xi
Years of teaching experience
D2 1 if female and 0 otherwise
D3 1
if white and = 0 otherwise
Notice that each of the two qualitative variables sex and color has two categories and hence
needs one dummy variable for each. Also the omitted, or base, category now is “black female
professor.”
Assuming; E (u i ) 0 , we can obtain the following regression from (1.7)
Mean salary for black female professor: E (Yi | D2 0, D3 0, X i ) 1 X i
Mean salary for black male professor: E (Yi | D2 1, D3 0, X i ) ( 1 2 ) X i
Mean salary for white female professor:
E (Yi | D2 0, D3 1, X i ) ( 1 3 ) X i
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 8
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
statistically significant, it will mean that color does affect a professor’s salary. Similarly, if 2 is
statistically significant, it will mean that sex also affects a professor’s salary. If both these
differential intercepts are statistically significant, it would mean sex as well as color is an
important determinant of professors’ salaries.
From the preceding discussion it follows that we can extend our model to include more than one
quantitative variable and more than two qualitative variables. The only precaution to be taken is
that the number of dummies for each qualitative variable should be one less than the number of
categories of that variable.
Until now, in the models considered in this chapter we assumed that the qualitative variables
affect the intercept but not the slope coefficient of the various subgroup regressions. But what if
the slopes are also different? If the slopes are in fact different, testing for differences in the
intercepts may be of little practical significance. Therefore, we need to develop a general
methodology to find out whether two (or more) regressions are different, where the difference
may be in the intercepts or the slopes or both.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 9
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
D3
is also constant across the two sexes. That is, if, say, the mean expenditure on clothing is
higher for females than males this is so whether they are college graduates or not.
Likewise, if, say, college graduates on the average spend more on clothing than non college
graduates, this is so whether they are female or males.
A female college graduate may spend more on clothing than a male graduate. In other words,
D
there may be interaction between the two qualitative variables D2 and 3 and therefore their
effect on mean Y may not be simply additive as in (1.8) but multiplicative as well, as in the
following model:
Yi 1 2 D2i 3 D3i 4 ( D2i D3i ) X i u i ----------------- (1.9)
Which shows that; the mean clothing expenditure of graduate females are different (by 4 ) from
the mean clothing expenditure of females or college graduates. If 2 3
, , and 4 are all positive,
the average clothing expenditure of females is higher (than the base category, which here is male
non graduate), but it is much more so if the females also happen to be graduates. Similarly, the
average expenditure on clothing by a college graduate tends to be higher than the base category
but much more so if the graduate happens to be a female. This shows how the interaction
dummy modifies the effect of the two attributes considered individually. Whether the coefficient
of the interaction dummy is statistically significant can be tested by the usual t test. If it turns out
to be significant, the simultaneous presence of the two attributes will attenuate or reinforce the
individual effects of these attributes. Needless to say, omitting a significant interaction term
incorrectly will lead to a specification bias.
1.3. Dummy as Dependent Variable
Here the dependent variable is qualitative. Suppose we want to study the labor-force
participation of adult males as a function of the unemployment rate, average wage rate, family
income, education, etc. A person either is in the labor force or not. Hence, the dependent
variable, labor-force participation, can take only two values: 1 if the person is in the labor force
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 10
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
and 0 if he or she is not. We can consider another example. A family may or may not own a
house. If it owns a house, it takes a value 1 and 0 if it does not.
In this situation considering Qualitative Response Model (QRM) is very important. QRM
contains different method of models. These are models in which the dependent variable is a
discrete outcome. There are two broad categories of QRM. These are:
A. Binomial Model: The choice is between two alternatives
B. Multinomial models: The choice is between more than two alternatives
Example: Y = 1, occupation is farming
= 2, occupation is carpentry
= 3, occupation is fishing
Binary variables: are variables that have two categories and are often used to indicate that an
event has occurred or that some characteristic is present.
Example: - Decision to participate in the labor force/or not to participate
Multinomial variables: These variables occur when there are multiple outcomes. Type of
binomial models are:
3. Linear probability models
4. The logit models
5. The Probit model
Yi =
0 + 1 X + U ……………………………(1)
i i
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 11
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
The above model expresses the dichotomous Y i as a linear function of the explanatory variable
Xi. Such kinds of models are called linear probability models (LPM) since E(Y i/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Yi/Xi) gives the
probability of a family owing a house and whose income is the given amount X i. The
justification of the name LPM can be seen as follows.
Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain
E(Yi/Xi) =
0 + 1 X …………………………………….(2)
i
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i = probability
that Yi = 0 (that is, that the event does not occur), the variable Yi has the following distributions:
Yi Probabilit y
0 1 Pi
1 Pi
Total 1
E(Yi/Xi) = Yi =
0 + 1 X = P ……………………………………(4)
i i
Since the probability Pi must lie between 0 and 1, we have the restriction 0 E (Yi/Xi) 1 that is,
the conditional expectation, or conditional probability, must lie between 0 and 1.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 12
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
very attractive model because it assumes that P i = E(Y = 1/X) increases linearly with X, that is,
the marginal or incremental effect of X remains constant throughout.
Geometrically, the model we want would look something like fig 7.1 below.
1 CDF
X
- 0
The above S-shaped curve is very much similar with the cumulative distribution function (CDF)
of a random variable. (Note that the CDF of a random variable X is simply the probability that it
takes a value less than or equal to x0, were x0 is some specified numerical value of X. In short,
F(X), the CDF of X, is F(X = x0) = P(X x0). Please refer to your text statistics for economists).
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic – which gives rise to the logit model (used to solve the problems logically)
b) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Recall that the LPM was (for home ownership)
Pi = E(Y = 1/Xi) =
0 + 1 X
i
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 13
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
1
Z
Pi = 1 e i where Zi =
0 + 1 X
i
This equation represents what is known as the (cumulative) logistic distribution function. Since
the above equation is no linear in both the X and the ’s. This means we cannot use the familiar
OLS procedure to estimate the parameters. This can be linear as follows.
1
1 – Pi = 1 e i
Z
Pi 1 e Zi
e Zi
1 Pi 1 e Z i
Pi
Now 1 Pi is simply the odds ratio in favor of owning a house- the ratio of the probability that a
family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
Pi
1 Pi = Zi = 0 + 1 Xi
Li = ln
L (the log of the odds ratio) is linear in X as well as (the parameters). L is called the logit and
hence the name logit model is given to it. The interpretation of the logit model is as follows:
1 – The slope measures the change in L for a unit change in X.
0 – The intercept tells the value of the log-odds in favor of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
Now for estimation purposes, let us write the logit model as
Pi
1 Pi = 0 + 1 Xi + Ui
Li = ln
The estimating model that emerges from the normal CDF is popularly known as the probit model.
Here the observed dependent variable Y, takes on one of the values 0 and 1 using the following
criteria. Define a latent variable Y* such that Yi = X i +
* 1
I
*
Y = 1 if Yi > 0
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 14
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
*
0 if Yi 0
The latent variable Y* is continuous (-< Y* <). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X’s through the structural
model.
- In the probit model, it is assumed that Var (i/Xi) = 1.
Summary
- logit function
e ( X i ) 1
X i
P(Y = 1/X) = 1 e 1 e X i
- Probit function
CHAPTER TWO
A time-series data is a set of observations on a quantitative variable collected over time. A time-
series is data collected over discrete intervals of time.
Examples include the annual price of wheat in the United States and the daily price of General
Electric stock shares. Macroeconomic data are usually reported in monthly, quarterly, or
annual terms.
Financial data, such as stock prices, can be recorded daily, or at even higher frequencies.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 15
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
The key feature of time-series data is that the same economic quantity is recorded at a regular time
interval. A time series data set consists of observations on a variable or several variables over
time. In time series analysis, we analyze the past behavior of a variable in order to predict its
future behavior.
Some Time Series Terms
• Stationary - a time series variable exhibiting no significant upward or downward trend over time.
• Nonstationary - a time series variable exhibiting a significant upward or downward trend over time.
• Seasonal Data - a time series variable exhibiting a repeating pattern at regular intervals over time.
• Univariate time-series analysis- analysis of single sequence of data describing the behavior of one
variable in terms of its own past values.
Graphically, Stationary
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 16
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 17
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
On the other hand, time-series observations on a given economic unit are observed over a number of
time periods.
A second distinguishing feature of time-series data is its natural ordering according to time. With
cross-section data there is no particular ordering of the observations that is better or more natural
than another. To show the dynamic nature of relationships:
Given that the effects of changes in variables are not always instantaneous, we need to ask how to
model the dynamic nature of relationships. Wehave a dynamic model with lagged values of both the
dependent and explanatory variables, such as
Yt = f(Yt-1,Xt,Xt-1, Xt-2 ) -------------------------------(3.1)
Such models are called autoregressive distributed lag (ARDL) models, with ‘‘autoregressive’’
meaning a regression of Yt on its own lag or lags.
Examples: Consider the following:
Yt = 1.2Yt-1 + Ɛt --------------------------------- the AR (1) model, then,
Or Yt = δ + 1.2Yt-1 + Ɛt --------------------------------- the AR (1) model, then
Yt = 1.2Yt-1- 0.32Yt-2 + Ɛt -------------------------- the AR (2) model
OrYt = δ + 1.2Yt-1- 0.32Yt-2 + Ɛt -------------------- the AR (2) model
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 18
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
The main purpose of time-series analysis is to study the dynamics or temporal structure of the
data.
The collection of random variable yt ordered in time is called a stochastic process or random
process. There are two different classes of the stochastic process.
Stationary stochastic process-gives rise to stationary time series.
Non-stationary stochastic process- gives rise to non-stationary time series.
Stationary Stochastic Processes
Stochastic process is said to be stationary if its mean and variance are constant over time (do not
depend on time or do not change as time changes). Moreover, the value of the covariance
between the two time periods depends only on the lag between the two time periods and not on
the actual time. A non-stationary time series will have a time varying mean or a time-varying
variance or both.
Non-stationary Stochastic Processes
In practical research one often encounters non-stationary time series. The classic example is the
Random Walk Model (RWM). We distinguish two types of random walks:
The series process, Ytis said to be a random walk without drift if;
Yt= Yt−1 + ut
Where: ut is a white noise error term (error term with mean 0 and variance σ2).
This model says that the value of Y at time period t (i.e., Yt)is equal to its value at time (t−1) plus a
random shock(ut) and it is an AR(1) model, because it is regressed on itself lagged one period.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 19
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Yt Y0 ut
E (Yt ) E Y0 ut Y0 (Why?)
In short, the RWM without drift is a non-stationary stochastic process.
Yt Yt 1 ut
Where: δ is known as the drift parameter.
Why we call it drift? Because if we write the preceding equation as;
The model will show that Yt drifts upward or downward, depending on whether δ being positive
or negative. Note that RWM with drift is also an AR model. Therefore, in general we can conclude
that the Random Walk Model (with or without drift) is non-stationary stochastic process.
The random walk model is an example of what is known as a unit root process. Let us write the
RWM as:
In practical research, it is important to find out whether a time series possesses has unit root (or if it
is non-stationery). Note that the term unit root process is similar to non-stationery process.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 20
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
CHAPTER THREE
In all the previous chapters discussed so far, we have been focusing exclusively with the
problems and estimations of a single equation regression models. In such models, a dependent
variable is expressed as a linear function of one or more explanatory variables. The cause-and-
effect relationship in such models between the dependent and independent variable is
unidirectional. That is, the explanatory variables are the cause and the independent variable is
the effect. But there are situations where such one-way or unidirectional causation in the function
is not meaningful. This occurs if, for instance, Y (dependent variable) is not only function of X’s
(explanatory variables) but also all or some of the X’s are, in turn, determined by Y. There is,
therefore, a two-way flow of influence between Y and (some of) the X’s which in turn makes the
distinction between dependent and independent variables a little doubtful. Under such
circumstances, we need to consider more than one regression equations; one for each
interdependent variables to understand the multi-flow of influence among the variables. This is
precisely what is done in simultaneous equation models.
Some examples of SEMs in Economics
2. Demand-Supply Model
3. Keynesian Model of Income Determination
4. Wage–Price Models
5. The Is and Lm Model Of Macroeconomics
A system describing the joint dependence of variables is called a simultaneous equations model.
The number of equations in such models is equal to the number of jointly dependent or
endogenous variables involved in the phenomenon under analysis. Unlike the single equation
models, in simultaneous equation models it is not usually possible to estimate a single equation
of the model without taking into account the information provided by other equation of the
system.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 21
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
If one applies OLS to estimate the parameters of each equation disregarding other equations of
the model, the estimates so obtained are not only biased but also inconsistent; i.e. even if the
sample size increases indefinitely, the estimators do not converge to their true values. The bias
arising from application of such procedure of estimation which treats each equation of the
simultaneous equations model as though it were a single model is known as simultaneity bias or
simultaneous equation bias. To avoid this bias we will use other methods of estimation, such as:
What happens to the parameters of the relationship if we estimate by applying OLS to each
equation without taking into account the information provided by the other equations in the
system? One of the crucial assumptions of the OLS is that the explanatory variables and the
disturbance term is independent i.e. the disturbance term is truly exogenous. Symbolically:
E[Xi/Ui] = 0. As a result, the linear model could be interpreted as describing the conditional
expectation of the dependent variable (Y) given a set of explanatory variables. In the
simultaneous equation models, such independence of explanatory variables and disturbance
term is violated i.e. E[XiUi] 0. If this assumption is violated, the OLS estimator is biased and
inconsistent.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 22
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Where X and Y are endogenous variables and Z is an exogenous variable. The reduced form of
X of the above model is obtained by substituting Y in the equation of X.
X 0 1 ( 0 1 X U ) 2 Z V
0 0 1 2 U V
X Z 1 (11)
1 1 1 1 1 1 1 1 1
Applying OLS to the first equation of the above structural model will result in biased estimator
because cov( X iU i ) ( X iU j ) 0 .
endogenous. For instance; X t and X t 1 depict the current and lagged exogenous variables and
Yt 1
depicts lagged endogenous variable. This is on the assumption that X’s symbolize the
Y
exogenous variables and Y’s symbolize the endogenous variables. Thus, X t , X t 1 and t 1 are
regarded as predetermined (exogenous) variables.
Since the exogenous variables are predetermined, they are supposed to be independent of the
error terms in the model.
Consider the demand and supply functions.
Q d 0 1 P 2Y U 1 (14)
Q s 0 1 P 2 R U 2 (15)
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 23
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Y C Z ---------------------------------------------------- (17)
C C Z U
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 24
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
C C Z U
C (1 ) Z U
U
C Z
1 1 1 ---------------------------------- (18)
Equation (18) and (19) are called the reduced form of the structural model of the above. We can
write this more formally as:
Y CZ 1 U
Y Z
1 1 1
Parameters of the reduced form measure the total effect (direct and indirect) of a change in
exogenous variables on the endogenous variable. For instance, in the above reduced form
equation (18), 1 measures the total effect of a unit change in the non-consumption
1
expenditure on consumption. This total effect is , the direct effect, times 1 , the indirect
effect. The reduced form equations can be obtained in two ways:
1) To express the endogenous variables directly as a function of the predetermined
variables.
2) To solve the structural system of endogenous variables in terms of the predetermined
variables, the structural parameters, and the disturbance terms.
Consider the following simple model for a closed economy.
Ct = a1Yt + U1 --------------------------------------------------------- (i)
It = b1Yt + b2Yt-1 + U2----------------------------------------------- (ii)
Yt = Ct +It + Gt------------------------------------------------------- (iii)
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 25
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
This model has three equations in three endogenous variables (C t, It, and Yt) and two
predetermined variables (Gt, andYt-1).
To obtain the reduced form of this model, we may use two methods (direct method and solving
the structural model method).
Direct Method: Express the three endogenous variables (C t , It , and Yt ) as functions of the two
predetermined variables (Gt, andYt-1) directly using ’s as the parameters of the reduced form
model as follows.
Note: 11, 12 , 21 , 22 , 31 , and 32 are reduced from parameters.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 26
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
iii) Substitute the estimates of the structural coefficients into the system of parameters’
relations to find the estimates of the reduced coefficients,
Recursive models
A model is called recursive if its structural equations can be ordered in such a way that the first
equation includes only the predetermined variables in the right hand side; the second equation
contains predetermined variables and the first endogenous variable (of the first equation) in the
right hand side and so on. The special feature of recursive model is that its equations may be
estimated, one at a time, by OLS without simultaneous equations bias.
OLS is not applicable if there is interdependence between the explanatory variables and the error
term. In the simultaneous equation models, the endogenous variables may depend on the error
terms of the model; hence the OLS technique is not appropriate for estimation of an equation in a
simulations equations model.
In the above illustration, as usual, the X’s and Y’s are exogenous and endogenous variables
respectively. The disturbance terms follow the following assumptions.
(U 1U 2 ) (U 1U 3 ) (U 2U 3 ) 0
The above assumption is the most crucial assumption that defines the recursive model. If this
does not hold, the above system is no longer recursive and OLS is also no longer valid. The first
equation of the above system contains only the exogenous variables on the right hand side. Since
by assumption, the exogenous variable is independent of U 1 , the first equation satisfies the
critical assumption of the OLS procedure. Hence OLS can be applied straight forwardly to this
equation.
Consider the second equation. It contains the endogenous variable Y1 as one of the explanatory
variables along with non-stochastic X’s. OLS can be applied to this equation only if it can be
shown that Y1 and U 2 are independent of each other. This is true because U , which affects Y1 is
1
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 27
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
In the first equation, there are only exogenous variables and are assumed to be independent of
U 1 . In the second equation, the causal relation between Y1 and Y2 is in one direction. Also Y1 is
independent of U 2 and can be treated just like exogenous variable. Similarly since Y2 is
U
independent of 3 , OLS can be applied to the third equation. Thus, we can rewrite the above
equations as follows:
Y1 1 2 X 2 3 X 3 U 1
1Y1 Y2 4 5 X 4 U 2
2Y2 Y3 6 7 X 5 U 3
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 28
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 29
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
In applying the identification rules, we should either ignore the constant term, or, if we want to
retain it, we must include in the set of variables a dummy variable (say X 0) which would always
take on the value 1. Let’s ignore the constant intercept. There are two formal rules for
identification
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 30
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
k (G 1)
excluded
var iable total number of equatioins 1
Examples: State identifiability status of each the following equation using order condition stated
above.
1) If a system contains 10 equations with 15 variables, ten endogenous and five exogenous, an
equation containing 11 variables. For the equation we have,
G 10 K 15 M 11
Order condition:
( K M ) (G 1)
(15 11) (10 1) ;that is, the order condition is not satisfied and thus, not identified.
2) if a system contains 10 equations with 15 variables, ten endogenous and five exogenous, an
equation containing 5 variables.
The order condition for identification is necessary for a relation to be identified, but it is not
sufficient, that is, it may be fulfilled in any particular equation and yet the relation may not be
identified.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 31
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
CHAPTER FOUR
5.1. Introduction
Panel Data are Models that Combine Cross-section and Time-Series Data. In panel data the
same cross-sectional unit (industry, firm, country) is surveyed over time, so we have data
which is pooled over space as well as time.
Reasons for using Panel Data
Panel data can take explicit account of individual-specific heterogeneity (“individual”
here means related to the micro unit)
By combining data in two dimensions, panel data gives more data variation, less
collinearity and more degrees of freedom.
Panel data is better suited than cross-sectional data for studying the dynamics of change.
For example it is well suited to understanding transition behaviour – for example
company bankruptcy or merger.
Autocorrelation
Although different to autocorrelation using the usual OLS models, a version of the Durbin-
Watson test can be used in the usual way. (E-views reports this). To remedy autocorrelation we
can use the usual methods, such as the Error Correction Model. ‘Dynamic Models’ are also often
used, which basically involves adding a lagged dependent variable. Recently the use of a method
for adjusting the standard errors has become popular, the most common method is termed the
‘Newey-West’ adjusted standard errors.
Heteroskedasticity
Given that there is a cross-section component to panel data, there will always be a potential for
heteroskedasticity. Although there are various tests for heteroskedastcity, as with autocorrelation
there is a tendency to automatically use adjusted standard errors, which remove the problem.
With heteroskedasticity, it is usually White’s adjusted standard errors that are used.
1. Example, the data consists of 20 countries over 10 years of annual data, giving 200
observations in all (T=200). This produces the following result, where stock prices are
regressed against expenditure on research (r ):
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 32
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
2. Example, the results are interpreted in the usual way, however you would need to decide
whether you wished to use fixed or random effects in this model.
Panel or longitudinal data sets consist of repeated observations for the same units, firms,
individuals or other economic agents. Typically the observations are at different points in time.
Let Yit denote the outcome for unit i in period t, and Xit a vector of explanatory variables. The
index i denotes the unit and runs from 1 to N, and the index t denotes time and runs from 1 to T.
There are two types of Panel Data. These are:
Balanced panel data: the time period is the same for each sampling unit.
E.g. year1=year2=year3=300 households
Unbalanced panel data has potentially different numbers of observations for each unit at
different points in time.
E.g. year1=300, year2=295, year3=270
– Households move to other places/ members/household heads die
– Firms go out of business
Panel data in stata
• Use the data file ‘Epanel’
• Check in what format it is presented: make sure that it is presented in long format as
opposed to wide format.
• Make sure that the data has two identifiers: the entity id (hid) and the panel period (year).
• Make sure that the entity id is unique for a panel period.
• Declare to stata that your data is a panel data using this command: xtset hid year
• The key issue with panel data is that Yit (outcome in period t) and Yis (outcome in
periods) tend to be correlated even conditional on the covariates Xit and Xis.
Let us look at this in a linear model
• What is Mr.C ? JJ
It is called the unobserved individual effect.
– It is unobserved: e. g: genetic make up
– It is individual-specific
– It is time-invariant : stays the same over time
– It is random.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 33
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
It creates the correlation between Yit and Yis even with the error term uncorrelated over time
and units.
5.2. Estimation of panel data Regression model The fixed Effect Approach
Use fixed effects provided the following assumptions are fulfilled:
• Assumption1: Strict Exogeneity:
• Assumption2: Uncorrelated Effects:
Fixed Effects Estimation
Covariance Model
Within Estimator
Individual Dummy Variable Model
Least Squares Dummy Variable Model.
• Each entity has its own individual characteristics that may or may not influence the
predictor variables
Fixed Effect removes the effect of those time-invariant characteristics from the predictor
variables so we can assess the predictors’ net effect. Each entity is different therefore the
entity’s error term and the constant (which captures individual characteristics) should not be
correlated with the others. If the error terms are correlated then FE is not suitable since
inferences may not be correct and you need to model that relationship (probably using random-
effects), this is the main rationale for the Hausman test (presented later on in this document).
Another important assumption of the FE model is that those time-invariant characteristics are
unique to the individual and should not be correlated with other individual characteristics
5.3. Estimation of panel data Regression model Random effect estimation
This is a very strong assumption to make in empirical analysis. The rationale behind random
effects model is that, unlike the fixed effects model, the variation across entities is assumed to be
random and uncorrelated with the predictor or independent variables included in the model. If
you have reason to believe that differences across entities have some influence on your
dependent variable then you should use random effects. An advantage of random effects is that
you can include time invariant variables (i.e. gender). In the fixed effects model these variables
are absorbed by the intercept. To decide between fixed or random effects you can run a Hausman
test where the null hypothesis is that the preferred model is random effects versus the alternative
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 34
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
the fixed effects (see Green, 2008, chapter 9). It basically tests whether the unique errors
(unobserved individual characteristics) are correlated with the regressors.
Conclusion:
Panel data is a method for estimating data which is both time series and cross
sectional
It has both advantages but also disadvantages over OLS estimation
It applies to many different techniques, such as tests for stationary.
WORK SHEET
Part I: choose the best answer from the available alternatives.
____1. If artificial values 0 and 1is given to qualitative variables, we call it as;
A. Binary variables B. Categorical variable C. Dichotomous variables D. all
____2. Income, output, price, cost, ….etc are expressed as ____________ variables in their
nature.
A. Qualitative B. quantitative C. Parameters D. Regression
____3. The simultaneous equation that may be estimated by OLS without simultaneity bias is;
A. Structural model B. Reduced form C. Recursive model D. Exogenous
variable
____4. The structural equation of reduced form may be expressed interims of;
A. Endogenous variable B. exogenous variable C. Random variable D. all
____5. Exogenous variable are
also called as;
A. Predetermined B. Lagged C. Direct effect D. current effect
____6. Dummy variables are also called as;
B. Binary variables B. categorical variable C. dichotomous variables D. all
____7. Solving the structural system of endogenous variable interims of predetermined variable
is;
A. Solving structural method B. Direct method C. OLS method D.
parameters
____8. Independent variable is also called as;
A. Explanatory variable B. Explained variable C. Dependent variable D.
Regresand
____9. OLS is not applicable if there is interdependence between the explanatory variables and
the error term. A. True B. False
____10. In classical linear regression model assumption the mean value of error term is;
A. Zero B. constant C. non-negative D. different from zero
Part II: Write ‘True’ or ‘False’ for the following questions.
____1. In dummy regression analysis, the dependent variable is not only influenced by
quantitative independent variable.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 35
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
____2. In classical linear regression model assumption the mean value of error term is different
from zero
____3. In simultaneous equation OLS is not applicable since the mean of error term is different
from zero
____4. Simultaneous equation is unidirectional since there is no multi-flow of influences among
variables
____5. Endogenous variables are the variables that are determined by economic model.
Part III: Workout
1. Assume that the following simple simultaneous equation models are given.
Y 0 1 X U
X 0 1Y 2 Z V
Required: (a) Find the reduced form of X.
(b) Find the reduced form of Y.
2. The following represents a simplified model of the economy:
C t α 0 α1Yt α 2 C t 1 μ 1t (consumpti on)
I t 0 1 rt 2 I t 1 2t (investmen t)
rt 0 1Yt 2 M t 3t (money market)
Yt C t I t Gt (income identity)
where C consumptio n, Y income, r rate of intrest
M money supply, I investment and G government
expenditur e
21,200 1
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 36
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
Required: a. find the mean salary of black instructor and white instructor
a. B. calculate the slop of the regression analysis
b. C. fit the regression analysis result
4. In studying the effect of a number of qualitative attributes on the prices charged for
movie admissions in a large metropolitan area for the period 2007-2010, Galma Ada
Oromo obtained the following regression for the year 2007:
Yˆ 4.13 5.77 D1 8.12 D2 7.68 D3 1.13D4 27.09 D5 31.46 log X 1 0.81X 2 3other dummy var iables
(2.04) (2.67) (2.51) (1.78) (3.58) (13.78) (0.17)
R 2 0.961
Where:
D1 Theater location: 1 if suburban, 0 if city center
D2 Theater age: 1 if less than 10 years since construction or major renovation, 0 otherwise.
D3
Type of theater: 1 if outdoor, 0 if indoor
D4 Parking: 1 if provided, 0 otherwise
D5
Screening policy: 1 if first run, 0 otherwise
X 1 Average percentage unused seating capacity per showing
Y
Adult evening admission price, cents and where the figures in parentheses are standard
errors.
Required:
a. Comment on the results.
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 37