Econometrics II-1-1

RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF
ECONOMICS A.Y 2020/2012
RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS
FACULTY OF BUSINESS AND ECONOMICS
DEPARTMENT OF ECONOMICS
COURSE TITLE: - Econometrics II

CREDIT HOUR: - 4hrs
COURSE CODE: - (Econ -2062)
CHAPTER ONE
1. REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION: BINARY/ DUMMY
VARIABLES
1.1 Describing Qualitative Information
1.2. Dummy as Independent Variables
Econometrics II Lecture Note Economics Department For 2ND Year Students Page 1
1.2.2 Regression on one quantitative variable and one qualitative variable with more than two
classes
1.4 Regression on one quantitative variable and two qualitative variables
1.5 Testing for structural stability of regression models
1.3. Dummy as Dependent Variable
1.3.1. The Linear Probability Model (LPM)
1.3.2. The Logit Model
1.3.3. The Probit Model
CHAPTER TWO
2. INTRODUCTION TO BASIC REGRESSION ANALYSIS WITH TIME SERIES
2.1. The Nature of Time Series Data
2.2. Stationary and Non-stationary stochastic processes
CHAPTER THREE
3. INTRODUCTION TO SIMULTANEOUS EQUATION MODELS
3.1. The Nature of Simultaneous Equation Model
1.2. Simultaneity bias
2.3 The Order and Rank Condition of identification problem
CHAPTER FOUR
2. INTRODUCTION TO PANEL DATA ANALYSIS
1.1. Introduction
1.2. Estimation of panel data Regression model The fixed Effect Approach
1.3. Estimation of panel data Regression model Random effect estimation
 WORK SHEET
CHAPTER ONE
1. REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION:
BINARY/ DUMMY VARIABLES
1.1 Describing Qualitative Information
In regression analysis, a dummy variable (also known as an indicator variable) is one that
takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be
expected to shift the outcome. Those variables are variables essentially not quantified or non-
measurable by their nature. Dummy variables are "proxy" variables or numeric stand-ins for
qualitative facts in a regression model. In regression analysis, the dependent variables can be also
influenced by qualitative variables (gender, religion, geographic region, etc.). Hence the method
of converting such variables into quantitative method is possible through constructing artificial
values.
1.2. Dummy as Independent Variables
In regression analysis the dependent variable is frequently influenced not only by variables that
can be readily quantified on some well-defined scale (e.g., income, output, prices, costs, height,
and temperature), but also by variables that are essentially qualitative in nature (e.g., sex, race,
color, religion, nationality, wars, earthquakes, strikes, political upheavals, and changes in
government economic policy).
For example, holding all other factors constant, female professors are found to earn less
than their male counterparts, and nonwhites are found to earn less than whites.
This pattern may result from sex or racial discrimination, but whatever the reason, qualitative
variables such as sex and race does influence the dependent variable and clearly should be
included among the explanatory variables. Since such qualitative variables usually indicate the
presence or absence of a “quality” or an attribute, such as male or female, black or white, or
Christian or Muslim, one method of “quantifying” such attributes is by constructing artificial
variables that take on values of 1 or 0, 0 indicating the absence of an attribute and 1 indicating
the presence (or possession) of that attribute. For example, 1 may indicate that a person is a
male, and 0 may designate a female; or 1 may indicate that a person is a college graduate, and 0
that he is not, and so on. Variables that assume such 0 and 1 values are called dummy variables.
Alternative names are indicator variables, binary variables, categorical variables, and
dichotomous variables.
Dummy variables can be used in regression models just as easily as quantitative variables.
Example:
Yi    Di  u i ------------------------------------------ (1.1)
Where: Y = annual salary of a college professor
Di  1 if male college professor
= 0 otherwise (i.e., female professor)
Note that (1.1) is like the two variable regression models encountered previously except that
instead of a quantitative X variable we have a dummy variable D (hereafter, we shall designate
all dummy variables by the letter D).
Assuming that the disturbance satisfy the usually assumptions of the classical linear regression
model (i.e the mean of error term is zero), we obtain from (1.1).
Mean salary of female college professor: E (Yi / Di  0)   ….………………….(1.2)
Mean salary of male college professor: E (Yi / Di  1)    
That is, the intercept term  gives the mean salary of female college professors and the slope
coefficient  tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart,    reflecting the mean salary of the male college
professor.
A test of the null hypothesis that there is no sex discrimination ( H 0 :   0) can be easily made
by running regression in the usual manner and finding out whether on the basis of the t test the
estimated  is statistically significant.
Consider the following hypothetical data on satisfying salaries of college teachers by sex
Starting salary Sex
(Y) (1 = male, 0 = female)
22,000 1
19,000 0
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
21,200 1
The estimated mean salary of female is =
The estimated mean salary of male is =
Therefore the results of regression analysis are presented as follows:
Yˆi = 18,000 + 3,280D
i
(0.32) (0.44)
t = (57.74) (7.439)
R2 = 0.8737
The above results shows that the estimated mean salary of female college instructor is birr
18,000

(= ) and that of male instructor is birr 21,280 ( +  1 )

α α
1.2.1 Regression on one quantitative variable and one qualitative variable with two classes,
Consider the model: Yi   i   2 Di   X i  ui ---------------------------- (1.3)

Yi 
Where: annual salary of a college professor
X i  Years of teaching experience
Di  1 if male
= 0 otherwise
Model (1.03) contains one quantitative variable (years of teaching experience) and one
qualitative variable (sex) that has two classes namely, male and female. What is the meaning of
this equation? Assuming, as usual, that E (u i )  0, we see that
Mean salary of female college professor:
E (Yi / X i , Di  0)   1   X i --------- (1.4)
Mean salary of male college professor: E (Yi / X i , Di  1)  (1   2 )  X i ------ (1.5)

Geometrically, we have the situation shown in fig. 1.1 (for illustration, it is assumed that  1  0 ).
Model 1.1 postulates that the male and female college professors’ salary functions in relation to
the years of teaching experience have the same slope   but different intercepts. In other words,
it is assumed that the level of the male professor’s mean salary is different from that of the
female professor’s mean salary (by  2 ) but the rate of change in the mean annual salary by years
of experience is the same for both sexes.
If the assumption of common slopes is valid, a test of the hypothesis that the two regressions
(1.4) and (1.5) have the same intercept (i.e., there is no sex discrimination) can be made easily by
running the regression (1.3) and noting the statistical significance of the estimated  2 on the
basis of the traditional t test. If the t test shows that ̂ 2 is statistically significant, we reject the
null hypothesis that the male and female college professors’ levels of mean annual salary are the
same.
Before proceeding further, note the following features of the dummy variable regression model
considered previously.
1. To distinguish the two categories, male and female, we have introduced only one dummy
D 1
variable Di . For if i always denotes a male, when Di  0 we know that it is a
female since there are only two possible outcomes. Hence, one dummy variable suffices
to distinguish two categories. The general rule is this: If a qualitative variable has ‘m’
categories, introduce only ‘m-1’ dummy variables. If this rule is not followed, we shall
fall into what might be called the dummy variable trap, that is, the situation of perfect
multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary
in the sense that in our example we could have assigned D = 1 for female and D = 0 for
male.
3. The group, category, or classification that is assigned the value of 0 is often referred to as
the base, benchmark, control, comparison, reference, or omitted category. It is the base
in the sense that comparisons are made with that category.
4. The coefficient  2 attached to the dummy variable D can be called the differential
intercept coefficient because it tells by how much the value of the intercept term of the
category that receives the value of 1 differs from the intercept coefficient of the base
category.
1.2.2 Regression on one quantitative variable and one qualitative variable with more than
two classes
Suppose that, on the basis of the cross-sectional data, we want to regress the annual expenditure
on health care by an individual on the income and education of the individual. Since the variable
education is qualitative in nature, suppose we consider three mutually exclusive levels of
education: primary school, high school, and college. Now, unlike the previous case, we have
more than two categories of the qualitative variable education. Therefore, following the rule that
the number of dummies be one less than the number of categories of the variable, we should
introduce two dummies to take care of the three levels of education. Assuming that the three
educational groups have a common slope but different intercepts in the regression of annual
expenditure on health care on annual income, we can use the following model:
Yi   1   2 D2i   3 D3i   X i  u i -------------------------- (1.6)
Yi 
Where: annual expenditure on health care
Xi 
Annual expenditure
D2  1 if high school education and 0 otherwise
D3 
1 if college education and 0 otherwise
Note that in the preceding assignment of the dummy variables we are arbitrarily treating the
“primary school education” category as the base category. Therefore, the intercept  1 will
reflect the intercept for this category. The differential intercepts  2 and  3 tells by how much
the intercepts of the other two categories differ from the intercept of the base category, which
can be readily checked as follows: Assuming E (u i )  0 , we obtain from equation (1.6)

E (Yi | D2  0, D3  0, X i )   1   X i
E (Yi | D2  1, D3  0, X i )  ( 1   2 )   X i
E (Yi | D2  0, D3  1, X i )  ( 1   3 )   X i
Which are, respectively the mean health care expenditure functions for the three levels of
education, namely, primary school, high school, and college. Geometrically, the situation is
shown in fig 1.2 (for illustrative purposes it is assumed that  3   2 ).
1.4 Regression on one quantitative variable and two qualitative variables
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. Let us revert to the college professors’ salary regression (1.3), but now assume that in
addition to years of teaching experience and sex the skin color of the teacher is also an important
determinant of salary. For simplicity, assume that color has two categories: black and white.
We can now write (1.3) as :
Yi   1   2 D2i   3 D3i   X i  u i ------------------------------------------- (1.7)
Yi 
Where: annual salary
Xi 
Years of teaching experience
D2  1 if female and 0 otherwise
D3  1
if white and = 0 otherwise
Notice that each of the two qualitative variables sex and color has two categories and hence
needs one dummy variable for each. Also the omitted, or base, category now is “black female
professor.”
Assuming; E (u i )  0 , we can obtain the following regression from (1.7)
Mean salary for black female professor: E (Yi | D2  0, D3  0, X i )   1   X i
Mean salary for black male professor: E (Yi | D2  1, D3  0, X i )  ( 1   2 )   X i
Mean salary for white female professor:
E (Yi | D2  0, D3  1, X i )  ( 1   3 )  X i
Mean salary for white male professor: E (Yi | D2  1, D3  1, X i )  ( 1   2   3 )   X i

Once again, it is assumed that the preceding regressions differ only in the intercept coefficient
but not in the slope coefficient  .
An OLS estimation of (1.6) will enable us to test a variety of hypotheses. Thus, if  3 is
statistically significant, it will mean that color does affect a professor’s salary. Similarly, if  2 is
statistically significant, it will mean that sex also affects a professor’s salary. If both these
differential intercepts are statistically significant, it would mean sex as well as color is an
important determinant of professors’ salaries.
From the preceding discussion it follows that we can extend our model to include more than one
quantitative variable and more than two qualitative variables. The only precaution to be taken is
that the number of dummies for each qualitative variable should be one less than the number of
categories of that variable.
1.5 Testing for structural stability of regression models
Until now, in the models considered in this chapter we assumed that the qualitative variables
affect the intercept but not the slope coefficient of the various subgroup regressions. But what if
the slopes are also different? If the slopes are in fact different, testing for differences in the
intercepts may be of little practical significance. Therefore, we need to develop a general
methodology to find out whether two (or more) regressions are different, where the difference
may be in the intercepts or the slopes or both.
Interaction effects: Consider the following model:

Yi   1   2 D2i   3 D3i   X i  u i --------------------------------- (1.8)
Yi 
Where: annual expenditure on clothing
Xi 
Income
D2  1 if female and 0 if male
D3  1
if college graduate and 0 otherwise
Implicit in this model is the assumption that the differential effect of the sex dummy D2 is
constant across the two levels of education and the differential effect of the education dummy
D3
is also constant across the two sexes. That is, if, say, the mean expenditure on clothing is
higher for females than males this is so whether they are college graduates or not.
Likewise, if, say, college graduates on the average spend more on clothing than non college
graduates, this is so whether they are female or males.
A female college graduate may spend more on clothing than a male graduate. In other words,
D
there may be interaction between the two qualitative variables D2 and 3 and therefore their
effect on mean Y may not be simply additive as in (1.8) but multiplicative as well, as in the
following model:
Yi   1   2 D2i   3 D3i   4 ( D2i D3i )  X i  u i ----------------- (1.9)
From (4.9) we obtain:

E (Yi | D2  1, D3  1, X i )  ( 1   2   3   4 )   X i ------------ (1.10)
Which is the mean clothing expenditure of graduate females are:
 2  Differential effect of being a female
 3  Differential effect of being a college graduate
 4  Differential effect of being a female graduate
Which shows that; the mean clothing expenditure of graduate females are different (by  4 ) from
the mean clothing expenditure of females or college graduates. If 2 3
 ,  , and  4 are all positive,
the average clothing expenditure of females is higher (than the base category, which here is male
non graduate), but it is much more so if the females also happen to be graduates. Similarly, the
average expenditure on clothing by a college graduate tends to be higher than the base category
but much more so if the graduate happens to be a female. This shows how the interaction
dummy modifies the effect of the two attributes considered individually. Whether the coefficient
of the interaction dummy is statistically significant can be tested by the usual t test. If it turns out
to be significant, the simultaneous presence of the two attributes will attenuate or reinforce the
individual effects of these attributes. Needless to say, omitting a significant interaction term
incorrectly will lead to a specification bias.
1.3. Dummy as Dependent Variable
Here the dependent variable is qualitative. Suppose we want to study the labor-force
participation of adult males as a function of the unemployment rate, average wage rate, family
income, education, etc. A person either is in the labor force or not. Hence, the dependent
variable, labor-force participation, can take only two values: 1 if the person is in the labor force
and 0 if he or she is not. We can consider another example. A family may or may not own a
house. If it owns a house, it takes a value 1 and 0 if it does not.
In this situation considering Qualitative Response Model (QRM) is very important. QRM
contains different method of models. These are models in which the dependent variable is a
discrete outcome. There are two broad categories of QRM. These are:
A. Binomial Model: The choice is between two alternatives
B. Multinomial models: The choice is between more than two alternatives
Example: Y = 1, occupation is farming
= 2, occupation is carpentry
= 3, occupation is fishing
Binary variables: are variables that have two categories and are often used to indicate that an
event has occurred or that some characteristic is present.
Example: - Decision to participate in the labor force/or not to participate
Multinomial variables: These variables occur when there are multiple outcomes. Type of
binomial models are:
3. Linear probability models
4. The logit models
5. The Probit model
1.3.1. The Linear Probability Model (LPM)

The linear probability model is the regression model applied to a binary dependent variable. To
fix ideas, consider the following simple model:
Yi =
 0 +  1 X + U ……………………………(1)
i i
Where: X = family income

Y = 1 if the family owns a house
= 0 if the family does not own a house
Ui is the disturbance term
The independent variable Xi can be discrete or continuous variable. The model can be extended
to include other additional explanatory variables.
The above model expresses the dichotomous Y i as a linear function of the explanatory variable
Xi. Such kinds of models are called linear probability models (LPM) since E(Y i/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Yi/Xi) gives the
probability of a family owing a house and whose income is the given amount X i. The
justification of the name LPM can be seen as follows.
Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain
E(Yi/Xi) =
 0 +  1 X …………………………………….(2)
i
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i = probability
that Yi = 0 (that is, that the event does not occur), the variable Yi has the following distributions:
Yi Probabilit y
0 1  Pi
1 Pi
Total 1
Therefore, by the definition of mathematical expectation, we obtain

E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ……………………………………..(3)
Now, comparing (2) with (3), we can equate
E(Yi/Xi) = Yi =
 0 +  1 X = P ……………………………………(4)
i i
Since the probability Pi must lie between 0 and 1, we have the restriction 0  E (Yi/Xi)  1 that is,
the conditional expectation, or conditional probability, must lie between 0 and 1.
1.3.2. The Logit Model

We have seen that LPM has many problems, such as non-normality of U i, heteroscedasticity of
ˆ
Ui, possibility of Yi lying outside the 0-1 range, and the generally lower R 2 values. But these
problems are surmountable. The fundamental problem with the LPM is that it is not logically a
very attractive model because it assumes that P i = E(Y = 1/X) increases linearly with X, that is,
the marginal or incremental effect of X remains constant throughout.
Geometrically, the model we want would look something like fig 7.1 below.
1 CDF
X
- 0
Fig A Cumulative Distribution Function (CDF)
The above S-shaped curve is very much similar with the cumulative distribution function (CDF)
of a random variable. (Note that the CDF of a random variable X is simply the probability that it
takes a value less than or equal to x0, were x0 is some specified numerical value of X. In short,
F(X), the CDF of X, is F(X = x0) = P(X x0). Please refer to your text statistics for economists).
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic – which gives rise to the logit model (used to solve the problems logically)
b) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Recall that the LPM was (for home ownership)
Pi = E(Y = 1/Xi) =
 0 + 1 X
i
Where X is income and Y = 1 means the family owns a house.

Now consider the following representation of home ownership.
1
 (  0  1 X i )
Pi = E(Y = 1/Xi) = 1  e
1
Z
Pi = 1  e i where Zi =
 0 + 1 X
i
This equation represents what is known as the (cumulative) logistic distribution function. Since
the above equation is no linear in both the X and the  ’s. This means we cannot use the familiar
OLS procedure to estimate the parameters. This can be linear as follows.
1
1 – Pi = 1  e i
Z
Pi 1  e Zi
  e Zi
1  Pi 1  e  Z i
Pi
Now 1  Pi is simply the odds ratio in favor of owning a house- the ratio of the probability that a
family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
 Pi 
 
1  Pi  = Zi =  0 +  1 Xi
Li = ln 
L (the log of the odds ratio) is linear in X as well as  (the parameters). L is called the logit and
hence the name logit model is given to it. The interpretation of the logit model is as follows:
 1 – The slope measures the change in L for a unit change in X.
 0 – The intercept tells the value of the log-odds in favor of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
Now for estimation purposes, let us write the logit model as
 Pi 
 
1  Pi  =  0 +  1 Xi + Ui
Li = ln 
1.3.3. The Probit Model
The estimating model that emerges from the normal CDF is popularly known as the probit model.
Here the observed dependent variable Y, takes on one of the values 0 and 1 using the following
criteria. Define a latent variable Y* such that Yi = X i  + 
* 1
I
*
Y = 1 if Yi > 0
*
0 if Yi  0
The latent variable Y* is continuous (-< Y* <). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X’s through the structural
model.
- In the probit model, it is assumed that Var (i/Xi) = 1.
- In the logit model, it is assumed that Var (i/Xi) =  3 .

2
Summary
- logit function
e (  X i ) 1
  X i  
P(Y = 1/X) = 1  e 1  e   X i
- Probit function
P(Y = 1/X) =  (- -  Xi)

Where: (.) is the normal probability distribution function
CHAPTER TWO
2. INTRODUCTION TO BASIC REGRESSION ANALYSIS WITH TIME SERIES

ECONOMETRICS
2.1. The Nature of Time Series Data
A time-series data is a set of observations on a quantitative variable collected over time. A time-
series is data collected over discrete intervals of time.
 Examples include the annual price of wheat in the United States and the daily price of General
Electric stock shares. Macroeconomic data are usually reported in monthly, quarterly, or
annual terms.
 Financial data, such as stock prices, can be recorded daily, or at even higher frequencies.
The key feature of time-series data is that the same economic quantity is recorded at a regular time
interval. A time series data set consists of observations on a variable or several variables over
time. In time series analysis, we analyze the past behavior of a variable in order to predict its
future behavior.
Some Time Series Terms
• Stationary - a time series variable exhibiting no significant upward or downward trend over time.
• Nonstationary - a time series variable exhibiting a significant upward or downward trend over time.
• Seasonal Data - a time series variable exhibiting a repeating pattern at regular intervals over time.
• Univariate time-series analysis- analysis of single sequence of data describing the behavior of one
variable in terms of its own past values.
Graphically, Stationary
Below you can see Non-stationary (upward trend)
Non-stationarity (upward trend)

7
Time Series Models:
It is important to distinguish between cross-section data (data on a number of economic units at a

particular point in time) and time-series data (data collected over time on one particular economic
unit). When we say ‘‘economic units’’ we could be referring to individuals, households, firms,
geographical regions, countries, or some other entity on which data is collected.
Cross-sectional observations: Data collected at a given time;
On the other hand, time-series observations on a given economic unit are observed over a number of
time periods.
A second distinguishing feature of time-series data is its natural ordering according to time. With
cross-section data there is no particular ordering of the observations that is better or more natural
than another. To show the dynamic nature of relationships:
Given that the effects of changes in variables are not always instantaneous, we need to ask how to
model the dynamic nature of relationships. Wehave a dynamic model with lagged values of both the
dependent and explanatory variables, such as
Yt = f(Yt-1,Xt,Xt-1, Xt-2 ) -------------------------------(3.1)
Such models are called autoregressive distributed lag (ARDL) models, with ‘‘autoregressive’’
meaning a regression of Yt on its own lag or lags.
Examples: Consider the following:
Yt = 1.2Yt-1 + Ɛt --------------------------------- the AR (1) model, then,
Or Yt = δ + 1.2Yt-1 + Ɛt --------------------------------- the AR (1) model, then
Yt = 1.2Yt-1- 0.32Yt-2 + Ɛt -------------------------- the AR (2) model
OrYt = δ + 1.2Yt-1- 0.32Yt-2 + Ɛt -------------------- the AR (2) model
Yt= δ + θ1Yt−1+ θ2Yt−2+ Ɛt-----------------------------the AR (2) model

Yt= δ + θ1Yt−1+ θ2Yt−2+θ3Yt−3+Ɛt------------------------the AR (3) model
. . .
. . .
. . .
Yt= δ + θ1Yt−1+ θ2Yt−2+ ·· ·+θpYt−p + Ɛt-------------------the AR(p) model
The Additional Autoregressive models:
Ut = ρUt−1 + εtfirst order autoregressive or
Yt= ρ1 Yt−1+ ρ2Yt−2+εtsecond order autoregressive
Analysis of several sets of data(variables) for the same sequence of time periods is called
multivariate time-series analysis.
Examples, analysis of the relationships among price level, money supply and GDP on the basis
of say quarterly or annual collected data).
The main purpose of time-series analysis is to study the dynamics or temporal structure of the
data.
2.2. Stationary and Non-stationary stochastic processes
The collection of random variable yt ordered in time is called a stochastic process or random
process. There are two different classes of the stochastic process.
 Stationary stochastic process-gives rise to stationary time series.
 Non-stationary stochastic process- gives rise to non-stationary time series.
Stationary Stochastic Processes
Stochastic process is said to be stationary if its mean and variance are constant over time (do not
depend on time or do not change as time changes). Moreover, the value of the covariance
between the two time periods depends only on the lag between the two time periods and not on
the actual time. A non-stationary time series will have a time varying mean or a time-varying
variance or both.
Non-stationary Stochastic Processes
In practical research one often encounters non-stationary time series. The classic example is the
Random Walk Model (RWM). We distinguish two types of random walks:
 Random walk model without drift (with no intercept term)

 Random walk model with drift (constant term is present).
Random Walk without Drift
The series process, Ytis said to be a random walk without drift if;
Yt= Yt−1 + ut
Where: ut is a white noise error term (error term with mean 0 and variance σ2).
This model says that the value of Y at time period t (i.e., Yt)is equal to its value at time (t−1) plus a
random shock(ut) and it is an AR(1) model, because it is regressed on itself lagged one period.
We can write the above model as;

Y1  Y0  u1
Y2  Y1  u2  Y0  u1  u2
Y3  Y2  u3  Y0  u1  u2  u3
In general, if the process started at some time 0 with a value of Y0, we have;
Yt  Y0   ut
E (Yt )  E Y0   ut   Y0 (Why?)
In short, the RWM without drift is a non-stationary stochastic process.
Random Walk with Drift (with intercept)

Let us modify the above RWM as follows:
Yt    Yt 1  ut
Where: δ is known as the drift parameter.
Why we call it drift? Because if we write the preceding equation as;
The model will show that Yt drifts upward or downward, depending on whether δ being positive
or negative. Note that RWM with drift is also an AR model. Therefore, in general we can conclude
that the Random Walk Model (with or without drift) is non-stationary stochastic process.
2.3. The Unit Root Stochastic Process
The random walk model is an example of what is known as a unit root process. Let us write the
RWM as:
 If ρ=1, the model becomes a RWM (that a RWM without drift).

 If ρ is in fact equal to 1, we face what is known as the unit root problem, that is, a
situation of non-stationarity because in this case the variance of Yt is not stationary.
The name unit root is due to the fact that ρ=1. Thus, the terms non-stationarity, random walk,
and unit root can be treated as synonymous.
 If, however, |ρ| < 1, that is if the absolute value of ρ is less than one, then the time series Yt is
stationary in the sense we have defined it.
In practical research, it is important to find out whether a time series possesses has unit root (or if it
is non-stationery). Note that the term unit root process is similar to non-stationery process.
CHAPTER THREE
3. INTRODUCTION TO SIMULTANEOUS EQUATION MODELS
3.1. The Nature of Simultaneous Equation Model
In all the previous chapters discussed so far, we have been focusing exclusively with the
problems and estimations of a single equation regression models. In such models, a dependent
variable is expressed as a linear function of one or more explanatory variables. The cause-and-
effect relationship in such models between the dependent and independent variable is
unidirectional. That is, the explanatory variables are the cause and the independent variable is
the effect. But there are situations where such one-way or unidirectional causation in the function
is not meaningful. This occurs if, for instance, Y (dependent variable) is not only function of X’s
(explanatory variables) but also all or some of the X’s are, in turn, determined by Y. There is,
therefore, a two-way flow of influence between Y and (some of) the X’s which in turn makes the
distinction between dependent and independent variables a little doubtful. Under such
circumstances, we need to consider more than one regression equations; one for each
interdependent variables to understand the multi-flow of influence among the variables. This is
precisely what is done in simultaneous equation models.
Some examples of SEMs in Economics
2. Demand-Supply Model
3. Keynesian Model of Income Determination
4. Wage–Price Models
5. The Is and Lm Model Of Macroeconomics
A system describing the joint dependence of variables is called a simultaneous equations model.
The number of equations in such models is equal to the number of jointly dependent or
endogenous variables involved in the phenomenon under analysis. Unlike the single equation
models, in simultaneous equation models it is not usually possible to estimate a single equation
of the model without taking into account the information provided by other equation of the
system.
5.2. Simultaneity bias
If one applies OLS to estimate the parameters of each equation disregarding other equations of
the model, the estimates so obtained are not only biased but also inconsistent; i.e. even if the
sample size increases indefinitely, the estimators do not converge to their true values. The bias
arising from application of such procedure of estimation which treats each equation of the
simultaneous equations model as though it were a single model is known as simultaneity bias or
simultaneous equation bias. To avoid this bias we will use other methods of estimation, such as:
 Indirect Least Square (ILS),

 Two Stage Least Square (2SLS),
 three Stage Least Square(3SLS),
 Maximum Likelihood Methods and
 The Method of Instrumental Variable (IV).
What happens to the parameters of the relationship if we estimate by applying OLS to each
equation without taking into account the information provided by the other equations in the
system? One of the crucial assumptions of the OLS is that the explanatory variables and the
disturbance term is independent i.e. the disturbance term is truly exogenous. Symbolically:
E[Xi/Ui] = 0. As a result, the linear model could be interpreted as describing the conditional
expectation of the dependent variable (Y) given a set of explanatory variables. In the
simultaneous equation models, such independence of explanatory variables and disturbance
term is violated i.e. E[XiUi]  0. If this assumption is violated, the OLS estimator is biased and
inconsistent.
Simultaneity bias of OLS estimators: The two-way causation in a relationship leads to

violation of the important assumption of linear regression model, i.e. one variable can be
dependent variable in one of the equation but becomes also explanatory variable in the other
equations of the simultaneous-equation model. In this case E[Xi/Ui] may be different from zero.
To show simultaneity bias, let’s consider the following simple simultaneous equation model:
Y   0  1 X  U 

X   0   1Y   2 Z  V 
-------------------------------------------------- (10)
Suppose that the following assumptions hold.

(U )  0 , (V )  0
(U )   ,
2 2
u (V 2 )   u2
(U iU j )  0 , (ViV j )  0, also (UiVi )  0;
Where X and Y are endogenous variables and Z is an exogenous variable. The reduced form of
X of the above model is obtained by substituting Y in the equation of X.
X   0   1 ( 0   1 X  U )   2 Z  V
 0   0 1   2    U V 
X     Z   1                       (11)
1   1 1  1   1 1   1   1 1 
Applying OLS to the first equation of the above structural model will result in biased estimator
because cov( X iU i )  ( X iU j )  0 .
The Definitions of Some Concepts

 Endogenous and exogenous variables
In simultaneous equation models variables are classified as endogenous and exogenous.
Endogenous variables: are variables that are determined by the economic model (within the
system). Exogenous variables are those determined from outside and are also called
predetermined. Predetermined groups can be divided into two categories which are considered
in general as exogenous variables. These are: current and lagged exogenous and lagged
endogenous. For instance; X t and X t 1 depict the current and lagged exogenous variables and
Yt 1
depicts lagged endogenous variable. This is on the assumption that X’s symbolize the
Y
exogenous variables and Y’s symbolize the endogenous variables. Thus, X t , X t 1 and t 1 are
regarded as predetermined (exogenous) variables.
Since the exogenous variables are predetermined, they are supposed to be independent of the
error terms in the model.
Consider the demand and supply functions.
Q d   0  1 P   2Y  U 1                  (14)
Q s   0   1 P   2 R  U 2                  (15)
Where: Q = quantity, Y = income, P = price, R = Rainfalls, U 1 &U 2 are error terms.

Here P and Q are endogenous variables and Y and R are exogenous variables.
 Structural models
A structural model describes the complete structure of the relationships among the economic
variables. Structural equations of the model may be expressed in terms of endogenous variables,
exogenous variables and disturbances (random variables). The parameters of structural model
express the direct effect of each explanatory variable on the dependent variable. Variables not
appearing in any function explicitly may have an indirect effect and is taken into account by the
simultaneous solution of the system. For instance, a change in consumption affects the
investment indirectly and is not considered in the consumption function.
The effect of consumption on investment cannot be measured directly by any structural
parameter, but is measured indirectly by considering the system as a whole.
Example: The following simple Keynesian model of income determination can be considered as
a structural model.
C     Y  U ----------------------------------------------- (16)
Y  C  Z ---------------------------------------------------- (17)
For  > 0 and 0<<1

Where: C = consumption expenditure
Z = non-consumption expenditure
Y = national income
C and Y are endogenous variables while Z is exogenous variable.
 Reduced form of the model:
The reduced form of a structural model is the model in which the endogenous variables are
expressed a function of the predetermined variables and the error term only.
Illustration: Find the reduced form of the above structural model. Since C and Y are
endogenous variables and only Z is the exogenous variables, we have to express C and Y in
terms of Z. To do this substitute Y= C+Z into equation (16).
C     (C  Z ) + U
C    C  Z  U
C  C    Z  U
C (1   )     Z  U
    U
C    Z 
1  1   1   ---------------------------------- (18)
Substituting again (18) into (17) we get;

  1  U
Y    Z 
1  1   1   -------------------------------- (19)
Equation (18) and (19) are called the reduced form of the structural model of the above. We can
write this more formally as:
Structural form equations Reduced form equations

C    Y  U     U
C    Z 
1  1   1 
Y CZ   1  U
Y    Z 
1  1   1 
Parameters of the reduced form measure the total effect (direct and indirect) of a change in
exogenous variables on the endogenous variable. For instance, in the above reduced form
  
 
equation (18),  1    measures the total effect of a unit change in the non-consumption
 1 
 

expenditure on consumption. This total effect is , the direct effect, times  1    , the indirect
effect. The reduced form equations can be obtained in two ways:
1) To express the endogenous variables directly as a function of the predetermined
variables.
2) To solve the structural system of endogenous variables in terms of the predetermined
variables, the structural parameters, and the disturbance terms.
Consider the following simple model for a closed economy.
Ct = a1Yt + U1 --------------------------------------------------------- (i)
It = b1Yt + b2Yt-1 + U2----------------------------------------------- (ii)
Yt = Ct +It + Gt------------------------------------------------------- (iii)
This model has three equations in three endogenous variables (C t, It, and Yt) and two
predetermined variables (Gt, andYt-1).
To obtain the reduced form of this model, we may use two methods (direct method and solving
the structural model method).
Direct Method: Express the three endogenous variables (C t , It , and Yt ) as functions of the two
predetermined variables (Gt, andYt-1) directly using ’s as the parameters of the reduced form
model as follows.
Ct = 11Yt-1 + 12Gt + V1 ------------------------------------(iv)

It , =21Yt-1 + 22Gt +V2 -------------------------------------(v)
Yt =31Yt-1 + 32Gt + V3 ------------------------------------(vi)
Note: 11, 12 , 21 , 22 , 31 , and 32 are reduced from parameters.
By solving the structural system of endogenous variables in terms of predetermined variables,

structural parameters and disturbances, the expressions for the reduced parameters can be
obtained easily. For instance, the third structural equation (iii) can be expressed in reduced form
as follows:
Yt = b2/ (1-a1-b1)Yt-1 + 1/(1-a1-b1) Gt + (U1 +U2)/ (1-a1-b1). This equation is obtained by simply
substituting structural equations (i) and (ii) in (iii). Form this expression: 31 = b2/ (1-a1-b1)
32 = b2/ (1-a1-b1)
Exercise
a) Determine the reduced form equations for the structural equations (ii) and (iii).
b) Indicate the expressions for 11, 12, 21 , and 22 form (a) above.
How to estimate the reduced form parameters?
The estimates of the reduced from coefficients (’s ) may be obtained in two ways.
1) Direct estimation of the reduced coefficients by applying OLS.
2) Indirect estimation of the reduced form coefficients:
Steps:
i) Solve the system of endogenous variables so that each equation contains only
predetermined explanatory variables. In this way we may obtain the system of
parameters’ relations (relations between ’s and structural parameters)
ii) Obtain the estimates of the structural parameters by any appropriate econometric
method.
iii) Substitute the estimates of the structural coefficients into the system of parameters’
relations to find the estimates of the reduced coefficients,
 Recursive models
A model is called recursive if its structural equations can be ordered in such a way that the first
equation includes only the predetermined variables in the right hand side; the second equation
contains predetermined variables and the first endogenous variable (of the first equation) in the
right hand side and so on. The special feature of recursive model is that its equations may be
estimated, one at a time, by OLS without simultaneous equations bias.
OLS is not applicable if there is interdependence between the explanatory variables and the error
term. In the simultaneous equation models, the endogenous variables may depend on the error
terms of the model; hence the OLS technique is not appropriate for estimation of an equation in a
simulations equations model.
However, in a special type of simultaneous equations model called Recursive, Triangular or

Causal model, the use of OLS procedure of estimation is appropriate. Consider the following
three equation system to understand the nature of such models:
Y1   10   11 X 1   12 X 2  U 1 

Y2   20   21Y1   21 X 1   22 X 2  U 2 
Y3   30   31Y1   32Y2   31 X 1   32 X 2  U 3 
In the above illustration, as usual, the X’s and Y’s are exogenous and endogenous variables
respectively. The disturbance terms follow the following assumptions.
(U 1U 2 )  (U 1U 3 )  (U 2U 3 )  0
The above assumption is the most crucial assumption that defines the recursive model. If this
does not hold, the above system is no longer recursive and OLS is also no longer valid. The first
equation of the above system contains only the exogenous variables on the right hand side. Since
by assumption, the exogenous variable is independent of U 1 , the first equation satisfies the
critical assumption of the OLS procedure. Hence OLS can be applied straight forwardly to this
equation.
Consider the second equation. It contains the endogenous variable Y1 as one of the explanatory
variables along with non-stochastic X’s. OLS can be applied to this equation only if it can be
shown that Y1 and U 2 are independent of each other. This is true because U , which affects Y1 is
1
by assumption uncorrelated with U 2 , i.e. (U 1U 2 )  0 . Y1 acts as a predetermined variable in so

far as Y2 is concerned. Hence OLS can be applied to this equation. Similar argument can be
U
stretched to the 3rd equation because Y1 and Y2 are independent of 3 . In this way, in the
recursive system OLS can be applied to each equation separately.
Let us build a hypothetical recursive model for an agricultural commodity, say wheat. The
production of wheat  Y1 , may be assumed to depend on exogenous factors: X 2 = climatic
X3
conditions; and =last season’s price. The retail rice = Y2 may be assumed to be the function of
production level = Y1 and exogenous factor X 4 = disposable income. Finally the price obtained
Y X
by the producer = 3 can be expressed in terms of the retail price Y2 and exogenous factor j =
the cost of marketing the producer.
The relevant equations of the model may be described as under:
Y1   1   2 X 2   3 X 3  U 1 

Y2   4   1Y1   5 X 4  U 2 
Y3   6   2Y2   7 X 5  U 3 
In the first equation, there are only exogenous variables and are assumed to be independent of
U 1 . In the second equation, the causal relation between Y1 and Y2 is in one direction. Also Y1 is
independent of U 2 and can be treated just like exogenous variable. Similarly since Y2 is
U
independent of 3 , OLS can be applied to the third equation. Thus, we can rewrite the above
equations as follows:
Y1   1   2 X 2   3 X 3  U 1 

  1Y1  Y2   4   5 X 4  U 2 
  2Y2  Y3   6   7 X 5  U 3 
We can again rewrite this in matrix form as follows:

 
 
 
 1 0 0 Y1     1   2   3 0 0   X 1  U 1 
  1 0 Y     0 0 5 0   X   U 
 1  2  4  2  2
 0  1 Y3    6 0 0 0    X 3  U 3 
     2                7 X 4 
Coefficient matrix of coefficient matrix of
endogenous var iables exogenous var iable  
X 5 
The coefficient matrix of endogenous variables is thus a triangular one; hence recursive models
are also called as triangular models.
2.3 The Order and Rank Condition of identification problem

In simultaneous equation models, the Problem of identification is a problem of model
formulation; it does not concern with the estimation of the model. The estimation of the model
depends up on the empirical data and the form of the model. If the model is not in the proper
statistical form, it may turn out that the parameters may not uniquely estimated even though
adequate and relevant data are available. In a language of econometrics, a model is said to be
identified only when it is in unique statistical form to enable us to obtain unique estimates of its
parameters from the sample data. To illustrate the problem identification, let’s consider a
simplified wage-price model. In simultaneous equation models, the Problem of identification:
 is a problem of model formulation;
 Does not concern with the estimation of the model because, the estimation of the model
depends up on the empirical data and the form of the model.
 In a language of econometrics, a model is said to be identified only when it is in unique
statistical form to enable us to obtain unique estimates of its parameters from the sample
data.
There are three possible situations of identification:
 Exactly identified
 Over identified
 Under identified
By observing the correspondence between reduced form and structural form coefficients, it is
possible to determine whether the given equation is,exactly identified, over identified or under
identified as follows; An equation is exactly identified if there is one to one correspondence
between reduced form coefficients and structural form coefficients. We will get a unique solution
in this case.
 If the number of reduced form coefficients exceeds that of structural form
coefficients, we have over identification (that is no unique solution)-more than
sufficient information available.
 If the number of reduced form coefficients is less than that of structural form
coefficients- under identification (no solution can be found)-no sufficient
information available.
Formal Rules (Conditions) for Identification
In applying the identification rules, we should either ignore the constant term, or, if we want to
retain it, we must include in the set of variables a dummy variable (say X 0) which would always
take on the value 1. Let’s ignore the constant intercept. There are two formal rules for
identification
i) Order condition and

ii) Rank Condition for identification.
Here we shall discuss the order condition.
1. The order condition for identification
This condition is based on a counting rule of the variables included and excluded from the
particular equation. It is a necessary but not sufficient condition for the identification of an
equation. The order condition may be stated as follows.
For an equation to be identified the total number of variables (endogenous and exogenous)
excluded from it must be equal to or greater than the number of endogenous variables in the
model less one. Given that in a complete model the number of endogenous variables is equal to
the number of equations of the model, the order condition for identification is sometimes stated
in the following equivalent form.
The order condition of identification
Let G be the number of endogenous variables in the system and let k be the total number of
variables (both endogenous and exogenous) missing from the equation under consideration, then
if;
a) k = G-1, the equation is exactly identified
b) k >G-1, the equation is over identified
c) k <G-1, the equation is under identified
Where, k = number of total variables in the model (endogenous and predetermined) minus number of
variables, endogenous and exogenous, included in a particular equation. The order condition is a
necessary but not sufficient condition for identification.
Then the order condition for identification may be symbolically expressed as:
k  (G  1)
excluded 
 var iable   total number of equatioins  1
 
Examples: State identifiability status of each the following equation using order condition stated
above.
1) If a system contains 10 equations with 15 variables, ten endogenous and five exogenous, an
equation containing 11 variables. For the equation we have,
G  10 K  15 M  11
Order condition:
( K  M )  (G  1)
(15  11)  (10  1) ;that is, the order condition is not satisfied and thus, not identified.
2) if a system contains 10 equations with 15 variables, ten endogenous and five exogenous, an
equation containing 5 variables.
The order condition for identification is necessary for a relation to be identified, but it is not
sufficient, that is, it may be fulfilled in any particular equation and yet the relation may not be
identified.
CHAPTER FOUR
6. INTRODUCTION TO PANEL DATA ANALYSIS
5.1. Introduction
Panel Data are Models that Combine Cross-section and Time-Series Data. In panel data the
same cross-sectional unit (industry, firm, country) is surveyed over time, so we have data
which is pooled over space as well as time.
Reasons for using Panel Data
 Panel data can take explicit account of individual-specific heterogeneity (“individual”
here means related to the micro unit)
 By combining data in two dimensions, panel data gives more data variation, less
collinearity and more degrees of freedom.
 Panel data is better suited than cross-sectional data for studying the dynamics of change.
For example it is well suited to understanding transition behaviour – for example
company bankruptcy or merger.
Autocorrelation
Although different to autocorrelation using the usual OLS models, a version of the Durbin-
Watson test can be used in the usual way. (E-views reports this). To remedy autocorrelation we
can use the usual methods, such as the Error Correction Model. ‘Dynamic Models’ are also often
used, which basically involves adding a lagged dependent variable. Recently the use of a method
for adjusting the standard errors has become popular, the most common method is termed the
‘Newey-West’ adjusted standard errors.
Heteroskedasticity
Given that there is a cross-section component to panel data, there will always be a potential for
heteroskedasticity. Although there are various tests for heteroskedastcity, as with autocorrelation
there is a tendency to automatically use adjusted standard errors, which remove the problem.
With heteroskedasticity, it is usually White’s adjusted standard errors that are used.
1. Example, the data consists of 20 countries over 10 years of annual data, giving 200
observations in all (T=200). This produces the following result, where stock prices are
regressed against expenditure on research (r ):
2. Example, the results are interpreted in the usual way, however you would need to decide
whether you wished to use fixed or random effects in this model.
Panel or longitudinal data sets consist of repeated observations for the same units, firms,
individuals or other economic agents. Typically the observations are at different points in time.
Let Yit denote the outcome for unit i in period t, and Xit a vector of explanatory variables. The
index i denotes the unit and runs from 1 to N, and the index t denotes time and runs from 1 to T.
There are two types of Panel Data. These are:
Balanced panel data: the time period is the same for each sampling unit.
E.g. year1=year2=year3=300 households
Unbalanced panel data has potentially different numbers of observations for each unit at
different points in time.
E.g. year1=300, year2=295, year3=270
– Households move to other places/ members/household heads die
– Firms go out of business
Panel data in stata
• Use the data file ‘Epanel’
• Check in what format it is presented: make sure that it is presented in long format as
opposed to wide format.
• Make sure that the data has two identifiers: the entity id (hid) and the panel period (year).
• Make sure that the entity id is unique for a panel period.
• Declare to stata that your data is a panel data using this command: xtset hid year
• The key issue with panel data is that Yit (outcome in period t) and Yis (outcome in
periods) tend to be correlated even conditional on the covariates Xit and Xis.
Let us look at this in a linear model
• What is Mr.C ? JJ
It is called the unobserved individual effect.
– It is unobserved: e. g: genetic make up
– It is individual-specific
– It is time-invariant : stays the same over time
– It is random.
It creates the correlation between Yit and Yis even with the error term uncorrelated over time
and units.
5.2. Estimation of panel data Regression model The fixed Effect Approach
Use fixed effects provided the following assumptions are fulfilled:
• Assumption1: Strict Exogeneity:
• Assumption2: Uncorrelated Effects:
 Fixed Effects Estimation
Covariance Model
Within Estimator
Individual Dummy Variable Model
Least Squares Dummy Variable Model.
• Each entity has its own individual characteristics that may or may not influence the
predictor variables
Fixed Effect removes the effect of those time-invariant characteristics from the predictor
variables so we can assess the predictors’ net effect. Each entity is different therefore the
entity’s error term and the constant (which captures individual characteristics) should not be
correlated with the others. If the error terms are correlated then FE is not suitable since
inferences may not be correct and you need to model that relationship (probably using random-
effects), this is the main rationale for the Hausman test (presented later on in this document).
Another important assumption of the FE model is that those time-invariant characteristics are
unique to the individual and should not be correlated with other individual characteristics
5.3. Estimation of panel data Regression model Random effect estimation
This is a very strong assumption to make in empirical analysis. The rationale behind random
effects model is that, unlike the fixed effects model, the variation across entities is assumed to be
random and uncorrelated with the predictor or independent variables included in the model. If
you have reason to believe that differences across entities have some influence on your
dependent variable then you should use random effects. An advantage of random effects is that
you can include time invariant variables (i.e. gender). In the fixed effects model these variables
are absorbed by the intercept. To decide between fixed or random effects you can run a Hausman
test where the null hypothesis is that the preferred model is random effects versus the alternative
the fixed effects (see Green, 2008, chapter 9). It basically tests whether the unique errors
(unobserved individual characteristics) are correlated with the regressors.
Conclusion:
 Panel data is a method for estimating data which is both time series and cross
sectional
 It has both advantages but also disadvantages over OLS estimation
 It applies to many different techniques, such as tests for stationary.
WORK SHEET
Part I: choose the best answer from the available alternatives.
____1. If artificial values 0 and 1is given to qualitative variables, we call it as;
A. Binary variables B. Categorical variable C. Dichotomous variables D. all
____2. Income, output, price, cost, ….etc are expressed as ____________ variables in their
nature.
A. Qualitative B. quantitative C. Parameters D. Regression
____3. The simultaneous equation that may be estimated by OLS without simultaneity bias is;
A. Structural model B. Reduced form C. Recursive model D. Exogenous
variable
____4. The structural equation of reduced form may be expressed interims of;
A. Endogenous variable B. exogenous variable C. Random variable D. all
____5. Exogenous variable are
also called as;
A. Predetermined B. Lagged C. Direct effect D. current effect
____6. Dummy variables are also called as;
B. Binary variables B. categorical variable C. dichotomous variables D. all
____7. Solving the structural system of endogenous variable interims of predetermined variable
is;
A. Solving structural method B. Direct method C. OLS method D.
parameters
____8. Independent variable is also called as;
A. Explanatory variable B. Explained variable C. Dependent variable D.
Regresand
____9. OLS is not applicable if there is interdependence between the explanatory variables and
the error term. A. True B. False
____10. In classical linear regression model assumption the mean value of error term is;
A. Zero B. constant C. non-negative D. different from zero
Part II: Write ‘True’ or ‘False’ for the following questions.
____1. In dummy regression analysis, the dependent variable is not only influenced by
quantitative independent variable.
____2. In classical linear regression model assumption the mean value of error term is different
from zero
____3. In simultaneous equation OLS is not applicable since the mean of error term is different
from zero
____4. Simultaneous equation is unidirectional since there is no multi-flow of influences among
variables
____5. Endogenous variables are the variables that are determined by economic model.
Part III: Workout
1. Assume that the following simple simultaneous equation models are given.
Y   0  1 X  U 

X   0   1Y   2 Z  V 
Required: (a) Find the reduced form of X.
(b) Find the reduced form of Y.
2. The following represents a simplified model of the economy:
C t  α 0  α1Yt  α 2 C t 1  μ 1t (consumpti on)
I t   0   1 rt   2 I t 1   2t (investmen t)
rt   0   1Yt   2 M t   3t (money market)
Yt  C t  I t  Gt (income identity)
where C  consumptio n, Y  income, r  rate of intrest
M  money supply, I  investment and G  government
expenditur e
Required: Which are endogenous and exogenous variables?

3. Assume that the following is the hypothetical data on black and white instructors at
Harvard University.
Starting salary Sex
(Y) (1 = white, 0 = black)
21,200 1
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
Required: a. find the mean salary of black instructor and white instructor
a. B. calculate the slop of the regression analysis
b. C. fit the regression analysis result
4. In studying the effect of a number of qualitative attributes on the prices charged for
movie admissions in a large metropolitan area for the period 2007-2010, Galma Ada
Oromo obtained the following regression for the year 2007:
Yˆ  4.13  5.77 D1  8.12 D2  7.68 D3  1.13D4  27.09 D5  31.46 log X 1  0.81X 2  3other dummy var iables
(2.04) (2.67) (2.51) (1.78) (3.58) (13.78) (0.17)
R 2  0.961
Where:
D1  Theater location: 1 if suburban, 0 if city center
D2  Theater age: 1 if less than 10 years since construction or major renovation, 0 otherwise.
D3 
Type of theater: 1 if outdoor, 0 if indoor
D4  Parking: 1 if provided, 0 otherwise
D5 
Screening policy: 1 if first run, 0 otherwise
X 1  Average percentage unused seating capacity per showing
X 2  Average film rental, cents per ticket charged by the distributor
Y 
Adult evening admission price, cents and where the figures in parentheses are standard
errors.
Required:
a. Comment on the results.
b. How would you rationalize the introduction of the variable X 1 ?
c. How would you explain the negative value of the coefficient of D4 ?

Econometrics II-1-1

Uploaded by

Copyright:

Available Formats

Econometrics II-1-1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics II-1-1

Uploaded by

Copyright:

Available Formats

RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS-DEPARTMENT OF

ECONOMICS A.Y 2020/2012

RIFT VALLEY UNIVERSITY NEKEMTE CAMPUS

FACULTY OF BUSINESS AND ECONOMICS

COURSE TITLE: - Econometrics II

1. REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION:

BINARY/ DUMMY VARIABLES

1.1 Describing Qualitative Information

1.2. Dummy as Independent Variables

Mean salary of male college professor: E (Yi / Di  1)    

(= ) and that of male instructor is birr 21,280 ( +  1 )

Consider the model: Yi   i   2 Di   X i  ui ---------------------------- (1.3)

Mean salary of male college professor: E (Yi / X i , Di  1)  (1   2 )  X i ------ (1.5)

can be readily checked as follows: Assuming E (u i )  0 , we obtain from equation (1.6)

shown in fig 1.2 (for illustrative purposes it is assumed that  3   2 ).

1.4 Regression on one quantitative variable and two qualitative variables

Mean salary for white male professor: E (Yi | D2  1, D3  1, X i )  ( 1   2   3 )   X i

1.5 Testing for structural stability of regression models

Interaction effects: Consider the following model:

From (4.9) we obtain:

1.3.1. The Linear Probability Model (LPM)

Where: X = family income

Therefore, by the definition of mathematical expectation, we obtain

1.3.2. The Logit Model

Fig A Cumulative Distribution Function (CDF)

Where X is income and Y = 1 means the family owns a house.

1.3.3. The Probit Model

- In the logit model, it is assumed that Var (i/Xi) =  3 .

P(Y = 1/X) =  (- -  Xi)

2. INTRODUCTION TO BASIC REGRESSION ANALYSIS WITH TIME SERIES

2.1. The Nature of Time Series Data

Below you can see Non-stationary (upward trend)

Non-stationarity (upward trend)

Time Series Models:

It is important to distinguish between cross-section data (data on a number of economic units at a

Cross-sectional observations: Data collected at a given time;

Yt= δ + θ1Yt−1+ θ2Yt−2+ Ɛt-----------------------------the AR (2) model

2.2. Stationary and Non-stationary stochastic processes

 Random walk model without drift (with no intercept term)

Random Walk without Drift

We can write the above model as;

Random Walk with Drift (with intercept)

2.3. The Unit Root Stochastic Process

 If ρ=1, the model becomes a RWM (that a RWM without drift).

3. INTRODUCTION TO SIMULTANEOUS EQUATION MODELS

3.1. The Nature of Simultaneous Equation Model

5.2. Simultaneity bias

 Indirect Least Square (ILS),

Simultaneity bias of OLS estimators: The two-way causation in a relationship leads to

Suppose that the following assumptions hold.

The Definitions of Some Concepts

Where: Q = quantity, Y = income, P = price, R = Rainfalls, U 1 &U 2 are error terms.

For  > 0 and 0<<1

Substituting again (18) into (17) we get;

Structural form equations Reduced form equations

Ct = 11Yt-1 + 12Gt + V1 ------------------------------------(iv)

By solving the structural system of endogenous variables in terms of predetermined variables,

However, in a special type of simultaneous equations model called Recursive, Triangular or

by assumption uncorrelated with U 2 , i.e. (U 1U 2 )  0 . Y1 acts as a predetermined variable in so