ECON3049 Lecture Notes 1

The Simple Regression Model
ECON3049 ECONOMETRICS I
Dr. Andre Haughton
2021
Revision of Summation Operator
n
• If a is a constant, then
 a  na
i 1
• If X takes n values x1, ..., xn and a is a constant, then

n n
 ax a x
i 1
i
i 1
i
• If a and b are two constants and x and y are two variables, then
n n n
 (ax  by )  a x  b y
i 1
i i
i 1
i
i 1
i
Revision of Expectation Operator
The Mean of a Random Variable

– The expected value of a random variable X is the average value of the
random variable in an infinite number of repetitions of the experiment
(repeated samples); it is denoted E[X]. The expected value of a
random variable is frequently describes as the population mean.
– Expected Value Rules

• If c is a constant, E[c]  c
• If c is a constant and X is a random variable, then
E[cX ]  cE[ X ]
• If a and c are constants then
E[ a  cX ]  a  cE[ X ]
Revision of Expectation Operator
Expected Value Rules Cont’
– If X and Y are random variables, then

E X  Y   E X   EY 
– also E[aX  bY ]  aE ( X )  bE (Y )
Revision of Variance/Covariance
– The Variance of a Weighted Sum of Random Variables
• If X, Y, and Z are random variables and a, b, and c are constants,
then
var  aX  bY  cZ   a 2 var  X   b 2 var  Y   c 2 var  Z 
 2ab cov  X , Y   2ac cov  X , Z   2bc cov  Y , Z 
• If X, Y, and Z are independent, or uncorrelated, random variables,

then the covariance terms are zero and:
var  aX  bY  cZ   a 2 var  X   b 2 var  Y   c 2 var  Z 

Outline
What is Econometrics?
The simple linear regression model
Forming an estimator for the parameters of interest
Algebraic properties of OLS statistics
Method of moments Estimation
Least squares estimation
Reading: Wooldridge (2009), Introductory Econometrics, Chapters 1 and 2.

What is Econometrics?
• Literally interpreted, econometrics means “economic measurement.”
• Econometrics is based upon the development of statistical methods for

estimating economic relationships, testing economic theories, and
evaluating and implementing government and business policy.
• It focuses on the problems inherent in collecting and analysing non-

experimental economic data, or observational data, i.e. data not
accumulated through controlled experiments (as it is typical in the natural
sciences).
• Econometrics is useful either when we have an economic theory to test or

when we have a relationship in mind that has some importance for
business decisions or government macroeconomic policy analysis.
• An empirical analysis uses data to test a theory or to estimate a

relationship.
Example: study hrs and Grade
Question of interest:
•How does the GPA an individual respond to changes in study hrs ?
Empirical relationship of interest:

•Understand GPA as a function of Study hrs.
Policy questions:
•How does study hrs affect individual GPA?
•How different study patterns affect your GPA?
Methodological Approach to Econometrics
Classical /traditional methodology
1. Statement of theory or hypothesis / formulation of the question of
interest.
2. Specification of the mathematical model of the theory (i.e. the possible
direction of the relationship(s))
3. Specification of the functional form of econometric model along with
assumptions about the nature of the error term
4. Obtaining the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing to check the validity of assumptions we’ve
made( for example whether of explanatory variables are statistically
significant)
7. Forecasting or prediction
8. Using the model for control or policy purposes.
Statistical vs Deterministic relationships
Statistical - considers variables that are random or stochastic. A random or

stochastic variable is one that has a non-degenerate probability
distribution function. Examples of statistical relations:
the effect of corruption on growth,
The effect of corruption on inflation.
Deterministic (Functional) - involves variables that are non-random or non-

stochastic. An example of deterministic relations is Newton's law of gravity
and motion. Deterministic relations are found in classical Physics.
In this course we abstract from deterministic relations and deal only with
statistical relations.
Structure of Economic Data
Cross-sectional Data - Data on one or more variables for individuals,

firms, cities, states, countries or other units of observation collected at
the same point in time.
Time Series Data - A collection of observations on the values that a

variable takes at different points in time. Intervals can be daily, monthly,
yearly etc.
Pooled Cross Section - Combining sets of cross sectional data to increase

sample size. Example, cross sectional household survey in two different
years (two different random sample).
Panel or Longitudinal Data - A time series data set for each cross-
sectional member in the data set. Example, wage data on a set of
individual's over a 25-year period.
Regression Analysis
Regression Analysis is concerned with the study of the

dependence of one variable, the dependent variable, on one or
more other variables, the explanatory variable(s), with a view
to estimate and/or predict the (population) mean or average
value of the former in terms of the known or fixed (in repeated
sampling)value of the latter"(Gujarati).
The Simple linear Regression Model
If y and x are two random variables representing some population,
we are interested in explaining y in terms of x"
or in studying how y varies with changes in x".
Example
y is monthly output and x is monthly stock of money

y GPA and x is study hours per day
IMPORTANT ISSUES
1. How do we account for the other variables that may influence y
2. What is the functional relationship between x and Y
3. How do we ensure that the relationship between x and y is estimated
holding all other factors constant (Ceteris Paribus)?
The Simple linear Regression Model
We write the simple (two variable) linear regression model as
yi = b0 + b1 xi + ui for i = 1,2,...,n
Definition:
y ... dependent variable, the explained variable, the response variable, or the
regressand;
x ... independent variable, the explanatory variable, the control variable, the
regressor, or the covariate;
ui ... error term or disturbance, and represents unobserved" factors other
than x that affect y;
b0... Intercept , or constant;
b1 ... slope parameter showing the linear relationship between y and x.
n ... the number of data points
Linearity
We have to distinguish between linearity in parameters and
linearity in variables
linearity means linear in parameters
i.e. No superscript is associated with the parameters, the

model need not be linear
Example
The Simple linear regression model in our example would look like this:
Earnings   0  1education  u
where Earnings measures income per annum &
Education measures years of Schooling
IMPORTANT
What makes up the error term ui ?

Measurement error in the dependent variable
Other variables that may have an impact on avgGrade for example;
individual effort, eating habits and ability
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)
y
f(y)
.E(y|x) = b + b x
.
0 1
x1 x2
To estimate our parameters b0 and b1 we must make some assumptions about
the unobserved error term u and how it relates to the explanatory variable x.
Assumption 1: Normalize error terms
E(u) = 0
i.e. the average value of u in the population is zero as long as b0 is included

in the regression
Nothing is lost by assuming that the average of the other factors that affect y is
zero.
Example
In our example with study hrs and GPA; if u represents ability for example, we can
assume that average ability is zero, without loss of generality
• If x and u are two random variables: the expected value of u given x is
zero
Assumption 2: zero conditional mean ( mean independent)
The average value of u does not depend on the value of x:
E(u/x) = E(u) = 0
• Let u represent ability, then assumption 2 implies that; average ability

does not depend on hours of study.
Example
•If for example average hours studied is 6 then if it increases to 9 average ability
should not change.
•If we believe average ability depends on hours of study then assumption 2 would be
false
•We cannot observe average ability
Zero covariance between the error term and x
cov(u,x) = E(ux)= 0
This is derived from assumptions 1 and 2
Population Regression Function (PRF)

Given assumption 2, the conditional probability of y on x can be
defined as
+
••For
For any
any given
given value
value xx the
the distribution
distribution ofof yy is
is centered
centered about
about E(y/x)
E(y/x)
••This
This tells
tells us
us how
how average
average value
value of
of yy changes
changes with
with values
values of
of x;
x; it
it does
does not
not tell
tell us
us
how each individual value of y changes.
how each individual value of y changes.
Example
• x corresponds to experiences
•E(y/x) corresponds to the mean earnings for individuals who study
certain amount of experience.
•If x doubles it is not saying that each individual’s earnings will double it
is saying that earnings will double on average, some may be more some
may be less
Under Assumption 2, y can be broken into two components:
1. systematic part of y: +
2. 2. unsystematic part of y; y – E(y/x) = u.
Estimation Procedure
The estimator we will employ here is Ordinary
Least Squares (OLS)
We will focus mainly on two procedures

1.Method of moments estimation
2. Least squares estimation
Note: Maximum likelihood is a third estimation procedure however it is out

of the scope of this course
Estimation procedure: Methods of moments
The procedure is as follows:

1. Start from a moment of our population, e.g. E() = 0
2. Form the estimator using the analog principle:

• Replace population moment by the sample counterpart.
• that is, replace expectation E() by sample average i.e. Using
the summation sign
3. Compute the estimator by solving for the parameter of

interest.
Method of Moments
This method relies on assumptions 1 and 2 and
their implications i.e.
Which implies that

Method of Moments
The Purpose of OLS is to minimize the error u
Ordinary Least Squares (OLS)
• The least-squares procedure selects and

which minimizes the sum of squares residuals
• That is:
OLS Procedure
OLS: Estimation of the Intercept Parameter
• ‘
OLS: Estimation of the Slope Parameter

ECON3049 Lecture Notes 1

Uploaded by

Copyright:

Available Formats

ECON3049 Lecture Notes 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECON3049 Lecture Notes 1

Uploaded by

Copyright:

Available Formats

The Simple Regression Model

Dr. Andre Haughton

• If X takes n values x1, ..., xn and a is a constant, then

The Mean of a Random Variable

– Expected Value Rules

– If X and Y are random variables, then

• If X, Y, and Z are independent, or uncorrelated, random variables,

var  aX  bY  cZ   a 2 var  X   b 2 var  Y   c 2 var  Z 

The simple linear regression model

Forming an estimator for the parameters of interest

Algebraic properties of OLS statistics

Method of moments Estimation

Least squares estimation

Reading: Wooldridge (2009), Introductory Econometrics, Chapters 1 and 2.

• Econometrics is based upon the development of statistical methods for

• It focuses on the problems inherent in collecting and analysing non-

• Econometrics is useful either when we have an economic theory to test or

• An empirical analysis uses data to test a theory or to estimate a

Empirical relationship of interest:

Statistical - considers variables that are random or stochastic. A random or

Deterministic (Functional) - involves variables that are non-random or non-

Cross-sectional Data - Data on one or more variables for individuals,

Time Series Data - A collection of observations on the values that a

Pooled Cross Section - Combining sets of cross sectional data to increase

Regression Analysis is concerned with the study of the

y is monthly output and x is monthly stock of money

linearity means linear in parameters

i.e. No superscript is associated with the parameters, the

What makes up the error term ui ?

i.e. the average value of u in the population is zero as long as b0 is included

The average value of u does not depend on the value of x:

• Let u represent ability, then assumption 2 implies that; average ability

Population Regression Function (PRF)

Under Assumption 2, y can be broken into two components:

We will focus mainly on two procedures

Note: Maximum likelihood is a third estimation procedure however it is out

The procedure is as follows:

2. Form the estimator using the analog principle:

3. Compute the estimator by solving for the parameter of

Which implies that

• The least-squares procedure selects and

You might also like