Unit-4
Unit-4
MODEL: ESTIMATION
Structure
4.0 Objectives
4.1 Linear Regression Model
4.2 Population Regression Function (PRF)
4.2.1 Deterministic Component
4.0 OBJECTIVES
After going through this unit, you should be able to
describe the classical linear regression model;
differentiate between Population Regression Function (PRF) and Sample
Regression Function (SRF);
find out the Ordinary Least Squares (OLS) estimators;
describe the properties of OLS estimators;
explain the concept of goodness of fit of regression equation; and
describe the coefficient of determination and its properties.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Regression Model: Two
Variables Case 4.1 INTRODUCTION
In Unit 5 of the course BECC 107: Statistical methods for Economics we
discussed the topics correlation and regression. In that Unit we gave a brief idea
about the concept of regression. You already know that there are two types of
variables in regression analysis: i) dependent (or explained) variable, and ii)
independent (or explanatory) variable. As the name (explained and explanatory)
suggests the dependent variable is explained by the independent variable.
Usually we denote the dependent variable as Y and the independent variable as
X. Suppose we took up a household survey and collected n pairs of observations
in X and Y. The relationship between X and Y can take many forms. The general
practice is to express the relationship in terms of some mathematical equation.
The simplest of these equations is the linear equation. It means that the
relationship between X and Y is in the form of a straight line, and therefore, it is
called linear regression. When the equation represents curves (not a straight line)
the regression is called non-linear or curvilinear.
Thus in general terms we can express the relationship between X and Y as
follows in equation (4.1).
𝑌 = 𝑓(𝑋) … (4.1)
In this block (Units 4, 5 and 6) we will consider simple linear regression models
with two variables only. The multiple regression model comprising more than
one explanatory variable will be discussed in the next block.
Regression analysis may have the following objectives:
To estimate the mean or average value of the dependent variable, given the
values of the independent variables.
To test the hypotheses regarding the underlying economic theory. For
example, one may test the hypotheses that the price elasticity of demand is
(–)1 that is, the demand is perfectly elastic, assuming other factors affecting
the demand are held constant.
To predict the mean value of the dependent variable given the values of the
independent variable.
Weekly Exp.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) Why does the average value of the dependent variable differ from the
actual value?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
47
Regression Model: Two
Variables Case
SRL1
Expenditure
SRL2
0
PDI
Exp.
SRL: 𝑌 = 𝑏 + 𝑏 𝑋
𝑌
𝑌 PRL: 𝐸(𝑌|𝑋 ) = 𝛽 + 𝛽 𝑋
𝑒
𝑢
𝑢
𝑒 𝑌
0
𝑋 𝑋 PDI
Y PRL:𝐸(𝑌|𝑋 )=𝛽 + 𝛽 𝑋
+𝑢
-𝑢
0 X
50
Simple Linear Regression
Model: Estimation
Y
PRF: 𝑌 = 𝛽 + 𝛽 𝑋
0 X
PRF:𝑌 = 𝛽 + 𝛽 𝑋
0 X
𝑒 = Yi − 𝑌
∑ 𝑒 =∑ Yi − 𝑌 ... (4.11)
The first order condition of minimization requires that the partial derivatives are
equal to zero. Note that we have to decide on the values of 𝑏 and 𝑏 such that
ESS is the minimum. Thus, we have take partial derivates with respect to 𝑏 and
𝑏 . This implies that
=0 … (4.13)
and
=0 … (4.14)
2𝛴 (𝑌 − 𝑏 − 𝑏 𝑋 ) (−1) = 0
𝛴𝑌 = 𝑛𝑏 + 𝑏 𝛴𝑋 … (4.15)
2𝛴 (𝑌 − 𝑏 − 𝑏 𝑋 ) (−𝑋 ) = 0
𝛴𝑋 𝑌 = 𝑏 𝛴𝑋 + 𝑏 𝛴𝑋 … (4.16)
Equations (4.15) and (4.16) are called normal equations. We have two equations
with two unknowns (𝑏 and 𝑏 ) .
Thus, by solving these two normal equations we can find out unique values of 𝑏
and 𝑏 .
53
Regression Model: Two
Variables Case
By solving the normal equations (4.15) and (4.16) we find that
𝑏 = 𝑌 − 𝑏 𝑋¯ … (4.17)
and
( ¯ )( )
𝑏 = ( )
= ( )
𝑏 = … (4.18)
As you can see from the formula for b2, it is simpler to write the estimator of the
slope coefficient in deviation form. Expressing the values of a variable from its
mean value does not change the ranking of the values, since we are subtracting
the same constant from each value. It is crucial to note that b1 and b2 are
expressed in terms of quantities computed from the sample, given by the formula
in expressions in (4.17) and (4.18).
We mention below the formulae for variance and standard deviation of the
estimators b1 and b2
𝑉𝑎𝑟(𝑏 ) = 𝜎 = 𝜎 … (4.19)
𝑆𝐸 (𝑏 ) = 𝑉𝑎𝑟(𝑏 ) … (4.20)
𝑉𝑎𝑟(𝑏 ) = 𝜎 =
𝑆𝐸 (𝑏 ) = √var(𝑏 ) … (4.21)
σ = = = … (4.22)
. .
𝑏 ~𝑁 𝛽 , 𝜎 … (4.24)
𝑏 ~𝑁 𝛽 , 𝜎 … (4.25)
54
Check Your Progress 2 Simple Linear Regression
Model: Estimation
1) Distinguish between the error term and the residual by using appropriate
diagram.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
2) Prove that the sample regression line passes through the mean values of X
and Y.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
𝑌¯ = 𝑏 + 𝑏 𝑋¯ …(4.26)
Mean value of residuals 𝑒¯ is always zero 𝑒¯ = = 0. This implies that on
an average, the positive and negative residual terms cancel each other.
b) 𝛴𝑒 𝑋 = 0 …(4.27)
c) 𝛴𝑒 𝑌𝑖 = 0 …(4.28)
Or, equivalently,
yi2 b22 xi2 ei2 … (4.34)
𝑅 = … (4.37)
1=𝑅 + =𝑅 +
Therefore,
𝑅 =1− … (4.38)
You should note that 𝑅 gives the percentage of TSS explained by ESS. Thus, if
𝑅 = 0.75, we can say that 75 per cent variation in the dependent variable is
explained by explanatory variable in the regression model. The value of R2 or
coefficient of determination lies between 0 and 1. This is mainly because it
represents the ratio of explained sum of squares to total sum of squares.
Now let us look into the algebraic properties of 𝑅 and interpret it. When 𝑅 = 0
we have ESS = 1. It indicates that no proportion of the variation in the dependent
variable is explained by ESS. If R2 = 1, the sample regression is a perfect fit. If
R2 = 1, all the observations lie on the estimated regression line. A higher value of
the R2 implies a better fit of a regression model.
4.7.2 F-Statistic for Goodness of Fit
The statistical significance of a regression model is tested by the F-statistic. By
using the t-test we can test the statistical significance of a particular parameter of
the regression model. For example, the null hypothesis 𝐻 : 𝛽 = 0 implies that
there is no relationship between Y and X in the population. By using F-statistic,
we can test the null hypothesis that all the parameters in the model are zero.
Therefore, we use F-statistics for goodness of fit.
Therefore,
(𝑌𝑖 ¯) ([ ] [ ¯ ])
F= = ... (4.42)
⁄( )
𝐹= . 𝑏 (𝑋 − 𝑋¯) …(4.44)
We know that
𝑣𝑎𝑟(𝑏 ) =
𝐹= = ( )
=[ ( )]
=𝑡 ... (4.45)
58
Therefore, the F-statistic is equal to square of the t-statistic (𝐹 = 𝑡 ). The above Simple Linear Regression
Model: Estimation
result, however, is true for the two-variable model only. If the number of
explanatory variable increases in a regression model, the above result may not
hold.
Check Your Progress 3
1) Is it possible to carry out F-test on the basis of the coefficient of
determination? Explain how.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
2) Can the coefficient of determination be greater than 1? Explain why.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
59
Regression Model: Two
Variables Case
2) The relationship between Y and X is stochastic in nature. There is an error
term added to the regression equation. The inclusion of the random error
term leads to a difference between the expected value and the actual value
of the dependent variable.
3) There are three reasons for inclusion of the error term in the regression
model. See Sub-Section 4.2.2 for details.
Check Your Progress 2
1) Go through Section 4.3. You should explain the difference between the
error term and the residual by using Fig. 4.3.
2) In the OLS method we minimise 𝛴𝑒 by equating its partial derivates to
zero. The condition = 0 gives us the first normal equation:
𝑌 = 𝑛𝑏 + 𝑏 𝛴𝑋 . If we divide this equation by the sample size, n, we
obtain 𝑌¯ = 𝑏 + 𝑏 𝑋¯ . Thus, the estimated regression passes through the
point 𝑋¯, 𝑌¯ .
Check Your Progress 3
1) Yes, we can carry out F-test on the basis of the 𝑅 value. Go through
equation (4.40).
2) The value of R2 or the coefficient of determination lies between 0 and 1.
This is mainly because it represents the ratio of ESS to TSS. It indicates
the proportion of variation in Y that has been explained by the
explanatory variables. The numerator ESS cannot be more than the TSS.
Therefore, R2 cannot be greater than 1.
60