EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
Tatiana Komarova
Summer 2021
1
1. Definition of simple regression
model
(Wooldridge, Ch. 2.1)
2
Simple regression
3
Issues
4
Simple linear regression model
5
How does the model deal with three issues?
y = β0 + β1 x + u
6
• The equation also addresses Issue #3, ceteris paribus issue. In
y = β0 + β1 x + u
∆y = β1 ∆x + ∆u
= β1 ∆x when ∆u = 0
7
Examples
• A model to explain crop yield by fertilizer use is
yield = β0 + β1 fertilizer + u
• Wage equation
wage = β0 + β1 educ + u
8
Impose assumptions to estimate β0 and β1
y = β0 + β1 x + u
9
Are we done with the causality issue?
• Thus, β1 measures the effect of x on y , holding all other
factors (in u) fixed
• Are we done with the causality issue? – Tjhis seems too easy!
• Unfortunately, no
11
x and u are viewed as random variables (cont.)
12
Notation for random variables
13
First assumption: E (u) = 0
E (u) = 0
14
• Also presence of β0 makes the assumption E (u) = 0
innocuous
y = β0 + β1 x + u
= (β0 + α0 ) + β1 x + (u − α0 )
= β0∗ + β1 x + u ∗
15
Second “key” assumption: E (u|x) = E (u)
16
Example: Ability and education
17
Example: Land quality and fertilizer amount
18
Implications of assumptions
19
Population regression and distribution of y given x
• Straight line is population regression function
E (y |x) = β0 + β1 x. Conditional distribution of y at three
different values of x are superimposed. Remember that
y = β0 + β1 x + u and u has distribution in population
20
2. Deriving ordinary least squares
estimates
(Wooldridge, Ch. 2.2)
21
Purpose here
22
23
Conditions for β0 and β1 in population
E (u) = 0
E (xu) = 0
E (u|x) = 0
24
Conditions for parameters
E (u) = E (y − β0 − β1 x) = 0
E (xu) = E [x(y − β0 − β1 x)] = 0
25
Method of moments
26
Method of moments for β0 and β1
27
• By property of summation operator, first equation is
n
1X
0 = (yi − β̂0 − β̂1 xi )
n
i=1
n n n
1X 1X 1X
= yi − β̂0 − β̂1 xi
n n n
i=1 i=1 i=1
n n
!
1X 1X
= yi − β̂0 − β̂1 xi
n n
i=1 i=1
= ȳ − β̂0 − β̂1 x̄
which means
β̂0 = ȳ − β̂1 x̄
28
• Plugging this into second equation (and multiply n both
sides), we get
n
X
xi [yi − (ȳ − β̂1 x̄) − β̂1 xi ] = 0
i=1
n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄)
i=1 i=1
• We could conclude
Pn
xi (yi − ȳ )
β̂1 = Pi=1
n
i=1 xi (xi − x̄)
Pn
if i=1 xi (xi − x̄) 6= 0
29
One step ahead
• Now we get solution for β̂1 . But to get more intuitive form for
β̂1 , we use these facts about summation operator
n
X
(xi − x̄) = 0
i=1
n
X n
X
xi (yi − ȳ ) = (xi − x̄)(yi − ȳ )
i=1 i=1
n
X n
X
xi (xi − x̄) = (xi − x̄)2
i=1 i=1
30
OLS estimate
n
" n #
X X
2
(xi − x̄)(yi − ȳ ) = β̂1 (xi − x̄)
i=1 i=1
Pn
• If − x̄)2 > 0, then we get
i=1 (xi
Pn
(x − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
Pn i
β̂1 = i=1 2
=
i=1 (xi − x̄) Sample Variance(xi )
β̂0 = ȳ − β̂1 x̄
31
Remarks
• For reasons we will see, β̂0 and β̂1 are called ordinary least
squares (OLS) estimates
32
Ordinary least squares
• Where does the name “ordinary least squares” come from?
∆ŷ = β̂1 ∆x
34
Example: Effects of education on hourly wage
(WAGE1.dta)
. des wage educ
• Model
wage = β0 + β1 educ + u
E (u|educ) = 0
• Estimate β0 and β1 by OLS. General form of Stata command:
reg y x
[ = −0.90. In
• Plugging in educ = 0 gives silly prediction wage
range where data are sparse, we may get strange predictions
(only 18 people with educ < 8)
• When educ = 8
37
Subsample with educ = 8
. list wage if educ == 8
wage
4. 6
30. 3.3
58. 10
89. 9.9
120. 3
128. 1.5
203. 10
214. 7.4
220. 5.8
221. 3.5
226. 3
266. 2.9
284. 8.5
287. 5
303. 3
331. 3
367. 4.1
406. 8.4
412. 4.4
425. 3
463. 3
487. 2.2
38
Reminder
wage = β0 + β1 educ + u
E (u|educ) = 0
we do not know β0 and β1
• Rather, β̂0 = −0.90 and β̂1 = 0.54 are our estimates from
particular sample of 526 workers. These estimates may or may
not be close to β0 and β1 . If we obtain another sample of 526
workers, estimates will change
39
Summary so far
y = β0 + β1 x + u, E (u|x) = 0
40
3. Properties of OLS on any sample
from population
(Algebraic properties of OLS)
(Wooldridge, Ch. 2.3)
41
OLS fitted values and residuals
• Once we have OLS estimates β̂0 and β̂0 , we get OLS fitted
values by plugging xi into the equation
42
OLS, then save ŷi and ûi
. reg wage educ
. predict wagehat
(option xb assumed; fitted values)
43
List for i = 1, . . . , 15
. list wage educ wagehat uhat in 1/15
44
• Some residuals are positive, others are negative. None in the
first 15 is especially close to zero
• For this purpose, we derive some useful properties of ûi and ŷi
45
Recall conditions for OLS
• Recall that OLS estimates β̂0 and β̂1 are chosen to satisfy
n
X
(yi − β̂0 − β̂1 xi ) = 0
i=1
n
X
xi (yi − β̂0 − β̂1 xi ) = 0
i=1
46
Algebraic properties
47
Implications (nice exercises)
ȳ = ŷ¯
48
Goodness-of-fit
• For each observation, write
yi = ŷi + ûi
• Define total sum of squares (SST), explained sum of squares
(SSE) (“model sum of squares” in Stata) and residual sum of
squares (SSR) as
n
X
SST = (yi − ȳ )2
i=1
n
X
SSE = (ŷi − ȳ )2
i=1
Xn
SSR = ûi2
i=1
SSE SSR
R2 = =1−
SST SST
which is called R-squared of regression
50
Remark on R 2 (1)
51
Remark on R 2 (2)
52
Remark on R 2 (3)
Simulation example 1
53
Remark on R 2 (4)
Simulation example 1
54
Remark on R 2 (5)
55
Remark on R 2 (6)
Simulation example 2
56
Remark on R 2 (7)
Simulation example 2
57
Remark on R 2 (8)
Simulation example 2
58
Example: Wage equation (WAGE1.dta)
. reg wage educ
59
4. Units of measurement and
functional form
(Wooldridge, Ch. 2.4)
60
Units of measurement
61
Regress salary on roe
. reg salary roe
62
Change unit of x = roe
63
Regress salary on roedec = roe/100
64
Change unit of y = salary
65
Two ways to find answer
• Use definition:
β̂0 = ȳ − β̂1 x̄
Pn
(x − x̄)(yi − ȳ )
β̂1 = Pn i
i=1
2
i=1 (xi − x̄)
66
Using natural logarithm
• Recall wage example
• Now consider
log(wage) = β0 + β1 educ + u
∆ log(wage)
β1 =
∆educ
• By property of log,
• Thus β1 is interpreted as
68
Regress log(wage) on educ
. drop lwage
log(salary ) = β0 + β1 log(sales) + u
then
∆ log(salary ) %∆salary
β1 = ≈
∆ log(sales) %∆sales
• Elasticity is free of units of salary and sales
70
Regress log(salary ) on log(sales)
. reg lsalary lsales
71
Summary: Interpretation of β1
72
5. Expected value and variance of OLS
(Wooldridge, Ch. 2.5)
73
Statistical property of OLS
• Our analysis so far has been purely algebraic that holds for
any sample
y = β0 + β1 x + u
75
Assumptions SLR.1
Population model is
y = β0 + β1 x + u
76
Assumptions SLR.2
yi = β0 + β1 xi + ui
for each i
77
Assumptions SLR.3
78
Assumptions SLR.4
E (u) = 0
79
Basic approach to show unbiasedness
• We focus on β̂1 . Want to show
E (β̂1 ) = β1
X = {(xi1 , . . . , xik ) : i = 1, . . . , n}
• If we say:
Conditional on X
it means sample outcome of regressors X are treated as
non-random (or fixed or constant) variables
81
Show unbiasedness: Step 1
• What we want to show is
E (β̂1 ) = β1
conditional on X
Pn
(xi − x̄)yi
β̂1 = i=1
SSTx
82
Show unbiasedness: Step 2
• Step 2: Replace yi with β0 + β1 xi + ui (which uses SLR.1-2)
84
Show unbiasedness: Step 3
E (β̂1 ) = β1
85
Theorem (Unbiasedness of OLS)
E (β̂1 ) = β1
E (β̂0 ) = β0
86
Simulation
y = 3 + 2x + u
where
x ∼ Normal(0, 9)
u ∼ Normal(0, 36)
87
Generate data
. range x 0 16 250
. replace x = 3*invnormal(uniform())
(250 real changes made)
. gen u = 6*invnormal(uniform())
88
Estimate by 1st sample
. gen y = 3 + 2*x + u
. reg y x
89
Estimate by 2nd sample
. * Now repeat the process, but keep x the same
. replace u = 6*invnormal(uniform())
(250 real changes made)
. replace y = 3 + 2*x + u
(250 real changes made)
. reg y x
90
Estimate by 3rd sample
. replace u = 6*invnormal(uniform())
(250 real changes made)
. replace y = 3 + 2*x + u
(250 real changes made)
. reg y x
91
Estimate by 4th sample
. replace u = 6*invnormal(uniform())
(250 real changes made)
. replace y = 3 + 2*x + u
(250 real changes made)
. reg y x
92
• 1st and 2nd generated data sets give us β̂1 ≈ 1.98, very close
to β1 = 2. 4th data set gives us β̂1 ≈ 2.15, bit far
93
Remarks on theorem
94
Variance of OLS estimator
• Under SLR.1-4, OLS estimators are unbiased, i.e. estimates
are equal to population values on average
• We also need measure of dispersion in sampling distribution
of the estimators. We employ variance (and standard
deviation)
95
Additional assumption
where σ 2 is unknown
96
Remark on SLR.5
E (y |x) = β0 + β1 x
Var (y |x) = σ 2
97
Example: Saving and income
98
Theorem (Sampling variance of OLS)
σ2 σ2
Var (β̂1 ) = Pn 2
=
i=1 (xi − x̄) SSTx
2 −1
P n 2
σ n i=1 xi
Var (β̂0 ) =
SSTx
99
Derive variance formula
n
!
X
Var (β̂1 ) = Var wi ui
i=1
100
• Since ui ’s are independent, they are uncorrelated each other.
So, variance of sum equals sum of variances (see Property
VAR.4 in App. B.4)
• Thus, conditional on X,
n n
!
X X
Var wi ui = Var (wi ui )
i=1 i=1
Xn
= wi2 Var (ui )
i=1
Xn
= wi2 σ 2
i=1
n
X
= σ2 wi2
i=1
σ2
Var (β̂1 ) =
SSTx
conditional on X
102
σ2
Remark on Var (β̂1 ) = SSTx
103
• Since SSTx /n is sample variance of x, we can say
SSTx ≈ nσx2
and thus
σ2
Var (β̂1 ) ≈
nσx2
which means Var (β̂1 ) shrinks at 1/n rate
105
Standard error of β̂1
106
Example: Return to education (WAGE1.dta)
. reg lwage educ
107