Chapter - 2
Chapter - 2
Chapter - 2
SIMPLE
REGRESSION I
Rozina Econometrics
Econometrics Rozina
2 Introducing Simple Regression
Econometrics Rozina
Introducing Simple Regression
3
Y effected by cause X x
dependent variable independent variable
explained variable explanatory variable
response variable control variable
predicted variable predictor variable
regressand regressor
we are actually going to derive the linear regression model in three very different
ways
these three ways reflect three types of econometric questions we
discussed in the first lecture (descriptive, causal and forecasting)
while the math for doing it is identical, conceptually they are very different
ideas
Econometrics Rozina
4 Descriptive Approach
Why do we need a regression model?
Estimation & interpretation.
Correlation vs. causation.
Econometrics Rozina
Introducing Simple Regression
Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…
Introducing Simple Regression
as long as
By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit
The simple linear regression model is rarely applicable in practice but its
discussion is useful for pedagogical reasons
Introducing Simple Regression
Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed
e.g. intelligence …
This should not happen to have a coreelation between education and intelligence
Introducing Simple Regression
This means that the average value of the dependent variable can be expressed
as a linear function of the explanatory variable
Introducing Simple Regression
5
€1,500
wage €1,000
€500
0
8 10 12 14 18 20
16
Econometrics education (years) Rozina
Introducing Simple Regression (cont’d
6 )
x is continuous – an infinite number of possible values. Continuous random variables are
usually measurements. Examples include height, weight, the amount of sugar in an orange, the time
required to run a mile. problem: no categories
example: CEO’s salary vs. company’s ROE
ROE = return on equity = net income as a percentage of common equity (ROE =
10 means if I invest $100 in equity in a firm I earn $10 a year)
does a CEO’s salary (typically huge) reflect her performance?
4000
CEO’s
salary 3000
2000
($1,000s)
1000
0
0 10 20 30 40 50
company’s
Econometrics ROE Rozina
Introducing Simple Regression (cont’d
7 )
therefore, we need a model for E[y|x]
in other words, we need to find a “good” mathematical expression for f
in E[y|x] = f(x) )
the simplest model I can think of is E[y|x] = β0 + β1x
First observation
Second observation
interepretation:
intercept: β0 = E[y|x =
0]
slope: E[ y|x] E[ y|x]
1
x
x
E[y|x] = β0 + β1x
E[y|x]
β1
1
β0
Econometrics Rozina
Estimation
9
once we’ve decided about the functional form of the model (in our case, it’s E[y|
x] = β0 + β1x), we need to develop techniques to obtain estimates of the
parameters β0 and β1
we base our estimates on a sample of the population → we need to make
inferences about the whole population
sample:
n observations (people, countries, etc.) indexed with i
x and y values for ith person denoted as yi , xi (for i = 1,…,n)
Preliminaries
it will be useful to define u = y – β0 – β1x
what do we know about u?
first, because E[y|x] = β0 + β1x, we have E[u|x ] E[ y 0 1x|x ]
E[ y|x ] 0 1x
0 1x 0 1x 0
Econometrics Rozina
Estimation (cont’d
10 )
this means two things:
1. E u = 0. (This should be intuitive: E[u|x] = 0 for all x → E u = 0;
2. the expected value of u does not change when we change x
the second property has numerous implications, the most important being:
cov(x,u) = 0 ()
E[xu] = 0 (because cov(x,u) = E[xu] – Ex E u )
how is this useful in estimation?
we arrived at two important facts about expectations of u:
Eu=0
E[xu] = 0
typically, expectations are estimated
using a sample mean
(e.g., how would you estimate mean wage
with a sample of 10 people?)
we’ll
Econometrics use the idea of sample analogue, Rozina
Estimation (cont’d
11 )
before we move on, we’ll revise some more statistical concepts connected
with random sample
remember we need to make inference about the population based on our sample
and its characteristics
Population vs. sample characteristics:
x E x x 1 N 1 xi x 1 n 1 xi
N i n i
n
1
var x E(x x ) 2
1
N
(xi x ) 2 2
1 i
N i1
n1 i (x
x)
cov(x, y) E(x x )( y y )
1
N ( y )(x ) 1
n 1 ( yi y)(xi x )
N i1 i y i x n1 i
the expressions on the right are called sample mean, sample variance
and sample covariance
Econometrics Rozina
Estimation (cont’d
12 )
for ith person (i.e., observation) in our sample, we have the population regression
model
y i = β 0 + β 1 x i + ui ,
where β0 and β1 are the unknown parameters to be estimated
to think about estimation let’s define the sample regression model
ˆ0 and ˆ1 are our estimates of β0 and β1 from the sample
uˆ i is defined as uˆ i yi ˆ0 ˆ1xi
note: if ˆ0 and ˆ1 are like β0 and β1 , then uˆ i should be like ui
sample analogue:
in order to make inferences about the population, we have to believe
that our sample looks like the population
if it is so, then let’s force things to be true in the sample which we
know would be true in the population
Econometrics Rozina
Estimation (cont’d
13 )
from the discussion about (the population’s) u, we know that
Eu=0
E[xu] = 0
the sample analogue to this is:
0 1nn i
1 uˆ i
1n ni1 ( yi ˆ0
as yi and xi are the data, the above equations are in fact two linear equations in
variables ˆ0 andˆ1 ; they can be solved (fairly) easily (see
Wooldridge, pages
note that the first 28 and
equation tells29)
us that ˆ0 y ˆ1x , where x and y
are the sample means of xi’s and yi’s
n
1
ˆ i1 ( yi y)(x i x )
solving the equations to get ˆ 1 2
(x x )
ni1 i
yields:
Econometrics Rozina
Estimation (cont’d
14 )
Econometrics Rozina
Interpretation:
Estimation Intercept or B0= 963.191
- Slope B1: 18.5
If there is no change ROE the salary of
CEO is 963.191
- if the ROE increase by 1% the salary will
increase by 18.501
CEO Salary and return on equity - the sign is positive=> positive
relationship between salary and ROE
Fitted regression
Causal interpretation?
Estimation
Fitted regression
Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by 0.54 $
Causal interpretation?
Correlation Vs. Causation
16
looking at the relationship between x and y from the causal standpoint, the
association (or correlation) between x and y can represent three basic situations:
the descriptive approach makes no claims about which one is the case
Econometrics Rozina
19 Causal Approach
The need for causal analysis. Structural
model and its assumptions. Estimation.
Examples.
Econometrics Rozina
Causal Analysis
20
Econometrics Rozina
Causal Analysis (cont’d
21 )
to say something about causality we need to make some more
assumptions
in practice, these assumptions will have to be checked using an economic theory /
common sense + knowledge of the real problem
mathematically, the formulation of these assumptions consists in writing
down a structural data-generating model
this will look similar to what we have been doing, but conceptually it is very
different
for the description-type analysis, we started with the data and then
asked which model could help summarize the conditional expectation
for the structural (causal) case we start with the model and then use it to say
what the data will look like (even before we actually collect them)
Econometrics Rozina
Structural model
22
Econometrics Rozina
How About u?
24
in some textbooks, you can read that u contains a couple more things:
measurement errors (or poor proxy variables)
the intrinsic randomness (in human behaviour etc.)
model specification errors
→ these will not be all that relevant for our causal discussion
Econometrics Rozina
Crucial Assumption: E[u|x] = 0
25
Econometrics Rozina
Crucial Assumption: E[u|x] = 0 (cont’d
26 )
as before, the assumption that E[u|x] = 0 really means two separate things,
one of which is a big deal, the other is not:
Econometrics Rozina
E[u|x] = 0 Vs.
28
Causation
what does the assumption tell us about causation?
remember the three causation schemes between x and y:
y←x , y→x ,
y←z→x
in the causal analysis, we want to rule out all but the first one
this is exactly what the assumption about E[u|x] effectively does
the “arrow scheme” now contains three letters: y, x and u
we know that u affects y (by definition), y←u
note that E[u|x] = constant implies cov(x,u) = 0
therefore, we’d like the arrow scheme to look like this:
Econometrics Rozina
E[u|x] = 0 Vs. (cont’d
29
Causation )
imagine we have the reversed causality y→
x:
y x here u affects y which in turn
affects x, so u affects x, and u
and x are correlated
u
therefore cov(x,u) ≠ 0 and the assumption is necessarily violated
now consider the case y ← z → x :
if there’s any z that affects y, it is a part of u (by definition), and therefore
z (and u) affect x
y x
u contains z, so u affects x,
meaning the two are correlated
u
Econometrics Rozina
Estimation
30
n n i1
0 1nn 1ix i uˆ i 1 xˆi 1(xyii) ˆ0 ˆ1xi )
n
i1 ( yi y)(x i
which gives us ˆ1 xn ) 2
, 0 y 1ˆ x exactly as before
(x x )
i1 i ˆ
it is only the interpretation that has changed
Econometrics Rozina
Estimation
Goodness-of-Fit
„How well does the explanatory variable explain the dependent variable?“
Measures of Variation
… if years of education
are increased by one year
Estimation
Fitted regression
For example:
For example:
+ 1 % sales ! + 0.257 % salary
The log-log form postulates a constant elasticity model, whereas the semi-log
form assumes a semi-elasticity model
Standard assumptions for the linear regression model
1
y 1n yi 0ˆ , 1 sd( y) n1
SST
ˆ
Model 1: OLS, using observations 1-209
Dependent variable: salary
R2
SSE 1 SSR SSR
ˆ n
k1
1
SSR
SST SST
Econometrics Rozina
Standard assumptions for the linear regression model
Interpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on the sample
that is the result of a random draw
However, on average, they will be equal to the values that charac-terize the true
relationship between y and x in the population
„On average“ means if sampling was repeated, i.e. if drawing the random sample
und doing the estimation was repeated many times
In a given sample, estimates may differ considerably from true values
Standard assumptions for the linear regression model
Conclusion:
The sampling variability of the estimated regression coefficients will be the higher
the larger the variability of the unobserved factors, and the lower, the higher the
variation in the explanatory variable
The Simple
Regression Model
Estimating the error variance
descriptive approach
goal: find the association between x and y expressed in terms of
conditional expectation E[y|x]
assumptions: approximate functional form of E[y|x]
causal approach
goal: find the causal effect of x on y
assumptions:
structural model for y (“how y is created”)
assumptions about u: E[u|x] = 0 (and thus, E[xu] = 0)
forecasting approach
goal: predict future values of y based on the knowledge of future x
assumptions: approximate functional form of the relation “y vs. x”
→ different goals, different assumptions, same formulas for estimates
→ the econometric software has only one procedure for all three cases,
you have to know what you’re doing, check the assumptions etc.
Econometrics Rozina