ECM Class 1 2 3
ECM Class 1 2 3
@Elisabetta Pellini 2
Quantitative Methods and Econometrics
@Elisabetta Pellini 3
Quantitative Methods and Econometrics
@Elisabetta Pellini 4
Economic and Econometric Models
• How can we structure an empirical analysis?
@Elisabetta Pellini 5
Economic and Econometric Models
• We must make an assumption about the form of the function that relates the
variables
@Elisabetta Pellini 6
Economic and Econometric Models
• Econometric Model of the CAPM: excess returns for a stock depend on the
market’s excess return and on many other factors we cannot observe
• 𝜀𝜀 is called the error term and it collects all unobserved factors affecting
the variable on the left-hand side of the equation and any approximation
error that arises because the linear functional form we have assumed may
be only an approximation to reality
@Elisabetta Pellini 7
Economic and Econometric Models
• The fourth step consists in using the data collected to estimate the
parameters of the population model (this is also called model
estimation)
@Elisabetta Pellini 8
Economic and Econometric Models
• The sixth step is the evaluation of the estimated model in the light of the
theory: are the parameters estimates of the sign and size required by the
theory?
@Elisabetta Pellini 9
Economic and Econometric Models
Collection of Data
Model Estimation
No Yes
@Elisabetta Pellini 10
Data
@Elisabetta Pellini 11
Data
@Elisabetta Pellini 12
Data
@Elisabetta Pellini 13
The Simple Linear Regression Model
• In the next slides we will examine in detail the Simple Linear Regression
Model
@Elisabetta Pellini 14
The Simple Linear Regression Model
@Elisabetta Pellini 15
The Simple Linear Regression Model
• Many factors other than ROE may determine CEO’s salary variability
(CEO’s tenure, firm’s size, type of industry, gender, CEO’s skills …), how do
we take these factors into account?
@Elisabetta Pellini 16
The Simple Linear Regression Model
• The variable 𝜺𝜺, called the error term or disturbance, captures all factors
affecting y other than x and any approximation error that may arise
because we approximate the relationship between x and y with a linear
form
@Elisabetta Pellini 17
The Simple Linear Regression Model
y x
Dependent variable Independent variable
Regressand Regressor
Effect variable Causal variable
Explained variable Explanatory variable
@Elisabetta Pellini 18
The Simple Linear Regression Model
• If the all the factors in 𝜺𝜺 are held fixed, so that the change in 𝜀𝜀 is zero (in
formulae 𝚫𝚫𝜺𝜺 = 𝟎𝟎), then x has a linear effect on y
and the change in y (namely Δ𝑦𝑦) is simply 𝜷𝜷𝟏𝟏 multiplied by the change in x
(that is Δ𝑥𝑥)
• 𝜷𝜷𝟏𝟏 measures the ceteris paribus (all other factors being equal) effect of x
on y. It is the slope parameter in the relationship between y and x
@Elisabetta Pellini 19
The Simple Linear Regression Model
• Can we really be sure that 𝜷𝜷𝟏𝟏 measures the ceteris paribus effect of x on
y? Not really, since we cannot hold all other factors constant, when in
fact we do not know them
@Elisabetta Pellini 20
The Simple Linear Regression Model
𝐸𝐸 𝜀𝜀 = 0
• For example, with reference to the CEO salary equation, we can assume
that unobservable things like average skills are zero in the population of
all CEOs
@Elisabetta Pellini 21
The Simple Linear Regression Model
• The crucial assumption we make is that the expected value of 𝜀𝜀 does not
depend on the value of x and it is the same for any value of x. This
corresponds to writing:
𝐸𝐸 𝜀𝜀|𝑥𝑥 = 𝐸𝐸 𝜀𝜀
• Assume that 𝜺𝜺 is the same as innate ability of CEOs. Then the zero
conditional mean assumption 𝐸𝐸 𝜀𝜀|𝑥𝑥 = 𝐸𝐸 𝜀𝜀 = 0
says that the average level of ability of CEOs is the same, regardless the
level of firm’s ROE
• This also means that knowledge of ROE does not help in predicting
CEO’s ability
• This assumption implies zero covariance cov 𝑥𝑥, 𝜀𝜀 = 0 and hence zero
correlation between 𝜀𝜀 and x
@Elisabetta Pellini 23
The Simple Linear Regression Model
Δ𝐸𝐸(𝑦𝑦|𝑥𝑥)
= 𝛽𝛽1
Δ𝑥𝑥
• The PRF is able to tell us only how the average value of y changes with x
@Elisabetta Pellini 24
The Simple Linear Regression Model
Δ𝐸𝐸(𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠|𝑅𝑅𝑅𝑅𝑅𝑅)
= 𝛽𝛽1
Δ𝑅𝑅𝑅𝑅𝑅𝑅
• The PRF is able to tell us that if ROE takes on a given value, than the
expected value of salary will be 𝛽𝛽0 + 𝛽𝛽1 𝑅𝑅𝑅𝑅𝑅𝑅. Whether a specific CEO
earns more or less the average salary will depend of the unobservable
factors
@Elisabetta Pellini 25
The Simple Linear Regression Model
@Elisabetta Pellini 26
The Simple Linear Regression Model
@Elisabetta Pellini 27
The Simple Linear Regression Model
Fictitious probability
distribution of CEOs salary
for firms with 20% ROE
𝜇𝜇𝑦𝑦|10
𝜇𝜇𝑦𝑦|20
@Elisabetta Pellini 28
Estimating the Regression Parameters: OLS
• However, with a sample of data from the population we can estimate 𝛽𝛽0
and 𝛽𝛽1
• Let 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : 𝑖𝑖 = 1,2, … , 𝑛𝑛 denote a random sample of size n from the
population, we can write:
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝜀𝜀𝑖𝑖
@Elisabetta Pellini 29
Estimating the Regression Parameters: OLS
• Suppose we have data on salary and ROE from a random sample of 15 CEOs
@Elisabetta Pellini 30
Estimating the Regression Parameters: OLS
• As a first step we plot one variable against the other with a scatter plot:
0 20 40 60
ROE
@Elisabetta Pellini 31
Estimating the Regression Parameters: OLS
• How can we use this data to estimate the unknown parameters of the
PRF 𝐸𝐸 𝑦𝑦|𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥?
2000 1500
salary, thousands $
1000 500
0 20 40 60
ROE
• We could draw a line through the middle of the points and use the intercept
and the slope of this line as estimates of the parameters of the PRF
@Elisabetta Pellini 32
Estimating the Regression Parameters: OLS
• We need a rule that tells us how to draw the line. Many rules are possible,
but the one that we will used is based on the ordinary least squares (OLS)
principle
@Elisabetta Pellini 33
Estimating the Regression Parameters: OLS
• The line 𝑦𝑦�𝑖𝑖 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 is called the fitted line 𝑦𝑦3
𝜀𝜀3̂
a hat (ˆ) over a variable or parameter denotes an 𝑦𝑦1
𝑦𝑦�2
estimated value 𝑦𝑦�3
𝜀𝜀1̂
𝜀𝜀2̂
𝑦𝑦2
𝑦𝑦�1
@Elisabetta Pellini 34
Estimating the Regression Parameters: OLS
• The OLS principle asserts that to fit a line to the data we should make the
sum of the squares of the vertical distances from each point to the line
as small as possible. The vertical distances are called residuals, thus the
sum of squared residuals is:
𝑛𝑛
� 𝜀𝜀𝑖𝑖̂ 2
𝑖𝑖=1
By minimising the RSS we get the formulae that allow us to estimate the
unknown population parameters given a sample of data
@Elisabetta Pellini 35
Estimating the Regression Parameters: OLS
𝑛𝑛
min � (𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 )2
𝛽𝛽 �1
�0 ,𝛽𝛽 𝑖𝑖=1
• Take the first order partial derivatives (FOC) using the chain rule and set
them =0.
𝜕𝜕𝑅𝑅𝑅𝑅𝑅𝑅
�0 = −2 ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 ) = 0 (eq. 1)
𝜕𝜕𝛽𝛽
𝜕𝜕𝑅𝑅𝑅𝑅𝑅𝑅
�1 = −2 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 (𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 ) = 0 (eq. 2)
𝜕𝜕𝛽𝛽
@Elisabetta Pellini 36
Estimating the Regression Parameters: OLS
• Since ∑𝑛𝑛𝑖𝑖=1 𝑦𝑦𝑖𝑖 = 𝑛𝑛𝑦𝑦� (where 𝑦𝑦� is the mean of y), then we re-write the above
as:
𝑛𝑛𝑦𝑦� − 𝑛𝑛𝛽𝛽̂0 − 𝛽𝛽̂1 𝑛𝑛𝑥𝑥̅ = 0
@Elisabetta Pellini 37
Estimating the Regression Parameters: OLS
𝑛𝑛
� 𝑥𝑥𝑖𝑖 (𝑦𝑦𝑖𝑖 − 𝑦𝑦� + 𝛽𝛽̂1 𝑥𝑥̅ − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 ) = 0
𝑖𝑖=1
• So as to get:
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
� 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝑦𝑦� � 𝑥𝑥𝑖𝑖 + 𝛽𝛽̂1 𝑥𝑥̅ � 𝑥𝑥𝑖𝑖 − 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖2 = 0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛 𝑛𝑛
𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖2 − 𝛽𝛽̂1 𝑛𝑛𝑥𝑥̅ 2 =� 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝑛𝑛𝑦𝑦� 𝑥𝑥̅
𝑖𝑖=1 𝑖𝑖=1
@Elisabetta Pellini 38
Estimating the Regression Parameters: OLS
∑ 𝑛𝑛
𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 −𝑛𝑛𝑦𝑦� 𝑥𝑥̅
• 𝛽𝛽̂1 = ∑𝑖𝑖=1
𝑛𝑛 2 ̅2
which can be also written as
𝑖𝑖=1 𝑥𝑥𝑖𝑖 −𝑛𝑛𝑥𝑥
𝑠𝑠𝑥𝑥𝑥𝑥
𝛽𝛽̂1 =
𝑠𝑠𝑥𝑥2
𝑠𝑠𝑥𝑥𝑥𝑥 𝑠𝑠𝑥𝑥𝑥𝑥 𝑠𝑠𝑦𝑦 𝑠𝑠𝑦𝑦
• But also: 𝛽𝛽̂1 = 2 = � = 𝑟𝑟𝑥𝑥𝑥𝑥 � (sample correlation multiplied by the
𝑠𝑠𝑥𝑥 𝑠𝑠𝑥𝑥 𝑠𝑠𝑦𝑦 𝑠𝑠𝑥𝑥 𝑠𝑠𝑥𝑥
@Elisabetta Pellini 39
Estimating the Regression Parameters: OLS
• The formulae:
𝑛𝑛
∑𝑖𝑖=1 𝑥𝑥𝑖𝑖 −𝑥𝑥̅ 𝑦𝑦𝑖𝑖 −𝑦𝑦�
𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅ and ̂
𝛽𝛽1 = ∑𝑛𝑛 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖 −𝑥𝑥̅
Are known as the OLS ESTIMATORS of the true values of 𝛽𝛽0 and 𝛽𝛽1 . OLS
estimators are random variables
• The numerical values that are obtained from applying the formula to the
sample of data are known as OLS ESTIMATES
@Elisabetta Pellini 40
Estimating the Regression Parameters: OLS
• Suppose we have data on salary and ROE from a random sample of 15 CEOs
@Elisabetta Pellini 41
Estimating the Regression Parameters: OLS
• To find the OLS estimates we plug the data into the OLS estimator(s):
∑ 𝑛𝑛
𝑥𝑥𝑖𝑖 −𝑥𝑥̅ 𝑦𝑦𝑖𝑖 −𝑦𝑦�
𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅ and 𝛽𝛽̂1 = 𝑖𝑖=1
∑𝑛𝑛 𝑥𝑥 −𝑥𝑥̅ 2
𝑖𝑖=1 𝑖𝑖
𝑦𝑦=(1095+1001+1122+578+1368+1145+1078+1094+
�
+1237+833+567+933+1339+937+2011)/15= 1089.2
𝑥𝑥̅ =(14.1+10.9+23.5+5.9+13.8+20+16.4+16.3+10.5+
+26.3+25.9+26.8+14.8+22.3+56.3)/15=20.25
@Elisabetta Pellini 42
Interpreting the Estimates
� 𝑖𝑖 “salary hat” indicates that this is an
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
estimated equation. It is the equation of
the SRF or fitted line for this sample of 15
data
� 𝑖𝑖 = 767.529 + 15.885𝑅𝑅𝑅𝑅𝑅𝑅𝑖𝑖
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
@Elisabetta Pellini 43
Interpreting the Estimates
2000
1500
1000
500
0 20 40 60 80
ROE
• With our estimated model, we predict that a CEO working for a firm whose
ROE=20% will earn on average $1,085,229
@Elisabetta Pellini 45
Prediction
….
….
@Elisabetta Pellini 46
Algebraic Properties of OLS Estimates
Deviations from regression line sum up to Sample covariance between deviations and
zero. This comes from the first FOC of the regressor is zero. This comes from the second FOC:
𝑛𝑛
minimisation problem:
𝑛𝑛 � 𝑥𝑥𝑖𝑖 (𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 ) = 0
� (𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 ) = 0 𝑖𝑖=1
𝑖𝑖=1
• This question is not answerable. We will never know the true values of
the population parameters 𝛽𝛽0 𝑎𝑎𝑎𝑎𝑎𝑎 𝛽𝛽1 , so we cannot say how close 𝛽𝛽̂0 =
767.529 and 𝛽𝛽̂1 = 15.885 are to the true values
@Elisabetta Pellini 48
Unbiasedness
• The specific estimate that we get may be smaller or larger than the true parameter,
depending on the sample that is selected. However if sampling is done repeatedly
and estimates are computed each time, then the average of these estimates will be
equal to the parameter
�
𝑓𝑓(𝜽𝜽)
Sampling
distribution of an
unbiased estimator
𝜃𝜃 𝜃𝜃�
@Elisabetta Pellini 49
Efficiency
• The variance of an estimator measures the precision of the estimator in the sense
that it tells us how much the estimates can vary from sample to sample
• The smaller the variance of an estimator is, the greater the sampling precision of
that estimator
�
𝑓𝑓(𝜃𝜃), Sampling distribution of the
If we compare two unbiased �
𝑓𝑓(𝜃𝜃) estimator 𝜽𝜽 � , the sampling
variance its smaller than that
estimators, say 𝜃𝜃̂ and 𝜃𝜃,
� the of the other estimator as the
distribution is more tightly
one with the smallest centrend about 𝜽𝜽
𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃̂ < 𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃�
variance is said to be
efficient relative to the other Sampling
distribution of
the estimator 𝜃𝜃�
𝜃𝜃 � 𝜃𝜃�
𝜃𝜃,
@Elisabetta Pellini 50
Gauss-Markov Assumptions
@Elisabetta Pellini 51
Gauss-Markov Assumptions
“linear in the parameters” means that the model can be seen as a linear combinations
of parameters or a weighted sums of parameters with the variables being the
weights
1
• This y = + 𝜀𝜀 is not a linear model as it is not a linear combinations of
𝛽𝛽0 +𝛽𝛽1 𝑥𝑥
@Elisabetta Pellini 52
Gauss-Markov Assumptions
@Elisabetta Pellini 53
Gauss-Markov Assumptions
𝐸𝐸 𝜀𝜀|𝑥𝑥 = 0
@Elisabetta Pellini 54
Gauss-Markov Assumptions
@Elisabetta Pellini 55
Gauss-Markov Assumptions
• When this assumption does not hold, we say that the error term is
heteroscedastic
• Remark: the conditional variance of the dependent variable is:
𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦|𝑥𝑥 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝜀𝜀|𝑥𝑥 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝜀𝜀|𝑥𝑥
(the element 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 is not random when we condition on x, that is when we
treat x as if it were known)
• Because of this relationship, we can say that heteroscedasticity is present
whenever the conditional variance of 𝒚𝒚 is function of 𝒙𝒙
@Elisabetta Pellini 56
Gauss-Markov Assumptions
@Elisabetta Pellini 57
The Variance of OLS Estimators
@Elisabetta Pellini 58
The Variance of OLS Estimators
2
∑ 𝑥𝑥𝑖𝑖 𝜎𝜎 2
𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0 = 𝜎𝜎 2 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 =
𝑛𝑛 ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2
• The sampling variance of the OLS estimators will be the higher, the
larger the variance of the unobserved factors (i.e. 𝜎𝜎 2 ), and the lower, the
higher the variation in the explanatory variable (∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 )
@Elisabetta Pellini 59
Estimating the Error Variance
• The previous formulae require knowledge of the variance of the error term
𝜎𝜎 2 . This quantity is unknown, given that the error term is unknown
• Having an unbiased estimator of the error variance, we can get estimators of the
variances of the OLS estimators 𝛽𝛽̂0 and 𝛽𝛽̂1 by replacing the unknown variance
𝜎𝜎 2 with 𝑠𝑠 2
2
∑ 𝑥𝑥𝑖𝑖 𝑠𝑠 2
� 𝛽𝛽̂0
𝑣𝑣𝑣𝑣𝑣𝑣 = 𝑠𝑠 2 � 𝛽𝛽̂1
𝑣𝑣𝑣𝑣𝑣𝑣 =
𝑛𝑛 ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2
• The square roots are called Standard Errors of 𝛽𝛽̂0 and 𝛽𝛽̂1 :
∑ 𝑥𝑥𝑖𝑖2 1
𝑆𝑆𝑆𝑆 𝛽𝛽̂0 = 𝑠𝑠 𝑆𝑆𝑆𝑆 𝛽𝛽̂1 = 𝑠𝑠
𝑛𝑛 ∑(𝑥𝑥𝑖𝑖 −𝑥𝑥)̅ 2 ∑(𝑥𝑥𝑖𝑖 −𝑥𝑥)̅ 2
• 𝑆𝑆𝑆𝑆 𝛽𝛽̂0 and 𝑆𝑆𝑆𝑆 𝛽𝛽̂1 are random variables when we think of OLS running over
different samples. For one given sample 𝑆𝑆𝑆𝑆 𝛽𝛽̂0 and 𝑆𝑆𝑆𝑆 𝛽𝛽̂1 are just numbers that
gives us an estimate of the precision of the OLS estimators (not of the accuracy of a
specific set of parameter estimates)
@Elisabetta Pellini 61
Estimating the Variance of the OLS estimators
1 𝑛𝑛 1178055
𝑠𝑠 = � 𝜀𝜀𝑖𝑖̂ 2 = = 301.03
𝑛𝑛 − 2 𝑖𝑖=1 13
∑ 𝑥𝑥𝑖𝑖2 8106.78
𝑆𝑆𝑆𝑆 𝛽𝛽̂0 = 𝑠𝑠 = 301.03 = 158.32
𝑛𝑛 ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 15(1953.817)
1 1
𝑆𝑆𝑆𝑆 𝛽𝛽̂1 = 𝑠𝑠 = 301.03 = 6.81
∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 1953.817
@Elisabetta Pellini 62
Gauss-Markov Theorem
@Elisabetta Pellini 63
Gauss-Markov Theorem
• Best: the OLS estimators have the smallest variance when compared to
similar estimators that are linear unbiased estimators
• Unbiased: it means that 𝐸𝐸(𝛽𝛽̂0 ) = 𝛽𝛽0 and 𝐸𝐸(𝛽𝛽̂1 ) = 𝛽𝛽1 , that is on average,
the value of the 𝛽𝛽̂0 and 𝛽𝛽̂1 will be equal to the true values 𝛽𝛽0 and 𝛽𝛽1
@Elisabetta Pellini 64
Gauss-Markov Theorem
• The Gauss-Markov theorem justifies the use of the OLS method rather
than other competing methods (or estimators): when the assumptions
hold no estimator will be better than OLS
• However, if any of the assumptions fails, then the OLS is not BLUE
@Elisabetta Pellini 65
References
@Elisabetta Pellini 66