Lesson 2: Multiple Linear Regression Model (I) : E L F V A L U A T I O N X E R C I S E S

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Lesson 2:

MULTIPLE LINEAR REGRESSION


MODEL (I)
2.1. FORMULATION AND BASIC ASSUMPTIONS OF THE
MODEL

2.2. ORDINARY LEAST SQUARE ESTIMATION (OLS).


STATISTICAL PROPERTIES
2.3. RESIDUAL ANALYSIS
2.4. ESTIMATION OF THE VARIANCE OF THE DISTURBANCE
TERM

2.5. MAXIMUM LIKELIHOOD


STATISTICAL PROPERTIES

ESTIMATION

( M LE) .

2.6. MODEL VALIDATION AND MEASURES OF GOODNESS OF


FIT

2.7. THE MULTIPLE LINEAR REGRESSION MODEL IN


DEVIATIONS

2.8. CHANGE OF ORIGIN AND SCALE OF THE VARIABLES.


UNITS OF MEASUREMENT

SELF-EVALUATION EXERCISES

OBJECTS:
In this lesson we study the specification for the multiple linear
regression model (MLRM) that establishes a linear relation between
the endogenous variable and the exogenous variables.
Then, we will see the basic assumptions underlying the model and
we will estimate the parameters of the model. Finally, we will
present the steps to be followed in order to validate the estimated
model: the determination coefficient and its correction, and the
measures which allow evaluating the goodness of fit. We then
analyze the economic and statistical significance of the parameter
estimated.
KEYWORDS:
Endogenous and exogenous variables, parameters, error term, basic
assumptions, OLS estimation, ML estimation, statistical properties
of the estimators, residuals, estimated variance of the error term,
goodness of fit, R2 coefficient, economic significance, individual or
joint statistical significance.
REFERENCES:
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern
Approach, 5th Edition Michigan State University. ISBN-10:
1111531048 ISBN-13: 9781111531041. 2012. Chapter 2, 3 (no
multicollinearity), 4 until 4.4, approximatively.
William H. Greene, Econometric Analysis, 7/E. Stern School of
Business, New York University. Prentice Hall. 2012. Chapter 2, 3
(3.1, 3.2, 3.5, 3.6), 4 (4.1-4.8), 5 (5.1-5.2), 17 (17.1-17.4).

2.1. FORMULATION AND BASIC ASSUMPTIONS OF THE


REGRESSION MODEL

A) FORMULATION OF THE REGRESSION MODEL


2

B) DEFINITION OF THE BASIC ASSUMPTIONS

A) Formulation of the regression model


A multiple linear regression model (MLRM) is the model
where an endogenous variable (Y) can be explained by
different k exogenous variables (X)

y = f ( x1 , x 2 , x 3 ,..., x K )

the

causality

relationship is linear and unidirectional


Multiple Linear Regression Model (MLRM)

y = 1x1 + 2 x2 + 3x3 +...+ k x k


x1=1

y = 1 + 2 x2 + 3x3 +...+ k x k
independent term or
constant

y
j =
x j

j = 2 ,3,..., k

Parameters o coefficients
It measures the amount by which y
changes when x increases by
one unit.

But, in the model specified an additional element is missing. en


3

Deterministic Relationship

Stochastic Relationship

Introducing the Error Term


For instance: Keynesian Consumption Function

Ci = 1 + 2 I i
It implies a deterministic relationship. It supposes that once
you know the value of the Income is possible to know
exactly the amount of Consumption. Moreover, it supposes
that all families with the same Income have the same
Consumption level. However, the reality is different:
C
x
x
x
x

x
x

x
x
x

x
x

Ci = 1 + 2 I i

x
x

In fact, there exist other factors that influence the


Consumption decisions other than the Income. Hence, we
have to introduce a term that will include:

The effect of other variables that are also important in


explaining the consumption behaviour, but that it is not
4

possible to include among the explicative variables (and


for which the joint effect on the consumption is null on
average).
The random behaviour of the Consumption (and, in
general, of the economic relationships)
Measurement errors of the variables included in the model
or possible specification errors.

Therefore, each econometric model has to reflect a


stochastic and not deterministic relationship between the
variables:
Deterministic Part

Stochastic (random)
part: not observable

y = 1 + 2 x2 + 3x3 +...+ k x k + u
y

: Endogenous Variable or Dependent variable or

Explained Variable
xj : Exogenous Variable or Independent variable (j=2,3,...,k)
or Explanatory Variable
u : Error term or disturbance
1 : Coefficient of the "constant" (independent term)
j : Other coefficients or parameters

(j=2,3,...,k)

Then, we have to select a spatial or time dimension and we


need a sample of observation to estimate the model...

Cross sectional

y i = 1 + 2 x 2 i + 3 x 3i +...+ k x ki + u i
Two types of
data

i=1,2,...,N

Time series

y t = 1 + 2 x 2 t + 3 x 3t +...+ k x k t + u t
t=1,2,...,T

We continue with an example using cross sectional data,


where for each individual we can specify an equation:

y 1 = 1 + 2 x 21 + 3 x 31 +...+ k x k1 + u 1
y 2 = 1 + 2 x 22 + 3 x 32 +...+ k x k 2 + u 2
y 3 = 1 + 2 x 23 + 3 x 33 +...+ k x k 3 + u 3
...

y N = 1 + 2 x 2 N + 3 x 3N +...+ k x kN + u N

yi : endogenous variable for the observation i.


6

xji : exogenous variable xj for observation i.


This system can be written in a matrix form:

y1 1 x 21
y 1 x
22
2 =
... ... ...

yN 1 x2 N

x 31
x 32
...
x 3N

... x k 1 1 u1
... x k 2 2 u 2
+
... ... ... ...

... x kN k u N

YNx1 = X Nxk kx1 + U Nx1


B) Definition of the Basic Assumptions
In order to determine the properties of the estimators and the
tests that it is possible to perform, it is necessary to
formulate the following hypothesis:
a) Basic Assumptions upon the model
b) Basic Assumptions upon the error term
c) Basic Assumptions upon the exogenous variables
d) Basic Assumptions upon the parameters

a) Basic Assumptions upon the model


7

We have three basic assumptions:


1) the model is stochastic
2) the model is linear (or it is possible to make it linear).
It is possible to establish two types of relationships: linear
and not linear. The non linear relationship can give raise to:
Strictly linear models: nonlinear relationship which can
be made linear by some transformation in the model.
They are linear with respect to the parameters, but not
with respect to the variables.
Example: Cobb-Douglas Production Function

Yi = ALi 1 Ki 2
LnYi = LnA + 1 LnL i + 2 LnK i
Inherently nonlinear models: they cannot be made
linear. They are nonlinear with respect to the
parameters.
3) The size of the data sample. The minimum requirement
is that the number of observations is higher or equal than the
numbers of parameters to be estimated. This is the minimum
requirement, it is advisable however to have a big sample
size to guarantee a good and feasible estimation.
8

Nk

Nk 0
Degrees of freedom

b) Basic assumption upon the error term


There are 4 basic assumptions:
1) The expected value of the error term is equal to 0
E(ui)=0 i

E ( U) = 0 Nx1

E ( u1 ) 0
E( u ) 0
2
E ( U) =
=
... ...

E( u N ) 0

2) The variance of the error term is constant

VAR ( u i ) = 2u

Homoskedasticity assumption

When this assumption is not satisfied, there is


Heteroskedasticity

VAR ( u i ) = 2ui

VAR ( u j ) = 2uj

We assume that the error term is homoskedastic

3) There is no autocorrelation in the error terms, i.e. they


are independent.

)] [

COV( u i , u j ) = E ( u i E ( u i )) u j E ( u j ) = E u i u j = 0
i, j = 1,2, ... , N i j

If the error term is homoskedastic and there is no


autocorrelation we will say that the error term is spherical
and in this case the variance-covariance matrix of the error
term () is scalar.
Which will be the dimension of ?

10

u1

2
= VAR(U ) = E [UU '] = E (u1 u 2 ... u N ) =

...


u N

u1u1 u1u 2

u 2 u1 u 2u 2

=E
...
...

u N u1 u N u 2
u2 0

0 u2
=
... ...
0
0

0
1

0
0
2
=

u
...
... ...

0
... u2

...
...

( )

2
... u1u N due E ui = VAR ( ui ),

... u 2u N E ( ui u j ) = COV ( ui u j ) i
=
...
... dueBasic Hypotheses :

... u N u N VAR ( ui ) = u2 and COV ( ui u

0 ... 0

1 ... 0
2
=

I NxN
u
... ... ...

0 ... 1

Therefore, with no heteroskedasticity and no autocorrelation


we will have that:

= VAR ( U) = 2u I N
is a square matrix, symmetric and positive definite.

11

A way to summarize the basic assumptions


homoskedasticity and absence of autocorrelation is:

( )

E ui u j =

2u

i= j

i j

of

4) The error term follows a normal distribution

In this way the basic hypotheses of the error term can be


summarized and expressed as follows:
ui ~ N(0, u )
2

2
U ~ N(0, u IN)

c) Basic Assumptions upon the explanatory variables


There are 5 basic assumptions:
1) The explicative variables are fixed or deterministic (they
are not random)
2) The explicative variables are uncorrelated with the error
term. Therefore:

E x ji u i = 0

i = 1,2 ,..., N
j = 1,2 ,..., k

12

3) There is no exact linear relationship between the


explicative variables...there is no perfect multicollinearity.
The columns of the matrix X are linearly independent.

(X)=k

(the rank of the matrix X is full)

4) The explicative variables are not measured with errors.


5) In the model there is neither omission of relevant
variables nor inclusion of irrelevant variables.
At this point we can ask: which is the expected value and the
variance of Y?

E ( Y ) = E ( X + U ) = X + E ( U ) = X

VAR ( Y ) = E [ ( X + U ) E ( Y )] 2 =

( )

= E [ ( X + U ) X ] 2 = E U 2 = u2
Therefore Y follows a normal distribution with expected
value X and variance u .
2

d) Basic assumptions upon the parameters


The j coefficients are constant for the entire data sample

13

Assumption of Structural Stability

Once we have analyzed the basic assumptions we can say


that we can have a situation where one or more of these
assumptions are not met...

Econometrics II

Heteroskedasticity
Autocorrelation
No normality of the error term U

Problem of endogeneity of the X

Econometrics III

Perfect Multicollinearity

Measurement Errors
Specification Error of the X
Structural Change

Econometrics I - Lesson 5

Econometrics I - Lesson 6

14

You might also like