Chapter Three Multiple
Chapter Three Multiple
Chapter Three Multiple
3.1 Introduction
In simple regression we study the relationship between a dependent variable and a single
explanatory (independent variable). But it is rarely the case that economic relationships
involve just two variables. Rather a dependent variable Y can depend on a whole series
of explanatory variables or regressors. For instance, in demand studies we study the
relationship between quantity demanded of a good and price of the good, price of
substitute goods and the consumer’s income. The model we assume is:
Y i= β0 + β 1 P 1 +β 2 P2 + β 3 X i + ui -------------------- (3.1)
Xi is consumer’s income, and β' s are unknown parameters and ui is the disturbance.
Equation (3.1) is a multiple regression with three explanatory variables. In general for K-
explanatory variable we can write the model as follows:
Y i= β0 +β 1 X 1i + β2 X 2i +β 3 X 3i +. . .. .. . ..+β k X ki +ui ------- (3.2)
X k =( i=1,2,3 ,. . .. .. . , K )
Where i are explanatory variables, Yi is the dependent variable
and β j ( j=0,1,2,.... (k+1)) are unknown parameters and ui is the disturbance term.
The disturbance term is of similar nature to that in simple regression, reflecting:
- the basic random nature of human responses
- errors of aggregation
- errors of measurement
- errors in specification of the mathematical form of the model and any other
3. Hemoscedasticity: The variance of each ui is the same for all the x i values.
E(u 2 )=σ 2
i.e. i u (constant)
This condition is automatically fulfilled if we assume that the values of the X’s are
a set of fixed numbers in all (hypothetical) samples.
7. No perfect multicollinearity: The explanatory variables are not perfectly linearly
correlated.
We can’t exclusively list all the assumptions but the above assumptions are some of the
basic assumptions that enable us to proceed our analysis.
is multiple regression with two explanatory variables. The expected value of the above
model is called population regression equation i.e.
E(Y )=β 0 +β 1 X 1 + β 2 X 2 , Since E(U i )=0 . …………………................(3.4)
and β2 are also some times known as regression slopes of the regression. Note that,
β2 for example measures the effect on E(Y ) of a unit change in X2 when X 1 is
held constant.
To obtain expressions for the least square estimators, we partially differentiate ∑ e2i
^ ^ ^
with respect to β 0 , β1 and β 2 and set the partial derivatives equal to zero.
∂ [ ∑ e 2i ]
=−2 ∑ ( Y i − β^ 0− β^ 1 X 1i − β^ 2 X 2i )=0
∂ β^0 ………………………. (3.8)
∂ [ ∑ e 2i ]
=−2 ∑ X 1i ( Y i− β^ 0 − β^ 1 X 1 i− β^ 1 X 1 i )=0
∂ β^1 ……………………. (3.9)
∂ [ ∑ e 2i ]
=−2 ∑ X 2i ( Y i− β^ 0 − β^ 1 X 1 i− β^ 2 X 2i )=0
∂ β^2 ………… ………..(3.10)
Summing from 1 to n, the multiple regression equation produces three Normal
Equations:
∑ Y =n β^ 0+ β^ 1 ΣX1 i + β^ 2 ΣX 2i …………………………………….(3.11)
∑ X 2i Y i= β^ 0 ΣX 2i + β^ 1 ΣX 1i X 2 i + β^ 2 ΣX 22i ………………………...(3.13)
^
From (3.11) we obtain β 0
β^ 0 =Ȳ − β^ 1 X̄ 1 − β^ 2 X̄ 2 ------------------------------------------------- (3.14)
⇒Σy2underbracealignl T⏟
otalsumof ¿ variation) ¿¿=β^ 1 Σx1i yi+ β^ 2 Σx2i yiunderbracealignl ⏟
Explained sumof ¿ variation) ¿¿¿+Σe 2 underbracealignl R⏟
esidualsumof squares ¿ ¿¿¿¿
i
square (Total ¿ square (Explained ¿ (unexplained variation) ¿ ----------------- (3.18)
^ ^
ESS β1 Σx1 i y i + β 2 Σx 2i y i
∴ R 2= =
TSS Σy 2 ----------------------------------(3.27)
As in simple regression, R2 is also viewed as a measure of the prediction ability of the
model over the sample period, or as a measure of how well the estimated regression fits
the data. The value of R2 is also equal to the squared sample correlation coefficient
^
between Y ∧Y t . Since the sample correlation coefficient measures the linear
association between two variables, if R2 is high, that means there is a close association
^
between the values of Y t and the values of predicted by the model, Y t . In this case,
the model is said to “fit” the data well. If R 2 is low, there is no association between the
^
values of Y t and the values predicted by the model, Y t and the model does not fit the
data well.
2
3.3.3 Adjusted Coefficient of Determination ( R̄ )
2
One difficulty with R is that it can be made large by adding more and more variables,
even if the variables added have no economic justification. Algebraically, it is the fact
that as the variables are added the sum of squared errors (RSS) goes down (it can remain
2
unchanged, but this is rare) and thus R goes up. If the model contains n-1 variables
2 2
then R =1. The manipulation of model just to obtain a high R is not wise. An
2
alternative measure of goodness of fit, called the adjusted R and often symbolized as
2
R̄ , is usually reported by regression programs. It is computed as:
Σe2i /n−k
2
R̄ =1− 2
Σy /n−1
=1−( 1−R2 ) ( n−1
n−k ) --------------------------------(3.28)
This measure does not always goes up when a variable is added because of the degree of
freedom term n-k is the numerator. As the number of variables k increases, RSS goes
2 2
down, but so does n-k. The effect on R̄ depends on the amount by which R falls.
While solving one problem, this corrected measure of goodness of fit unfortunately
2
introduces another one. It losses its interpretation; R̄ is no longer the percent of
2
variation explained. This modified R̄ is sometimes used and misused as a device for
selecting the appropriate set of explanatory variables.
3.4. General Linear Regression Model and Matrix Approach
So far we have discussed the regression models containing one or two explanatory
variables. Let us now generalize the model assuming that it contains k variables. It will
be of the form:
Y =β 0 + β1 X 1 +β 2 X 2 +.. . .. .+β k X k +U
There are k parameters to be estimated. The system of normal equations consist of k+1
equations, in which the unknowns are the parameters β 0 , β1 ,β 2 .......β k and the known
terms will be the sums of squares and the sums of products of all variables in the
structural equations.
Least square estimators of the unknown parameters are obtained by minimizing the sum
of the squared residuals.
Σe2i =Σ( y i − β^ 0 − β^ 1 X 1− β^ 2 X 2 −.. .. . .− β^ k X k )2
……………………………………………………..
2
∂ Σei
=−2 Σ( Y i− β^ 0 − β^ 1 X 1 − β^ 2 X 2 −. .. .. .− β^ k X k )( x ki )=0
∂ β^ k
The general form of the above equations (except first ) may be written as:
2
∂ Σei
=−2 Σ(Y i− β^ 0 − β^ 1 X 1 i−−−−−− β^ k X ki )=0
∂ β^ j ; where ( j=1,2,....k)
The normal equations of the general linear regression model are
ΣY i =n β^ 0 + β^ 1 ΣX 1 i + β^ 2 ΣX 2i +.. .. . .. .. . .. .. .. . .... . .. .. . .. ..+ β^ k ΣX ki
ΣY i X 1 i= β^ 0 ΣX 1i + β^ 1 ΣX ^
2 +. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. ..+ β k ΣX 1i X ki
1i
ΣY i X 2 i= β^ 0 ΣX 21 i + β^ 1 ΣX 1i X 2i + β^ 2 ΣX ^
2 +. .. . .. .. . .+ β k ΣX 2i X ki
2i
: : : : :
: : : : :
ΣY i X ki= β^ 0 ΣX ki+ β^ 1 ΣX 1 X ki + ∑ X 2i X ki . .. .. . .. .. . .. .. . ..+ β^ k ΣX 2
i ki
Solving the above normal equations will result in algebraic complexity. But we can solve
this easily using matrix. Hence in the next section we will discuss the matrix approach to
linear regression model.
U= stochastic disturbance term and i=i th observation, ‘n’ being the size of the observation.
Since i represents the ith observation, we shall have ‘n’ number of equations with ‘n’
number of observations on each variable.
Y 1 =β 0 +β 1 X 11 +β 2 X 21+β 3 X 31 .. . .. .. . .. .. .+β k X k 1 +U 1
Y 2 =β 0 +β 1 X 12 +β 2 X 22 +β 3 X 32 . .. . .. .. . .. ..+β k X k 2 +U 2
Y 3 =β 0 +β 1 X 13 +β 2 X 23 +β 3 X 33 .. . .. .. . .. .. .+β k X k 3 +U 3
…………………………………………………...
Y n =β 0 + β 1 X 1 n +β 2 X 2n + β3 X 3 n .. .. . .. .. . .. .+β k X kn+ U n
[] [Y2
Yn
Y =
1 X 12
Y 3 = 1 X 13
. . .
1 X1 n
X
X 22
X 23
.
X 2n
. β
. .. .. . .
. .. .. . .
. .. .. . .
. .. .. . .
+ U
Xk2
Xk3
.
X kn
][] [ ]
β1
β2
.
βn
+
U2
U3
.
Un
To derive the OLS estimators of β , under the usual (classical) assumptions mentioned
^
earlier, we define two vectors β and ‘e’ as:
^
β e1
[] []
0
^
β e2
1
^
β= and e= .
.
. .
^
β k
en
^
Y = X β+e e=Y − X β^
Thus we can write: and
We have to minimize:
n
∑ e 2i =e 21+e 22+ e23+ .. .. . .. ..+ e2n
i =1
e1
=[e 1 , e 2 .. .. . . en ]
2
=∑ ei =e ' e
[]e2
.
.
en
= e' e
We have seen, in simple linear regression that the OLS estimators ^ β^ ) satisfy the
( α∧
small sample property of an estimator i.e. BLUE property. In multiple regression, the
OLS estimators also satisfy the BLUE property.
1. Linearity
2. Unbiased ness
3. Minimum variance
Dear Students! We hope that from the discussion made so far on multiple regression
model, in general, you may make the following summary of results.
^ ^ ^
Let Y = β 0 + β1 X 1 + β 2 X 2 +e i ………………………………… (3.51)
A. H 0 : β 1=0
H 1 : β 1≠0
B. H 0 : β 2 =0
H 1 : β 2≠0
The null hypothesis (A) states that, holding X2 constant X1 has no (linear) influence on
Y. Similarly hypothesis (B) states that holding X1 constant, X2 has no influence on the
dependent variable Yi.To test these null hypothesis we will use the following tests:
i- Standard error test: under this and the following testing methods we test only
^ ^
for β 1 .The test for β 2 will be done in the same way.
σ^ 2 ∑ x 22 i
SE ( β^ 1 )= √ var( β^ 1 )=
^ 1 ^
√ ∑ x 21i ∑ x 22 i−(∑ x1 x 2 )2 ; where
σ^ 2 =
2
Σei
n−3
β^ 2
t∗¿
SE( β^ 2 )
If t*<t (tabulated), we accept the null hypothesis, i.e. we can conclude that
β^ 2 is not significant and hence the regressor does not appear to contribute to
In this section we extend this idea to joint test of the relevance of all the included
explanatory variables. Now consider the following:
Y =β 0 + β1 X 1 + β 2 X 2 +.. . .. .. ..+ β k X k +U i
H 0 : β 1= β2 =β 3 =.. . .. .. . .. ..=β k =0
H 1 : at least one of the β k is non-zero
This null hypothesis is a joint hypothesis that β 1 ,β 2 ,........ β k are jointly or
related to X 1 , X 2 ,...... .. X k .
Can the joint hypothesis be tested by testing the significance of individual significance of
sample. Thus, in testing the significance of β^ 2 under the hypothesis that β 2=0 , it
was assumed tacitly that the testing was based on different sample from the one used in
testing the significance of β^ 3 under the null hypothesis that β 3=0 . But to test the
joint hypothesis of the above, we shall be violating the assumption underlying the test
procedure.
If the null hypothesis is true, we expect that the data are compliable with the conditions
placed on the parameters. Thus, there would be little change in the sum of squared errors
when the null hypothesis is assumed to be true.
R2 / k −1
F=
1−R2 /n−k …………………………………………..(3.55)
1
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or
R2 & 1-R2. If the null hypothesis is not true, then the difference between RRSS and
URSS (TSS & RSS) becomes large, implying that the constraints placed on the model by
the null hypothesis have large effect on the ability of the model to fit the data, and the
value of F tends to be large. Thus, we reject the null hypothesis if the F test static
becomes too large. This value is compared with the critical value of F which leaves the
probability of α in the upper tail of the F-distribution with k-1 and n-k degree of
freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the
parameters of the model are jointly significant or the dependent variable Y is linearly
related to the independent variables included in the model.