3.1 Multiple Linear Regression Model
3.1 Multiple Linear Regression Model
3.1 Multiple Linear Regression Model
Multiple Regression
A fitted linear regression model always leaves some residual variation. There
might be another systematic cause for the variability in the observations yi . If we
have data on other explanatory variables we can ask whether they can be used to
explain some of the residual variation in Y . If this is a case, we should take it into
account in the model, so that the errors are purely random. We could write
Yi = β0 + β1 xi + β2 zi + ε⋆i .
| {z }
previously εi
55
56 CHAPTER 3. MULTIPLE REGRESSION
Analysis of Variance
Source DF SS MS F P
Regression 1 18299.8 18299.8 44.03 0.000
Error 19 7896.4 415.6
Total 20 26196.2
(a) (b)
Figure 3.1: (a) Fitted line plot for Dwine Studios versus per capita disposable personal income
in the community. (b) Residual plots.
The regression is highly significant, but R2 is rather small. It suggests that there
could be some other factors, which are also important for the sales. We have
data on the number of persons aged 16 or younger in the community, so we will
examine whether the residuals of the above fit are related to this variable. If yes,
then including it in the model may improve the fit.
Indeed, the residuals show a possible relationship with the number of persons aged
16 or younger in the community. We will fit the model with both variables, X1
and X2 included, that is
Yi = β0 + β1 x1i + β1 x2i + εi , i = 1, . . . , n.
The model fit is following
Analysis of Variance
Source DF SS MS F P
Regression 2 24015 12008 99.10 0.000
Residual Error 18 2181 121
Total 20 26196
Here we see that the intercept parameter is not significantly different from zero
(p = 0.226) and so the model without the intercept was fitted. R2 is now close to
100% and both parameters are highly significant.
Regression Equation
Y = 1.62 X1 + 4.75 X2
Coefficients
Term Coef SE Coef T P
X1 1.62175 0.154948 10.4664 0.000
X2 4.75042 0.583246 8.1448 0.000
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 2 718732 718732 359366 2917.42 0.000
Error 19 2340 2340 123
Total 21 721072
58 CHAPTER 3. MULTIPLE REGRESSION
Figure 3.3: Fitted surface plot and the Dwine Studios observations.
A Multiple Linear Regression (MLR) model for a response variable Y and ex-
planatory variables X1 , X2 , . . . , Xp−1 is
E(Y |X1 = x1i , . . . , Xp−1 = xp−1i ) = β0 + β1 x1i + . . . + βp−1xp−1i
var(Y |X1 = x1i , . . . , Xp−1 = xp−1i ) = σ 2 , i = 1, . . . , n
cov(Y |X1 = x1i , .., Xp−1 = xp−1i , Y |X1 = x1j , .., Xp−1 = xp−1j ) = 0, i 6= j
As in the SLR model we denote
Yi = (Y |X1 = x1i , . . . , Xp−1 = xp−1i )
and we usually omit the condition on Xs and write
µi = E(Yi ) = β0 + β1 x1i + . . . + βp−1 xp−1i
var(Yi ) = σ 2 , i = 1, . . . , n
cov(Yi , Yj ) = 0, i 6= j
or
Yi = β0 + β1 x1i + . . . + βp−1 xp−1i + εi
E(εi ) = 0
var(εi ) = σ 2 , i = 1, . . . , n
cov(εi , εj ) = 0, i 6= j
For testing we need the assumption of Normality, i.e., we assume that
Yi ∼ N (µi , σ 2 )
ind
3.2. LEAST SQUARES ESTIMATION 59
or
εi ∼ N (0, σ 2 )
ind
To simplify the notation we write the MLR model in a matrix form
Y = Xβ + ε, (3.1)
that is,
Y1 1 x1,1 ··· xp−1,1 β0 ε1
Y2 1 x1,2 ··· xp−1,2 β1 ε2
.. = .. .. .. .. + ..
. . . ··· . . .
Yn 1 x1,n · · · xp−1,n βp−1 εn
| {z } | {z }| {z } | {z }
:= Y := X := β := ε
Here Y is the vector of responses, X is often called the design matrix, β is the
vector of unknown, constant parameters and ε is the vector of random errors.
To derive the least squares estimator (LSE) for the parameter vector β we min-
imise the sum of squares of the errors, that is
n
X
S(β) = [Yi − {β0 + β1 x1,i + · · · + βp−1 xp−1,i }]2
i=1
X
= ε2i
= εT ε
= (Y − Xβ)T (Y − Xβ)
= (Y T − β T X T )(Y − Xβ)
= Y T Y − Y T Xβ − β T X T Y + β T X T Xβ
= Y T Y − 2βT X T Y + β T X T Xβ.
60 CHAPTER 3. MULTIPLE REGRESSION
βb = (X T X)−1 X T Y
S(β) − S(β 0 )
= Y T Y − 2β T X T Y + β T X T Xβ − Y T Y + 2β T0 X T Y − β T0 X T Xβ 0
= −2β T X T Xβ 0 + β T X T Xβ + 2β T0 X T Xβ 0 − β T0 X T Xβ 0
= β T X T Xβ − 2β T X T Xβ 0 + β T0 X T Xβ 0
= β T X T Xβ − β T X T Xβ 0 − β T X T Xβ 0 + β T0 X T Xβ 0
= β T X T Xβ − β T X T Xβ 0 − β T0 X T Xβ + β T0 X T Xβ 0
= β T (X T Xβ − X T Xβ 0 ) − β T0 (X T Xβ − X T Xβ 0 )
= (β T − β T0 )(X T Xβ − X T Xβ 0 )
= (β T − β T0 )X T X(β − β 0 )
Note that, as we did for the SLM in Chapter 2, it is possible to obtain this result
by differentiating S(β) with respect to β and setting it equal to 0.
Theorem 3.2. If
Y = Xβ + ε, ε ∼ Nn (0, σ 2 I),
then
βb ∼ N p (β, σ 2 (X T X)−1 ).
The expectation and variance-covariance matrix can be shown in the same way as
in Theorem 2.7.
Remark 3.1. The vector of fitted values is given by
b = Yb = X βb
µ
= X(X T X)−1 X T Y
= HY .
Note that
HT = H
and also
= H.
Lemma 3.2. Var(e) = σ 2 (I − H).
Proof.
Var(e) = (I − H) var(Y )(I − H)T
= (I − H)σ 2 I(I − H)
= σ 2 (I − H)
Lemma 3.3. The sum of squares of the residuals is Y T (I − H)Y .
Proof.
n
X
e2i = eT e = Y T (I − H)T (I − H)Y
i=1
= Y T (I − H)Y
Lemma 3.4. The elements of the residual vector e sum to zero, i.e
n
X
ei = 0.
i=1
3.3. ANALYSIS OF VARIANCE 63
P
But we know that e2i is the minimum value of S(β) so there cannot exist values
with a smaller sum of squares and this gives the required contradiction. So c =
0.
Corollary 3.1.
n
1Xb
Yi = Ȳ .
n i=1
P P P
Proof. The residuals ei = Yi − Ybi , so ei = (Yi − Ybi ) but ei = 0. Hence
P Pb
Yi = Yi and so the result follows.
Proof.
X
SST = (Yi − Ȳ )2
X
= Yi2 − nȲ 2
= Y T Y − nȲ 2 .
64 CHAPTER 3. MULTIPLE REGRESSION
X
SSR = (Ybi − Ȳ )2
X X
= Ybi2 − 2Ȳ Ybi +nȲ 2
| {z }
=nȲ
X
= Ybi2 − nȲ 2
T
= Yb Yb − nȲ 2
T
= βb X T X βb − nȲ 2
= Y T X(X T X)−1 X T X(X T X)−1 X T Y − nȲ 2
| {z }
=I
= Y T HY − nȲ 2 .
SSE = Y T (I − H)Y
and so
H0 : β1 = β2 = . . . = βp−1 = 0,
H1 : ¬H0 ,
which means that at least one of the coefficients is non-zero. Under H0 , the model
reduces to the null model
Y = 1β0 + ε,
3.3. ANALYSIS OF VARIANCE 65
In testing H0 we are asking if there is sufficient evidence to reject the null model.
Source d.f. SS MS VR
T SSR M SR
Overall regression p−1 Y HY − nȲ 2 p−1 M SE
SSR
∼ χ2p−1 .
σ2
The two statistics are independent, hence
MSR
∼ Fp−1,n−p .
MSE H0
This is a test function for the null hypothesis
H0 : β1 = β2 = . . . = βp−1 = 0,
versus
H1 : ¬H0 .
We reject H0 at the 100α% level of significance if