Multiple Regression and Model Building: Dr. Subhradev Sen Alliance School of Business

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 150

Multiple Regression

and
Model Building

Dr. Subhradev Sen


Alliance School of Business
Contents
11.1 Multiple Regression Models
Part I: First-Order Models with Quantitative
Independent Variables
11.2 Estimating and Making Inferences about the 
Parameters
11.3 Evaluating Overall Model Utility
11.4 Using the Model for Estimation and Prediction
Contents
Part II: Model Building in Multiple Regression
11.5 Interaction Models
11.6 Quadratic and Other Higher-Order
Models
11.7 Qualitative (Dummy) Variable Models
11.8 Models with Both Quantitative and
Qualitative Variables
11.9 Comparing Nested Models
Contents
11.10 Stepwise Regression
Part III: Multiple Regression Diagnostics
11.11 Residual Analysis: Checking the
Regression Assumptions
11.12 Some Pitfalls: Estimability,
Multicollinearity, and Extrapolation
Learning Objectives
• To introduce a multiple regression model
as a means of relating a dependent variable
y to two or more independent variables
• To present several different multiple
regression models involving both
quantitative and qualitative independent
variables
Learning Objectives
• To assess how well the multiple regression
model fits the sample data
• To show how an analysis of the model’s
residuals can aid in detecting violations of
model assumptions and in identifying
model modifications
11.1

Multiple Regression Models


The General Multiple
Regression Model
y   0   1 x1   2 x 2     k x k  
where
y is the dependent variable.
x1, x2, …, xk are the independent variables.
E(y) = 0 + 1x1 + 2x2 +…+ kxk is the deterministic
portion of the model.
i determines the contribution of the independent
variable xi.
Note: The symbols x1, x2, …, xk may represent higher-
order terms for quantitative predictors or terms that
represent qualitative predictors.
Analyzing a Multiple
Regression Model
Step 1 Hypothesize the deterministic component
of the model. This component relates the
mean, E(y), to the independent variables
x1, x2, … , xk. This involves the choice of
the independent variables to be included
in the model.
Step 2 Use the sample data to estimate the
unknown model parameters 0, 1, 2, …,
k in the model.
Analyzing a Multiple
Regression Model

Step 3 Specify the probability distribution of the


random error term, , and estimate the
standard deviation of this distribution,  .
Step 4 Check that the assumptions on  are
satisfied and make model modifications if
necessary.
Analyzing a Multiple
Regression Model

Step 5 Statistically evaluate the usefulness of the


model.
Step 6 When satisfied that the model is useful,
use it for prediction, estimation, and other
purposes.
Assumptions for Random
Error 
For any given set of values of x1, x2, … , xk, the
random error  has a probability distribution with
the following properties:
1. Mean equal to 0
2. Variance equal to 2
3. Normal distribution
4. Random errors are independent (in a
probabilistic sense).
Part I: First Order Models
with Quantitative
Independent Variables
11.2

Estimating and Making Inferences


about the  Parameter
First-Order Model in Five
Quantitative Independent
(Predictor) Variables
E( y)  0  1 x1  2 x2  3 x3   4 x4  5 x5

where x1, x2, … , x5 are all quantitative variables


that are not functions of other independent
variables.
Note: i represents the slope of the line relating y
to xi when all the other x’s are held fixed.
Estimator of σ2 for a Multiple
Regression Model with k
Independent Variables
SSE
s 
2

n  number of estimated  parameters


SSE
s 
2

n  (k  1)
Interpretation of Estimated
Coefficients
^
1. Slope (k)
^
• Estimated y changes by k for each 1 unit
increase in xk holding all other variables
constant

^
2. y-Intercept (0)
• Average value of y when xk = 0
Interpretation of Estimated
Coefficients
In first-order models, the relationship between
E(y) and one of the variables, holding the
others constant, will be a straight line and we
get parallel
straight lines as
the values of the
other variables
change.
A 100(1 – )% Confidence
Interval for a  Parameter

φi  t 2 sφ
i

where t/2 is based on n – (k + 1) degrees of


freedom and
n = Number of observations
k + 1 = Number of  parameters in the model
Test of an Individual Parameter Coefficient
in the Multiple Regression Model
One-Tailed Test
H0: i = 0
Ha: i < 0 (or Ha: i > 0)
φi
Test Statistic: t 
sφ
i

Rejection region: t < –t (or t > t when Ha: i > 0)


where t is based on n – (k + 1) degrees of freedom
n = Number of observations
k + 1 = Number of  parameters in the model
Test of an Individual Parameter Coefficient
in the Multiple Regression Model
Two-Tailed Test
H0: i = 0
Ha: i ≠ 0
φi
Test Statistic: t 
sφ
i
Rejection region: | t | > t
where t/2 based on n – (k + 1) degrees of freedom
n = Number of observations
k + 1 = Number of  parameters in the model
Conditions Required for Valid
Inferences about the  Parameters

For any given set of values of x1, x2, … , xk, the


random error  has a probability distribution with
the following properties:
1. Mean equal to 0
2. Variance equal to 2
3. Normal distribution
4. Random errors are independent (in a
probabilistic sense).
First–Order Multiple
Regression Model
Relationship between 1 dependent and 2 or
more independent variables is a linear function

Population Population Random


Y-intercept slopes error
y   0   1 x1   2 x 2     k x k  
Dependent Independent
(response) (explanatory)
variable variables
1st Order Model Example

You work in advertising for You’ve collected the


the New York Times. You following data:
want to find the effect of ad (y) (x1) (x2)
size (sq. in.) and newspaper Resp Size Circ
circulation (000) on the 1 1 2
number of ad responses 4 8 8
(00). Estimate the unknown 1 3 1
parameters. 3 5 7
2 6 4
4 10 6
Parameter Estimation
Computer Output
^0 Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 0.0640 0.2599 0.246 0.8214
ADSIZE 1 0.2049 0.0588 3.656 0.0399
CIRC 1 0.2805 0.0686 4.089 0.0264

^1 ^2

yˆ  .0640  .2049 x1  .2805 x2


Interpretation of Coefficients
Solution
^
1. Slope (1)
• Number of responses to ad is expected to
increase by 20.49 for each 1 sq. in. increase in ad
size holding circulation constant

^
2. Slope (2)
• Number of responses to ad is expected to increase
by 28.05 for each 1 unit (1,000) increase in
circulation holding ad size constant
Calculating s and s 2

Example
You work in advertising for the
New York Times. You want to
find the effect of ad size (sq.
in.), x1, and newspaper
circulation (000), x2, on the
number of ad responses (00), y.
Find SSE, s2, and s.
Analysis of Variance
Computer Output
Analysis of Variance

Source DF SS MS F P
Regression 2 9.249736 4.624868 55.44 .0043
Residual Error 3 .250264 .083421
Total 5 9.5

SSE S2
.250264
s 
2
 .083421
63
s  .083421  .2888
Working with SPSS

Data View
Variable view
Working with SPSS
Working with SPSS
SPSS Output
Evaluating Overall Model Utility
Use Caution When Conducting
t-tests on the  Parameters
It is dangerous to conduct t-tests on the
individual  parameters in a first-order linear
model for the purpose of determining which
independent variables are useful for predicting y
and which are not. If you fail to reject H0: i = 0,
several conclusions are possible:
1. There is no relationship between y and xi.
2. A straight-line relationship between y and x
exists (holding the other x’s in the model
fixed), but a Type II error occurred.
Use Caution When Conducting
t-tests on the  Parameters
3. A relationship between y and xi (holding the
other x’s in the model fixed) exists but is more
complex than a straight-line relationship (e.g.,
a curvilinear relationship may be appropriate).
The most you can say about a  parameter test
is that there is either sufficient (if you reject
H0: i = 0) or insufficient (if you do not reject
H0: i = 0) evidence of a linear (straight-line)
relationship between y and xi.
The Multiple Coefficient
of Determination, R 2

is defined as

2 SSE SS yy
 SSE Explained Variability
R = 1 = =
SS yy SS yy Total Variability
The Multiple Coefficient
of Determination, R 2

• Proportion of variation in y ‘explained’ by


all x variables taken together

• Never decreases when new x variable is


added to model
— Only y values determine SSyy
— Disadvantage when comparing models
The Adjusted Multiple
Coefficient of Determination
 n  1   SSE 
Ra2  1    
 n  (k  1)   SS yy 
 n 1 
 1  
 n  (k  1) 

1 R 2

Note : Ra  R
2 2

• Takes into account n and number of


parameters
• Similar interpretation to R2
Estimation of R and Ra 2 2

Example
You work in advertising for the
New York Times. You want to
find the effect of ad size (sq. in.),
x1, and newspaper circulation
(000), x2, on the number of ad
responses (00), y. Find R2 and
Ra2.
Excel Computer Output
Solution
R2

Ra2
Testing Global Usefulness of the Model:
The analysis of Variance F-Test
H0: 1 = 2 = … = k = 0
(All model terms are unimportant for predicting y)
Ha: At least one i ≠ 0
(At least one model term is useful for predicting y)

Test Statistic: F 
 
SS yy  SSE k
 
SSE  n  k  1 
2
R k Mean Square (Model)
 
   
1 R 2  n  k  1  Mean Square (Error)
Testing Global Usefulness of the Model:
The analysis of Variance F-Test

where n is the sample size and k is the number of terms in


the model.

Rejection region: F > F with k numerator degrees of


freedom and [n – (k + 1)] denominator degrees of freedom
Recommendation for Checking the
Utility of a Multiple Regression Model

1. First, conduct a test of overall model adequacy


using the F-test–that is, test
H0: 1 = 2 =…= k = 0
If the model is deemed adequate (that is, if you
reject H0), then proceed to step 2. Otherwise,
you should hypothesize and fit another model.
The new model may include more independent
variables or higher-order terms.
Recommendation for Checking the
Utility of a Multiple Regression Model

2. Conduct t-tests on those  parameters in which


you are particularly interested (that is, the “most
important” ’s). These usually involve only the
’s associated with higher-order terms (x2, x1x2,
etc.). However, it is a safe practice to limit the
number of ’s that are tested. Conducting a
series of t-tests leads to a high overall Type I
error rate .
Testing Overall
Significance Example
You work in advertising for the
New York Times. You want to
find the effect of ad size (sq. in.),
x1, and newspaper circulation
(000), x2, on the number of ad
responses (00), y. Conduct the
global F–test of model
usefulness. Use α = .05.
Testing Overall Significance
Solution
• H0: β1 = β2 = 0
• Ha: At least 1 not zero
•  = .05
• 1 = 2 2 = 3
• Critical Value(s):

 = .05

0 9.55 F
Testing Overall Significance
Computer Output
k
Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 9.2497 4.6249 55.440 0.0043
Error 3 0.2503 0.0834
C Total 5 9.5000 MS(Model)
n – (k + 1) MS(Error)
Testing Overall Significance
Solution
• H0: β1 = β2 = 0
Test Statistic:
• Ha: At least 1 not zero
•  = .05 4.6249
F  55.44
• 1 = 2 2 = 3 .0834
• Critical Value(s):
Decision:
Reject at  = .05
 = .05
Conclusion:
There is evidence at least 1
0 9.55 F of the coefficients is not zero
Testing Overall Significance
Computer Output Solution
Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 9.2497 4.6249 55.440 0.0043
Error 3 0.2503 0.0834
C Total 5 9.5000
MS(Model)
MS(Error)
P-Value
R-Code
# Multiple Linear Regression
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results

For Newspaper Data:


d<-read.csv(file.choose())
fit <- lm(d$res ~ d$size+d$cir, data=d)
summary(fit)
Using the Model for Estimation
and Prediction
Example Estimation and
Prediction
A collector of antique grandfather clocks sold at
auction knows that the price y received for the
clocks increases linearly with the age x1 of the
clocks and the number of bidders x2 and is
modeled with the first-order equation:
E(y) = 0 + 1x1 + 2x2
y = auction price of grandfather clock
x1 = age of clock
x2 = number of bidders
Example Estimation and
Prediction
Example Estimation and
Prediction
a. Estimate the average auction price for all 150-
year-old clocks sold at auctions with 10
bidders using a 95% confidence interval.
Interpret the result.
Here, the key words average and for all imply
we want to estimate the mean of y, E(y). We
want a 95% confidence interval for E(y) when x1
= 150 years and x2 = 10 bidders. A Minitab
printout for this analysis is shown . . .
Example Estimation and
Prediction
Example Estimation and
Prediction
Example Estimation and
Prediction
The confidence interval (highlighted under “95%
CI”) is (1,381.4, 1,481.9). Thus, we are 95%
confident that the mean auction price for all 150-
year-old clocks sold at an auction with 10
bidders lies between $1,381.40 and $1,481.90.
Example Estimation and
Prediction
b. Predict the auction price for a single 150-year old
clock sold at an auction with10 bidders using a
95% prediction interval. Interpret the result.
The key words predict and for a single imply that we
want a 95% prediction interval for y when x1 = 150
years and x2 = 10 bidders. This interval (highlighted
under “95% PI” on the Minitab printout) is (1,154.1,
1,709.3). We say, with 95% confidence, that the
auction price for a single 150-year-old clock sold at
an auction with 10 bidders falls between $1,154.10
and $1,709.30.
Example Estimation and
Prediction
c. Suppose you want to predict the auction price for
one clock that is 50 years old and has 2 bidders.
How should you proceed?
Now, we want to predict the auction price, y, for a
single (one) grandfather clock when x1 = 50 years
and x2 = 2 bidders. Consequently, we desire a 95%
prediction interval for y. However, before we form
this prediction interval, we should check to make
sure that the selected values of the independent
variables, x1 = 50 and x2 = 2, are both reasonable and
within their respective sample ranges.
Example Estimation and
Prediction
If you examine the sample data shown in Table 11.1
you will see that the range for age is 108 ≤ x1 ≤ 194,
and the range for number of bidders is 5 ≤ x2 ≤ 15.
Thus, both selected values fall well outside their
respective ranges. Recall the Caution warning about
the dangers of using the model to predict y for a
value of an independent variable that is not within
the range of the sample data. Doing so may lead to
an unreliable prediction.
Part II: Model Building in
Multiple Regression
Interaction Models
An Interaction Model Relating
E(y) to Two Quantitative
Independent Variables
E( y)  0  1 x1   2 x2  3 x1 x2

where
(1 + 3x2) represents the change in E(y) for
every 1-unit increase in x1, holding x2 fixed
(2 + 3x1) represents the change in E(y) for
every 1-unit increase in x2, holding x1 fixed
An Interaction Model Relating
E(y) to Two Quantitative
Independent Variables
A three-dimensional graph of an interaction
model in two quantitative x’s. If we slice the
twisted plane at a fixed
value of x2, we obtain a
straight line relating
E(y) to x1; however, the
slope of the line will
change as we change
the value of x2.
Interaction Model With
2 Independent Variables
• Hypothesizes interaction between pairs of x
variables
— Response to one x variable varies at different
levels of another x variable
• Contains two-way cross product terms
E ( y )   0   1 x1   2 x 2   3 x1 x 2
• Can be combined with other models
— Example: dummy-variable model
Effect of Interaction
Given:
E ( y )   0   1 x1   2 x 2   3 x1 x 2
• Without interaction term, effect of x1 on y is
measured by 1

• With interaction term, effect of x1 on y is


measured by 1 + 3x2
— Effect increases as x2 increases
Interaction Model Relationships
E(y) = 1 + 2x1 + 3x2 + 4x1x2
E(y)
E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1
12

8
E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1
4
0 x1
0 0.5 1 1.5
Effect (slope) of x1 on E(y) depends on x2 value
Interaction Example

You work in advertising for the


New York Times. You want to
find the effect of ad size (sq. in.),
x1, and newspaper circulation
(000), x2, on the number of ad
responses (00), y. Conduct a test
for interaction. Use α = .05.
Excel Computer Output
Solution
Global F–test indicates at least one parameter is not zero

F P-Value
Interaction Test
Solution
• H0: 3 = 0
• Ha: 3 ≠ 0
•   .05
• df  6 – 4 = 2
• Critical Value(s):
Reject H0 Reject H0
.025 .025

–4.303 0 4.303 t
Excel Computer Output
Solution

ˆ3
t
sˆ
3
Interaction Test
Solution
• H0: 3 = 0 Test Statistic:
• Ha: 3 ≠ 0 t = 1.8528
•   .05
• df  6 – 4 = 2
• Critical Value(s):
Decision:
Reject H0 Reject H0
Do no reject at  = .05
.025 .025
Conclusion:
There is no evidence of
–4.303 0 4.303 t interaction
Quadratic and Other
Higher-Order Models
A Quadratic (Second-Order) Model
in a Single Quantitative
Independent Variable

E ( y)   0  1x   2 x 2

where
0 is the y-intercept of the curve.
1 is a shift parameter.
2 is the rate of curvature.
Second-Order Model
Relationships
y 2 > 0 y 2 > 0

x1 x1

y 2 < 0 y 2 < 0

x1 x1
2 Order Model Example
nd

Errors (y) Weeks (x)


The data shows the number of 20 1
weeks employed and the number 18 1
of errors made per day for a 16 2
sample of assembly line 10 4
workers. Find a 2nd order model, 8 4
4 5
conduct the global F–test, and
3 6
test if β2 ≠ 0. Use α = .05 for all 1 8
tests. 2 10
1 11
0 12
1 12
Excel Computer Output
Solution

yˆ  23.728  4.784 x  .242 x 2


Overall Model Test Solution
Global F–test indicates at least one parameter is not zero

F P-Value
β2 Parameter Test Solution
β2 test indicates curvilinear relationship exists

t P-Value
A Complete Second-Order Model
with Two Quantitative Independent
Variables
E( y)  0  1 x1  2 x2  3 x2 x2  4 x12  5 x22
Comments on the Parameters
0 : y-intercept, the value of E(y) when x1 = x2 = 0
1: 2 changing 1 and 2 causes the surface to shift
along the x1- and x2-axes
3: controls the rotation of the surface
4: 5 signs and values of these parameters control
the type of surface and the rates of curvature
Second-Order Model
Relationships
y 4 + 5 > 0 y 4 + 5 < 0

x2 x2
x1 x1

y 32 > 4 4 5
E( y)  0  1 x1  2 x2
x2  3 x2 x2   4 x12  5 x22
x1
11.7

Qualitative (Dummy) Variable


Models
A Model Relating E(y) to a
Qualitative Independent Variable
with Two Levels

E y  0  1 x
where
1 if level A
x
0 if level B
Interpretation of ’s:
=B (mean for base level)
=A – B
Dummy-Variable Model

• Involves categorical x variable with 2 levels


— e.g., male-female; college-no college
• Variable levels coded 0 and 1
• Number of dummy variables is 1 less than
number of levels of variable
• May be combined with quantitative variable
(1st order or 2nd order model)
Interpreting Dummy-
Variable Model Equation
Given: yˆ i  ˆ 0  ˆ1 x 1 i  ˆ 2 x 2 i
y = Starting salary of college graduates
x1 = GPA
0 if Male
x2 =
1 if Female
Same slopes
Male ( x2 = 0 ):
yˆ i  ˆ 0  ˆ 1 x 1 i  ˆ 2 ( 0 )  ˆ 0  ˆ 1 x 1 i
Female ( x2 = 1 ):
 
yˆ i  ˆ 0  ˆ1 x 1 i  ˆ 2 (1)  ˆ 0  ˆ 2  ˆ1 x 1 i
Dummy-Variable Model
Example
Computer Output: yˆi  3  5 x1i  7 x2i
0 if Male
x2 =
1 if Female Same slopes
Male ( x2 = 0 ):
yˆ i  3  5 1 x 1 i  7 ( 0 )  3  5 x 1 i
Female ( x2 = 1 ):
yˆ i  3  5 x 1 i  7 (1)   3  7   5 x 1 i
Dummy-Variable Model
Relationships
y Same Slopes ^1
Female
^ + ^
0 2 Male
^
0

0 x1
0
A Model Relating E(y) to One
Qualitative Independent Variable
with k Levels
y   0
 1
x1
  2
x 2
 L   k 1
x k1
 
where xi is the dummy variable for level i + 1 and
1 if y is observed at level i + 1
xi  
0 otherwise
Then, for this system of coding BA
A =  C =  +  CA
B =  + D =  + DA


11.8

Models with Both Quantitative and


Qualitative Variables
Example
Substitute the appropriate values of the dummy
variables in the model to obtain the equations of
the three response lines in the figure.
E( y)  0  1 x1  2 x2  3 x3  4 x1 x2  5 x1 x3
Example
The complete model that characterizes the three
lines in the figure is
E( y)  0  1 x1  2 x2  3 x3  4 x1 x2  5 x1 x3

where
x1 = advertising expenditure
1 if radio medium
x2  
0 if not
1 if television medium
x3  
0 if not
Example
Examining the coding, you can see that x2 = x3 = 0
when the advertising medium is newspaper.
Substituting these values into the expression for
E(y), we obtain the newspaper medium line:
E( y)  0  1 x1  2 x2  3 x3   4 x1 x2  5 x1 x3
  
E( y)  0  1 x1  2 0  3 0   4 x1 0  5 x1 0 
E( y)  0  1 x1
Example
Similarly, we substitute the appropriate values of
x2 and x3 into the expression for E(y)to obtain the
radio medium line (x2 = 1, x3 = 0):

E( y)  0  1 x1  2 x2  3 x3  4 x1 x2  5 x1 x3
  
E( y)  0  1 x1  2 1  3 0   4 x1 1  5 x1 0 
 
E( y)  0   2  1  4 x1 
y-intercept Slope
Example

and the television medium line: (x2 = 0, x3 = 1):

E( y)  0  1 x1  2 x2  3 x3  4 x1 x2  5 x1 x3
  
E( y)  0  1 x1  2 0  3 1   4 x1 0  5 x1 1 
 
E( y)  0  3  1  5 x1 
y-intercept Slope
Example
Why bother fitting a model that combines all three
lines (model 3) into the same equation? The
answer is that you need to use this procedure if
you wish to use statistical tests to compare the
three media lines. We need to be able to express a
practical question about the lines in terms of a
hypothesis that a set of parameters in the model
equals 0. You could not do this if you were to
perform three separate regression analyses and fit
a line to each set of media data.
Comparing Nested Models
Nested Models
Two models are nested if one model contains all
the terms of the second model and at least one
additional term. The more complex of the two
models is called the complete (or full) model,
and the simpler of the two is called the reduced
model.
Comparing Nested Models
• Contains a subset of terms in the complete (full)
model
• Tests the contribution of a set of x variables to the
relationship with y
• Null hypothesis H0: g+1 = ... = k = 0
— Variables in set do not improve significantly
the model when all other variables are
included
• Used in selecting x variables or models
• Part of most computer programs
F-Test for Comparing
Nested Models
Reduced model:
 
E y  0  1 x1  L   g xg
Complete model:

E y  0  1 x1 L   g xg   g1 xg1  L   k xk

H 0 :  g1   g2  L   k  0

Ha: At least one of the b parameters under test is
nonzero.
F-Test for Comparing
Nested Models
Test statistic:

F
SSE R  k  g 
 SSEC
SSE  n  k  1
C


SSE  SSE  #  's tested in H
R C 0

MSEC
F-Test for Comparing
Nested Models
where
SSER = Sum of squared errors for the reduced
model
SSEC = Sum of squared errors for the complete
model
MSEC = Mean square error (s2)for the complete
model
k – g = Number of  parameters specified in H0
(i.e., number of  parameters tested)
F-Test for Comparing
Nested Models
where
k +1 = Number of  parameters in the complete
model (including 0)
n = total sample size

Rejection region: F > F


Where F is based on v1 = k – g numerator degrees
of freedom and v2 = n – (k + 1) denominator
degrees of freedom.
Parsimonious Models

A parsimonious model is a general linear


model with a small number of  parameters. In
situations where two competing models have
essentially the same predictive power (as
determined by an F-test), choose the more
parsimonious of the two.
Guidelines for Selecting
Preferred Model in a
Nested Model F-Test

Conclusion Preferred Model

Reject H0 Complete Model

Fail to reject H0 Reduced Model


Stepwise Regression
Stepwise Regression

The user first identifies the response, y, and the


set of potentially important independent
variables, x1, x2, … , xk, where k is generally
large. The response and independent variables
are then entered into the computer software,
and the stepwise procedure begins.
Stepwise Regression
Step 1 Software program fits all possible one-
variable models of the form
E(y) = 0 + 1x1
to the data, where xi is the ith
independent variable, i = 1, 2, … , k.
Test the null hypothesis H0: 1 = 0
against the alternative Ha: 1 ≠0. The
independent variable that produces the
largest (absolute) t-value is declared the
best one-variable predictor of y: x1
Stepwise Regression
Step 2 The stepwise program now begins to
search through the remaining (k – 1)
independent variables for the best two-
variable model of the form
E(y) = 0 + 1x1 + 2xi

Again the variable having the largest t


value is retained: x2
Stepwise Regression
Step 3 The stepwise procedure now checks for
a third independent variable to include
in the model with x1 and x2 –that is, we
seek the best model of the form
E(y) = 0 + 1x1 + 2x2 + 3xi

Again the variable having the largest t


value is retained: x3
Stepwise Regression
The result of the stepwise procedure is a model
containing only those terms with t-values that
are significant at the specified  level.
Thus, in most practical situations, only several
of the large number of independent variables
remain. We have very probably included some
unimportant independent variables in the
model (Type I errors) and eliminated some
important ones (Type II errors).
Stepwise Regression
There is a second reason why we might not
have arrived at a good model. When we choose
the variables to be included in the stepwise
regression, we may often omit higher-order
terms (to keep the number of variables
manageable). Consequently, we may have
initially omitted several important terms from
the model. Thus, we should recognize stepwise
regression for what it is: an objective variable
screening procedure.
Part III: Multiple Regression
Diagnostics
Residual Analysis: Checking the
Regression Assumptions
Regression Residual

A regression residual, φ, is defined as the


difference between an observed y value and its
corresponding predicted value:


φ y  yφ y  φ0  φ1 x1  φ2 x2 L  φk xk
 
Properties of
Regression Residual
1. The mean of the residuals is equal to 0. This
property follows from the fact that the sum of
the differences between the observed y values
and their least squares predicted yφ values is
equal to 0.

 Residuals  y  yφ 0
Properties of
Regression Residual
2. The standard deviation of the residuals is
equal to the standard deviation of the fitted
regression model, s. This property follows
from the fact that the sum of the squared
residuals is equal to SSE, which when
divided by the error degrees of freedom is
equal to the variance of the fitted regression
model, s2.
Properties of
Regression Residual
2. The square root of the variance is both the
standard deviation of the residuals and the
standard deviation of the regression model.

 Residuals   y  yφ  SSE


2 2

 Residuals
2
SSE
s 

n  k 1  
n  k 1 
Regression Outlier

A regression outlier is a residual that is larger


than 3s (in absolute value).
Residual Analysis

• Graphical analysis of residuals


— Plot estimated errors versus xi values
— Plot histogram or stem-&-leaf of residuals
• Purposes
— Examine functional form (linear v. non-linear
model)
— Evaluate violations of assumptions
Residual Plot
for Functional Form

Add x2 Term Correct Specification


^e ^
e

x x
Residual Plot
for Equal Variance

Unequal Variance Correct Specification


^
e ^
e

x x

Fan-shaped.
Standardized residuals used typically.
Residual Plot
for Independence

Not Independent Correct Specification


^
e ^
e

x x

Plots reflect sequence data were collected.


Residual Analysis
Computer Output
Dep Var Predict Student
Obs SALES Value Residual Residual -2-1-0 1 2
1 1.0000 0.6000 0.4000 1.044 | |** |
2 1.0000 1.3000 -0.3000 -0.592 | *| |
3 2.0000 2.0000 0 0.000 | | |
4 2.0000 2.7000 -0.7000 -1.382 | **| |
5 4.0000 3.4000 0.6000 1.567 | |*** |

Plot of standardized
(student) residuals
Steps in a
Residual Analysis
1. Check for a misspecified model by plotting
the residuals against each of the quantitative
independent variables. Analyze each plot,
looking for a curvilinear trend. This shape
signals the need for a quadratic term in the
model. Try a second-order term in the
variable against which the residuals are
plotted.
Steps in a
Residual Analysis
2. Examine the residual plots for outliers. Draw
lines on the residual plots at 2- and 3-
standard-deviation distances below and above
the 0 line. Examine residuals outside the 3-
standard-deviation lines as potential outliers
and check to see that no more than 5% of the
residuals exceed the 2-standard-deviation
lines. Determine whether each outlier can be
explained as an error in data collection or
transcription, corresponds ot a member of a
Steps in a
Residual Analysis
2. population different from that of the
remainder of the sample, or simply represents
an unusual observation. If the observation is
determined to be an error, fix it or remove it.
Even if you cannot determine the cause, you
may want to rerun the regression analysis
without the observation to determine its
effect on the analysis.
Steps in a
Residual Analysis
3. Check for nonnormal errors by plotting a
frequency distribution of the residuals, using a
stem-and-leaf display or a histogram. Check
to see if obvious departures from normality
exist. Extreme skewness of the frequency
distribution may be due to outliers or could
indicate the need for a transformation of the
dependent variable. (Normalizing
transformations are beyond the scope of this
book, but you can find information in the
references.)
Steps in a
Residual Analysis
4. Check for unequal error variances by plotting
the residuals against the predicted values, yφ.
If you detect a cone-shaped pattern or some
other pattern that indicates that the variance
of  is not constant, refit the model using an
appropriate variance-stabilizing
transformation on y, such as ln(y). (Consult
the references for other useful variance-
stabilizing transformations.)
Some Pitfalls: Estimability,
Multicollinearity, and
Extrapolation
Regression Pitfalls
• Parameter Estimability
— Number of levels of observed x–values must be
one more than order of the polynomial in x
• Multicollinearity
— Two or more x–variables in the model are
correlated
• Extrapolation
— Predicting y–values outside sampled range
• Correlated Errors
Multicollinearity

• High correlation between x variables


• Coefficients measure combined effect
• Leads to unstable coefficients depending on
x variables in model
• Always exists – matter of degree
• Example: using both age and height as
explanatory variables in same model
Detecting Multicollinearity
• Significant correlations between pairs of
independent variables
• Nonsignificant t–tests for all of the
individual  parameters when the F-test for
overall model adequacy is significant
• Sign opposite from what is expected in the
estimated  parameters
Using the Correlation
Coefficient r to Detect
Multicollinearity
• Extreme multicollinearity: | r | ≥ .8
• Moderate multicollinearity: .2 ≤ | r | < .8
• Low multicollinearity: | r | < .2
Solutions to Some Problems
Created by Multicollinearity in
Regression
1. Drop one or more of the correlated
independent variables from the model. One
way to decide which variables to keep in the
model is to employ stepwise regression.
Solutions to Some Problems
Created by Multicollinearity in
Regression
2. If you decide to keep all the independent
variables in the model,
a. Avoid making inferences about the individual
 parameters based on the t-tests.
b. Restrict inferences about E(y) and future y
values to values of the x’s that fall within the
range of the sample data.
Extrapolation

y Interpolation

Extrapolation Extrapolation

x
Sampled Range
Key Ideas
Multiple Regression Variables
y = Dependent variable (quantitative)
x1, x2,…, xk = Independent variables (quantitative or
qualitative)

First-Order Model in k Quantitative x’s



E y  0  1 x1   2 x2  L   k xk

Each i represents the change in y for every 1-
unit increase in xi, holding all other x’s fixed.
Key Ideas
Interaction Model in 2 Quantitative x’s

E y  0  1 x1  2 x2  3 x1 x2

(1 + 3x2) represents the change in y for every


1-unit increase in x1, for fixed value of x2
(2 + 3x1) represents the change in y for every
1-unit increase in x2, for fixed value of x1
Key Ideas
Quadratic Model in 1 Quantitative x

E y  0  1 x   2 x 2

2 represents the rate of curvature in y for x

2 > 0 implies upward curvature


2 < 0 implies downward curvature
Key Ideas
Complete Second-Order Model in 2
Quantitative x’s

E y  0  1 x1  2 x2  3 x1 x2   x   x
2
4 1
2
5 2

4 represents the rate of curvature in y for x1,


holding x2 fixed
5 represents the rate of curvature in y for x2,
holding x1 fixed
Key Ideas
Dummy Variable Model for k
Qualitative x

E y  0  1 x1  2 x2  L   k1 xk1
x1 = {1 if level 1, 0 if not}
x2 = {1 if level 1, 0 if not}
xk –1 = {1 if level 1, 0 if not}
0 = E(y) for level k (base level) = k
1 = 1 – k
2 = 2 – k
Key Ideas
Complete Second-Order Model in 1
Quantitative x and 1 Qualitative x
(Two Levels, A and B)


E y  0  1 x1   x  3 x2  4 x1 x2   x x
2
2 1
2
5 1 2

x2 = {1 if level A, 0 if level B}
Key Ideas
2
Adjusted Coefficient of Determination R a

Cannot be “forced” to 1 by adding independent


variables to the model.
Interaction between x1 and x2
Implies that the relationship between y and one
x depends on the other x.
Parsimonious Model
A model with a small number of  parameters.
Key Ideas
Recommendation for Assessing Model
Adequacy
1. Conduct global F-test; if significant then:
2. Conduct t-tests on only the most important
’s (interaction or squared terms)
3. Interpret value of 2s
2
4. Interpret value of Ra
Key Ideas
Recommendation for Testing Individual
’s
1. If curvature (x2) deemed important, do not
conduct test for first-order (x) term in the
model.
2. If interaction (x1 x2) deemed important, do
not conduct tests for first-order terms (x1 and
x2) in the model.
Key Ideas
Extrapolation
Occurs when you predict y for values of x’s that
are outside of range of sample data.

Nested Models
Are models where one model (the complete
model) contains all the terms of another model
(the reduced model) plus at least one additional
term.
Key Ideas
Multicollinearity
Occurs when two or more x’s are correlated.
Indicators of multicollinearity:
1. Highly correlated x’s
2. Significant, global F-test, but all t-tests
nonsignificant
3. Signs on ’s opposite from expected
Key Ideas
Problems with Using Stepwise
Regression Model as the “Final” Model
1. Extremely large number of t-tests inflate
overall probability of at least one Type I
error.
2. No higher-order terms (interactions or
squared terms) are included in the model.
Key Ideas
Analysis of Residuals
1. Detect misspecified model: plot residuals
vs. quantitative x (look for trends, e.g.,
curvilinear trend)
2. Detect nonconstant error variance: plot
residuals vs. yφ (look for patterns, e.g., cone
shape)
Key Ideas
Analysis of Residuals
3. Detect nonnormal errors: histogram, stem-
leaf, or normal probability plot of residuals
(look for strong departures from normality)
4. Identify outliers: residuals greater than 3s in
absolute value (investigate outliers before
deleting)

You might also like