Statistics For Business and Economics: Bab 16
Statistics For Business and Economics: Bab 16
Statistics For Business and Economics: Bab 16
JOHN S. LOUCKS
St. Edwards University
Chapter 16
Regression Analysis: Model Building
First-Order Model
Predictor Variable
y with
0 One
1x1
Second-Ordery Model
2 Predictor
0 1with
x1 One
x
2 1
Variable
x
0
1 1
2 2
3 1
4 2 5x1x2
Variables
with Interaction
3
Logarithmic Transformations
Most statistical packages provide the ability to
apply
logarithmic transformations using either the
base-10
(common log) or the base e = 2.71828...
(natural log).
Reciprocal Transformation
Use 1/y as the dependent variable instead of 4y.
F Test
To test whether the addition of x2 to a
model involving x1 (or the deletion of x2 from a
model involving x1and x2) is statistically
significant
(SSE(reduced)-SSE(full))/ number of extra terms
F
MSE(full)
(SSE(x1 )-SSE(x1 ,x2 ))/ 1
F
(SSE(x1 , x2 ))/ (n p 1)
Variable-Selection Procedures
Stepwise Regression
At each iteration, the first consideration is to
see whether the least significant variable
currently in the model can be removed
because its F value, FMIN, is less than the
user-specified or default F value, FREMOVE.
If no variable can be removed, the
procedure checks to see whether the most
significant variable not in the model can be
added because its F value, FMAX, is greater
than the user-specified or default F value,
FENTER.
If no variable can be removed and no
variable can be added, the procedure stops.
7
Variable-Selection Procedures
Forward Selection
This procedure is similar to stepwiseregression, but does not permit a variable to
be deleted.
This forward-selection procedure starts with
no independent variables.
It adds variables one at a time as long as a
significant reduction in the error sum of
squares (SSE) can be achieved.
Variable-Selection Procedures
Backward Elimination
This procedure begins with a model that
includes all the independent variables the
modeler wants considered.
It then attempts to delete one variable at a
time by determining whether the least
significant variable currently in the model
can be removed because its F value, FMIN,
is less than the user-specified or default F
value, FREMOVE.
Once a variable has been removed from the
model it cannot reenter at a subsequent
step.
9
Variable-Selection Procedures
Best-Subsets Regression
The three preceding procedures are onevariable-at-a-time methods offering no
guarantee that the best model for a given
number of variables will be found.
Some software packages include bestsubsets regression that enables the use to
find, given a specified number of
independent variables, the best regression
model.
Minitab output identifies the two best onevariable estimated regression equations, the
two best two-variable equation, and so on.
10
Sample Data
Drive
Fair
Green
Putt
Sand
Score
277.6 .681 .667 1.768 .550 69.10
259.6 .691 .665 1.810 .536 71.09
269.1 .657 .649 1.747 .472 70.12
267.0 .689 .673 1.763 .672 69.88
267.3 .581 .637 1.781 .521 70.71
255.6 .778 .674 1.791 .455 69.76
272.9 .615 .667 1.780 .476 70.19
265.4 .718 .699 1.790 .551 69.73
13
Green
Putt
1.803 .431
1.774 .493
1.809 .492
1.765 .599
1.784 .500
1.752 .603
1.813 .529
1.754 .576
Sand
69.97
70.33
70.32
70.09
70.46
69.49
69.88
70.27
14
16
C-p
.39685
.43183
.32872
.32891
.31318
.31957
.26913
.32011
.27499
X
X
X X
X
X
X
X
X
X
X
X
X
X
X X
X
X
X X
X X
X X X
17
Minitab Output
The regression equation
Score = 74.678 - .0398(Drive) - 6.686(Fair)
- 10.342(Green) + 9.858(Putt)
Predictor
Coef
Stdev
t-ratio
p
Constant74.678 6.952 10.74 .000
Drive -.0398 .01235 -3.22 .004
Fair -6.686 1.939 -3.45 .003
Green -10.342 3.561 -2.90 .009
Putt 9.858 3.180 3.10 .006
s = .2691
R-sq = 72.4%
R-sq(adj)
= 66.8%
18
Minitab Output
Analysis of Variance
SOURCE
F
P
Regression
13.10 .000
Error
07243
Total
DF
SS
MS
3.79469
.94867
20
1.44865
24
5.24334
19
End of Chapter 16
21