Regression Models
Regression Models
Regression Models
CHAPTER 4: REGRESSION
MODELS
2
CHAPTER 4: REGRESSION
MODELS
3
CHAPTER 4: REGRESSION
MODELS
= 0 if otherwise
This variable can then be factored into the regression equation to determine if
employees with a college education are more productive for the company that those
without one.
4.3 Discuss how the coefficient of determination and the coefficient of correlation are
related and how they are used in regression analysis.
Both the coefficient of determination and the coefficient of correlation measure the
strength of the linear relationship. The coefficient of determination is represented by
r2. R2 is the proportion of variability in Y that is explained by the regression equation.
R2 ranges from 0 to 1. The closer the coefficient is to 1 the stronger the correlation is.
The coefficient of correlation is represented by r and ranges from -1 to +1. It can be
found by taking the square root of the coefficient of determination. A positive r is
associated with a positive slope and a negative r has a negative slope.
4.4 Explain how a scatter diagram can be used to identify the type of regression to use.
A scatter diagram is a graph of the data. It can be used to find a relationship
between variables. From the created graph, different types of regressions can be
identified such as if it is linear, nonlinear, or no correlation. Specifics such as if it is
positive or negative can also be assessed using the scatter diagram.
4
CHAPTER 4: REGRESSION
MODELS
4.5
In a regression model when it is not clear which variable should be included or excluded
in the model r^2 should be used. It helps in determining whether the addition of more
variables to the regression model is useful or not. It gives a better picture over the normal
r^2 as it starts to fall when more than the required numbers of variables are added to the
model, thus helping in deciding how many variables should be used.
4.6
F test is used to determine a relationship between the dependent variable and
independent. A large F- test value shows that the regression model is not significant
enough while a low F- test value shows that the model is significant and there is a
relationship between the independent and dependent variables.
4.10
a)
Demand For Bass
Drums
Green Shades TV
Appearance
Column1
Column2
10
5
CHAPTER 4: REGRESSION
MODELS
b)
Y
3
6
7
5
10
8
Y=6.5
X
3
4
7
6
8
5
(Y-Y)^2
12.5
.25
.25
2.25
12.25
2.25
SST = 29.5
Y
4
5
8
7
9
6
(Y-Y)^2
1
1
1
4
1
4
SSE = 12.0
(Y-Y)^2
6.25
2.25
2.25
.25
6.25
.25
SSR = 17.5
c)
If the Green Shades performed on TV 6 times during the last month, using the regression
model in (1), the demand for bass drums can be calculated as Y = 1+6
4.12
Regression
Residuals
d.f
SS
MS
F Values
1
4
17.5
12.0
17.5
3.0
5.83
Significance
F
0.073
6
CHAPTER 4: REGRESSION
MODELS
Total
Intercept
Sales
29.5
Coefficient
1
1
Standard Error
2.385
0.414
t- Stat
0.419
2.415
p- Stat
0.697
0.073
4.13
A)
Regression model:
( x x)( y y )
( x x )^ 2
(Y)
(X)
( x x )( y y )
( x x )^ 2
93
98
236.444
285.23
7
CHAPTER 4: REGRESSION
MODELS
78
77
4.111
16.901
84
88
34.444
47.457
73
80
6.667
1.235
84
96
74.444
221.679
64
61
301.667
404.457
64
66
226.667
228.346
95
95
222.222
192.901
76
69
36.333
146.679
711
730
1143
1544.9
1143/1544.9=0.74
(711/9)-0.74(730/9) = 18.99
= 18.99 + 0.74x
B)
Y= 18.99+ 0.74* 83
= 80.41
8
CHAPTER 4: REGRESSION
MODELS
C)
( y y )^ 2
( y y )^ 2
196
156.135
9.252
25
25.977
36
0.676
25
121.345
225
221.396
225
124.994
256
105.592
80.291
SST = 998
SSR= 845.659
( y y )^ 2
( y y )^ 2
SSR/SST = 845.659/998
r^2 = 0.8473 = 0.85
0.8473
r=0.92048 = 0.92
9
CHAPTER 4: REGRESSION
MODELS
4.14
MSE= SSE/(n-k-1)
( y y )^ 2
( y y )^ 2
( y y )^ 2
196
156.135
2.2617
9.252
4.1689
25
25.977
0.0009
36
0.676
28.8106
25
121.345
36.1959
225
221.396
0.0143
225
124.994
14.5871
256
105.592
32.7596
80.291
35.5335
SST = 998
SSR= 845.659
SSE = 152.341
MSR=SSR/k
MSR= 845.659/1= 845.659
F= 845.659/21.76= 38.9
F is equal to 5.59. So 38.9 is > 5.59 and is significantly significant due to relationship
between the first test grade and final average from problems 4.13.
4.16
10
CHAPTER 4: REGRESSION
MODELS
Formula is
y 13,473 37.65 x
A)
x= 1,860
y 13,473 37.65(1,860)
= $83,502
B)
The selling price for the house of 1,860 square feet was $83,502. The selling price was
based on other homes sold within the neighborhood, not just the square footage. One can
purchase a home for either lower the selling price or above the selling price.
C) Other quantitative variables may be location and square footage of the house being
sold. The number of bedrooms, bathrooms, whether its one story or 2 story can also be
included in the model.
4.17
A)
Distance traveled= x2= 300 miles
Days out of town= x1= 5 days
11
CHAPTER 4: REGRESSION
MODELS
B)
The reimbursement request was for $685. Expected expenses was $452.50. The expected
expenses was less than the reimbursement request. A receipt of expenses for the 5 day trip
should be provided to justify the high expenses by Williams.
C)
Travel expenses can vary in its variables. This can include food, gas, rental vehicle and
hotel. With business trips, meetings/conferences or events can be included. Only 46% of
the cost is covered under the proposed model. It is not efficient in accounting for the
other percent due to other variables.
4.24
Use the data in Problem 4-22 and develop a regression model to predict
selling price based on the square footage, number of bedrooms, and age.
Use this to predict the selling price of a 10-year-old, 2,000-square-foot
house with 3 bedrooms.
12
CHAPTER 4: REGRESSION
MODELS
SUMMARY
OUTPUT
Regressio
n
13
CHAPTER 4: REGRESSION
MODELS
Statistics
Multiple R
0.94134
8
R Square
0.88613
7
Adjusted
R Square
0.85986
1
Standard
Error
13439.7
7
Observati
ons
17
ANOVA
df
Regressio
n
Residual
SS
MS
Significa
nce F
1.83E
+10
13
2.35E
1.81E
14
CHAPTER 4: REGRESSION
MODELS
+09
Total
16
+08
2.06E
+10
Coefficie Standa
nts
rd
Error
t Stat
Pvalue
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
13189 32478.
3.1
23
13189
3.1
Intercept
32478.2
3
Sq
Footage
Bedrooms
- 8826.0
- 0.8111 -21219.3 16915.
- 16915.
2151.74
87 0.2437
96
86 21219.
86
9
3
Age Years
- 327.19
- 0.0001 -2418.39
1711.54
08 5.2310
62
1004.6 2418.3 1004.6
1
8
9
8
15
CHAPTER 4: REGRESSION
MODELS
SUMMARY
OUTPUT
Regressio
n
Statistics
Multiple R
0.94107
2
R Square
0.88561
6
Adjusted
R Square
0.86927
6
Standard
Error
12980.4
5
Observati
ons
17
ANOVA
16
CHAPTER 4: REGRESSION
MODELS
df
Regressio
n
SS
MS
Significa
nce F
1.83E
+10
Residual
14
2.36E
+09
1.68E
+08
Total
16
2.06E
+10
Coefficie Standa
nts
rd
Error
t Stat
Pvalue
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
12072 38062.
1.4
11
12072
1.4
Intercept
38062.1
1
Sq
Footage
Age Years
- 315.99
1712.21
77 5.4184
9.06E- -2389.96
05
1034.4 2389.9 1034.4
17
CHAPTER 4: REGRESSION
MODELS
4.27
A sample of 20 automobiles was taken, and the miles per gallon (MPG),
horsepower, and the total weight were recorded. Develop a linear
regression model to predict MPG, using horsepower as the only independent
variable. Develop another model with weight as the independent variable.
Which of these two models is better? Explain.
18
CHAPTER 4: REGRESSION
MODELS
The
19
CHAPTER 4: REGRESSION
MODELS
Use the data in problem 4-27 to find the best quadratic regression model.
(There is more than one to consider.) How does this compare to the models
in 4-27?
a.
20
CHAPTER 4: REGRESSION
MODELS
SUMMAR
Y
OUTPUT
Regressi
on
Statistics
Multiple
R
0.89979
1
R Square
0.80962
4
Adjusted
R Square
0.78582
7
Standard
Error
3.96071
3
Observati
ons
19
21
CHAPTER 4: REGRESSION
MODELS
ANOVA
df
Regressio
n
SS
MS
Residual
16 250.99 15.687
6
25
Total
18 1318.4
21
Coefficie Stand
nts
ard
Error
Intercept
t Stat
Pvalue
Significa
nce F
1.73E06
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
6.68E06
67
- 0.2134
- 0.0156
0.57691
74 2.7024
92
8
4489
- 0.0038
- 0.0038
0.00062
3 0.0006
3
22
CHAPTER 4: REGRESSION
MODELS
Weight Quadratic
SUMMARY
OUTPUT
Regressio
n
Statistics
23
CHAPTER 4: REGRESSION
MODELS
Multiple R
0.88003
3
R Square
0.77445
8
Adjusted
R Square
0.74626
5
Standard
Error
4.31102
7
Observati
ons
19
ANOVA
df
Regressio
n
SS
MS
Residual
16 297.35 18.584
93
95
Total
18 1318.4
Significa
nce F
6.7E-06
24
CHAPTER 4: REGRESSION
MODELS
21
Coefficie Standa
nts
rd
Error
Intercept
1844
3400336
t Stat
Pvalue
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
3.02E05
- 0.0101
- 0.0082 -0.05231
0.03071
87 3.0148
2
0.0091 0.0523 0.0091
8
2
1
2
3.45E06
-1.4E-07
7.05E06
-1.4E07
7.05E06
25
CHAPTER 4: REGRESSION
MODELS
Y = 84.207-.0307x+0.00000345x2
Case Study
-The first concern is the maintenance cost. As the age of the aircraft increases, so does the
cost of maintenance.
-Maintenance should be looked at for both airlines as maintenance costs vary greatly.
-The data that was provided does not provide sufficient results. Maintenance cost seems
to be based on the airline rather than the age of the aircraft.
26
CHAPTER 4: REGRESSION
MODELS
-Northern Airline appears more efficient as the cost of maintenance has little variation
from year to year.
-Southeast Airline appears to have a steady increase for engine and air frame maintenance
cost.
Based on the overall data, Southeast Airline seems more efficient at doing repairs that
requires abrupt attention. Northern Airline however with its high costs is better suited for
preventative maintenance.