STAT 252 2018 Fall HW4 Solutions PDF
STAT 252 2018 Fall HW4 Solutions PDF
STAT 252 2018 Fall HW4 Solutions PDF
STAT 252 Homework 4 Solutions (63 marks) – Due Wednesday, November 28 by 5pm
For questions that state “SHOW ALL STEPS”, write all the steps of a hypothesis test or confidence interval as
indicated below. For other questions that say do “NOT” show all steps, read the question carefully and
follow the exact instructions regarding what is required.
Whenever you are asked to “carry out the most appropriate test” and “SHOW ALL STEPS”:
i) Select the most appropriate hypothesis test and define the parameter(s) of interest.
ii) State clearly the null and alternative hypothesis in terms of the parameter(s).
iii) Calculate the test statistic, being sure to state its components.
iv) Calculate df. Determine the P-value for the test AND state the strength of the evidence against H0.
State whether P is less than or greater than alpha and, based on this comparison, decide whether to
reject or not reject H0. If the exact P-value is given in output, then report it as is. If not, then you must
estimate the P-value (within a range of values) using the appropriate statistical table.
v) Based on the research problem and referring to the significance level given, write a conclusion in
words.
Whenever you are asked to calculate a “confidence interval” and “SHOW ALL STEPS”:
Note: If you need to use the t-table or F-table and the degrees of freedom you need are NOT on the
table, round your degrees of freedom DOWN to the nearest one.
1. (Nine parts; 30 marks in total) A researcher wanted to determine the effect of driving experience and
driving violations (and the interaction of these two variables) on auto insurance premiums (the response
variable). He took a random sample of 17 drivers insured by a certain company and, for each driver, he
recorded driving experience (in years), the number of driving violations committed (within the last 3 years),
and the monthly premium paid (in dollars). He calculated the interaction term (driving experience x
violations). Based on the partial computer output from multiple linear regression analysis shown below,
answer parts (a) – (i) below. Assume that all the required assumptions for this model are satisfied.
Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .983a .966 .958 8.796
a. Predictors: (Constant), Interaction, Driving_Experience, Violations
b. Dependent Variable: Monthly_Premium
1
Stat 252 – Homework # 4 Solutions – Fall 2018
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 28715.964 3 9571.988 123.704 .000b
Residual 1005.919 13 77.378
Total 29721.882 16
a. Dependent Variable: Monthly_Premium
b. Predictors: (Constant), Interaction, Driving_Experience, Violations
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients 95.0% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 96.740 9.898 9.774 .000 75.357 118.124
Driving_Experience -1.441 .676 -.177 -2.131 .053 -2.901 .020
Violations 22.313 2.372 .955 9.408 .000 17.189 27.436
Interaction -.772 .244 -.229 -3.159 .008 -1.299 -.244
a. Dependent Variable: Monthly_Premium
(a) (5 marks) At the 5% significance level, perform a hypothesis test to determine whether the overall
multiple regression model is useful for making predictions about monthly premiums. Show ALL steps.
H0 : 1 2 3 0
Ha : at least one i 0, i 1, 2,3 (1 mark)
SSR / k MSR
F
SSE / (n (k 1)) MSE
28715.964 / 3 9571.98800
123.704 (1.5 marks)
1005.919 /17 (3 1) 77.37838
At the 5% significance level, the data provide sufficient evidence to conclude that the overall regression
model is useful for making predictions about monthly premiums. OR, at least one of variables, driving
experience, driving violations, or their interaction, has an effect on monthly premiums.
(1 mark)
2
Stat 252 – Homework # 4 Solutions – Fall 2018
(b) (2 marks) What percentage of variation in monthly premiums is explained by the regression model?
(Determine the unadjusted percentage.)
SS REGR 28715.964
R2 0.96616
SSTOTAL 29721.882
Therefore, 96.6% (unadjusted) of the variation in monthly premiums is explained by the regression model.
(c) (2 marks) What percentage of variation in monthly premiums is explained by the regression model?
(Determine the adjusted percentage.)
SSE 1005.919
MSE 77.37838
n (k 1) 17 (3 1)
Standard error of the model is: ˆ MSE 77.37838 8.796498 8.796
(e) (5 marks) Since it would be virtually impossible that longer years of driving experience would lead to a
driver having to higher monthly premiums, it is legitimate to perform a left-tailed test. Therefore, perform
the most appropriate test (at the 5% significance level) to determine whether there is a negative
relationship between years of driving experience and monthly premiums. Show ALL steps. Give both the
exact P-value from the computer output and the P-value from the appropriate statistical table.
Note: A one-tailed test requires a regression t-test, not a regression ANOVA F-test.
H0: 2 0 (There is no relationship between years of driving experience and monthly premiums.)
Ha: 2 0 (There is a negative relationship between years of driving experience and monthly premiums.)
(1 mark)
ˆ2 1.441
t 2.132 (1 mark)
ˆ
SE ( 2 ) 0.676
df n (k 1) 17 (3 1) 13 (0.5 marks)
From the computer output, the exact P-value = 0.053/2 = 0.0265 (0.5 marks)
From the t-table: 0.025 < P < 0.05 (0.5 marks)
There is strong evidence against H0.
Since P < α (0.05), reject H0 (1 mark)
Conclusion: At the 5% significance level, the data provide sufficient evidence to conclude that there is a
significant negative relationship between years of driving experience and monthly premiums.
(0.5 marks)
3
Stat 252 – Homework # 4 Solutions – Fall 2018
(f) (4 marks) Calculate a 95% confidence interval for the slope of the interaction term (representing
interaction between years of driving experience and the number of driving violations). SHOW ALL STEPS.
Based on this confidence interval, what conclusion can you make about whether the interaction between
these two predictor variables has a significant effect on monthly premiums? Explain your answer.
Conclusion: It is estimated with 95% confidence that the slope of the interaction term is between -1.299
and -0.245. (0.5 marks)
Since 0 is NOT inside this interval, we can say, with 95% confidence, that the interaction between years of
driving experience and the number of driving violations has a significant effect on monthly premiums.
(1 mark)
(g) (2 marks) Suppose that a driver who has 10 years of driving experience and has committed 3 driving
violations within the past 3 years has to pay a monthly premium of $100. What is the residual or error of
this observation?
(h) (4 marks) Based on the values of the predictor variables given in part (g) (a driver who has 10 years of
driving experience and has committed 3 violations within the past 3 years), what is the 95% prediction
interval for all single observation responses of monthly premiums at those values of the predictor
variables? SHOW ALL STEPS. [Note: SE(Fit) = 3.249]
yˆ p t /2 ˆ 2 [ SE ( Fit )]2
126.109 2.160 (8.796498) 2 (3.249) 2
126.109 2.160 9.3773
126.109 20.2550
(105.854,146.364) (2 marks)
It is estimated with 95% confidence that all single observation responses of monthly premiums at those
values of the predictor variables given in part (g) are between $105.854 and $146.364. (1 mark)
4
Stat 252 – Homework # 4 Solutions – Fall 2018
(i) (4 marks) Based on the values of the predictor variables given in part (g) (a driver who has 10 years of
driving experience and has committed 3 violations within the past 3 years), what is the 95% confidence
interval for mean monthly premium at those values of the predictor variables? SHOW ALL STEPS. [Note
again: SE(Fit) = 3.249]
2. (Eight parts; 33 marks in total) A study was conducted with the objective of increasing muzzle velocity of
mortar-like antipersonnel weaponry with grenade-type golf-ball-size ammunition. Some researchers
determined that the addition of an O-ring can reduce propellant gas escape in the muzzle and increase
muzzle velocity. Three explanatory variables were considered in the study: vent hole volume (in cubic
inches), the presence of an O-ring (with or without), and discharge hole area (in inches) which might affect
the pressure pulse of propellant gases. The researcher measured the muzzle velocity 8 times for each
combination of the four levels of discharge hole area (0.016, 0.03, 0.048, and 0.062) as well as with or
without an O-ring, for a total of 64 observations. In each trial, vent hole volume was also observed.
For parts (a) – (e): (No output is required for these parts) Consider the regression model below
(which will be referred to as the “original model” for parts (a) – (e)) for average muzzle velocity given the
variables, volume, ring, and area (where ring and area are treated as categorical variables and volume is a
numerical variable).
The indicator variables for ring and area are defined as follows:
Ring:
1, if an O-ring is present (with O-ring)
with
0, if an O-ring is not present (without O-ring)
Area:
1, if discharge hole area is 0.016
d1
0, otherwise
1, if discharge hole area is 0.030
d2
0, otherwise
1, if discharge hole area is 0.048
d3
0, otherwise
5
Stat 252 – Homework # 4 Solutions – Fall 2018
a) (6 marks) Referring to the original model, in terms of the regression coefficients, what is the effect of
volume on mean velocity? Find this effect in general, and then summarize the effect for all
combinations of levels of ring and area in the following chart.
(4 marks) Thus, for each combination of levels of ring and area, we have:
b) (2 marks) Modify the original model to specify that the effect of volume on the mean of velocity is the
same with and without an O-ring, provided the discharge hole area is the same; otherwise, the effect of
volume on the mean of velocity is possibly different with and without O-ring when the discharge hole
area is not the same. Just state the constraint(s) needed. You do not have to rewrite the model.
6
Stat 252 – Homework # 4 Solutions – Fall 2018
c) (3 marks) Referring to the original model, set up a test to explore whether or not the effect of volume is
any different for the different areas when an O-ring is not present. Write out the null and alternative
hypotheses in terms of the regression coefficients and identify the null distribution of the test statistic.
If the effect of volume on the mean velocity is the same for discharge hole area without using an O-ring,
then
1 7 1 8 1 9 1
7 8 9 0
(2 marks) Thus,
H 0 : 7 8 9 0
H a : at least one i 0, i 7,8,9
(1 mark) The null distribution of the test statistic is an Fdfdf((fr))df ( f ) Fnk( k 1) F643 (121) F513
distribution.
d) (4 marks) Referring to the original model, in terms of the regression coefficients, what is the effect of
ring (with O-ring vs. without O-ring) on the mean velocity? Find this effect in general, and then
summarize the effect for all levels of area in the following chart.
(2 marks) The effect of ring (with O-ring vs. without O-ring) on the mean velocity is (by definition):
(velocity | volume, ring with, area ) (velocity | volume, ring without , area)
{ 0 1volume 2 3 d1 4 d 2 5 d3 6 volume 7 (volume d1 ) 8 (volume d 2 )
9 (volume d3 ) 10 (volume d1 ) 11 (volume d 2 ) 12 (volume d3 )}
0 1volume 3 d1 4 d 2 5 d3 7 (volume d1 ) 8 (volume d 2 ) 9 (volume d 3 )
2 6 volume 10 (volume d1 ) 11 (volume d 2 ) 12 (volume d 3 )
Discharge Hole Area Effect of O-ring (with vs. without) on the mean velocity
0.016 2 ( 6 10 )volume
0.030 2 ( 6 11 )volume
0.048 2 ( 6 12 )volume
0.062 2 6 volume
7
Stat 252 – Homework # 4 Solutions – Fall 2018
e) (2 marks) Referring to the original model, consider a test to explore whether or not there is any O-ring
effect on mean velocity. Write out the reduced model, under the null hypothesis, for this test.
The reduced model under the null hypothesis would state that O-ring has no effect on mean velocity.
This would mean that
For parts (f) – (g): Refer to the table below for the group definitions of a k = 8-mean model.
Let i , i 1,2,,8 , correspond to the mean of velocity for groups 1 to 8, respectively.
Also, use the computer output in Tables 1 – 3 to answer parts (f) – (g):
ANOVA
Velocity
Total 51823.944 ?
8
Stat 252 – Homework # 4 Solutions – Fall 2018
Table 2:
Table 3:
Contrast Tests
Contras
t Value of Contrast Std. Error t df Sig. (2-tailed)
f) (5 marks) Carry out the most appropriate test to determine if there are any significant differences in the
mean velocity among the 8 different groups? SHOW ALL STEPS. Use a 5% significance level.
H 0 : 1 2 3 4 5 6 7 8 (One-mean model)
(1 mark)
H a : Not all population means are equal (8-mean model)
(1.5 marks) The P-value is: P < 0.001 OR P( F567 67.69) 0.001.
There is extremely strong evidence against H0. Since P < α (0.05), reject H0.
(0.5 marks) Conclusion: At the 5% significance level, the data provide sufficient evidence to conclude
that there is some difference in mean velocity among the 8 different groups (at least two means are
different.
9
Stat 252 – Homework # 4 Solutions – Fall 2018
g) (5 marks) What is the overall effect of ring (with O-ring vs. without O-ring) on the mean of velocity?
First, define a linear contrast that will define the overall effect of ring on mean velocity. Then, calculate
a 95% confidence interval for this effect. SHOW ALL STEPS. Based on this confidence interval, does it
appear that there is a difference in mean velocity with vs. without an O-ring?
1 2 3 4 5 6 7 8
(1 mark)
4 4
1
(0.5 marks) Estimate : ˆ (179.225) 44.806
4
1
(0.5 marks) S .E.( Estimate) S .E.(ˆ ) (9.889986) 2.473
4
(1 mark) For 95% confidence,
C.V . t56,0.025
*
t50,0.025
*
2.009 .
(1 mark) Conclusion: It is estimated with 95% confidence that the mean of velocity is between 39.839
units to 49.774 units larger when an O-ring is present. Since the confidence interval does not include
zero, we would reject H 0 : 0 in favour of H a : 0 at significant level 0.05. Thus, at 5%
significance, there is significant evidence of a difference in mean velocity with vs. without an O-ring.
h) (6 marks) For this part, consider the three models defined below and use the three tables of computer
output analyzing those three models. Carry out the most appropriate test to determine if there are any
differences in the mean velocity for the four different levels of the discharge hole area, after accounting
for the volume and whether or not an O-ring is included. SHOW ALL STEPS. Use a 1% significance level.
10
Stat 252 – Homework # 4 Solutions – Fall 2018
(1.5 marks) The P-value is: P < 0.001 OR P( F583 42.67) P( F503 42.669) 0.001
There is extremely strong evidence against H0. Since P < α (0.05), reject H0.
(0.5 marks) Conclusion: At the 5% significance level, the data provide sufficient evidence to conclude
that the mean velocity for the four different levels of the discharge hole area, after accounting for the
volume and whether or not an O-ring is included.
11