Chap 13
Chap 13
Chap 13
the
rifi-
the
eral
USINGSTATISTICS
@ SunflowersApparel
you
n t h i s c h a p t e ra n d t h e n e x t two chapters,you learnhow regressionanalysisenables
f
Idevelop a model to predict the values of a numerical variable, based on the valueof
variables.
In regressionanalysis,the variableyou wish to predict is called the dependent
The variablesused to make the prediction are called independent variables. In
predicting values of the dependentvariable, regressionanalysis also allows you to identiff
tvoe of mathematical relationshio that exists between a deoendent and an indeoendent
able, to quantify the effect that changes in the independent variable have on the
variable, and to identify unusual observations. For example, as the director of planning,
may wish to predictsalesfor a Sunflowersstore,basedon the sizeof the store.Other
ples include predicting the monthly rent of an apartment, based on its size, and predictr
monthly salesof a product in a supermarket,based on the amount of shelf spacedevoted
product.
simple linear regression,in which a singlenumericali
This chapterdiscusses
variable,X, is used to predict the numerical dependentvariable )', such as using the size
storeto predictthe annualsalesof the store.Chapters14 and l5 discussmultiple
models, which use several independentvariables to predict a numerical dependentvari
price,andthe
For example,you could usethe amountof advertisingexpenditures,
shelfspacedevotedto a productto predictits monthlysales.
13.1 MODELS
TYPESOF REGRESSION
ln Section2.5,youuseda scatterplot (alsoknownasa scatterdiagram)to examinethe
tionship between an X variable on the horizontal axis and a I variable on the verticalaxis.
nature of the relationship between two variables can take many forms, ranging from si
extremelycomplicated functions.
mathematical Thesimplestrelationship
consists
of a
is shownin Figure13.1.
line,or linear relationship.An exampleof thisrelationship
I 3. I : Typesof Rcgression
Models 5I3
13.1
FIGURE
straig ht-line
A positive
relationship
LY = "change in Y"
for A X = " c h a n g ei n X "
has
Iers
rfa
arel
ratic
rlec-
sig-
ship E q u a t i o n( 1 3 . l ) r e p r e s e n ttsh e s t r a i g h t - l i n e
( l i n e a r )m o d e l .
you
'that
L I N E A RR E G R E S S I OMNO D E L
SIMPLE
)i: Fo+ B,{ + e, (13.1)
wnere
FIGURE 13.2
Examples of types
of relationshipsfound
in scatterolots
PanelA PanelB
PanelD
U-shapedcurvilinearrelationship
PanelF
No relationship between X and Y
FigureI 3.3 displaysthe scatterplot for the datain TableI 3.I . Observethe increasingrela-
tionshipbetweensquarefeet (,{) andannualsales(Y).As the sizeof the storeincreases, annual
salesincreaseapproximatelyas a straightline. Thus,you can assumethat a straightline pro-
videsa usefulmathematical modelof this relationshio.
Now vou needto determinethe soecific
straightline that is the bestfit to thesedata.
Excelscatter
fortheSunflowers
data
E2.12to create
beyond a
decreases
34567
;s further. Squde Fest (000)
le and its
vever,the
The Least-SquaresMethod
l between
In the precedingsection,a statisticalmodel is hypothesizedtorepresentthe relationship
ween two betweentwo variables,squarefootageandsales,in the entirepopulationof SunflowersApparel
aying the stores.Howeveqas shownin Table13.1,the dataare from only a randomsampleof stores.If
,ailable certainassumptions arevalid (seeSection13.4),you canusethe sampleXintercept,bo,andthe
discusses sampleslope,b,, as estimates populationparameters,
of the respective Boand B,. Equation
( 13.2)usestheseestimatesto form the simplelinear regression
equation.This straightlineis
oftenreferredto asthe prediction line.
where
I; : predictedvalue of I for observationi
X,: valueofXfor observationi
bo: samplelintercept
b, : sampleslope
- f)'
\{r,
j=l
BecauseYi = bo + \Xi,
FIGURE13.4
MicrosoftExcelresults
for the Sunflowers
Appareldata
t2 ssE-1145;r 0J339
t, = 0.9645+ 1.6699Xi
The slope,b,, is +1.6699.This meansthat for eachincreaseof I unit in X, the meanvalueof I
is estimatedto increaseby | .6699units. In otherwords,for eachincreaseof I .0 thousandsquare
feet in the size of the store,the meanannualsalesare estimatedto increaseby | .6699millions
of dollars.Thus, the sloperepresentsthe portion of the annualsalesthat are estimatedto vary
accordingto the sizeof the store.
The )zintercept,bo, is +0.9645.The f interceptrepresentsthe mean value of Y whenX
equals0. Becausethe squarefootageofthe storecannotbe 0, this Iintercept hasno practical
interpretation.Also, the Iintercept for this exampleis outsidethe rangeof the observedvalues
of the X variable,and thereforeinterpretationsof the value of bo should be made cautiously.
Figure 13.5displaysthe actualobservationsand the prediction line. To illustratea situationin
which thereis a direct interpretationfor the I/ intercept,bo,seeExample I 3.I .
Excelscatter
tand predictionline
SunflowersApparel y = r.0599t o.96,fs
'ii=35.0+3Xi
VISUAL EXPLORATIONSExploringSimpleLinearRegression
Coefficients
U s e t h e V i s r - r aEl , x p l o r a t i o n sS i m p l e L i n e a r R e g r e s s i o n w o r k b o o k a n d s e l e c tV i s u a l E r p l o r a t i o n s ) S i m p l e
procedureto producc a predictionline that is as close as Linear Regression with your worksheet data
possibleto the predictionline defined by the least-sqLrares (91-2003) or Add-ins ) Visual Explorations )
solution. Open the fiffiffi add-in work- S i m p l e L i n e a r R e g r e s s i o nl v i t h 1 ' o u r u o r k s h e e t d a t a
b o o k a n d s e l e c tV i s u a l E x p l o r a t i o n s 9 S i m p l e L i n e a r ( 2 0 0 1 ) . I n t h e p l o c e d u r c ' sd i a l o g b o x ( s h o w n b e l o l v ) ,
Regression (E,xcel 91-2003) or Add-ins ) Visual e n t e r y o u r I v a r i a b l ec e l l r a n g ea s t h e Y V a r i a b l e C e l l
Erplorations ) Simple Linear Regression(Exccl 2001). R a n g e a n d y o u r X v a r i a b l cc c l l r a n g c a s t h c X \ h r i a b l e
( S c c S e c t i o rE
r l . 6 t o l e a r na b o u tu s i n ga d d - i n s . ) C e l l R a n g e .C l l i c k F i r s t c e l l s i n b o t h r a n g e s c o n t a i n a
When a scatterplot of the SunflowersApparcl data of l a b e l . c n t e r a t i t l c a s t h c T i t l e . a r r dc l i c k O K . W h e n t h e
T a b l e 1 3 . 1 o n p a g e 5 1 5 w i t h a n i n i t i a l p r e d i c t i o nl i n e s c a t t e rp l o t u ' i t h a n i n i t i a l p r e d i c t i o n l i n e a p p e a r s u
. se
a p p e a r s( s h o w n b e l o l v ) , c l i c k t h e s p i n n e r b r - r t t o n st o t h e i n s t r u c t i o n si n t h e f i r s t p a r t o f t h i s s e c t i o nt o t r y t 0
c l r a n g et h e v a l u e sf o r b , , t h e s l o p eo f t h e p r e d i c t i o nl i n e . p r o d u c ct h e p r c d i c t i o nl i n c d c f i n c c lb y t h e l c a s t - s q u a r c s
and b,,.the f interccptof thc predictionlirre. nrethoci.
Try to producea prcdictionlinc that is ascloseas possible
to the prcdictionline dcfinedby the least-squares estimates.
using the chart display and thc Differencc fi'om Targct SSE
valueas f-eedback (scepage525 fbr an cxplanationof SSE). Data
-*-;
C'lickFinish whenyou aredonewith this exploration. !VariableCellRarq", i-""-'-
-- -*-l]
At any time. click Reset to resetthc b, and ir,,values. X Variable
cell Range: i-.
Help for rrore inforn.ration, or Solution to revealthe pre- v flrst cellsin bothrangescontaina label
- t,)'= (bo+b,x,)12
Ittl Itt'-
i=l i=l
FORMULAFORTHESLOPE,b1
COMPUTATIONAL
,'
A=-
,ssxr (13.3)
ssx
where
n
ssx:I(x, - x)',
J_ I
520. CHAPTERTHIRTEENSimpleLinearRegression
bO
FORMULAFORTHE Y INTERCEPT,
COMPUTATIONAL
bo=Y -btX (13.4)
where
n
Sv.
LJ't
v - i- =- l
t
n
n
Sr. I
^Lr"
v- i=l
of the productof X and )2.For the SunflowersApparel data,the numberof squarefeet is usedto
predictthe annualsalesin a store.Table 13.2presentsthe computationsof the varioussums
T A B L E1 3 . 2 Square Annual
Feet(X) Sales(Y) y2
Computations for the
SunflowersApparel I 1.7 3.7 2.89 13.69 6.29
Data 2.s6 15.21 6.24
2 1.6 3.9
3 2.8 6.7 7.84 44.89 18.76
4 5.6 9.5 31.36 90.25 s3.20
5 1.3 3.4 r.69 I 1.56 4.42
6 2.2 5.6 4.84 31.36 12.32
7 1.3 3.7 1.69 13.69 4.81
8 l.l 2.7 l.2r 7.29 2.97
9 3.2 5.5 10.24 30.25 r7.60
l0 1.5 2.9 2.25 8.41 4.35
ll 5.2 10.7 27.04 114.49 ss.64
'7.6 21.16 57.76 34.96
t2 4.6
l3 5.8 I 1.8 33.64 139.24 68.44
t4 3.0 4.1 9.00 16.81 12.30
Totals 40.9 81.8 r57.41 s94.90 302.30
rI
usingEquations
(r3.3)and(13.4),youcancompute
thevaluesof boand,br:
, .Ssrry
D1=- ' ,ssr
= 157.41-@o'D2
t4
= 157.41
- 119.48642
= 37.92358
so that
, 63.3271s
, r' = -
37.923s8
= 1.6699
and
bo=F-brX
!r'
t =d-= ttf =5.842857
n14
n
)x, '
N =E =09?=2.e2t43
n14
bo = 5.842857
- (r.6699)(2.92143)
= 0.9645
522 CHAPTERTHIRTEENSimpleLinearRegression
d thenumberof orderswill help in the planningof the order- 1 950 850 t4 1,800 t,369
)f fulfillmentprocess.A sampleof 25 mail shipmentsis 2 1,600 I 45n l5 1,400 t , t 15
re selected that range from 200 to 700 pounds.The results J 1,200 1,085 l6 1,450 t,225
n (storedin the file @[@) are as follows: A
1,500 I t1') II 1,100 1,245
950 718 18 l,700 1,259
Weight Weight 6 l,700 I,485 19 t,200 I,150
ofMail Orders of Mail Orders 7 I,650 1,136 20 1,150 896
(Pounds) (Thousands) (Pounds) (Thousands) 8 93s 726 21 1,600 1,361
o 875 700 22 1,650 1,040
216 6.1 432 13.6 l0 1,150 956 ZJ t,200 755
283 9.1 409 t2.8 11 1,400 1,100 z+ 800 1,000
237 7.2 553 16.5 t2 1,650 t,285 25 l,750 1.200
2,300 1,985
203 7.5 572 t7.l
IJ
FIGURE
13.6
M e a s u r e so f v a r i a t i o n E r r o rs u m
of squares
,t',"^-?t'=ssr
Yi= bo+ btXi
,2,(r,-D2=ssr
Regressionsum
of squares
n^
v',)',--SSR
,Zr(V,-
ij
,s,sz:ssR+.lsE (13.s)
1
:l
ERRORSUM OF SOUARES
Theerrorsumof squares
(55O
(SSU)is equalto thesumof thesquared
valueof Iand thepredicted
observed valueof ).
differences
between
the
il
rl
Il
Il
= Unexplainedvariationor errorsumof squares
^S,SE
n
= \{r, _ y,),
i=l
(13.8)
',3.7
Excelsum
for the 11 i r|f SS frlS F Sign'ricanceF
rsAppareldata 12_ jRegresion | 105.7{76 105.7176 113.2335 0.fin0
l3lResldual 12 111067 0.934|
il'ltotal t3 116.95{3
COEFFICIENTOF DETERMINATION
The coefficientof determinationis equalto the regressionsum of squares(thatis,
explainedvariation)dividedby the total sumofsquares(thatis, total variation).
Regression
sum of squares ,ssR
,2= (13.e)
Totalsumofsquares ,s,sz
The coefficient of determination measuresthe proportion of variation in Ithat is explained
by the independentvariable X in the regressionmodel. For the Sunflowers Apparel data,with
, S , S:R 1 0 5 . 7 4 7 6 S
. S E : 1 1 . 2 0 6 7a. n d , S S I : 1 1 6 . 9 5 4 3 .
) t05.7476 ^.^.-
t'- = = 0.9042
116.9543
F I G U R E1 3 . 8
PartialMicrosoftExcel
regression resultsfor the
Sunflowers Appareldata
4. iklultipleR
5 tRSquare
6 ";Adjuered R Square 0.852
7 :Standard Error svx-0.96$4
See SectionE13.1to create
the worksheet that contains
this area.
E X A M P L E1 3 . 4 COMPUTING
THECOEFFICIENT
OF DETERMINAT]ON
12,for the Sunflowers
Computethecoefficientof determination, Appareldata.
SOLUTION YoucancomputeS,Sl.SSR, andSSE,thataredefinedin Equations (13.6),(13.7),
a n d( 1 3 . 8o) n p a g e s5 2 4 - 5 2 5b,y u s i n gE q u a t i o n( 1
s 3 . 1 0 )( ,1 3 . 1l ) , a n d( 1 3 . 1 2 ) .
n
lIv, ' lI
l.Lt
ss?"= )tr, - y), = \i=l )
(13.10)
n
13.3:Measures
ofVariation 527
FORMULA
COMPUTATIONAL FORSsR
n
Yv
/d'i
COMPUTATIONALFORMUT.AFOR 558
(n )2
llnI
=fd,-v)'=fr,'+ n
ssz
i=r 7-r'
(81'S)2
= 594.9-
t4
= 594,9- 477.94571
= 116.95429
3 ^ -.
^ S S RL=V i - Y \ "
i=l
I 'r2
t+l
n n
ll L)t v , lI
+b,\XiYi-*+
= uoZY,
i=l i=l
(sl'8)2
= (0.s64478X81.8)
+ (1.66e86)(
302.3)-
t4
= 105.74726
3 ^a
SSE=/(Yi-Yi)"
=fr? -b,ir,-u,fx,Y,
i=t i=l i=l
- (1.66986X302.3)
= 594.9- (0.e64478)(81.8)
= 11.2067
Therefore.
,z -105.74726=0.9042
116.95429
528 CHAPTERTHIRTEENSimple Linear Regression
,sst 2rt,-+f
l=l
SYX= (13.13)
n-2 n-2
where
Y,: actualvalue of Y for a givenX,
: predictedvalue of I for a givenX,
i
^SSZ':error sum of squares
: I1.2067.Thus,
FromEquation(I3.8) andFigureI3.4 on page5l6,,S,SE
cY- X
O _ = 0.9664
The third assumption, normality, requires that the errors (e,) are normally
each value of X. Like the I test and the ANOVA F' test, regressionanalysisis fairly
againstdeparturesfrom the normality assumption.As long as the distribution of the enon
eachlevel ofXis not extremelydifferent from a normal distribution,inferencesaboutpo
are not seriouslvaffected.
The fourth assumption,equal variance or homoscedasticity,requiresthat the variance
the errors (e,) are constantfor all valuesof X. In other words,the variability of )'valuesis
samewhen X is a low value as when X is a high value.The equal varianceassumptic
important when making inferencesabout po and B,. If there are seriousdeparturesfrom
assumption,you can use either data transformationsor weighted least-squaresmethods
reference4).
13.5 RESIDUALANALYSIS
In Section13.1,regressionanalysiswas introduced.In Sections13.2and 13.3,a
model was developedusing the least-squares approachfor the SunflowersApparel data.Is
the correctmodel for thesedata?Are the assumptionsintroducedin Section13.4valid?In
section,a graphicalapproachcalled residual analysis is usedto evaluatethe assumptions
determinewhetherthe regressionmodel selectedis an appropriatemodel.
The residual or estimatederror value,e,, is the differencebetweenthe observed(I)
predicted (I,) valuesof the dependentvariablefor a given value ofX,. Graphically,a resi
appearson a scatterplot as the vertical distancebetweenan observedvalue of )zandthe
dictionline. Equation(13.14)definesthe residual.
RESIDUAL
The residual is equal to the difference betweenthe observedvalue of /and the predicted1:
valueot'I.
ei=Yi-Yi (13.14)
Evaluatingthe Assumptions
Recall from Section 13.4that the four assumptionsof regression(known by the
normality,and equalvariance.
LINE) are linearity,independence,
Linearity To evaluatelinearity,you plot the residualson the vertical axis againstthe cone-
spondingX, values of the independentvariable on the horizontal axis. If the linear modelis
appropriatefor the data,thereis no apparentpatternin this plot. However,if the linearmodelis
not appropriate,there is a relationshipbetweenthe X, valuesand the residuals,e,.You cansee
sucha patternin Figure 13.9.PanelA showsa situationin which, althoughthereis an increas-
ing trend in I as X increases,the relationshipseemscurvilinearbecausethe upwardtrend
decreasesfor increasingvalues of X. This quadratic effect is highlighted in Panel B, where
there is a clear relationshipbetweenX,and e,. By plotting the residuals,the linear trendof.f,
with I has beenremoved,therebyexposingthe lack of fit in the simple linear model.Thus,a
quadraticmodel is a better fit and should be used in place of the simple linear model.(See
Sectionl5.l for furtherdiscussionof fitting quadraticmodels.)
To determinewhetherthe simple linear regressionmodel is appropriate,returnto the eval-
uation ofthe SunflowersApparel data.Figure 13.10providesthe predictedand residualvalues
of the responsevariable(annualsales)computedby Microsoft Excel.
1 3 . 5 :R e s i d u A
a ln a l y s i s 5 3 1
FIGURE13.9
Studying the
appropnateness
of the simplelinear
regressionmodel
a
o oo
al a
a ' l o
aa 1o'
oa
l
a
a
F I G U R E1 3 . 1 0
MicrosoftExcel
residual statistics for the
Sunflowers Appareldata Obseruation Predicted Anmral Sates Fesidaals
1 3.803239598{.103239598
2 3.636253367 0.263746633
3 5.640088147 1.05991 1853
1 10.31570263.0.815702635
5 3.135294672 0.2647053?8
SeeSectionE13.3to create 6 d.638170757 0.961829243
the worksheetthat contains 7 3.1352916720.564705328
thisarea. I 2.801322208 s.101322208
I 6.3{n033074 .o.8r,8033071
10 3.469267135.0.569267135
11 9.64n57708 1.052242n2
12 8.645840318 -1.045840318
13 10.6{96751 1.150324S2
11 5.97106061'l-1.874060611
variable(storesize,in
To assessIinearity,the residualsareplottedagainstthe independent
thousands of squarefeet)in Figure13.11.Althoughthereis widespread scatterin the residual
plot, thereis no apparentpatternor relationshipbetweenthe residualsandXi. The residuals
appearto be evenlyspreadaboveand below 0 for the differingvaluesofX. You can conclude
thatthe linearmodelis appropriatefor the SunflowersAppareldata.
Micosoft Excelplot of
residuals againstthe
square footageof a
storefor the Sunflowers
Apparel data
SeeSectionE2.12 to create
this.
Square F6et
532 CHAPTERTHIRTEEN SimpleLinearRegression
Normality You can evaluatethe assumptionof normality in the errorsby tallying the
uals into a frequencydistribution and displayingthe resultsin a histogram(see Section
For the SunflowersApparel data,the residualshavebeentallied into a frequencydistribution
Table 13.3. (There are an insufficient number of values.however.to constructa hi
You can also evaluatethe normality assumptionby comparingthe actualversustheoretical
ues of the residualsor by constructinga normal probability plot of the residuals(seeSecti
6.3).Figure13.12is a normalprobabilityplot of the residualsfor the SunflowerApparel
MicrosoftExcelnormar
probabilityplot of
the residuals
for the
Sunflowers Appareldata
ll
! -o.s
o
See Section E6.2 to create E
this. -1
.1.5
-2
-2.5
0
ZValw
Equal Variance You can evaluatethe assumptionof equal variance from a plot of the
residualswith X,. For the SunflowersApparel data of Figure I 3. I I on page 53I , there do not
appearto be major differencesin the variability of the residualsfor differentX, values.Thus,
you can concludethat thereis no apparentviolation in the assumptionofequal varianceat each
level ofX.
To examine a casein which the equal variance assumptionis violated, observeFigure
13.13,which is a plot ofthe residualswithX, for a hypotheticalsetof data.In this plot, the vari-
ability of the residualsincreasesdramaticallyasXincreases,demonstratingthe lack of homo-
geneityin the variancesof Y,at eachlevel ofX. For thesedata,the equalvarianceassumption
is invalid.
3.13
equal
a
a
a
..;j:'iii
.. . ! !l].
. tl
a a
a
a
aa
l1 a
a
a
.l'. !;33:
a aa a ooo !orr
3.f ' I
a
a
,:';i:l: t
t ta:::
the Basics
resultsbelow provide the Xvalues, residuals, 13.24 The resultsbelow showtheXvalues, residuals,and
plot from a regressionanalysis: a residualplot from a regressionanalysis:
2.u
1.5
t.0
t0: -"-{.0a
!! o.t
*ii;
l:r
-iti**-:ird I o.o
g2
-0.5
-,iit.---.,3.2r.1.0
!rt
"!,1._,*: -1.5
13.5 MEASURINGAUTOCORRELATION:
TH E DU RBIN.WATSONSTATISTIC
One of the basic assumptionsof the regressionmodel is the independenceof the errors.This
assumptionis sometimesviolatedwhen dataarecollectedover sequentialtime periodsbecausc
a residualat any one time period may tend to be similar to residualsat adjacenttime peri
This patternin the residualsis called autocorrelation. When a setof datahas substantiala
correlation,the validity of a regressionmodel can be in seriousdoubt.
T A B L E1 3 . 4 Sales Sales
Customers and (Thousands (Thousands
Salesfor a Periodof Customers of Dollars) Customers of Dollars)
I
15Consecutive Weeks o
r 794 9.33 880 t2.07
I
199 8.26 10 905 t2.55
831 7.48 lt 886 11.92
855 9.08 t2 843 10.27
la
845 9.83 IJ 904 I 1.80
844 10.09 t4 950 t2.15
863 11.01 l5 841 9.64
875 11.49
FIGURE'13.14
Microsoft
Excelresults
forthepackagedelivery
storedataof Table13.4
-
t\v t
l-la
-
SeeSectionE13.1to create !3 11"39010.8762
this.
Microsoft Excelresiduar
plotfor the package
rielivorv cfnra.]:ia
ofTable13.4
SeeSectronE13.3to create
this.
536 CHAPTERTHIRTEEN SimnleLinearResressron
The Durbin-Watson
Statistic
The Durbin-Watson statistic is used to measure autocorrelation.This statistic measuresthe
correlation between each residual and the residual for the time period immediately preceding
the one of interest.Equation(13.15) definesthe Durbin-Watsonstatistic.
DURBIN-WATSONSTATISTIC
f. L ' - I{ e ' - e , - , ) 2
--, (r3.ls)
>"?
i- |
where
FIGURE13.16
M icrosoft Excel results
of the Durbin-Watson
statisticfor the package
delivery store data
*83/84
You need to determine when the autocorrelation is large enough to make the Durbin-
See SectionE13.4to create Watson statistic,D, fall sufficiently below 2 to conclude that there is significant positive auto-
thts. correlation. After computing D, you compare it to the critical values of the Durbin-Watsonsta-
tistic found in Table E.10, a portion of which is presentedin Table 13.5.The critical values
dependon o(,the significancelevel chosen,n,the sample size, and k, the number of indepen-
dent variablesin the model (in simple linear resression./r : 1).
T A B L E1 3 . 5 cr: .05
F i n d i n gC r i t i c a V
l alues
of the Durbin-Watson
Statistic dL
i.l5).
ssive
Learning
the Basics b. Computethe Durbin-Watsonstatistic.At the 0.05 level
rsents of significance,is thereevidenceof positiveautocorre-
13.32 The residualsfor l0 consecutivetime lationamongthe residuals?
periodsareas follows: c. Basedon (a) and (b), what conclusioncan you reach
d,the
2.(rf aboutthe autocorrelation ofthe residuals?
TimePeriod Residual Time Period Residual
naxi- Applying the Concepts
ssults I 6 r1
TI
13.35 A mail-ordercatalogbusinessthat sells personal To use the espressoshot in making alatte, cappuccino,
computersupplies,software,and hardwaremaintainsa other drinks, the shot must be poured into the beverage
centralizedwarehousefor the distribution of products ing the separationof the heart,body,andcrema.If the shoti
ordered.Managementis currently examining the process used after the separationoccurs,the drink becomes
of distribution from the warehouseand is interestedin sively bitter and acidic, ruining the final drink. Thus,
studying the factors that affect warehousedistribution longer separationtime allows the drink-maker more time
costs.Currently,a small handlingfee is addedto the order, pour the shotandensurethatthebeveragewill meet
regardlessof the amountof the order.Data havebeen col- tions. An employeeat a coffee shop hypothesizedthat
lected over the past 24 months, indicating the warehouse harder the espressogrounds were tamped down into
distributioncostsand the numberof ordersreceived.They portafilter before brewing, the longer the separationti
are storedin the file@@. The resultsare as follows: would be. An experimentusing 24 observationswas
ductedto test this relationship.The independentvari
Tampmeasuresthe distance,in inches,betweenthe
Distribution Cost Number
groundsand the top ofthe portafilter (that is, the harder
Months (Thousandsof Dollars) of Orders
tamp, the largerthe distance).The dependentvariable
I 52.95 4,015 is the numberof secondsthe heart,body,and cremaare
2 7r.66 3,806 arated(that is. the amountof time after the shot is
J 85.58 5,309 beforeit mustbe usedfor the customer'sbeverage). The
4 63.69 4,262 are storedin the filel$!$$:
5 72.8r 4,296
6 68.44 4,097 Shot Tamp Time Shot Tamp
7 52.46 3,213
8 70,77 4,809 | 0.20 t4 13 0.50
9 82.03 5,237 2 0.50 t4 14 0.50 t3
l0 74.39 4,732 3 0.50 18 15 0.3s 19
ll 70.84 4,413 4 0.20 t6 16 0.35 l9
12 s4.08 2,921 s 0.20 16 r7 0.20 l7
13 62.98 3,977 6 0.50 13 18 0.20 l8
t4 72.30 4,428 7 0.20 12 19 0.20 t5
15 58.99 3,964 8 0.35 15 20 0.20 l6
l6 79.38 4,592 9 0.50 9 2t 0.35 l8
t7 94.44 5,582 10 0.35 15 22 0.35 16
l8 59.74 3,450 11 0.50 ll 23 0.35 t4
l9 90.50 5,079 t2 0.50 t6 24 0.35 l6
20 93.24 5,735
2l 69.33 4,269 Determinethe prediction line, using Time as the
22 53.7r 3,708 dent variableandTampas the independentvariable.
23 8 9 .8
1 5,387 b. Predictthe meanseparationtime for a Tampdistance
24 66.80 4,161 0.50inch.
c. Plot the residualsversusthe time order of exoeri
tion. Are thereany noticeablepatterns?
Assuming a linear relationship,use the least-squares
d. Computethe Durbin-Watsonstatistic.At the 0.05
methodto find the regressioncoefficientsbo and b,.
of significance,is there evidenceof positive
Predict the monthly warehousedistribution costswhen
lation amongthe residuals?
the numberof ordersis 4.500.
e. Basedon the resultsof (c) and (d), is there reason
c. Plot the residualsversusthe time period.
questionthe validity of the model?
d. Computethe Durbin-Watsonstatistic.At the 0.05 level
ofsignificance,is thereevidenceofpositive autocorre- 13.38 The owner of a chain of ice cream stores
lation amongthe residuals? like to study the effect of atmospherictemperature
e. Basedon the resultsof (c) and (d), is there reasonto salesduringthe summerseason.A sampleof 2l
questionthe validity of the model? tive daysis selected,with the resultsstoredin the data
13.37 A freshlybrewedshot of espressohasthreedistinct @.
components:the heart,body, and crema.The separationof (Hint: Determinewhich are the independentand
thesethreecomponentstypically lastsonly l0 to 20 seconds. dentvariables.)
13.7:lnferences
AbouttheSlopeandCorrelation
Coefficient 539
)r Assuminga linear relationship,use the least-squares d. Compute the Durbin-Watson statistic. At the 0.05 level
methodto find the regressioncoefficientsbo andb,. of significance, is there evidence of positive autocorre-
is Predictthe salesper storefor a day in which thetemper- lation among the residuals?
i- atureis 83"F. e. Based on the results of (c) and (d), is there reason to
a Plotthe residualsversusthe time oeriod. question the validity of the model?
o
l-
IE
IC
13.7 INFERENCES
ABOUTTHESLOPE
IC
l-
AND CORRELATION
COEFFICIENT
te In Sectionsl3.l through13.3,regression wasusedsolelyfor descriptivepurposes. Youlearned
io how the least-squaresmethoddeterminesthe regressioncoefficientsandhow to predictY for a
re given valueof X. In addition,you learnedhow to computeand interpretthe standarderror of
IC the estimateandthe coefficientof determination.
)- When residualanalysis,as discussedin Section13.5,indicatesthat the assumptions of a
rd least-squaresregressionmodel are not seriouslyviolated and that the straight-linemodel is
a appropriate,you canmakeinferencesaboutthe linearrelationshipbetweenthe variablesin the
population.
r - 4-Fr (13.16)
sr,
where
Srr _- Svx
ffi
3
ssx:> 6i- x)2
j=l
FIGURE13.17 D:
MicrosoftExcelttest
forthe slopefor the 16 i CoefficientsSandard Errcr t Sat P-rralae Lawer95% Upper9S/o
SunflowersApparel data tZj lntercept 0.9645 0.5262 1.8329 0.0917 {.1820 2.1110
18 SquareFeet 1.6699 0.1569 10.6411 0.qpo 1.3280 2.0118
FromFigure13.17,
See SectionE13.1to create
the worksheet that contains 4=+1.6699 n=14 Sa =0.1569
this area.
and
hr-F
,_
sn,
_ r.6699-0:10.6411
0.I 569
FIGURE
13.18
Testing a hypothesis
about the population
slope at the 0.05 level
o f s i g n i f i c a n c ew
, ith
12 deoreesof freedom
-2.1t788 0 +2.1788!, tp
I
R e g i o no f R e g i o no f R e g i o no f
Rejection Nonrejection Rejection
Critical Critical
Value Value
MSR
t -- (13.17)
MSE
13.7: InferencesAbout the Slopeand CorrelationCoefficient 541
where
MsR:!q4 L
L
MSE:
s,sE
n-k-1
t: numberof independent
variablesin the regression
model
rtifi-
fll > TableI 3.6 organizesthe completesetof resultsinto an ANOVA table.
ralue
Frifi-
13.6 Sum of Mean Square
Table Source df Squares (Variance) F
inqthe
ofa Regression SSR . MSR
k ,SSR M,SR=
Coefficient MSE
Error 'S^St
n-k-l S.siE MSE =
n-k-l
Total n- | ,S,SZ
13.19
ExcelFtest ANOVA
Sunflowers
data
ss MS F F
Regreeslon 1 105.7476105.74761132335 0.{xno
Residual 12 11.2M7 0333!'
14lTotal 13 I16.9543
EI3.1to create
that contains
Using a level of significanceof 0.05,from TableE.5, the critical valueof the F distribu-
tion,with 1 and12degrees of freedom,is 4.75(seeFigure13.20).Because F: 113.2335 > 4.j5
or becausethep-value: 0.0000< 0.05,you rejectHn andconcludethatthe sizeof the storeis
significantly relatedto annualsales.Because theF teit in Equation13.17on page540is equiv-
alentto the I teston page539,you reachthe sameconclusion.
542 CHAPTERTHIRTEENSimple Linear Regression
FTGURE13.20
Regionsof rejection
and nonreiection when
testingfoisignificance
of slooeat the 0.05 level
with
of significance,
1 and 12 degrees
of freedom
| 4.75
it
Regionof Critical Regionof
Nonrejection Value Relection
br!tn_256, (13.18)
b 1 + t n - 2 5 6=, 1 . 6 6 9 t9 ( 2 . 1 7 8 8 X 0 . 1 5 6 9 )
= 1.6699+ 0.3419
1.3280<Fr<2.0118
l= (r3.1e)
where
,: +F ifbl>0
,: _,[7i f b l < 0
The test statisticI follows a / distributionwith n - 2 degreesof freedom.
r-0
= 10.641I
1- (o.9so9)2
t4-2
Usingthe 0.05 levelof significance,becauset : l0.64ll > 2.1'788,you rejectthe null hypoth-
esis.You concludethat thereis evidenceofan association betweenannualsalesand storesize.
This / statisticis equivalentto the / statisticfound when testingwhetherthe populationslope,
F1,is equalto zero(seeFigure13.17on page540).
When inferencesconcerningthe populationslopewere discussed" confidenceintervalsand
testsof hypothesiswereused interchangeably. However,developinga confidence intervalfor the
correlationcoefficientis morecomplicatedbecausethe shapeof the samplingdistributionof the
statisticr variesfor differentvaluesof the populationcorrelationcoefficient.Methodsfor devel-
oping a confidenceintervalestimatefor the correlationcoefficientarepresentedin reference4.
used to moveonly 60% as much as the overall market. mately 12.5%.On the downside,if the sameindex loses
rders withnegativebetavaluestend to move in a direc- 20%, POSCX losesapproximately25o/o.
@. thatof the overallmarket.The following table
opposite a. Considerthe leveragedmutual fund ProFundUltraOTC
somebetavalues for some widely held stocks: "Inv" (UOPIX), whose descriptionis 200% of the per-
ofa formanceof the S&P 500 Index. What is its approxi-
Ticker Symbol Beta
d the mate marketmodel?
T 0.80 b. If the NASDAQ gains30% in a yeaq what return do you
pop- IBM 1.20 expectUOPX to have?
Company DIS 1.40 c. If the NASDAQ loses35% in a year,what return do you
AA 2.26 expectUOPX to have?
I rev- Logrc LSI 3.61 d. What type of investorsshouldbe attractedto leveraged
) data
funds?What type of investorsshould stay away from
f that from finance.yahoo.com, May 3 I, 2006.
: Extracted
thesefunds?
eachof the five companies,interpretthe betavalue. 13.51 The data in the file EEE@ representthe
: of a
Howcaninvestorsuse the beta value as a euide for caloriesand fat (in grams)of 16-ounce
iced coffeedrinks
fran-
investins? at Dunkin'Donutsand Starbucks:
) pop- lndexfundsare mutual funds that try to mimic the
Product Calories Fat
of leadingindexes,suchas the S&P 500 Index,
NASDAQ100Index, or the Russell2000 Index.The Dunkin'DonutsIced MochaSwirl latte
a real
valuesfor thesefunds(asdescribedin Problem 13.49) (wholemilk) 240 8.0
apart-
therefore approximately1.0. The estimatedmarket StarbucksCoffeeFrappuccinoblended
stored
for thesefundsare approximately coffee 260 3.5
Dunkin' DonutsCoffeeCoolatta(cream) 350 22.0
e of a (%weeklychangein index tu"d) : 0.0 + 1.0 (% weekly
StarbucksIcedCoffeeMochaEspresso
nt and changein the index)
(wholemilk andwhippedcream) 350 20.0
index funds are designedto magnify the StarbucksMocha Frappuccinoblended
epop-
of maior indexes.An article in Mutual Funds coffee (whippedcream) 420 16.0
0'Shaughnessy, "Reachfor Higher Returns,"Mutual StarbucksChocolateBrownie Frappuccino
ldness July1999,pp. 4449) describedsomeof the risks blendedcoffee(whippedcream) 510 22.0
r. The rewards associated
with thesefunds and savedetails StarbucksChocolateFrappuccinoBlended
nlts of some of themostpopularleveragedfunds,including Crdme(whippedcream) 530 r9.0
in thefollowins table:
re of a Source:Extractedfrom"Coffeeas Candyat Dunkin'Donutsand
(TickerSymbol) Fund Description ConsumerReports,June2004,p. 9.
Starbucks,"
;ensile
SmallCap 125%ofRussell2000Index a. Compute and interpret the coefficient of correlation, r.
e pop- (POSCX) b. At the 0.05 level of significance, is there a significant
linear relationship between the calories and fat?
"Inv"Nova 150%ofthe S&P 500Index
by its 13.52 There are several methods for calculating fuel
rck by economy. The following table (contained in the file
le per- indicates the mileage as calculated by owners
UltraOTC Double(200%)the NASDAQ 100 @l!!ls)
rt vari-
rdexas
(uoPx) Index and by current government standards:
a. Compute and interpret the coefficient of correlation, r. 13.54 Collegefootballplayerstrying out for the NFL
b. At the 0.05 level of significance, is there a significant given the Wonderlic standardizedintelligence test.The datai
linear relationship between the mileage as calculated by the file[@!@Srepresent theaverageWonderlicscores
owners and by current government standards? football players trying out for the NFL and the
rates for football players at selected schools (extracted
13.53 College basketball is big business,with coaches' S. Walkeq "The NFUs SmartestTeam," The Wall
salaries,revenues,and expensesin millions of dollars. The 30,2005,pp.Wl, Wl0).
Journal,September
datain the file !![!l!$ls$l[f@ represent
the coaches' a. Compute and interpret the coefficient of correlation,r.
salariesand revenuesfor collegebasketballat selected b. At the 0.05 levelof sienificance.
is therea sisnifi
schoolsin a recentyear(extractedfrom R. Adams,"Pay for linear relationship betweenthe averageWonderlic
Playoffs,"TheWallStreetJournal,March ll-12,2006, pp. of football players trying out for the NFL and the
Pl, P8). ation rates for football players at selectedschools?
a. Computeand interpretthe coefficientof correlation,r. c. What conclusions can you reach about the relat
b. At the 0.05 level of significance,is therea signifi- between the averageWonderlic score of football
cant linear relationshipbetweena coach'ssalaryand trying out for the NFL and the graduation rates for
revenue? ball players at selectedschools?
hi=
,ssx
where
Yi : predictedvalueof { = bs + b1X,
,Sr": standarderror of the estimate
n : samplesize
X,: givenvalueofX
F r o mT a b l eE . 3 ,t r r : 2 . 1 7 8 8 .T h u s ,
ndi-
Y,X tn-rSrrrfr
where
r. In
n ssx
tion
so that
the
,ssx
= 7.6439t (2.1788X0
(4- 2.g2rq2
.9664)
37.9236
= 7.6439+ 0.6728
SO
6 . 9 7 1 1 ! F y r - q <8 . 3 1 6 7
PREDICTION
INTERVALFORAN INDIVIDUALRESPONSE,
Y
J ^ t--
Yi+t,-zSrxll+I+
(13.21)
1 - t,-rsu^t;a 3 Yy=y,s I + to*2sn.[il-
Yy*y,is
lvhere&r,-yr,SWn,a+dX,aredefinedasinEquation,(13.20)onpege546and
futurevalueof YwhenX=4.
fi =0.9645+1.6699X,
= 0.9645+ 1.6699(4)
= 7.6439(millionsof dollars)
FromTableE.3,tn: 2.1788.
Thus,
f, :'t,-rsr*[1
where
2<',- x)'
n
;-l
so that
(4 - 2.s q2
7.6439I (2.1788X0
.9664) t + ! +
t4 37.9236
7.6439
!2.2104
so
5.43353 Yr_+<9.8543
13,21
Excel
interval
and prediction
for the
Apparel -DarrCopylF2
-Bi -2
-nwF -85, Bl
-D6c.ItlF3
-DmcopylF{
trrn rrgrcdon rerh.t c.ll Bl
-t/80 + {Bf -Btll^2nn
'DrirCofylF
E13.5to create
-810'813'sARTFtal
-815 - 818
-815 r 8il
-Bl0"813'SQRTfi + Bl{
-Bl5 - 8?3
-815 r fiB
=l.o X=2 fr",-x)2=zo ffi predict weekly sales.The data are stored in the
file [@!![!.
h i : 0 . 1 3 7 3w h e n x : 8 .
For these dataSr*: 30.81 and
13.9 PITFALLS
IN REGRESSION
AND ETHICALIsSUEs
Someof the pitfalls involved in using regressionanalysisare as follows:
ne Anscombe (reference 1) showed that all four data sets given in Table 13.7 have the follow-
rSS ing identicalresults:
12 = 0.667
SSR= Explainedvariation
= - f )2 = 27.5t
It1
j=l
Thus, with respect to these statistics associatedwith a simple linear regression analysis, the
four data setsare identical. Were you to stop the analysisat this point, you would fail to observe
the important differences among the four data sets.By examining the scatterplots for the four
data sets in Figure 13.22 on page 552, and their residual plots in Figure I 3.23 on page 552, you
oa can clearly seethat each ofthe four data sets has a different relationship betweenX and Y.
From the scatterplots of Figure 13.22 and the residual plots of Figure 13.23,you see how
different the data setsare. The only data set that seemsto follow an approximate straight line is
ion data set A. The residual plot for data set A does not show any obvious patterns or outlying
are residuals. This is certainly not true for data sets B, C, and D. The scatter plot for data set B
rly. shows that a quadratic regressionmodel (see Section l5.l) is more appropriate.This conclu-
the sion is reinforced by the residual plot for data set B. The scatter plot and the residual plot for
Eif data set C clearly show an outlying observation. If this is the case,you may want to remove the
outlier and reestimatethe regressionmodel (see reference4). Similarly, the scatterplot for data
ing set D representsthe situation in which the model is heavily dependenton the outcome of a sin-
the gle response(XB: 19 and )', : 12.50).You would have to cautiously evaluate any regression
model becauseits regressioncoefficients are heavily dependenton a single observation.
FIGURE13.22
Scatterplotsfor four
data sets
$ 0 | $a
FIGURE13.23
plotsforfour
Residual
data sets
Residual
+4
a
a
a
a
a
a
a
10
P a n e lD
13.9: Pitfallsin Regression
and EthicalIssues 553
In summary, scatter plots and residual plots are of vital importance to a complete regres-
sion analysis.The information they provide is so basic to a credible analysis that you should
always include these graphical methods as part of a regressionanalysis.Thus, a strategy that
you can use to help avoid the pitfalls of regressionis as follows:
$ youarefamiliar
erhaps withthe I regressionmodels)to determinethe t PublishingA studyof the effectof price
r.V TV competition
organized by i effectof an advertisementon sales,based changes at Amazon.com and BN.com on
fa\ model Tyra Banks to find r on a set of factors.Also,managers use sales(again,regression analysis)found
$ "America's
topmodel."
Youmay dataminingto predictpatternsof behav- thata 1% pricechange at BN.com pushed
a
^h
be lessfamiliarwith anotherset of toDmod-
els that are emergingfrom the business
ior of what customers will buy in the
future, basedon historicinformation
salesdown4%, but it pushedsalesdown
only 0.5% at Amazon.com. (You can
world. aboutthe consumer. downloadthe paperat http://gsbadg.
l*
ln a EusinessWeek article from its FinanceAnytimeyoureadabouta finan- uchicago.edu/vitae.htm.)
January23, 2006,edition(S.Baker,"Why cial"model,"youshouldunderstand that s TransportationFarecast.com usesdata
"S MathWillRockYourWorld: MoreMathGeeks sometypeof regression modelis being miningandpredictive technologies
to objec-
Are Callingthe Shotsin Business. ls Your used.For example, a New York Times tivelypredictairfarepricing(seeD.Darlin,
'An
\ lndustryNext?" Business Week,pp.54-62), articleon June18,2006,titled Old ?irfaresMadeEasy(OrEasier)," TheNew
q) StephenBakertalks about how "quants" Formula ThatPointsto NewWorry"by YorkTimes, July1,2006,pp.C1,C6).
s turnedfinance upside downandis movingon MarkHulbert(p.BU8)discusses a market # Real estate Zillow.com usesinformation
to otherbusiness fields.The namequants timingmodelthat predicts the returnof aboutthe features contained in a home
s derivesfrom the fact that "math geeks" stocksin the next three to five years, anditslocation to develooestimatesabout
the marketvalueof thehome,usinga "for-
,x
\
developmodelsand forecastsby using
"ouantitative methods." Thesemethodsare
basedon the dividendyieldof the stock
marketand the interestrate of 90-dav mula"builtwith a proprietaryalgorithm.
builton the principles of regression analysis Treasury bills.
discussedin this chapter,althoughthe actual Food and beverageBelieveit or not, In the article,Bakerstatedthat statistics
modelsare muchmorecomplicated than the Enologix, a Californiaconsultingcom- and probability will becomecoreskillsfor
simplelinearmodels discussedin thischapter. pany,has developeda "formula" (a businesspeople and consumers. Thosewho
Regression-based modelshavebecome regression model)that predicts a wine's aresuccessful will knowhowto usestatistics,
the top modelsfor manytypesof business qualityindex,basedon a set of chemical whethertheyarebuilding financialmodels or
analyses.Someexamples include compounds found in the wine (seeD, makingmarketingplans.He also strongly
Darlington,"The Chemistryof a 90+ endorsed the needfor everyone in businessto
n Advertisingand marketingManagers Wine,"Ihe New York TimesMagazine, haveknowledge of MicrosoftExcelto beable
models(in otherwords,
useeconometric August7, 2005,pp.36-39). to producestatistical andreports.
analysis
554 CHAPTERTHIRTEEN SimoleLinearResression
As you can see frorn the chapter roadmap in Figure 13.24, Once you are assuredthat the model is appropriate,you can
this chapter developsthe simple linear regressionmodel predict valuesby using the prediction line and test for the
and discussesthe assumotions and how to evaluate them. significance of the slope.
S i m p l e L i n e a rR e g r e s s i o n
and Correlation
Coefficient
Least-Squares
, R e g r e s s i o nA n a l y s i s -:tto""l*:.5
r e s t | n gH o :
P=0
Scatter Plot
P r e d i c t i o nL i n e
Data
PlotResiduals Collected
I Over Ilme in Sequential
1-.,. Order
?
Compute
No
Durbin-Watson
Statistic
R e s i d u a lA n a l y s i s
I
ls Model
UseAlternativeto Yes No Yes No
Autocorrelation Appropriate
Regression
: Least-squares Present
L.".,. ?
7
Testing Hs:
I 0r=0
l{See Assumptionsl
No Model Yes
Significant '
?
Use Model for
P r e d i c t i o na n d E s t i m a t i o n
F|GURE'13.24Roadmapfor simplelinearregression
KeyEquations555
ssR=\fr,_yl,
RegressionEquation: The Prediction Line l=1
thetotal numberof hits the studenthad on the e. Determine the coefficient of determination,12, and
'eitesupporting the course. The following table
explain its meaningin this problem.
conelationcoefficient for all pairs of variables. f. Perform a residual analysis.Is there any evidenceof a
correlationsmarked with an * are statisticallv patternin the residuals?Explain.
usingo : 0.001: g. At the 0.05 level of significance,is there evidenceof a
linear relationshipbetweendelivery time and the num-
Correlation ber ofcasesdelivered?
Cumulative GPA 0.72* h. Constructa 95ohconfidenceinterval estimateof the
Total Hits 0.08 meandelivery time for 150 casesof soft drink.
Hit Consistency 0.37* i. Constructa95o/oprediction interval of the delivery time
GPA.TotalHits 0.12 for a singledeliveryof 150casesofsoft drink.
j. Constructa 95o/oconfidenceinterval estimateof the
GPA,Hit Consistency 0.32*
Hit Consistency 0.64* populationslope.
k. Explain how the resultsin (a) through (j) can help allo-
F.xtmctedfromD. Baugheti A. Varanelli, and E. Weisbord, catedelivery coststo customers.
Hits in an Internet-Supported Course: How Can
UseThemand What Do They Mean? " Decision Sciences 13.75 A brokeragehousewantsto predict the numberof
Innovative
EducatioqFall 2003,I(2), pp. 159-179. trade executionsper day, using the number of incoming
phonecalls as a predictorvariable.Datawerecollectedover
conclusionscan you reach from this correlation a period of 35 daysand are storedin the file@@.
a. Use the least-squares methodto computethe regression
surprisedby the results,or are they consistent coefficientsboandbr.
own observationsand experiences? b. Interpretthe meaningof bo and b, in this problem.
c. Predictthe numberof tradesexecutedfor a day in which
Managementof a soft-drink bottling company
the numberof incoming calls is 2,000.
developa methodfor allocating delivery coststo
d. Should you use the model to predict the number of
Althoughone cost clearly relatesto travel time
tradesexecutedfor a day in which the numberof incom-
particularroute, anothervariablecost reflectsthe
ing calls is 5,000?Why or why not?
iredto unloadthe casesof soft drink at the deliv-
e. Determine the coefficient of determination,r2, and
A sampleof 20 deliverieswithin a territory was
explain its meaningin this problem.
The delivery times and the numbersof cases
f. Plot the residualsagainstthe number of incoming calls
wererecordedin the@@$@file:
andalsoagainstthedays.Is thereanyevidenceofa pattern
in the residualswith eitherof thesevariables?Explain.
Delivery Delivery
Number Time Number Time g. Determinethe Durbin-Watsonstatisticfor thesedata.
ofCases (Minutes) Customer ofCases (Minutes) h. Basedon the resultsof (f) and (g), is there reasonto
questionthe validity of the model?Explain.
52 32.1 ll 161 43.0
g i. At the 0.05 level of significance,is there evidenceof a
34.8 t2 184 49.4
73 36.2 l3 202 57.2 linear relationshipbetweenthe volume of trade execu-
85 37.8 t4 2r8 56.8 tions and the numberof incoming calls?
95 37.8 15 243 60.6 j. Constructa 95o/oconfidenceinterval estimateof the
103 39.7 l6 254 61.2 mean number of tradesexecutedfor days in which the
n6 38.5 t7 267 58.2 numberof incomingcallsis 2,000.
l2l 4r.9 l8 27s 63.1 k Construct a 95o/oprediction interval of the number of
t43 44.2 l9 287 65.6
tradesexecutedfor a particularday in which the number
t57 47.r 20 298 67.3
of incomingcallsis 2,000.
l. Constructa 95ohconfidenceinterval estimateof the
modelto predictdeliverytime,based
a regression
populationslope.
ofcasesdelivered.
m.Basedon the resultsof (a) through (l), do you think the
least-squares
methodto computethe regression
rtsDoandb,. brokeragehouse should focus on a strategyof increas-
ing the total number of incoming calls or on a strategy
themeaningof bo and 6, in this problem.
that relies on trading by a small number of heavy
thedeliverytime for 150 casesof soft drink.
you usethe model to predict the delivery time traders?Explain.
who is receiving500 casesof soft drink? 13.76 You want to developa model to predict the selling
why not? price of homesbasedon assessedvalue.A sampleof 30
558 CHAPTER
THIRTEEN
Simple
LinearRegression
e. Performa residualanalysison your resultsand deter- 13.92 The datafile [@!contains the stockpricesof
minethe adequacyof the fit of the model. four companies,collectedweekly for 53 consecutive
f. At the 0.05 level of significance,is thereevidenceof a weeks,endingMay 22,2006.Thevariablesare
linearrelationshipbetweenthe price per personand the Week-Closing datefor stockprices
summatedrating? MSFT-Stock price of Microsoft,Inc.
g. Constructa 95ohconfidenceintervalestimateof the Ford-Stock price of FordMotor Company
meanprice per personfor all restaurantswith a sum- GM-Stock price of GeneralMotors,Inc.
matedratingof 50. IAL-Stock price of International
Aluminum,Inc.
h. Constructa95o/opredictionintervalof the priceper per- Source;Extracted Jromfinance.yahoo.com, May 31, 2006.
sonfor a restaurant with a summatedratingof 50.
a. Calculate the correlation coefficient, r, for each pair of
i. Constructa 95% confidenceintervalestimateof the slope.
stocks. (There are six of them.)
j. How useful do you think the summatedrating is as a
b. Interpret the meaning of r for each pair.
predictorof price?Explain.
c. Is it a good idea to have all the stocks in an individual's
'13.91 Referto the discussionof betavaluesand market portfolio be strongly positively correlated among each
modelsin Problem13.49onpages544-545.One hundred other? Explain.
weeksof data,endingtheweekof May 22,2006,for the S&P
13.93 Is the daily performanceof stocks and bonds corre-
500 and threeindividual stocksare includedin the datafile
lated? The data file E!@s![[tE contains information
@ Note that the weeklypercentqgechangefor both
concerning the closing value of the Dow Jones Industrial
the S&P 500 and the individualstocksis measuredas the
Average and the Vanguard Long-Term Bond Index Fund
percentage changefrom the previousweek'sclosingvalueto
for 60 consecutivebusinessdays, ending May 30, 2006.
the currentweek'sclosingvalue.The variablesincludedare
The variables included are
Week-Current week
Date Current day
SP500-Weekly percentage changein the S&P 500 Index
Bonds Closing price of Vanguard Long-Term Bond
WALMART-Weekly percentage changein stockprice
Index Fund
of Wal-MartStores,Inc.
Stocks-Closing price of the Dow Jones Industrial
TARGET-Weekly percentage changein stockprice of
Average
the TargetCorporation
: Extracted.from
Scturce finance.yahoo.com,
May 31, 2006.
SARALEE-Weekly percentagechangein stockprice
of the SaraLeeCorporation a. Compute and interpret the correlation coefficient, r, for
Source
: Extracted the variables Stocks and Bonds.
from finance.yahoo.com,
May 3I, 2006.
b. At the 0.05 level of significance, is there a relationship
a. Estimate the market model forWal-Mart StoresInc. (Hint:
between these two variables?Explain.
Use the percentagechange in the S&P 500 Index as the
independent variable and the percentage change in Wal- Report Writing Exercises
Mart Stores,Inc.'s stock price as the dependentvariable.) 13.94 In Problems13.85-13.89 on page561,you devel-
b. Interpret the beta value for Wal-Mart Stores,Inc. opedregressionmodelsto predictmonthlysalesat a sport-
c. Repeat(a) and (b) forTarget Corporation. ing goodsstore.Noq write a reportbasedon the models
d. Repeat(a) and (b) for Sara Lee Corporation. you developed.Append to your report all appropriate
e. Write a brief summary of your findings. chartsand statisticalinformation.
Managingthe SpringvilleHerald
To ensure that as many trial subscriptions as possible are examining new subscription data for the prior three
converted to regular subscriptions, the Herald marketing months, a group of three managerswould develop a subjec-
departmentworks closely with the distribution department tive forecast of the number of new subscriptions. Lauren
to accomplish a smooth initial delivery processfor the trial Hall, who was recently hired by the company to provide
subscription customers.To assist in this effort, the market- special skills in quantitative forecasting methods, sug-
ing department needs to accurately forecast the number of gested that the department look for factors that might help
new regular subscriptionsfor the coming months. in predicting new subscriptions.
A team consisting of managersfrom the marketing and Members of the team found that the forecasts in the
distribution departmentswas convenedto develop a better past year had been particularly inaccuratebecausein some
method of forecasting new subscriptions.Previously, after months, much more time was spent on telemarketing than
References563
l. Anscombe,F. J., "Graphsin StatisticalAnalysisl' The 4. Kutner,M. H., C. J. Nachtsheim,J. Neter,and W. Li,
AmericanStatistician27 (1973):17-21. AppliedLinear StatisticalModels,5th ed. (NewYork:
2.Hoaglin,D. C., and R. Welsch,"The Hat Matrix in McGraw-Hill/Irwin,2005).
Regression
andANOVAI' TheAmericanStatistician32 5. MicrosoftExcel2007(Redmond,WA: MicrosoftCorp.,
(1978):17-22. 2007\.
3.Hocking,R. R., "Developmentsin Linear Regression
Methodology:1959-1982,"kchnometrics25 (l 983):
219-250.
564 EXCELcoMPANIoN to chaoterl3
the Open to the chart sheet that contains your scatter plot II polrno*at
I
eet and selectChart t Add Trendline. In the Add Trendline
| ,'r Po,r.
our d i a l o gb o x ( s e eF i g u r eE l 3 . l ) , c l i c k t h e T y p e t a b a n d t h e n I
fhe click Linear. Click the Options tab and select the ' t -| -
:" l,to.nc Areraoe
lox A u t o m a t i c o p t i o n .C l i c k D i s p l a y e q u a t i o n o n c h a r t a n d
Display R-squared value on chart and then click OK. If Trendftna l lame
ln
you haveincluded a label as part of your data range,you will i:l Aulomatc : Linrar (Annu6lSiies)
(j Eustom;
seethatlabeldisplayedin placeof Seriesl in this dialogbox.
ForeGst
lect
ff 5et Intacept -
rthe
E DrsplayEqudbonon ch.rt
,eof t{ rqt:[email protected] 0" .njii
cell "1"" -l
lick Iype OPtbns
f*c"*
box, Ifendlnerrime
FIGURE E13.2 Format Trendline dialog box (2007)
r+) A*om*A: LirEsr(scri€rt)
{) Eurtmr
relocatethe X axis to the bottom of the chart. open to the
F0re{nst
f,orword: 0 I Lhits chart, right-click the I axis and select Format Axis fiom
[*kwvdr 0 ] t-kr*s the shortcutrnenu.
lf you use Excel 97-2003, selectthe Scale tab in the
r*aapt * o
Dl* FormatAxis dialogbox (seeFigureE 13.3), and enterthe value
g Bsplay gquatbn m chsrt
fbund in the Minimum box (-6 in FigureE13.3)asthe Value
E u+tev B-squareav.k€ on ch.ft
(X) axis Crossesat value and click OK. (As you enterthis
value,the check box fbr this entry is clearedautomatically.
)
Opento the chart sheet that contains your scatter plot and E] mg;orur*: I
selectLayout ) Trendline and in the Trendline gallery, P Fl.rnrmit: o.2
FIGURE E13.5 A B c U E F
Data
* **;
YvariableCdl
Rarpr lcl'cls
-^ VariebleCdlRvrger lcz+:c:a J
x veriaHe
tdl Range: ru;s15 ; 17 frst qeficsntalnrldbel
17 Fir* cellsin bsth rangescn*e*n labd
Csrfidenc*bvel for rogressioncodficbr*s: k*'1{
Outg.rtO$ions
RegressbnToolortut Optitrts T*le: W
F Regessian5tdislics Tabb
V *xwn ard ceffkients TBbk Heb I lt oK il cmcd I
Lgr:i::::1! ^ --
17 Resid:dsT&le
tr7 Resid-rdPlot FIGURE E13.7 Completed Normal
P r o b a b i l i t yP l o t d i a l o g b o x
Ortput O$ions
rlrle: i5ir-d A";[
You conclude that all assumptionsare valid and that
f7 scetterDiegram you can use this simple linear regression model for the
l* fr.rrbift.watson5t*irtic SunflowersApparel data.You can now open to the SLR
fil {orfidgme 6ndtuedctbn intervattor x = worksheetto view the detailsof the analysisor open to the
[_
Csrfidencetevd for brtervdestimates: qt
igS Estimate worksheet to make inferencesabout the mean of
)'and the predictionof individual valuesof )'.
F I G U R E E 1 3 . 6 C o m p l e t e d S i m o l eL i n e a r
R e g r e s s i o nd i a l o g b o x Using Basis Excel
Open to the Data worksheet of the ffiE workbook.
Select Tools ) Data Analysis (972003) or Data ) Data
contain label and enter a value for the Confidence level for Analysis (2007). Select Regression from the Data
regression coefficients. Click the Regression Statistics Analysis list, and click OK. In the procedure'sdialog box
Table,ANOVA and Coefficients Table, Residuals Table, ( s e eF i g u r e E 1 3 . 8 ) ,e n t e r C l : C l 5 a s t h e I n p u t Y R a n g e
and Residual Plot RegressionTool Output Options. Enter and enterBl:Bl5 as the Input X Range. Click Labels,
S i t e S e l e c t i o nA n a l y s i s a s t h e T i t l e a n d c l i c k S c a t t e r click Confidence Level and enter 95 in its box, and click
Diagram. Click Confidence and Prediction Interval Residuals.Click OK to executethe orocedure.
for X: and enter 4 in its box. Enter 95 in the Confidence
level for interval estimates box. Click OK to executethe
procedure.
To evaluatethe assumptionof linearity, you review the
inFUl
R e s i d u a l P l o t f o r X l c h a r t s h e e t .N o t e t h a t t h e r e i s n o Inprl Y Ran{e: rl,rtq
ru t3-*:l
apparentpattern or relationshipbetweenthe residualsand f c*.-l I
X variable. InFtrtXRang6: 81:615 ffi
To evaluatethe normality assumption,create a nor- B laoets f corstat k Z*ero Tn-b 1
mal probability plot. With your workbook open to the E ccrtiderse
t-avelr S olo
To evaluatethe assumptionof linearity, you plot the cell C2. Copy the residuals(including their column heading)
residuals againstthe squarefeet (independent)variable. To to the cell range Dl:Dl5. Selectthe formulas in cell range
simplifucreatingthis plot, open to the Data worksheet and B2:C2 and copy them down through row 15. Open to the
c o p yt h es q u a r ef e e t c e l l r a n g e B l : B l 5 t o c e l l E 1 . T h e n probability plot and observe that the data do not appear to
copythecell rangeof the residuals,C24:C38 on the SLR depart substantiallyfrom a normal distribution.
worksheet, to cell Fl of the Data worksheet. With your To evaluatethe assumptionof equal variance,returnto
workbook open to the Data worksheet, use the Section the scatter plot of the residuals and the X variable that you
813.2instructionson pages 564-566 to create a scatter already developed. Observe that there do not appear to be
plot.(UseEl:Fl5 as the Data range (Excel 97-2003) or major differencesin the variability of the residuals.
asthecell range of the X and I variables (Excel 2007) You conclude that all assumptionsare valid and that
whencreatingthe scatter plot.) Review the scatter plot. y o u c a n u s e t h i s s i m p l e l i n e a r r e g r e s s i o nm o d e l f o r t h e
0bserve that there is no apparentpattern or relationship Sunflowers Apparel data. You can now evaluatethe details
between the residualsand X variable.You conclude that the of the regressionresultsworksheet.If you are interestedin
linearity assumptionholds. making inferencesabout the mean of )'and the prediction
Younow evaluatethe normality assumptionby creating of individual values of )', open the (l!!@@$f[! work-
hat anormal probabilityplot. Createa Plot worksheet,using the book. (Usually, you would have to first make adjustments
the modelworksheet in the $fifr workbook as your guide. In a to the DataCopy worksheet,as discussedin SectionE13.5,
LR newworksheet,enter Rank in cell A I and then enter the but this workbook already contains the entries for the
the series 1 through14 in cells A2:A15. Enter Proportion in S u n f l o w e r sA p p a r e l a n a l y s i s . )O p e n t o t h e C I E a n d P I
rof allBl andenterthe formula :A2l15 in cell 82. Next. enter worksheetto make inferencesabout the mean of )'and the
ZValuein cell Cl and the formula :NORMSINV(82) in predictionof individual valuesof )'.
rok.
)ata
)ata
box
nge
rels,
;lick