Solutions of Forecast: Methods and Applications (1998)
Solutions of Forecast: Methods and Applications (1998)
Solutions of Forecast: Methods and Applications (1998)
1.1 Look for pragmatic applications in the real world. Note that there are no fixed
answers in this problem.
(a) Dow theory: There is an element of belief that past patterns will continue
into the future. So first, look for the patterns (support and resistance levels)
and then project them ahead for the market and individual stocks. This is a
quantitative time series method.
(b) Random walk theory: This is quantitative, and involves a time series rather
than an explanatory approach. However, the forecasts are very simple because
of the lack of any meaningful information. The best prediction of tomorrow’s
closing price is today’s closing price. In other words, if we look at first differences
of closing prices (i.e., today’s closing price minus yesterday’s closing price) there
will be no pattern to discover.
(c) Prices and earnings: Here instead of dealing with only one time series (i.e., the
stock price series) we look at the relation between stock price and earnings per
share to see if there is a relationship—maybe with a lag, maybe not. There-
fore this is an explanatory approach to forecasting and would typically involve
regression analysis.
1.2 Step 1: Problem definition This would involve understanding the nature of the indi-
vidual product lines to be forecast. For example, are they high-demand prod-
ucts or specialty biscuits produced for individual clients? It is also important
to learn who requires the forecasts and how they will be used. Are the forecasts
to be used in scheduling production, or in inventory management, or for bud-
getary planning? Will the forecasts be studied by senior management, or by
the production manager, or someone else? Have there been stock shortages so
that demand has gone unsatisfied in the recent past? If so, would it be better
to try to forecast demand rather than sales so that we can try to prevent this
76
Chapter 1: The forecasting perspective 77
happening again in the future? The forecaster will also need to learn whether
the company requires one-off forecasts or whether the company is planning on
introducing a new forecasting system. If the latter, are they intending it to
be managed by their own employees and, if so, what software facilities do they
have available and what forecasting expertise do they have in-house?
Step 2: Gathering information It will be necessary to collect historical data on each
of the product lines we wish to forecast. The company may be interested in
forecasting each of the product lines for individual selling points. If so, it is
important to check that there are sufficient data to allow reasonable forecasts
to be obtained. For each variable the company wishes to forecast, at least a
few years of data will be needed.
There may be other variables which impact the biscuit sales, such as economic
fluctuations, advertising campaigns, introduction of new product lines by a
competitor, advertising campaigns of competitors, production difficulties. This
information is best obtained by key personnel within the company. It will be
necessary to conduct a range of discussions with relevant people to try to build
an understanding of the market forces.
If there are any relevant explanatory variables, these will need to be collected.
Step 3: Preliminary (exploratory) analysis Each series of interest should be graphed
and its features studied. Try to identify consistent patterns such as trend
and seasonality. Check for outliers. Can they be explained? Do any of the
explanatory variables appear to be strongly related to biscuit sales?
Step 4: Choosing and fitting models A range of models will be fitted. These models
will be chosen on the basis of the analysis in Step 3.
Step 5: Using and evaluating a forecasting model Forecasts of each product line will
be made using the best forecasting model identified in Step 4. These forecasts
will be compared with expert in-house opinion and monitored over the period
for which forecasts have been made.
There will be work to be done in explaining how the forecasting models work
to company personnel. There may even be substantial resistance to the in-
troduction of a mathematical approach to forecasting. Some people may feel
threatened. A period of education will probably be necessary.
A review of the forecasting models should be planned.
78 Part D. Solutions to exercises
2.1 (a) One simple answer: choose the mean temperature in June 1994 as the forecast
for June 1995. That is, 17.2 ◦ C.
(b) The time plot below shows clear seasonality with average temperature higher
in summer.
20
18
16
Celsius
14
12
10
8
6
Month
Exercise 2.1(b): Time plot of average monthly temperature in Paris (January 1994–May
1995).
2.3 (a) Smooth series with several large jumps or direction changes; very large range
of values; logs help stabilize variance.
(b) Downward trend (or early level shift); cycles of about 15 days; outlier at day
8; no transformation necessary.
(c) Cycles of about 9–10 years; large range and little variation at low points indi-
cating transformation will help; logs help stabilize variance.
Chapter 2: Basic forecasting tools 79
(d) No clear trend; seasonality of period 12; high in July; no transformation neces-
sary.
(e) Initial trend; level shift end of 1982; seasonal period 4 (high in Q2 and Q3, low
in Q1); no transformation necessary.
2.4 1-B, 2-A, 3-D, 4-C. The easiest approach to this question is to first identify D.
Because it has a peak at lag 12, the time series must have a pattern of period 12.
Therefore it is likely to be monthly. The slow decay in plot D shows the series has
trend. The only series with both trend and seasonality of period 12 is Series 3. Next
consider plot C which has a peak at lag 10. Obviously this cannot reflect a seasonal
pattern since the only series remaining which is seasonal is series 2 and that has
period 12. Series 4 is strongly cyclic with period approximately 10 and series 1 has
no seasonal or strong cyclic patterns. Therefore C must correspond to series 4. Plot
A shows a peak at lag 12 indicating seasonality of period 12. Therefore, it must
correspond with series 2. That leaves plot B aligned with series 1.
2.5 (a)
X Y
Mean 52.99 43.70
Median 52.60 44.42
MAD 3.11 2.47
MSE 15.94 8.02
St.dev. 4.14 2.94
(b) Mean and median give a measure of center; MAD, MSE and St.dev. are mea-
sures of spread.
(c) r = −0.660. See plot on next page.
(d) It is inappropriate to compute autocorrelations since there is no time component
to these data. The data are from 14 different runners. (Autocorrelation would
be appropriate if they were data from the same runner at 14 different times.)
48
Y: maximal aerobic capacity
46
44
42
40
48 50 52 54 56 58 60
X: running times
180
Actual
160
Forecast Method 1
Forecast Method 2
140
5 10 15 20
Month
2.7 (a) Changes: −0.25, −0.26, 0.13, . . . , −0.09, −0.77. There are 78 observations in
the DOWJONES.DAT file. Therefore there are 77 changes.
(b) Average change: 0.1336. So the next 20 changes are each forecast to be 0.1336.
(c) The last value of the series is 121.23. So the next 20 are forecast to be:
115
110
0 20 40 60 80 100
day
1 Pn
(e) The average change is c = n−1 t=2 (Xt − Xt−1 ) and the forecasts are X̂n+h =
Xn + hc. Therefore,
n
1 X
X̂n+h = Xn + h (Xt − Xt−1 )
n − 1 t=2
h
= Xn + (Xn − X1 ).
n−1
This is a straight line with slope equal to (X n − X1 )/(n − 1). When h = 0,
X̂n+h = Xn and when h = −(n − 1), X̂n+h = X1 . Therefore, the line is drawn
between the first and last observations.
2.8 (a) See the plot on the next page. The variation when the production is low is
much less than the variation in the series when the production is high. This
indicates a transformation is required.
(b) See the plot on the next page.
(c) See the table on page 84.
Chapter 2: Basic forecasting tools 83
•
10 12
Forecast
Vehicles (thousands)
8
6
4
2
0
8
6
4
Exercise 2.8 (a) and (b): Time plots of Japanese automobile production and the logarithms
of Japanese automobile production.
84 Part D. Solutions to exercises
3.1
Y 3-MA 5-MA 7-MA 3 × 3-MA 5 × 5-MA
42 55.50 70.33 81.50 62.92 81.14
69 70.33 81.50 91.60 73.50 88.71
100 94.67 91.60 99.83 93.56 96.65
115 115.67 111.40 107.57 113.22 111.10
132 129.33 128.40 126.00 129.11 125.92
141 142.33 142.60 141.86 142.33 141.60
154 155.33 155.60 156.71 155.33 156.80
171 168.33 170.00 172.86 169.56 172.32
180 185.00 187.40 189.29 185.78 189.80
204 204.00 206.00 210.71 205.11 210.96
228 226.33 230.00 236.86 228.56 236.72
247 255.33 261.40 268.29 257.78 262.54
291 291.67 298.80 283.00 295.56 289.27
337 339.67 316.50 298.80 331.78 304.09
391 364.00 339.67 316.50 351.83 318.32
400
3-MA
3x3 MA
5-MA
5x5 MA
7-MA
300
200
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
The graph on the previous page shows the five smoothers. Because moving average
smoothers are “flat” at the ends, the best smoother in this case is the one with the
smallest number of terms, namely the 3-MA.
3.2 1
1
T̂t = 3 5 (Yt−3 + Yt−2 + Yt−1 + Yt + Yt+1 )
+ 51 (Yt−2 + Yt−1 + Yt + Yt+1 + Yt+2 )
+ 51 (Yt−1 + Yt + Yt+1 + Yt+2 + Yt+3 )
1 2 1 1 1 2 1
= 15 Yt−3 + 15 Yt−2 + 5 Yt−1 + 5 Yt + 5 Yt+1 + 15 Yt+2 + 15 Yt+3 .
3.3 (a) The 4 MA is designed to eliminate seasonal variation because each quarter
receives equal weight. The 2 MA is designed to center the estimated trend at
the data points. The combination 2 × 4 MA also gives equal weight to each
quarter.
(b) T̂t = 18 Yt−2 + 14 Yt−1 + 41 Yt + 14 Yt+1 + 81 Yt+2 .
3.4 (a) Use 2×4 MA to get trend. If the end-points are ignored, we obtain the following
results.
Data: Trend:
Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4
Q1 99 120 139 160 Q1 110.250 129.250 150.125
Q2 88 108 127 148 Q2 114.875 134.500 154.750
Q3 93 111 131 150 Q3 100.375 119.635 138.875
Q4 111 130 152 170 Q4 105.500 124.375 145.125
Data – trend:
Y1 Y2 Y3 Y4 Ave
Q1 9.750 9.750 9.875 9.792
Q2 –6.875 –7.500 –6.500 –7.042
Q3 –7.375 –8.625 –8.875 –8.292
Q4 5.500 5.625 6.875 6.000
(b) Hence, the seasonal indices are:
Ŝ1 = 9.8, Ŝ2 = −7.0, Ŝ3 = −8.3 and Ŝ4 = 6.0.
The seasonal component consists of replications of these indices.
(c) End points ignored. Other approaches are possible.
3.5 (a) See the top plot on the next page. There is clear trend which appears close to
linear, and strong seasonality with a peak in August–October and a trough in
January–March.
(b) Calculations are given at the bottom of the next page. The decomposition plot
is shown at the top of the next page.
88 Part D. Solutions to exercises
1600
1200
Plastic sales
data
800
trend-cycle
1200
1000
110
seasonal
70 80 90
104
remainder
100
96
2 3 4 5
Year J F M A M J J A S O N D
Data
1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
2×12 MA Trend
1 977.0 977.0 977.1 978.4 982.7 990.4
2 1000.5 1011.2 1022.3 1034.7 1045.5 1054.4 1065.8 1076.1 1084.6 1094.4 1103.9 1112.5
3 1117.4 1121.5 1130.7 1142.7 1153.6 1163.0 1170.4 1175.5 1180.5 1185.0 1190.2 1197.1
4 1208.7 1221.3 1231.7 1243.3 1259.1 1276.6 1287.6 1298.0 1313.0 1328.2 1343.6 1360.6
5 1374.8 1382.2 1381.2 1370.6 1351.2 1331.2
Ratios
1 119.2 124.5 123.6 115.6 98.8 79.1
2 74.1 69.2 75.7 90.1 105.1 116.0 121.0 125.4 123.6 118.4 96.6 81.0
3 80.2 70.7 78.3 92.3 104.4 114.0 111.3 122.2 124.8 122.6 98.3 85.5
4 78.7 70.5 76.2 89.2 101.2 111.4 115.4 119.8 122.2 120.5 104.4 88.9
5 74.9 74.7 81.5 93.8 108.6 123.0
Seasonal indices
Ave 77.0 71.3 77.9 91.3 104.8 116.1 116.8 122.9 123.6 119.3 99.5 83.6
Exercise 3.5(a) and (b): Multiplicative classical decomposition of plastic sales data.
Chapter 3: Time series decomposition 89
(c) The trend does appear almost linear except for a slight drop at the end. The
seasonal pattern is as expected. Note that it does not make much difference
whether these data are analyzed using a multiplicative decomposition or an
additive decomposition.
3.6 Period Trend Seasonal Forecast
t Tt St Ŷt = Tt St /100
61 1433.96 76.96 1103.6
62 1442.81 71.27 1028.3
63 1451.66 77.91 1131.0
64 1460.51 91.34 1334.0
65 1469.36 104.83 1540.3
66 1478.21 116.09 1716.1
67 1487.06 116.76 1736.3
68 1495.91 122.94 1839.1
69 1504.76 123.55 1859.1
70 1513.61 119.28 1805.4
71 1522.46 99.53 1515.3
72 1531.31 83.59 1280.0
3.7 (a) See the top of the figure on the previous page.
(b) The calculations are given below.
Year Q1 Q2 Q3 Q4
Data
1 362 385 432 341
2 382 409 498 387
3 473 513 582 474
4 544 582 681 557
5 628 707 773 592
6 627 725 854 661
4×2 MA
1 382.5 388.0
2 399.3 413.3 430.4 454.8
3 478.3 499.6 519.4 536.9
4 557.9 580.6 601.5 627.6
5 654.8 670.6 674.9 677.0
6 689.4 708.1
Ratios
1 112.9 87.9
2 95.7 99.0 115.7 85.1
3 98.9 102.7 112.1 88.3
4 97.5 100.2 113.2 88.7
5 95.9 105.4 114.5 87.4
6 91.0 102.4
Seasonal indices
Ave 95.8 101.9 113.7 87.5
90 Part D. Solutions to exercises
600
data
400
700
trend-cycle
600
500
400
seasonal
105
90 95
remainder
100
98
96
2 3 4 5 6
3.8 (a) The top plot shows the original data followed by trend-cycle, seasonal and
irregular components. The bottom plot shows the seasonal sub-series.
(b) The trend-cycle is almost linear and the small seasonal component is very small
compared to the trend-cycle. The seasonal pattern is difficult to see in time
plot of original data. Values are high in March, September and December and
low in January and August. For the last six years, the December peak and
March peak have been almost constant. Before that, the December peak was
growing and the March peak was dropping. There are several possible outliers
in 1991.
Chapter 3: Time series decomposition 91
(c) The recession is seen by several negative outliers in the irregular component.
This is also apparent in the data time plot. Note: the recession could be made
part of the trend-cycle component by reducing the span of the loess smoother.
3.9 (a) and (b) Calculations are given below. Note that the seasonal indices are
computed by averaging the de-trended values within each half-year.
Data 2×2 MA Detrended Seasonal Seasonal
Trend Data Component Adjusted Data
1.09 0.017 1.073
1.07 1.0825 -0.0125 -0.014 1.084
1.10 1.0825 0.0175 0.017 1.083
1.06 1.0750 -0.0150 -0.014 1.074
1.08 1.0625 0.0175 0.017 1.063
1.03 1.0450 -0.0150 -0.014 1.044
1.04 1.0300 0.0100 0.017 1.023
1.01 1.0225 -0.0125 -0.014 1.024
1.03 1.0075 0.0225 0.017 1.013
0.96 -0.014 0.974
(c) With more data, we could take moving averages of the detrended values for
each half-year rather than a simple average. This would result in a seasonal
component which changed over time.
92 Part D. Solutions to exercises
4.1
Period Data MA(3) SES(α = 0.7)
t Yt Ŷt Et Ŷt Et
1974 1 1 5.4
2 2 5.3 5.40 -0.10
3 3 5.3 5.33 -0.03
4 4 5.6 5.33 0.27 5.31 0.29
1975 1 5 6.9 5.40 1.50 5.51 1.39
2 6 7.2 5.93 1.27 6.48 0.72
3 7 7.2 6.57 0.63 6.99 0.21
4 8 7.10 7.14
Accuracy statistics from period 4 through 7
ME 0.92 0.65
MAE 0.92 0.65
MAPE 13.22 9.56
MSE 1.08 0.64
Theil’s U 1.40 1.14
Theil’s U statistic suggests that the naı̈ve (or last value) method is better than
either of these. If SES is used with an optimal value of α chosen, then α = 1 is
selected. This is equivalent to the naı̈ve method. Note different packages may give
slightly different results for SES depending on how they initialize the method. Some
packages will also allow α > 1.
values are not strictly comparable for the MA forecasts. It would be better to
use a holdout sample but there are too few data.
4.3 Optimizing α for SES over the period 3 through 10:
α 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
MAPE 65.60 53.46 44.43 37.60 32.32 28.16 24.82 22.08 19.80 17.86
MSE 79.34 47.24 29.95 20.10 14.17 10.41 7.91 6.17 4.92 4.00
(a) Clearly Holt’s method is better as it allows for the trend in the data.
(b) For SES, α = 1. Because of the trend, the forecasts will always lag behind the
actual values so that the forecast errors will always be at least 2. Choosing
α = 1 makes the forecast errors as small as possible for SES.
(c) See above.
4.4 (a) (b) and (c) See the table on the following page.
(d) There’s not much to choose between these methods. They are both bad! Look
at Theil’s U values for instance. The last value method over the same period
(13–28) gives MSE=6.0, MAPE=2.05 and Theil’s U=1.0.
94 Part D. Solutions to exercises
4.6 Here is a complete run for one set of values (β = 0.1 and α 1 = 0.1). Note that in this
program we have chosen to make the first three values of α be equal to the starting
value. This is not crucial, but it does make a difference.
t Yt Ft Et At Mt αt
1 200.0 200.00 0.00 0.00 0.00 0.100
2 135.0 200.00 -65.00 -6.50 6.50 0.100
3 195.0 193.50 1.50 -5.70 6.00 0.100
4 197.5 193.65 3.85 -4.74 5.79 0.950
5 310.0 197.31 112.69 7.00 16.48 0.820
6 175.0 289.74 -114.74 -5.18 26.30 0.425
7 155.0 241.00 -86.00 -13.26 32.27 0.197
8 130.0 224.08 -94.08 -21.34 38.45 0.411
9 220.0 185.43 34.57 -15.75 38.06 0.555
10 277.5 204.62 72.88 -6.89 41.55 0.414
11 235.0 234.77 0.23 -6.17 37.41 0.166
12 234.81 0.165
96 Part D. Solutions to exercises
0.4
0.2
0.2
ACF
ACF
0.0
0.0
-0.2
-0.2
-0.4
-0.4
2 4 6 8 10 12 14 2 4 6 8 10 12 14
Lag Lag
0.4
0.2
0.2
ACF
ACF
0.0
0.0
-0.2
-0.2
-0.4
-0.4
2 4 6 8 10 12 14 2 4 6 8 10 12 14
Lag Lag
For other combinations of values for β and starting values for α, here is what the
final α value is:
The time series is not very long and therefore the results are somewhat fickle. In
any event, it is clear that the β value and the starting values for α have a profound
effect on the final value of α.
4.7 Holt-Winters’ method is best because the data are seasonal. The variation increases
with the level, so we use Holt-Winters’ multiplicative method. The optimal smooth-
ing parameters (giving smallest MSE) are a = 0.479, b = 0.00 and c = 1.00. These
give the following forecasts (read left to right):
Chapter 4: Exponential smoothing methods 97
4.8 First choose any values for the three parameters. Here we have used α = β = γ = 0.1.
Different values will choose different initial values. Our program uses the method
described in the textbook and gave the following results:
Now compare with the optimal values: α = 0.917, β = 0.234 and γ = 0.000. Using
the same initialization, we obtain the results in Table 4-11, namely
304.20/(2 − 1)
F = = 35.4.
25.80/(5 − 2)
This has (2−1) = 1 df for the numerator and (5−2) = 3 df for the denominator.
From Table C in Appendix III, the P -value is slightly smaller than 0.010. (Using
a computer, it is 0.0095.) Standard errors:
q
s.e.(a) = (2.93) 15 + 25 20 = 3.53
q
1
s.e.(b) = (2.93) 20 = 0.656.
On 3 df, t∗ = 3.18 for a 95% confidence interval. Hence 95% intervals are
Analysis of Variance
Source DF SS MS F P
Regression 1 304.20 304.20 35.37 0.010
Error 3 25.80 8.60
Total 4 330.00
5.3 (a) See the plot on the next page and the Minitab output on page 101. The straight
line is Ŷ = 0.46 + 0.22X.
(b) See the plot on the next page. The residuals may show a slight curvature (Λ
shaped). However, the curvature is not strong and the fitted model appears
reasonable.
(c) R2 = 90.2%. Therefore, 90.2% of the variation in melanoma rates is explained
by the linear regression.
(d) From the Minitab output:
Prediction: 9.286. Prediction interval: (6.749, 11.823)
100 Part D. Solutions to exercises
10
8
melanoma
6
4
2
10 20 30 40
ozone
0.5
0.0
-0.5
-1.0
10 20 30 40
ozone
Analysis of Variance
Source DF SS MS F P
Regression 1 81.822 81.822 82.70 0.000
Error 9 8.905 0.989
Total 10 90.727
Note that it is the prediction interval (PI) we want here. Minitab also gives
the confidence interval (CI) for the line at this point, something we have not
covered in the book.
(e) This analysis has assumed that the susceptibility to melanoma among people
living in the various locations is constant. This is unlikely to be true due to
the diversity of racial mix and climate over the locations. Apart from ozone
depletion, melanoma will be affected by skin type, climate, culture (e.g. is
sun-baking encouraged?), diet, etc.
5.4 (a) See plot on the next page and computer output on page 103.
(b) Coefficients: a = 4.184, b = 0.9431. Only b is significant, showing the relation-
ship is significant. (We could refit the model without the intercept term.)
(c) If X = 80, Ŷ = 4.184 + 0.9431(80) = 79.63. Standard error of forecast is 1.88
(from computer output).
102 Part D. Solutions to exercises
90
80
70
Production rating
60
50
40
30
20 40 60 80
Manual dexterity
Exercise 5.4(a): Scatterplot of production rating against manual dexterity test scores.
100
•
•
80
Production rating
• •
• •
•
60
• •
•
• •
•
• • •
40
• •
• •
20
20 40 60 80 100
Manual dexterity
Analysis of Variance
Source DF SS MS F P
Regression 1 6576.8 6576.8 250.29 0.000
Error 18 473.0 26.3
Total 19 7049.8
(d) For confidence and prediction intervals, use Table B with 18 df. 95% CI for β
is 0.94306 ± 2.10(0.05961) = [0.82, 1.07].
(e) See output. Again it is the prediction interval (PI) we want here, not the
confidence interval (CI). The prediction intervals are shown in the plot on the
previous page.
5.5 (a) See the plot on the following page. The straight line regression model is Ŷ =
20.2−0.145X where Y = electricity consumption and X = temperature. There
is a negative relationship because heating is used for lower temperatures, but
there is no need to use heating for the higher temperatures. The temperatures
are not sufficiently high to warrant the use of air conditioning. Hence, the
electricity consumption is higher when the temperature is lower.
104 Part D. Solutions to exercises
19
Electricity consumption (Mwh)
18
17
16
10 15 20 25 30
Temperature
Exercise 5.5(a): Electricity consumption (Mwh) plotted against temperature (degrees Cel-
sius).
Possible outlier
2
1
Residuals
0
-1
10 15 20 25 30
Temperature
Exercise 5.5(c): Residual plot for the straight line regression of electricity consumption
against temperature.
Chapter 5: Simple regression 105
(b) r = −0.791
(c) See the plot on the previous page. Apart from the possible outlier, the model
appears to be adequate. There are no highly influential observations.
(d) If X = 10, Ŷ = 20.2 − 0.145(10) = 18.75. If X = 35, Ŷ = 20.2 − 0.145(35) =
15.12. The first of these predictions seems reasonable. The second is unlikely.
Note that X = 35 is outside the range of the data making prediction danger-
ous. For temperatures above about 20 ◦ C, it is unlikely electricity consumption
would continue to fall because no heating would be used. Instead, at high
temperatures (such as X = 35◦ C), electricity consumption is likely to increase
again due to the use of air-conditioning.
5.7 (a) See the plot on the next page. The winning time has been decreasing with year.
There is an outlier in 1896.
(b) The fitted line is Ŷ = 196−0.0768X where X denotes the year of the Olympics.
Therefore the winning time has been decreasing an average 0.0768 seconds per
year.
(c) The residuals are plotted on the next page. The residuals show random scatter
about 0 with only one usual point (the outlier in 1896). But note that the
last five residuals are positive. This suggests that the straight line is “levelling
out”—the winning time is decreasing at a slower rate now than it was earlier.
106 Part D. Solutions to exercises
54
52
50
winning.time
48
46
44
year
1
0
-1
year
Exercise 5.7(c): Residual plot for linear regression model of winning times.
Chapter 5: Simple regression 107
This would smash the world record. But given the previous five results (with
positive residuals), it would seem more likely that the actual winning time
would be higher. A prediction interval is
5.8 (a) There is strong seasonality with peaks in November and December and a trough
in January. The surfing festival shows as a smaller peak in March from 1988.
The variation in the series is increasing with the level and there is a strong
positive trend due to sales growth.
(b) Logarithms are necessary to stabilize the variance so it does not increase with
the level of the series.
(c) See the plot on the next page and the computer output on page 109. The fitted
line is Ŷ = −526.57 + 0.2706X where X is the year and Y is the logged annual
sales.
(d)
X = 1994 : Ŷ = −526.57 + 0.2706(1994) = 12.98
X = 1995 : Ŷ = −526.57 + 0.2706(1995) = 13.25
X = 1996 : Ŷ = −526.57 + 0.2706(1996) = 13.52
(e) We transform the forecasts and intervals with the exponential function:
Prediction intervals:
80000
60000
Sales
40000
20000
0
12.0
11.5
Analysis of Variance
Source DF SS MS F P
Regression 1 2.0501 2.0501 134.45 0.000
Error 5 0.0762 0.0152
Total 6 2.1263
These prediction intervals are very wide because we are only using annual totals
in making these predictions. A more accurate method would be to fit a model
to the monthly data allowing for the seasonal patterns. This is discussed in
Chapter 7.
(f ) One way would be to calculate the proportion of sales for each month compared
to the total sales for that year. Averaging these proportions will give a rough
guide as to how to split the annual totals into 12 monthly totals.
110 Part D. Solutions to exercises
14
12
Percentage mortality
10
8
6
4
0 20 40 60 80 100
Ŷ = 4.38 + 0.0154X
So the t-test is significant (since P < 0.05). A 95% confidence interval for the
slope is
This suggests that the Type A birds have a higher mortality than the Type B
birds, the opposite to what the farmers claim.
(c) For a farmer using all Type A birds, X = 100. So Ŷ = 4.38 + 0.0154(100) =
5.92%. For a farmer using all Type B birds, X = 0. So Ŷ = 4.38%. Prediction
intervals for these are [2.363, 9.487] and [0.587, 8.177] respectively.
(d) R2 = 2.6. So only 2.6% of the variation in mortality is due to bird type.
Chapter 5: Simple regression 111
140
Model 1
Model 2
120
100
consumption
80
60
40
40 60 80 100
price
(e) This information suggests that heat may be a lurking variable. If Type A birds
are being used more in summer and the mortality is higher in summer, than the
increased mortality of Type A birds may be due to the summer rather than the
bird type. A proper randomized experiment would need to be done to properly
assess whether bird type is having an effect here.
5.10 (a) Cross sectional data. There is no time component.
(b) See the plot above.
(c) When the price is higher, the consumption may be lower due to the pressure of
increased cost. Therefore, we would expect b 1 < b2 < 0.
(d) Model 1: First take logarithms of Y i , then use simple linear regression to obtain
a = 5.10, b = −0.0153, σe2 = 0.0735.
Model 2: Split data into two groups. Fit each group separately using simple
linear regression to obtain
a1 = 221, b1 = −2.91 and a2 = 84.8, b2 = −0.447.
Using the equation given in the question, we obtain
σe2 = 2913.7/16 = 182.06.
The fitted lines are shown on the graph above.
112 Part D. Solutions to exercises
The 95% PI are obtained using Ŷ ± t∗ (s.e.) where t∗ = 2.12 (from Table B with
16 df). Hence, we obtain the following values.
X Ŷ s.e. [ 95% PI ]
40 104.67 14.15 [74.7 , 134.7]
60 46.55 13.83 [17.2 , 75.9]
80 49.03 14.00 [19.3 , 78.7]
100 40.09 14.65 [ 9.0 , 71.1]
120 31.15 15.70 [ -2.1 , 64.4]
For example, at a price of 80c, the gas consumption will lie between 19.3 and
78.7 for 95% of towns.
Chapter 5: Simple regression 113
40
20
Residuals model 1
0
-20
40 60 80 100
Price
20
10
Residuals model 2
0
-10
-20
40 60 80 100
Price
140
120
100
Consumption
80
60
40
40 60 80 100
Price
Exercise 5.10(f ): Local linear regression through the gas consumption data. The fitted line
suggests that model 2 is more appropriate.
140
120
100
Consumption
80
60
40
20
0
40 60 80 100 120
Price
6.2 (a) The fitted model is Ĉ = 273.93−5.68P +0.034P 2 . For this model, R2 = 0.8315.
[Recall: in exercise 5.6, model 1 had R 2 = 0.721 and model 2 had R 2 = 0.859.]
So the R̄2 values for each model are:
46
Model 1 R̄2 = 1 − (1 − 0.721) n−k−1
n−1
= 1 − (1 − 0.721) = 0.715.
45
46
Model 2 R̄2 = 1 − (1 − 0.859) n−k−1
n−1
= 1 − (1 − 0.859) = 0.849.
43
46
Model 3 R̄2 = 1 − (1 − 0.832) n−k−1
n−1
= 1 − (1 − 0.832) = 0.824.
44
These values show that model 2 is the best model, followed by model 3. The t
values for the coefficients are:
Model 1 α : t = 10.22 β : t = −5.47
Model 2 α1 : t = 10.33 β1 : t = −6.61 α2 : t = 4.11 β2 : t = −1.99
Model 3 β0 : t = 8.83 β1 : t = −5.62 β2 : t = 4.57
Of these, only β2 from model 2 is not significantly different from zero. This
suggests that a better model would be to allow the second part of model 2 to
be a constant rather than a linear function.
(b) From the computer output the following 95% prediction intervals are obtained.
116 Part D. Solutions to exercises
Analysis of Variance
Source DF SS MS F P
Regression 2 17327.0 8663.5 41.95 0.000
Error 17 3511.0 206.5
Total 19 20838.0
Source DF Seq SS
P 1 13005.7
Psq 1 4321.3
200
150
consumption
100
50
0
20 40 60 80 100 120
price
Exercise 6.2: Quadratic regression of gas consumption against price. 95% prediction inter-
vals shown.
P Ĉ [ 95% PI ]
20 173.97 [ 131.21 , 216.74 ]
40 101.14 [ 69.19 , 133.10 ]
60 55.43 [ 23.38 , 87.48 ]
80 36.85 [ 4.77 , 68.92 ]
100 45.38 [ 11.52 , 79.25 ]
120 81.04 [ 31.62 , 130.46 ]
It is clear from the plot that it is dangerous predicting outside the observed
price range. In this case, the predictions at P = 20 and P = 120 are almost cer-
tainly wrong. Predicting outside the range of the explanatory variable is always
dangerous, but much more so when a quadratic (or higher-order polynomial) is
used.
(c) rP P 2 = 0.990. If we were to use P , P 2 and P 3 , the correlations among these
explanatory variables would be very high and we would have a serious multi-
collinearity problem on our hands. The coefficients estimates would be unstable
(i.e. have large standard errors). Multicollinearity will often be a problem with
polynomial regression.
118 Part D. Solutions to exercises
95% confidence intervals for the parameters are calculated using a t 6 distribu-
tion. So the multiplier is 2.45:
(b) F = 123.3 on (3,6) df. P = 0.000. This means that the probability of results
like this, if the three explanatory variables were not relevant, is very small.
(c) The residual plots on page 120 show the model is satisfactory. There is no
pattern in any of the residual plots.
(d) R2 = 0.984. Therefore 98.4% of the variation in Y is explained by the regression
relationship.
Chapter 6: Multiple regression 119
Analysis of Variance
Source DF SS MS F P
Regression 3 2001.54 667.18 123.32 0.000
Error 6 32.46 5.41
Total 9 2034.00
Source DF Seq SS
X1 1 1118.36
X2 1 871.67
X3 1 11.51
4
3
3
2
2
residuals
residuals
1
1
0
0
-1
-1
-2
-2
5 10 15 20 30 40 50 60 70
X1 X2
4
3
2
residuals
1
0
-1
-2
10 20 30 40 50 60
X3
(e) The signs of the coefficients indicate the direction of the effect of each variable.
X1 increases heat and has the greatest effect (the largest coefficient). The other
variables are not significant, so they may not have any effect. If they do, the
coefficients suggest that X2 might increase heat and X3 might decrease heat.
(f ) For X1 = 10, X2 = 40 and X3 = 30, Ŷ = 73.40+1.52(10)+0.38(40)−0.27(30) =
95.76. 90% Prediction interval: [90.24,101.29]
6.5 The data for this exercise were taken from McGee and Carleton (1970) “Piecewise re-
gression”, Journal of the American Statistical Association, 65, 1109–1124. It might
be worthwhile to get this paper to compare what conventional regression can ac-
complish when there are special features in the data. In this case, the relationship
Chapter 6: Multiple regression 121
between the Boston dollar volume and the NYSE-AME dollar volume underwent a
series of changes over the time period of interest. In this paper, the solution was as
follows:
Notice the slope coefficients in these four equations. They are small (because
Boston’s dollar volume is small relative to the big board volumes) but they get
increasingly stronger (from6 1 to 114 to 205) in successive periods of commission
splitting. Then in Dec ’68, the SEC said “no more commission splitting” and it
hurt the Boston dollar volume. The slope went back to 67, which is almost where it
started.
(a) The fitted equation is Ŷ = −66.2 + 0.014X. The following output was obtained
from a computer package.
Here, the regression is significant, but time is not significant. In fact, comparing
these two models shows that adding time to the regression equation is actually
worse than not adding it. See the R̄2 values. And for both analyses, the D-W
122 Part D. Solutions to exercises
250
200
Y
150
100
50
Exercise 6.5(c): Connected scatterplot for the Boston and American stock exchanges.
statistic shows that there is a lot of pattern left in the residuals. A piecewise
regression approach does far better with this data set.
(c) See the plot above.
6.6 (a) and (b) Here are the seasonality indices based on the regression equations
(6.10) and (6.12). They represent the intercept term in the regression for each
of the 12 first differences.
Using (6.10) Using (6.12)
Mar-Feb -2.6 -6.2
Apr-Mar -6.7 -10.6
May-Apr -3.5 -7.4
Jun-May -5.3 -9.2
Jul-Jun -3.6 -7.4
Aug-Jul -5.2 -9.2
Sep-Aug -5.9 -9.7
Oct-Sep -6.9 -10.7
Nov-Oct -4.1 -7.9
Dec-Nov -4.7 -8.5
Jan-Dec -0.8 -4.6
Feb-Jan -2.2 -6.2
These two sets of seasonal indices are not quite the same. In the first equa-
Chapter 6: Multiple regression 123
tion (6.10), all eleven dummy variables for seasonality were allowed to be in
the regression. In the second equation (6.12), the best subsets regression pro-
cedure did not allow the first seasonal dummy into the final equation. The
absolute values are not so important because, in the presence of different sets
of explanatory variables, we expect the intercept terms to be different.
(c) The seasonal indices should be the same regardless of which month is used as
a base.
7.1 (a) In general, the approximate standard error of the sample autocorrelations is
√
1/ n. So the larger the value of n, the smaller the standard error. Therefore,
the ACF has more variation for small values of n than for large values of n. All
three series show the autocorrelations mostly falling with the 95% bands. The
few that lie just outside the bands are not of concern since we would expect
about 5% of spikes to cross the bands. There is no reason to think these series
are anything but white noise.
√
(b) The lines shown are 95% critical values. These are calculated as ±1.96/ n. So
they are closer to zero when n is larger. The autocorrelations vary randomly,
but they mostly stay within the bounds.
7.2 The time plot shows the series as a non-stationary level. It wanders up and down
over time in a similar way to a random walk. The ACF decays very slowly which
also indicates non-stationarity in the level. Finally, the PACF has a very large value
at lag 1, indicating the data should be differenced.
AR(1) Yt = 0.6Yt−1 + et .
MA(1) Yt = et + 0.6et−1 .
ARMA(1,1) Yt = 0.6Yt−1 + et + 0.6et−1 .
AR(2) Yt = −0.8Yt−1 + 0.3Yt−2 + et .
MA(2) Yt = et + 0.8et−1 − 0.3et−2 .
In each case, we assume Yt = 0 and et = 0 for t ≤ 0. The generated data are shown
on the following two pages. There is a lot of similarity in the shapes of the series
because they are based on exactly the same errors.
7.4 (a) The ACF is slow to die out and the time plot shows the series wandering in a
non-stationary way. So we take first differences. The ACF of the first differences
show one significant spike at lag 1 indicating an MA(1) is appropriate. So the
model for the raw data is ARIMA(0,1,1).
(b) There is not consistent trend in the raw data and the differenced data have
mean close to zero. Therefore, there is no need to include a constant term.
(c) (1 − B)Yt = (1 − θ1 B)et .
(d) See the output on page 127. There may be slight differences with different
software packages and even different versions of the same package. The Ljung-
Box statistics are not significant and the ACF and PACF of residuals show no
significant differences from white noise.
Chapter 7: The Box-Jenkins methodology for ARIMA models 125
AR(1) MA(1)
2
2
1
1
0
0
-1
-1
-2
0 10 20 30 0 10 20 30
ARMA(1,1) AR(2)
4
4
3
2
2
0
1
-2
0
-4
-1
-6
-2
0 10 20 30 0 10 20 30
MA(2)
2
1
0
-1
-2
-3
0 10 20 30
(e) The last observation is yt = 3885; the last residual in series is e t = −881.87
(obtained from the computer package). Now
Yt = Yt−1 + et − 0.3174et−1 .
So Ŷ31 = Y30 + ê31 − 0.3174ê30
= 3885 + 0 − 0.3174(−881.87) = 4164.9
Ŷ32 = Ŷ31 + 0 − 0.3174(0) = 4164.9
Ŷ33 = Ŷ32 + 0 − 0.3174(0) = 4164.9
Exercise 7.4(f ): Predicted number of strikes in USA. 95% prediction intervals shown.
the high periods with the low periods) and at lags 12, 14 and 36 they are
positive (because we are correlating high periods with high periods).
(c) The pattern in the PACF plot is not particularly revealing. However, there
is little need to try to interpret this plot when the analysis clearly shows the
dominance of the seasonality. The best approach would be to difference the
series to reduce the effect of the seasonality and then see what is left over.
(d) These graphs suggest a seasonal MA(1) because of the spike at lag 12 in the
ACF and the decreasing spikes at lags 12 and 24 in the PACF. Overall, the
suggested model is ARIMA(0,1,0)(0,1,1) 12 .
(e) Using the backshift operator: (1 − B)(1 − B 12 )Yt = (1 − ΘB 12 )et . Rewriting
gives
Yt − Yt−12 − Yt−1 + Yt−13 = et − Θet−12 .
(c) Now
So
7.8 (a) The centered 12-MA smooth is shown in the plot on the next page. The trend
is generally linear and increasing with a flat period between 1990 and 1993.
(b) The variation does not change much with the level, so transforming will not
make much difference.
(c) The data are not stationary. There is a trend and seasonality in the data.
Differencing at lag 12 gives the data shown in the plot on page 131. These
appear stationary although it is possible another difference at lag 1 is needed.
(d) From the plots on page 131 it is clear there is a seasonal MA component of order
1. In addition there is a significant spike at lag 1 in both the ACF and PACF.
Hence plausible models are ARIMA(1,0,0)(0,1,1) 12 and ARIMA(0,0,1)(0,1,1)12 .
Comparing the two models we have the following results
ARIMA(1,0,0)(0,1,1)12 AIC=900.2
ARIMA(0,0,1)(0,1,1)12 AIC=926.9
130 Part D. Solutions to exercises
US electricity generation
300
280
260
240
220
200
Year
Hence the better model is the first one. Note that different packages will give
different values for the AIC depending on how it is calculated. Therefore the
same package should be used for all calculations.
(e) The residuals from the ARIMA(1,0,0)(0,1,1) 12 are shown in the plots on page
132. Because there are significant spikes in the ACF and PACF, the model is
not adequately describing the series. These plots suggest we need to add an
MA(1) term to the model. So we fit the revised model ARIMA(1,0,1)(0,1,1) 12 .
This time, the residual plots (not shown here) look like white noise. The AIC
is 876.7. Part of the computer output for fitting the revised model is shown
below.
Approx.
Parameter Estimate Std Error T Ratio Lag
MA1,1 0.74427 0.05887 12.64 1
MA2,1 0.77650 0.09047 8.58 12
AR1,1 0.99566 0.0070613 141.00 1
So the fitted model is
o
o
o o
20
o
o o o
o o oo o
o o
o
o o o o o
oo o
o o
10
o o o
o
o
o o oo oo o o
o
o o
o o oo o o o
o o o o
o o o o o o o
o o o o o o
o o o o
o o o oo o o
oo o o o o o oo o o
o o o o
o
0
o o o o o o
o o
o o o
o o o o
o o o oo
o o o
o o
o
o
o
-10
o
o
o
o
o o
-20
1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
0.2
0.2
PACF
ACF
0.0
0.0
-0.2
-0.2
0 10 20 30 40 0 10 20 30 40
o
o
o
o
o
2
o o
o o o o
o
o o o o
o o
o
o o o
o
1
o o o o
o o o
o o o o oo
oo o o o o o o o
o o o o o o o o
o o o o o o
o o o o o o o o
o o o o o o
o o o
o ooo
0
o
oo o oo o o o o o
o o o o o o
o oo
o o o o o o
o o o o
o
o o
o
o o o
-1
o
o
o
o o
o
o o
o
-2
o
-3
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
0.2
0.1
0.1
0.0
PACF
ACF
-0.1
-0.1
-0.3
-0.3
0 10 20 30 40 0 10 20 30 40
Exercise 7.8(e): Residuals from ARIMA(1,0,0)(0,1,1)12 model fitted to the electricity data.
Chapter 7: The Box-Jenkins methodology for ARIMA models 133
Note that the first term on the left is almost the same as differencing (1 − B).
This suggests that we probably should have taking a first difference as well as a
seasonal difference. We repeated the above analysis and arrived at the following
model: ARIMA(1,1,1)(0,1,1)12 which has AIC=864.6.
The computer output for the final model is shown above. The figures under
the heading Chi Square concern the Ljung-Box test. Clearly the model passes
the test (see Table E in Appendix III).
(f ) Forecasts for the next 24 months are given on the following page.
134 Part D. Solutions to exercises
7.9 (a) See the plot on the following page. Note that there is strong seasonality and
a pronounced trend-cycle. One way to study the consistency of the seasonal
pattern is to compute the seasonal sub-series and see how stable each month
is. The results are given below.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1955: 94.7 94.0 96.5 101.3 102.4 103.7 104.5 104.3 104.1 101.2 98.3 95.4
1956: 94.1 93.5 96.8 103.1 104.1 102.8 103.7 103.6 103.6 101.7 98.6 96.8
1957: 95.9 96.8 99.0 97.7 99.5 101.1 102.0 103.3 105.1 103.2 99.8 96.7
1958: 95.0 94.8 96.1 100.4 101.7 102.1 103.6 104.9 104.4 101.1 96.7 95.3
1959: 93.9 94.5 96.4 100.9 102.1 103.3 104.7 106.0 104.4 100.9 98.7 96.1
1960: 95.7 95.7 95.1 98.9 100.8 102.5 104.6 106.0 104.0 100.1 98.0 96.7
1961: 95.2 94.8 96.5 101.3 101.7 103.7 105.2 105.3 104.3 101.2 97.7 96.5
1962: 93.5 93.7 95.6 100.7 102.3 102.5 104.4 106.4 103.5 101.0 97.6 96.9
1963: 94.6 93.4 95.5 99.1 100.8 104.1 106.1 107.4 104.1 100.7 97.9 97.4
1964: 93.6 93.2 94.6 98.6 100.2 103.5 106.6 107.5 103.6 101.7 97.9 96.9
1965: 95.6 92.7 94.0 96.7 99.4 103.7 108.2 108.0 104.7 100.5 98.4 99.6
1966: 97.0 93.7 95.2 97.0 98.4 104.1 105.9 107.2 104.2 99.7 97.1 96.8
1967: 93.9 93.6 94.1 99.0 102.5 105.7 109.2 109.9 104.9 99.8 98.3 93.8
1968: 91.2 91.7 94.5 99.0 101.9 103.1 105.7 106.0 103.5 100.2 100.7 99.1
1969: 96.0 94.3 94.1 96.8 100.7 104.5 106.3 107.2 103.7 102.5 100.2 99.4
1970: 95.8 93.0 92.0 96.0 100.2 103.7 106.0 105.8 102.7 98.9 97.1 96.5
These detrended data are relatively consistent from year to year with only minor
Chapter 7: The Box-Jenkins methodology for ARIMA models 135
o
o oo
o
o o
o o
o o oo
o o
o
220
o o
o o
oo o o
o oo
o o
o oo o o
o oo o o
o o o
o
oo o o
o o o
o oo o o o
o o o
o o o
oo o o o
200
o oo o o oo
o o
o o o o o o
o o
o o o o
o oo o o o
oo o o
o o o
o o o o o oo
o o o
o o oo o o o o
o o o o o
o o o
o o o o o
o o o
o oo oo o o o o oo
o o
180
o o o o o o o o
o
o
o o o o oo
o o o o
o o o o oo
o o
o o oo o
o o
oo o
o o
o
o oo
0.5
PACF
ACF
0.4
0.0
0.0
-0.5
0 10 20 30 40 0 10 20 30 40
110
105
ratios
100
95
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
variations occurring here and there. For example, December 1967 and January
and February 1968 were noticeably lower than surrounding years.
Another way to look at seasonal patterns is via autocorrelation functions. Note
that for the raw data, the ACF shows strong seasonality over several seasonal
lags. This is further evidence of the consistency of the seasonal pattern. The
plot on the previous page shows the detrended data. Again, the seasonal pat-
tern is very consistent although the amplitude of the pattern each year varies.
Unusual results in early 1968 and early 1970 are seen.
(b) For the first 96 months, we identified an ARIMA(0,1,0)(0,1,1) 12 . For the second
96 months, we identified an ARIMA(0,1,0)(1,1,0) 12 : In practice, there is little
difference between these models. This means that once the trend has been
eliminated (by differencing), the seasonal patterns are very similar.
(c) Using the above ARIMA(0,1,0)(0,1,1) 12 model, we obtained the following fore-
casts.
Chapter 7: The Box-Jenkins methodology for ARIMA models 137
o
o
oo
o o
o o
o o
o
o o o o o o
o o o
105
o o oo o
oo o o o o
o o o o
o o o o o o o
o ooo o o o o o o o o
oo o o
o o
o o
o o oo o o
o o o
o o o o o
o o o o o
o o o o
o o o o o o
100
o o
o o o o o
o o o o
o o o
o o o o o o
o o
o o o o
o o
o o o
o o o o
o o o o o o o
o oo o o o o o o
o o o o
o o
o oo o o
o o o
o o
95
o o o
oo o
o o o o o o oo
o o o o oo
o o
o oo o o o
o
o
o
o
o
o
PACF
ACF
0.0
0.0
-0.5
-0.5
0 10 20 30 40 0 10 20 30 40
(d) For the second half of the data we used the ARIMA(0,1,0)(1,1,0) 12 to obtain
the forecasts at the top of the following page. The actual 1971–1972 figures
are also shown. The source is “Employment and Earnings, US 1909–1978”,
published by the Department of Labor, 1979.
A good exercise would be to take these forecasts and check the MAPE for 1971
and 1972 separately. The MAPE for the first forecast year should be smaller
than the MAPE for the second year.
(e) If the objective is to forecast the next 12 months then the latest data is obviously
the most relevant but to get seasonal indices we have to go back several years
and to anticipate what the next move the large cycle is going to be, we really
need to look at as much data as possible. So a good strategy would be
i. study the trend-cycle by looking at the 12-month moving average;
ii. remove the trend-cycle and study the consistency of the seasonality;
iii. decide how much of the data series to retain for the ARIMA modeling;
iv. forecast the next 12 months and use some judgment as to how to modify
the ARIMA forecasts on the basis of anticipated trend-cycle movements.
Chapter 7: The Box-Jenkins methodology for ARIMA models 139
7.10 (a) There is strong seasonality as can be seen from the time plot and the seasonal
peaks in the ACF.
(b) The trend in the series is small compared to the seasonal variation. However,
there is a period of downward trend in the first four years, followed by an
upward trend for four years. At the end the trend seems to have levelled off.
(c) The one large spike in the PACF of Figure 7-34 suggests the series needs dif-
ferencing at lag 1. This is also apparent from the slow decay in the ACF and
the non-stationary mean in the time plot.
(d) You would need to difference again at lag 1 and plot the ACF and PACF of the
new series (differenced at lags 12 and 1). It is not possible to identify a model
from Figures 7-33 and 7-34.
140 Part D. Solutions to exercises
8.1 (a) The fitted model in Exercise 6.7 (using OLS) was
Yt = 78.7 + 0.534xt + Nt .
The computer output below shows the results for fitting the straight line re-
gression with AR(1) errors. Hence the new model is
In this case, the error model makes very little difference to the parameters.
Approx.
Parameter Estimate Std Error T Ratio Lag Variable Shift
MU 79.27236 0.76093 104.18 0 SALES 0
AR1,1 0.72469 0.14647 4.95 1 SALES 0
NUM1 0.50801 0.02318 21.91 0 ADVERT 0
(b) The ACF and PACF of the errors is plotted on the following page. An AR(1)
model for the errors is appropriate since there is a single significant spike at
lag 1 in the PACF and geometric decay in the ACF. This is confirmed by the
Ljung-Box test in the computer output above. The Q ∗ values are given under
the column Chi Square. None are significant showing the residuals from the
full model are white noise.
Chapter 8: Advanced forecasting models 141
o
2
o
o
o
1
o
o
o
o
0
o o
o
o
o
-1
o o o
o
o o
-2
o
-3
10 20
0.4
0.4
PACF
ACF
0.0
0.0
-0.4
-0.4
2 4 6 8 10 12 2 4 6 8 10 12
Exercise 8.1: Errors from regression model with AR(1) error term.
142 Part D. Solutions to exercises
Approx.
Parameter Estimate Std Error T Ratio Lag Variable Shift
MU 9.56328 0.40537 23.59 0 HURON 0
AR1,1 0.78346 0.06559 11.94 1 HURON 0
NUM1 -0.02038 0.01066 -1.91 0 YEAR 0
8.2 (a) To reduce numerical error, we subtracted 1900 from the year to create an ex-
planatory variable. Hence the year ranged from -25 (1875) to 72 (1972). The
computer output above shows the fitted model to be
Yt = 9.56 − 0.02xt + Nt where Nt = 0.78Nt−1 + et
where xt is the year −1900.
(b) The errors are shown in the plot on the following page. This demonstrates
that a better model would have an AR(2) error term since the PACF has two
significant spikes at lags 1 and 2. The spike at lag 10 is probably due to chance.
The ACF shows geometric decay which is possible with an AR(2) model. So
the full regression model is
Yt = β 0 + β 1 xt + N t where Nt = φ1 Nt−1 + φ2 Nt−2 + et .
Fitting this model gives the output shown on page 144. So the fitted model is
Yt = 9.53 − 0.02xt + Nt where Nt = Nt−1 − 0.29Nt−2 + et .
Chapter 8: Advanced forecasting models 143
o
2
o
o o o
o
o o o
oo o
o
o o o
1
o o oo o
o o o
o
o oo
o o
oo o
o
o ooo o
o o o
o o
o o o
0
o o o
o o
o o o o
o
o o o
o o
oo oo o
oo o o
o o
o
-1
o o
o o o oo
o o o
o o o
o o
o
oo oo
-2
oo o
0.8
0.6
0.6
0.4
0.4
PACF
ACF
0.2
0.2
0.0
-0.2
-0.2
5 10 15 5 10 15
Exercise 8.2(b): Errors from regression model with AR(1) error term.
144 Part D. Solutions to exercises
Approx.
Parameter Estimate Std Error T Ratio Lag Variable Shift
MU 9.53078 0.30653 31.09 0 HURON 0
AR1,1 1.00479 0.09839 10.21 1 HURON 0
AR1,2 -0.29128 0.10030 -2.90 2 HURON 0
NUM1 -0.02157 0.0082537 -2.61 0 YEAR 0
8.3 (a) ARIMA(0,1,1)(2,1,0)12 . This model would have been chosen by first identifying
that differences at lags 12 and 1 are necessary to make the data stationary. Then
looking at the ACF and PACF of the differenced data would have shown two
significant spikes in the PACF at lags 12 and 24. There would have also been
a significant spike in the ACF at lag 1 and geometric decay in the early lags of
the PACF.
(b) Since both parameter estimates are positive (and significantly different from
zero), we can conclude that electricity consumption increases with both heating
degrees and cooling degrees. Because b 2 is larger, we know that there is a greater
increase in electricity usage for each heating degree than for each cooling degree.
(c) To use this model for forecasting, we would first need forecasts of both X 1,t
and X2,t into the future. These could be obtained by taking averages of these
variables over the equivalent months of the previous few decades. Then the
model can be used to forecast electricity demand over the next 12 months by
Chapter 8: Advanced forecasting models 145
forecasting the Nt series using the method discussed in chapter 7 and plugging
the forecasts of X1,t , X2,t and Nt into the formula for Yt .
(d) If the model was fitted using a standard regression package (thus modeling N t
as white noise), then the seasonality and autocorrelation in the data would have
been ignored. This would result in less efficient parameter estimates and invalid
estimates of their standard errors. In particular, tests for significance would be
incorrect, as would prediction intervals. Also, when producing forecasts of Y t ,
the forecasts of Nt would be all be zero. Hence, the model would not adequately
allow for the seasonality or autocorrelation in the data.
8.4 (a) b = 3, r = 1, s = 2.
(b) ARIMA(2,0,0)
(c) ω0 = −0.53, ω1 = −0.37, ω2 = −0.51, δ1 = 0.57, δ2 = 0, θ1 = θ2 = 0, φ1 = 1.53,
φ2 = −0.63.
(d) 27 seconds.
8.6 (a) The three series are shown on page 147. For Set 1, four X t values are needed
(since v1 , v2 , v3 and v4 are all non-zero). Therefore 27 Yt values can be produced.
Similarly 26 Yt values for Set 2 and 24 Yt values for Set 3 can be calculated.
(b) The first model is
2.0B
Yt = Xt + N t .
1 − 0.7B
The simplest way to generate data for this transfer function is to rewrite it as
follows
(a) (b)
2.0
3.0
1.5
2.5
1.0
2.0
weight
weight
0.5
1.5
0.0
1.0
0.5
-1.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
lag lag
(c) (d)
0.8
1.0
0.6
0.8
0.4
0.6
weight
weight
0.2
0.4
0.0
0.2
-0.2
-0.4
0 2 4 6 8 10 0 2 4 6 8 10
lag lag
Exercise 8.5: Impulse response weights for the four different transfer functions.
Chapter 8: Advanced forecasting models 147
Yt = a + (ν0 + ν1 B + · · · + ν6 B 6 )Xt + Nt
where Yt denotes the average room rate, Xt denotes the CPI and Nt is an AR(1)
process. The estimated errors from this model are shown in the figure on the
previous page. They are clearly non-stationary and have some seasonality.
So we difference both Yt and Xt and refit the model with Nt specified as an
ARIMA(1,0,0)(1,0,0)12 . The parameter estimates are shown below (as given
by SAS).
Chapter 8: Advanced forecasting models 149
o o o
o o o
o o o
o o o
o o o
o o o o
oo
o
0.05
o o oo o
o o
o o o
o oo o o o
o o
o o
oo o
o o o
o o
o o o o
o o o o
o o o
o o
o o oo o
o o o
o o
o oo o o o
0.0
o o o
o o o o o
o oo o o o
oo
o o o o o o
oo o o o o
o o o o o
o o o o o o
o o o o
o o o oo oo oo
o o o
o o o o o o o o o
o o o
o o o o
o o o o
-0.05
o o o
o o
o o o
o o
o o o o
o o o o
o o
o
o o
o o
0.4
0.4
PACF
ACF
0.2
0.2
0.0
0.0
-0.4
5 10 15 20 5 10 15 20
Exercise 8.7(d): Errors from regression model with AR(1) error term.
150 Part D. Solutions to exercises
60
40
0 20 40 60 80 100 120
Day
Exercise 8.8: Time plot of daily perceptual speed scores for a schizophrenic patient. The
drug intervention is shown at day 61.
where Yt denotes the perceptual speed score and X t denotes the step dummy
variable. The estimated coefficients were
Parameter ω θ
Estimate -22.1 0.76
(c) The drug has lowered the perceptual speed score by about 22.
(d) The new model is
ω
Yt = Xt + N t where (1 − B)Nt = θet−1 + et
1 − δB
(An ARIMA(0,1,1) error was found to be the best model again.) Here the
estimated coefficients were
Parameter ω δ θ
Estimate -13.21 0.54 0.76
The following accuracy measures show that the delayed effect model fits the
data better.
Model Step Delayed step
MAPE 15.1 15.0
MSE 92.5 91.1
AIC 542.8 538.4
The forecasts for the two models are very similar. This is because the effect of
the step in the delayed step model is almost complete at the end of the series,
60 days after the drug intervention.
(e) The best ARIMA model we found was an ARIMA(0,1,1) with θ = 0.69. This
gave MAPE=15.4, MSE=100.8 and AIC=550.9.
This model gives a flat forecast function (since we did not include a constant
term). The forecast values are 33.9. Because the step effect is almost complete
in the delayed step model, it also gives a virtually flat forecast function with
forecast values of 34.1. Hence there is virtually no difference. If forecasts had
been made earlier (for example, at day 80), there would have been a difference
because the step effect would still be in progress and so the delayed step model
would have showed a continuing decline in perceptual speed. The real advantage
of the intervention model over the ARIMA model is that the intervention model
provides a way of measuring and evaluating the effect of an intervention.
(f ) If the drug varied from day to day and the reaction times depended on dose,
then a better model would be a dynamic regression model with the the quantity
of drug as an explanatory variable.
Chapter 8: Advanced forecasting models 153
8.9 (a)
Yt − Yt−1 Yt−1 − Yt−2 Yt−2 − Yt−3
= Φ1 + Φ2
Xt − Xt−1 Xt−1 − Xt−2 Xt−2 − Xt−3
Yt−12 − Yt−13
+ · · · + Φ12 + Zt.
Xt−12 − Xt−13
(b)
Yt = Yt−1 − 0.38(Yt−1 − Yt−2 ) + 0.15(Xt−1 − Xt−2 )
− 0.37(Yt−2 − Yt−3 ) + 0.13(Xt−2 − Xt−3 ) + · · ·
= 0.62Yt−1 + 0.01Yt−2 + 0.15Xt−1 − 0.02Xt−2 + · · ·
(c)
• Multivariate model assumes feedback. That is, X t depends on past values
of Yt . But regression does not allow this.
• Regression model does not assume X t is random.
• Regression model allows Yt to depend on Xt as well as past values
Xt−1 , Xt−2 , . . .. Multivariate AR only allows dependence on past values
of {Xt }.
• For these data, it is unlikely room rates will substantially affect Y t although
it is possible. Small values in lower left of coefficient matrices suggest that
Xt is not affecting Yt . Yt should depend on Xt . So regression is probably
better.
8.10 (a) An AR(3) model can be written using the same procedure as the AR(2) model
described in Section 8/5/1. Thus we define X 1,t = Yt , X2,t = Yt−1 and X3,t =
Yt−2 . Then write
φ1 φ2 φ3 at
X t = 1 0 0 X t−1 + 0
0 1 0 0
and Yt = [1 0 0]X t .
This is now in state space form with
φ1 φ2 φ3 1 0 0 at
F = 1 0 0 ,G = 0 1 0 , H = [1 0 0], et = 0 and zt = 0.
0 1 0 0 0 1 0
Yt = [1 1] X t−1 + et .
9.1 There is little doubt that the trends in computer power and memory show a very
clear exponential growth while that of price is declining exponentially. It is therefore
a question of time until computers that cost only a few hundred dollars will exist
that can perform an incredible array of tasks which until now have been the sole
prerogative of humans, for example playing chess (a high-power judgmental and
creative process). It is therefore up to our imaginations to come up with future
scenarios when such computers will be used as extensively as electrical appliances
are used today. The trick is to free our thinking process so that we can come up with
scenarios that are not constrained by our perception of the present when computers
are being used mostly to make calculations.
9.2 As the cost of computers (including all of the peripherals such as printers and scan-
ners) is being reduced drastically, and at the same time we will be getting soon to
devices that will perform a great number of functions now done by separate ma-
chines, it will become more practical and economical to work at home. Furthermore,
the size of these all-purpose machines is being continuously reduced. In the next
five to ten years we will be able to have everything that is provided to us now in
an office at home with two machines: one a powerful all-inclusive computer and the
other a printer-scanner-photocopier-fax machine. Moreover these two machines will
be connected to any network we wish via modems so that we can communicate and
get information from anywhere.
9.3 As it was also mentioned in Exercise 9.1, there is no doubt that the trend in computer
and equipment prices are declining exponentially at a fast rate. This would make it
possible for everyone to be able to afford them and be able to have an office not only
at home but at any other place he or she wishes, including one’s car, a hotel room,
a summer vacation residence, or a sail boat.
9.4 Statements like those referred to in Exercise 9.4 abound and demonstrate the short-
sightedness of peoples’ ability to predict the future. As a matter of fact as late
as the beginning of our century people did not predict all four major inventions of
the Industrial Revolution (cars, telephones, electrical appliances and television) that
have dramatically changed our lives. Moreover, they did not predict the huge impact
of computers even as late as the beginning of the 1950s. This is why we must break
from our present mode of thinking and see things in a different, new light. This is
where scenarios and analogies can be extremely useful.
156 Part D. Solutions to exercises
10.1 Phillips’ problems have to do with the management bias of overoptimism, that is
believing that all changes will be successful and that they can overcome peoples’
resistance to change. This is not true, but we tend to believe that most organisational
changes are successful because we hear and we read about the successful ones while
there is very little mention of those that fail. Introducing changes must be considered,
therefore, in an objective manner and our ability to succeed estimated correctly.
10.2 The quote by Glassman illustrates the extent to which professional, expert invest-
ment managers underperform the average of the market. Business Week, Fortune
and other business journals regularly publish summary statistics of the performance
of mutual funds and other professionally-managed funds, benchmarking them with
the Standard & Poor or other appropriate indexes. The instructor can therefore get
some more recent comparisons than those shown in Chapter 10 and show them in
class.
10.3 Assuming that the length of cycles varies considerably we have little way to say how
long it will take until the expansion started in May 1991 will be interrupted. Un-
fortunately the length (and depth) of cycles varies a great deal making it extremely
difficult to say how long an expansion will last. It will all depend on the specific
situation involved that will require judgmental inputs, structured in such a way as
to avoid biases and other problems.
10.4 There are twenty 8s that one will encounter when counting from 0–100. When given
this exercise most people say nine or ten because they are not counting the eights
coming from 81 to 87, and 89 (they usually count the 8s in 88 often one time).
Chapter 11: The use of forecasting methods in practice 157
11.1 The results of Table 1 are very similar to those of the previous M-Competitions. As
a matter of fact the resemblance is phenomenal given the fact that the series used
and the time horizon they refer to are completely different.
11.2 In our view the combined method will do extremely well. More specifically its accu-
racy will be higher than the individual methods being combined while its variance
of forecasting error will be smaller than that of the methods involved.
11.3 It seems that proponents of new forecasting methods usually exaggerate their ben-
efits. This has been the case with methods under the banner of neural networks,
machine learning and expert systems. These methods did not do well in the M3-IJF
Competition. In addition only few experts participated in the competition using
such methods, even though more than a hundred were contacted (and invited to
participate) and more than fifty expressed an interest in the M3-IJF Competition,
indicating that they would “possibly” participate . In the final analysis it seems that
it is not so simple to run a great number of series by such methods resulting in not
too many participants from such methods.
158 Part D. Solutions to exercises
The exercises for Chapter 12 are general and can be answered by referring to the text
of Chapter 12 which covers each one of the topics. Each instructor can therefore form
his/her way of answering these exercises.