Automatic Forecasting
Automatic Forecasting
Automatic Forecasting
7/3/2009
Automatic Forecasting
Summary
The Automatic Forecasting procedure is designed to forecast future values of time
series data. A time series consists of a set of sequential numeric data taken at equally
spaced intervals, usually over a period of time or space. Unlike the Forecasting
procedure that expects the user to select the forecasting model to use, this procedure tries
many models and selects the one which performs best according to a specified criteria.
The available criteria for selecting a model include the Akaike Information Criteria
(AIC), the Hannan-Quinn Criterion (HQC), and the Schwarz-Bayesian Criteria (SBC).
This criteria select the model with smallest mean squared error, subject to a penalty for
the number of unknown parameters that need to be estimated.
Since the output of this procedure is similar to the Forecasting procedure, this document
will highlight only the unique aspects of the Automatic Forecasting procedure. For a
detailed discussion of all tables and graphs, refer to the Forecasting documentation.
Sample Data:
The file golden gate.sgd contains monthly traffic volumes on the Golden Gate Bridge in
San Francisco for a period of n = 168 months from January, 1968 through December,
1981. The table below shows a partial list of the data from that file:
Month Traffic
1/68 73.637
2/68 77.136
3/68 81.481
4/68 84.127
5/68 84.562
6/68 91.959
7/68 94.174
8/68 96.087
9/68 88.952
10/68 83.479
11/68 80.814
12/68 77.466
1/69 75.225
… …
The data were obtained from a publication of the Golden Gate Bridge.
Data Input
The data input dialog box requests the name of the column containing the time series
data:
Time indices: time, date or other index associated with each observation. Each value
in this column must be unique and arranged in ascending order.
Sampling Interval: If time indices are not provided, this defines the interval between
successive observations. For example, the data from the Golden Gate Bridge were
collected once every month, beginning in January, 1968.
Number of Forecasts: number of periods following the end of the data for which
forecasts are desired.
Withhold for Validation: number of periods m at the end of the series to withhold
for validation purposes. The data in those periods will not be used to estimate the
forecasting model. However, statistics will be calculated describing how well the
estimated model is able to forecast those observations.
In the current example, the traffic data is monthly beginning in January, 1968, and has a
seasonality of s = 12. m = 24 observations at the end of the series will be withheld for
validation purpose, while forecasts will be generated for the next 36 months.
Analysis Options
The models fit by the Automatic Forecasting procedure are controlled by the Analysis
Options dialog box:
Models to Include: specify the models that should be fit to the data. These are the
models from which the “best” model will be selected. Descriptions of each of the models
are given in the Forecasting documentation. For most of the models, additional
information must are provided:
ARIMA model: Optimize Model Order – If checked, all models with terms of
order up to those specified will be fit. If not checked, the only model fit will be
the one with terms exactly equal to the specified order.
AR Terms (p) – specify the maximum order p of the autoregressive terms in the
ARIMA model.
MA Terms(q) – specify the maximum order q of the moving average terms in the
ARIMA model. You may also elect to consider only models for which q = p – 1.
Method Selection Criterion: the criterion used to select the best model.
The procedure fits each of the models indicated and selects the model that gives the
smallest value of the selected criterion. They are six criteria to choose from:
AIC 2 lnRMSE
2c
(1)
n
where RMSE is the root mean squared error during the estimation period, c is the number
of estimated coefficients in the fitted model, and n is the sample size used to fit the
model. Notice that the AIC is a function of the variance of the model residuals, penalized
by the number of estimated parameters. In general, the model will be selected that
minimizes the mean squared error without using too many coefficients (relative to the
amount of data available).
Hannan-Quinn Criterion
The Hannan Quinn Criterion (HQC) is calculated from
2 p lnln(n)
HQC 2 lnRMSE (2)
n
This criterion uses a different penalty for the number of estimated parameters.
p lnn
SBIC 2 lnRMSE (3)
n
Again, the penalty for the number of estimated parameters is different than for the other
criteria.
Adjustments: Press this button to specify adjustments to be made to the data before
the forecasting models are fit:
After the models are fit and forecasts are made of the adjusted data values, the
adjustments are reversed to provide the final forecasts. If Apply to regressors is
checked, the same adjustments will be made to any regressor variables in the models.
Parameters: Press this button to enter values for each of the model parameters:
The entries in this dialog box are only used for models in which the Optimize
Parameters button is not checked.
Estimation: Press this button to change the default values for certain estimation
options:
Stopping Criterion 1: The algorithm is assumed to have converged when the relative
change in the residuals sums of squares from one iteration to the next is less than this
value.
Stopping Criterion 2: The algorithm is assumed to have converged when the relative
change in all parameter estimates from one iteration to the next is less than this value.
Input Series: Press this button to enter one or more input variables to act as
regressors in the trend and ARIMA forecasting models:
Analysis Summary
The Analysis Summary gives the standard forecasting output for whichever model gives
the smallest value of the specified criterion.
Forecast Summary
Seasonal differencing of order: 1
Forecast model selected: ARIMA(0,1,2)x(2,1,2)12
Number of forecasts generated: 36
Number of periods withheld for validation: 24
Estimation Validation
Statistic Period Period
RMSE 2.0522 2.4251
MAE 1.35877 1.16503
MAPE 1.49719 1.18882
ME -0.0402442 0.0221547
MPE -0.0732127 0.00847592
When searching for models, the procedure tries all of the models checked on the Analysis
Options dialog box. Note: except for Winter’s Exponential Smoothing and the ARIMA
Models, seasonal data will first be seasonally adjusted before the forecasts are applied.
Once the forecasts have been generated, the seasonality will be put back to create the
final forecasts.
For the Golden Gate Bridge data, the procedure selected an ARIMA(0,1,2)x(2,1,2)12
model. Note that all of the coefficients in the model are statistically significant. As can be
seen from the plot of the fit and forecasts, the results are quite satisfactory:
103
93
83
73
1/68 1/72 1/76 1/80 1/84 1/88
Model Comparisons
The Model Comparisons pane displays information on the best-fitting models of each
type requested. The top section summarizes the data and lists the fitted models:
Model Comparison
Data variable: Traffic
Number of observations = 168
Start index = 1/68
Sampling interval = 1.0 month(s)
Length of seasonality = 12
Number of periods withheld for validation: 24
Models
(A) Random walk
(B) Constant mean = 93.153
(C) Linear trend = 66.5074 + 0.0923593 t
(D) Quadratic trend = 41.5321 + 0.269169 t + -0.000306429 t^2
(E) Exponential trend = exp(4.24508 + 0.000997307 t)
(F) S-curve trend = exp(4.8175 + -80.4029 /t)
(G) Simple moving average of 2 terms
(H) Simple exponential smoothing with alpha = 0.1
(I) Brown's linear exp. smoothing with alpha = 0.1
(J) Holt's linear exp. smoothing with alpha = 0.1 and beta = 0.1
(K) Brown's quadratic exp. smoothing with alpha = 0.1
(L) Winter's exp. smoothing with alpha = 0.1, beta = 0.1, gamma = 0.1
(M) ARIMA(0,1,2)x(2,1,2)12
(N) ARIMA(2,0,2)x(1,1,2)12
(O) ARIMA(1,1,1)x(2,1,2)12
(P) ARIMA(1,1,2)x(1,1,2)12
(Q) ARIMA(1,0,0)x(0,1,1)12
The five ARIMA models in the list are those that fit best, among dozens that were fit.
The next section summarizes the performance of each model during the estimation
period:
Estimation Period
Model RMSE MAE MAPE ME MPE AIC
The column on the far right shows the value of the selected criterion for each of the
models. In the sample data, the ARIMA(2,0,1)x(2,1,1)12 model (Model M) does best,
though several other ARIMA models are very close.
The output also shows how well each model did during the validation period in
forecasting the observations that were withheld from the estimation process:
Validation Period
Model RMSE MAE MAPE ME MPE
(A) 2.72132 1.42365 1.46028 0.00416832 -0.00391219
(B) 35.7947 5.74478 5.79108 5.74478 5.79108
(C) 6.17679 2.06352 2.09452 -2.01854 -2.04853
(D) 2.49534 1.38226 1.41552 -0.368482 -0.386012
(E) 7.17915 2.2584 2.29013 -2.25692 -2.28857
(F) 2.80136 1.41307 1.4416 -0.748106 -0.767949
(G) 2.18546 1.30179 1.32752 0.136708 0.130939
(H) 2.56813 1.36359 1.40201 0.542393 0.537291
(I) 2.46638 1.39127 1.4176 0.374832 0.376192
(J) 4.15101 1.66548 1.70837 1.02078 1.01629
(K) 2.90303 1.44906 1.4631 0.424273 0.437374
(L) 4.15181 1.73076 1.76437 0.987824 0.953436
(M) 2.4251 1.16503 1.18882 0.0221547 0.00847592
(N) 2.09206 1.18016 1.22359 -0.3114 -0.330986
(O) 2.62911 1.13357 1.15323 0.301173 0.292195
(P) 2.0961 1.18316 1.22549 -0.0621341 -0.0776316
(Q) 2.26482 1.25482 1.29717 -0.260738 -0.278414
The selected ARIMA model performed well, especially on the MAPE at approximately
1.2%, although it was beaten by a couple of the other ARIMA models with respect to the
RMSE.