Automatic Forecasting

STATGRAPHICS – Rev.
7/3/2009
Automatic Forecasting
Summary
The Automatic Forecasting procedure is designed to forecast future values of time
series data. A time series consists of a set of sequential numeric data taken at equally
spaced intervals, usually over a period of time or space. Unlike the Forecasting
procedure that expects the user to select the forecasting model to use, this procedure tries
many models and selects the one which performs best according to a specified criteria.
The available criteria for selecting a model include the Akaike Information Criteria
(AIC), the Hannan-Quinn Criterion (HQC), and the Schwarz-Bayesian Criteria (SBC).
This criteria select the model with smallest mean squared error, subject to a penalty for
the number of unknown parameters that need to be estimated.
Since the output of this procedure is similar to the Forecasting procedure, this document
will highlight only the unique aspects of the Automatic Forecasting procedure. For a
detailed discussion of all tables and graphs, refer to the Forecasting documentation.
Sample StatFolio: autocast.sgp
Sample Data:
The file golden gate.sgd contains monthly traffic volumes on the Golden Gate Bridge in
San Francisco for a period of n = 168 months from January, 1968 through December,
1981. The table below shows a partial list of the data from that file:
Month Traffic
1/68 73.637
2/68 77.136
3/68 81.481
4/68 84.127
5/68 84.562
6/68 91.959
7/68 94.174
8/68 96.087
9/68 88.952
10/68 83.479
11/68 80.814
12/68 77.466
1/69 75.225
… …
The data were obtained from a publication of the Golden Gate Bridge.
 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 1

STATGRAPHICS – Rev. 7/3/2009
Data Input
The data input dialog box requests the name of the column containing the time series
data:
 Data: numeric column containing n equally spaced numeric observations.
 Time indices: time, date or other index associated with each observation. Each value
in this column must be unique and arranged in ascending order.
 Sampling Interval: If time indices are not provided, this defines the interval between
successive observations. For example, the data from the Golden Gate Bridge were
collected once every month, beginning in January, 1968.
 Seasonality: the length of seasonality s, if any. The data is seasonal if there is a

pattern that repeats at a fixed period. For example, monthly data such as traffic on the
Golden Gate Bridge have a seasonality of s = 12. Hourly data that repeat every day
have a seasonality of s = 24. If no entry is made, the data is assumed to be
nonseasonal (s = 1).

 Trading Days Adjustment: a numeric variable with n observations used to

normalize the original observations, such as the number of working days in a month.
The observations in the Data column will be divided by these values before being
plotted or analyzed. There must be enough entries in this column to cover both the
observed data and the number of periods for which forecasts are requested.
 Select: subset selection.
 Number of Forecasts: number of periods following the end of the data for which
forecasts are desired.
 Withhold for Validation: number of periods m at the end of the series to withhold
for validation purposes. The data in those periods will not be used to estimate the
forecasting model. However, statistics will be calculated describing how well the
estimated model is able to forecast those observations.
In the current example, the traffic data is monthly beginning in January, 1968, and has a
seasonality of s = 12. m = 24 observations at the end of the series will be withheld for
validation purpose, while forecasts will be generated for the next 36 months.

Analysis Options
The models fit by the Automatic Forecasting procedure are controlled by the Analysis
Options dialog box:
Models to Include: specify the models that should be fit to the data. These are the
models from which the “best” model will be selected. Descriptions of each of the models
are given in the Forecasting documentation. For most of the models, additional
information must are provided:
Optimize parameters: If checked, unknown parameters in the model will be

estimated so as to optimize the specified forecasting criterion. If not checked,
specific values for the parameters may be entered by pressing the Parameters
button.
ARIMA model: Optimize Model Order – If checked, all models with terms of
order up to those specified will be fit. If not checked, the only model fit will be
the one with terms exactly equal to the specified order.
AR Terms (p) – specify the maximum order p of the autoregressive terms in the
ARIMA model.

MA Terms(q) – specify the maximum order q of the moving average terms in the
ARIMA model. You may also elect to consider only models for which q = p – 1.
Differencing (d) – specify the maximum order of differencing d in the ARIMA

model. Select Include constant to consider models that include a constant term
when differencing is performed.
 Method Selection Criterion: the criterion used to select the best model.
The procedure fits each of the models indicated and selects the model that gives the
smallest value of the selected criterion. They are six criteria to choose from:
Akaike Information Criterion

The Akaike Information Criterion (AIC) is calculated from
AIC  2 lnRMSE  
2c
(1)
n
where RMSE is the root mean squared error during the estimation period, c is the number
of estimated coefficients in the fitted model, and n is the sample size used to fit the
model. Notice that the AIC is a function of the variance of the model residuals, penalized
by the number of estimated parameters. In general, the model will be selected that
minimizes the mean squared error without using too many coefficients (relative to the
amount of data available).
Hannan-Quinn Criterion
The Hannan Quinn Criterion (HQC) is calculated from
2 p lnln(n) 
HQC  2 lnRMSE  (2)
n
This criterion uses a different penalty for the number of estimated parameters.
Schwarz-Bayesian Information Criterion

The Schwarz-Bayesian Information Criterion (SBIC) is calculated from
p lnn 
SBIC  2 lnRMSE  (3)
n
Again, the penalty for the number of estimated parameters is different than for the other
criteria.
Mean Squared Error (MSE)

The selected model is the one with the smallest root mean squared error RMSE, with no
penalty for the number of estimated model parameters.

Mean Absolute Error (MAE)

The selected model is the one with the smallest mean absolute error.
Mean Absolute Percentage Error (MAPE)

The selected model is the one with the smallest mean absolute percentage error.
There are also four buttons that provide additional options:
 Adjustments: Press this button to specify adjustments to be made to the data before
the forecasting models are fit:
After the models are fit and forecasts are made of the adjusted data values, the
adjustments are reversed to provide the final forecasts. If Apply to regressors is
checked, the same adjustments will be made to any regressor variables in the models.
 Parameters: Press this button to enter values for each of the model parameters:

The entries in this dialog box are only used for models in which the Optimize
Parameters button is not checked.
 Estimation: Press this button to change the default values for certain estimation
options:
Stopping Criterion 1: The algorithm is assumed to have converged when the relative
change in the residuals sums of squares from one iteration to the next is less than this
value.

Stopping Criterion 2: The algorithm is assumed to have converged when the relative
change in all parameter estimates from one iteration to the next is less than this value.
Maximum Iterations: Estimation stops if convergence is not achieved within this

many iterations.
Backforecasting: Uses a method called backforecasting to forecast values prior to

time t = 1. These values are used to generate the initial values which are needed to
generate forecasts for small values of t. For details, see Box, Jenkins and Reinsel
(1994).
 Input Series: Press this button to enter one or more input variables to act as
regressors in the trend and ARIMA forecasting models:

Analysis Summary
The Analysis Summary gives the standard forecasting output for whichever model gives
the smallest value of the specified criterion.
Automatic Forecasting – Traffic

Data variable: Traffic (Golden Gate Bridge Traffic Volume)
Number of observations = 168

Start index = 1/68
Sampling interval = 1.0 month(s)
Length of seasonality = 12
Forecast Summary
Seasonal differencing of order: 1
Forecast model selected: ARIMA(0,1,2)x(2,1,2)12
Number of forecasts generated: 36
Number of periods withheld for validation: 24
Estimation Validation
Statistic Period Period
RMSE 2.0522 2.4251
MAE 1.35877 1.16503
MAPE 1.49719 1.18882
ME -0.0402442 0.0221547
MPE -0.0732127 0.00847592
ARIMA Model Summary

Parameter Estimate Stnd. Error t P-value
MA(1) 0.782716 0.0542184 14.4364 0.000000
MA(2) 0.858691 0.0289743 29.6363 0.000000
SAR(1) 1.04365 0.25067 4.16344 0.000052
SAR(2) 0.618683 0.14807 4.1783 0.000049
SMA(1) 1.72366 0.079256 21.748 0.000000
SMA(2) -0.763695 0.0725868 -10.5211 0.000000
Backforecasting: yes
Estimated white noise variance = 4.61712 with 153 degrees of freedom
Estimated white noise standard deviation = 2.14875
Number of iterations: 6
When searching for models, the procedure tries all of the models checked on the Analysis
Options dialog box. Note: except for Winter’s Exponential Smoothing and the ARIMA
Models, seasonal data will first be seasonally adjusted before the forecasts are applied.
Once the forecasts have been generated, the seasonality will be put back to create the
final forecasts.
For the Golden Gate Bridge data, the procedure selected an ARIMA(0,1,2)x(2,1,2)12
model. Note that all of the coefficients in the model are statistically significant. As can be
seen from the plot of the fit and forecasts, the results are quite satisfactory:

Time Sequence Plot for Traffic

ARIMA(0,1,2)x(2,1,2)12
123
actual
113 forecast
95.0% limits
Traffic
103
93
83
73
1/68 1/72 1/76 1/80 1/84 1/88
Model Comparisons
The Model Comparisons pane displays information on the best-fitting models of each
type requested. The top section summarizes the data and lists the fitted models:
Model Comparison
Data variable: Traffic
Number of observations = 168
Start index = 1/68
Sampling interval = 1.0 month(s)
Length of seasonality = 12
Number of periods withheld for validation: 24
Models
(A) Random walk
(B) Constant mean = 93.153
(C) Linear trend = 66.5074 + 0.0923593 t
(D) Quadratic trend = 41.5321 + 0.269169 t + -0.000306429 t^2
(E) Exponential trend = exp(4.24508 + 0.000997307 t)
(F) S-curve trend = exp(4.8175 + -80.4029 /t)
(G) Simple moving average of 2 terms
(H) Simple exponential smoothing with alpha = 0.1
(I) Brown's linear exp. smoothing with alpha = 0.1
(J) Holt's linear exp. smoothing with alpha = 0.1 and beta = 0.1
(K) Brown's quadratic exp. smoothing with alpha = 0.1
(L) Winter's exp. smoothing with alpha = 0.1, beta = 0.1, gamma = 0.1
(M) ARIMA(0,1,2)x(2,1,2)12
(N) ARIMA(2,0,2)x(1,1,2)12
(O) ARIMA(1,1,1)x(2,1,2)12
(P) ARIMA(1,1,2)x(1,1,2)12
(Q) ARIMA(1,0,0)x(0,1,1)12
The five ARIMA models in the list are those that fit best, among dozens that were fit.
The next section summarizes the performance of each model during the estimation
period:
Estimation Period
Model RMSE MAE MAPE ME MPE AIC

(A) 2.15723 1.32165 1.458 -0.00150458 -0.0242249 1.68591

(B) 5.0849 3.86344 4.251 0.00537199 -0.292675 3.41922
(C) 3.24619 2.41982 2.66084 0.00022691 -0.120425 2.53552
(D) 3.22054 2.34518 2.57551 0.000060774 -0.116781 2.53354
(E) 3.25935 2.44179 2.68339 0.0548011 -0.0613545 2.54361
(F) 3.19448 2.34628 2.5743 0.0524653 -0.0587977 2.5034
(G) 2.48865 1.6626 1.82416 0.255358 0.232329 1.81704
(H) 2.91786 2.07536 2.27504 0.882328 0.869531 2.29448
(I) 2.65308 1.7498 1.92949 0.141644 0.126127 2.10422
(J) 2.96059 1.93325 2.13122 -0.316974 -0.389177 2.32355
(K) 2.56768 1.71834 1.89248 -0.0438619 -0.0727782 2.03878
(L) 3.04 1.96053 2.16142 -0.361903 -0.481662 2.22372
(M) 2.0522 1.35877 1.49719 -0.0402442 -0.0732127 1.52116
(N) 2.03378 1.25306 1.38098 0.0884528 0.0582524 1.53091
(O) 2.04984 1.33781 1.47744 0.0450175 0.0131063 1.53275
(P) 2.06766 1.28876 1.42164 -0.0258518 -0.0589859 1.53617
(Q) 2.11182 1.35568 1.49186 0.135221 0.102928 1.53677
The column on the far right shows the value of the selected criterion for each of the
models. In the sample data, the ARIMA(2,0,1)x(2,1,1)12 model (Model M) does best,
though several other ARIMA models are very close.
The output also shows how well each model did during the validation period in
forecasting the observations that were withheld from the estimation process:
Validation Period
Model RMSE MAE MAPE ME MPE
(A) 2.72132 1.42365 1.46028 0.00416832 -0.00391219
(B) 35.7947 5.74478 5.79108 5.74478 5.79108
(C) 6.17679 2.06352 2.09452 -2.01854 -2.04853
(D) 2.49534 1.38226 1.41552 -0.368482 -0.386012
(E) 7.17915 2.2584 2.29013 -2.25692 -2.28857
(F) 2.80136 1.41307 1.4416 -0.748106 -0.767949
(G) 2.18546 1.30179 1.32752 0.136708 0.130939
(H) 2.56813 1.36359 1.40201 0.542393 0.537291
(I) 2.46638 1.39127 1.4176 0.374832 0.376192
(J) 4.15101 1.66548 1.70837 1.02078 1.01629
(K) 2.90303 1.44906 1.4631 0.424273 0.437374
(L) 4.15181 1.73076 1.76437 0.987824 0.953436
(M) 2.4251 1.16503 1.18882 0.0221547 0.00847592
(N) 2.09206 1.18016 1.22359 -0.3114 -0.330986
(O) 2.62911 1.13357 1.15323 0.301173 0.292195
(P) 2.0961 1.18316 1.22549 -0.0621341 -0.0776316
(Q) 2.26482 1.25482 1.29717 -0.260738 -0.278414
The selected ARIMA model performed well, especially on the MAPE at approximately
1.2%, although it was beaten by a couple of the other ARIMA models with respect to the
RMSE.

Automatic Forecasting

Uploaded by

Copyright:

Available Formats

Automatic Forecasting

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Forecasting

Uploaded by

Copyright:

Available Formats

STATGRAPHICS – Rev.

Sample StatFolio: autocast.sgp

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 1

 Data: numeric column containing n equally spaced numeric observations.

 Seasonality: the length of seasonality s, if any. The data is seasonal if there is a

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 2

 Trading Days Adjustment: a numeric variable with n observations used to

 Select: subset selection.

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 3

Optimize parameters: If checked, unknown parameters in the model will be

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 4

Differencing (d) – specify the maximum order of differencing d in the ARIMA

Akaike Information Criterion

Schwarz-Bayesian Information Criterion

Mean Squared Error (MSE)

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 5

Mean Absolute Error (MAE)

Mean Absolute Percentage Error (MAPE)

There are also four buttons that provide additional options:

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 6

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 7

Maximum Iterations: Estimation stops if convergence is not achieved within this

Backforecasting: Uses a method called backforecasting to forecast values prior to

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 8

Automatic Forecasting – Traffic

Number of observations = 168

ARIMA Model Summary

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 9

Time Sequence Plot for Traffic

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 10

(A) 2.15723 1.32165 1.458 -0.00150458 -0.0242249 1.68591

 2009 by StatPoint Technologies, Inc. Automatic Forecasting - 11

You might also like