White Noise With Arima Modelling
White Noise With Arima Modelling
White Noise With Arima Modelling
(SCIPY 2015)
Abstract—Time series analysis has been a dominant technique for assessing and cycles (e.g., systematic, periodic fluctuations in values).
relations within datasets collected over time and is becoming increasingly Time series analysis are also used to predict the next value
prevalent in the scientific community; for example, assessing brain networks by in the series, given some model of its history. This is of
calculating pairwise correlations of time series generated from different areas
special importance in environmental and econometric studies
of the brain. The assessment of these relations relies, in turn, on the proper
calculation of interactions between time series, which is achieved by rendering
where forecasting the next set of values (e.g., the weather
each individual series stationary and nonautocorrelated (i.e., white noise, or to or a stock price) may have serious practical consequences.
“prewhiten” the series). This ensures that the relations computed subsequently In other fields, time series provide crucial information about
are due to the interactions between the series and do not reflect internal an evolving process (e.g., rate of spread of a disease or
dependencies of the series themselves. An established method for prewhitening changing pollution levels) with implications about the effect
time series is to apply an Autoregressive (AR, p) Integrative (I, d) Moving
of interventions. Finally, time series can provide fundamental
Average (MA, q) model (ARIMA) and retain the residuals. To diagnostically
check whether the model orders (p, d, q) are sufficient, both visualization and
information about the process that generates them, leading to
statistical tests (e.g., Ljung-Box test) of the residuals are performed. However, a scientific understanding of that process (e.g., brain network
these tests are not robust for high-order models in long time series. Additionally, analysis).
as dataset size increases (i.e., number of time series to model) it is not In time series analysis, there are two main investigative
feasible to visually inspect each series independently. As a result, there is methods: frequency-domain and time-domain. In this paper,
a need for robust alternatives to diagnostic evaluations of ARIMA modeling.
only analysis in the time-domain is considered. Within the
Here, we demonstrate how to perform ARIMA modeling of long time series
using Statsmodels, a library for statistical analysis in Python. Then, we present
time-domain, typically crosscorrelation analysis is utilized as
a comprehensive procedure (White Noise Test) to detect autocorrelation and a measure of the relation between two time series. Now, it is
nonstationarities in prewhitened time series, thereby establishing that the series commonly the case that a time series contains some autocor-
does not differ significantly from white noise. This test was validated using time relation, meaning that values in the time series are influenced
series collected from magnetoencephalography recordings. Overall, our White by previous values. It is also common for a time series to
Noise Test provides a robust alternative to diagnostic checks of ARIMA modeling
exhibit nonstationarities, such as drifts or trends over time. In
for long time series.
either case, the crosscorrelation function calculated between
Index Terms—Time series, Statsmodels, ARIMA, statistics two series containing either autocorrelation or nonstationarities
will give misleading results, such as an inflated correlation
between two series where there is none. To circumvent this,
Introduction time series are modeled to remove such effects, as in the case
Time series are discrete, stochastic realizations of underlying of prewhitening.
data generating processes [Yaffee]. In other words, a time
series is a set of consecutive samples collected over a time
Prewhitening
interval, such as temperature recordings at regular intervals.
They are ubiquitous in any field where monitoring of data A white noise process is a continuous time series of random
is involved. For example, time series can be environmental, values, with a constant mean and variance, normally and inde-
economic, or medical. In addition, time series can provide pendently distributed, and nonautocorrelated. If after modeling
information about trends (e.g., broad fluctuations in values) a time series the residuals are practically white noise, then we
say the series has been prewhitened. An established method
* Corresponding author: [email protected] for prewhitening time series is to apply an Autoregressive
‡ Brain Sciences Center, Minneapolis VA Health Care System & University
of Minnesota (AR) Integrative (I) Moving Average (MA) model (ARIMA)
and retain the residuals [Box]. The full specification of an
Copyright © 2015 Margaret Y Mahan et al. This is an open-access article ARIMA model comprises the orders of each component, (p,
distributed under the terms of the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, d, q), where p is the number of preceding values in the
provided the original author and source are credited. autoregressive component, d is the number of differencing,
WHITE NOISE TEST: DETECTING AUTOCORRELATION AND NONSTATIONARITIES IN LONG TIME SERIES AFTER ARIMA MODELING 101
and q is the number of preceding values in the moving average First, nonstationarities need to be removed before ARMA
component. An ARIMA model with orders p, d, and q, is a modeling. A nonstationary process is identified by an ACF
discrete time linear equations with noise of the form: that does not tail away to zero quickly or cut-off after a finite
p q number of steps. If the time series is nonstationary, then a
(1 − ∑ φk Lk )(1 − L)d Xt = (1 + ∑ θk Lk )εt first differencing of the series is computed. This process is
k=1 k=1 repeated until the time series is stationary, which determines
the value of d (i.e., the value of d is the number of times
where L is the time lag operator, Lxt = xt−1 .
the derivative of the series is taken to achieve stationarity).
In ARIMA modeling, the I component is addressed first,
Two of the most frequently used tests for detecting non-
followed by jointly addressing the AR and MA components.
stationarities are the augmented Dickey-Fuller (ADF) test
Most importantly, the ARIMA method requires the input time
[Said] and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS)
series to be: (1) equally spaced over time, (2) of sufficient
test [Kwiatkowski]. The ADF is a unit root test for the null
length, (3) continuous (i.e., no missing values), and, specifi-
hypothesis that a time series is I(1) while the KPSS is a
cally for the ARMA portion, (4) stationary in the second or
stationarity test for the null hypothesis that a time series is I(0).
weak sense, meaning the mean and variance remain constant
Since these tests are complementary, we use them together
over time and the autocovariance is only lag-dependent.
to determine whether a series is stationary. In our case, a
Prewhitening using ARIMA modeling takes three main
series taken to be nonstationary, if the ADF null hypothesis
steps. First, identify and select the model, by detecting factors
is accepted and the KPSS null is rejected. We implement the
that influence the time series, such as nonstationarities or
ADF test using Statsmodels and the KPSS test using the Arch
periodicities, and identifying the AR and MA components
Python package.
(i.e., model orders). Second, estimate parameter values, by
Once nonstationarities have been removed, ARMA model-
using an estimation function to optimize the parameter values
ing can begin. To choose the p and q orders, the ACF and
for the desired model. Third, evaluate the model, by checking
PACF of the stationary (differenced) series will show patterns
the model’s adequacy through establishing that the series has
based on which tentative ARMA model can be postulated.
been rendered stationary and nonautocorrelated. This time
There are three main patterns. A pure MA(q) process will
series modeling is iterative, successively refining the model
have an ACF that cuts off after q lags and a PACF that tails off
until stationary and nonautocorrelated residuals are obtained.
with exponential or oscillating decay. A pure AR(p) process
Overall, a good model serves three purposes: providing the
will have an ACF that tails off with exponential or oscillating
background information for further research on the process
decay and a PACF that cuts off after p lags. For a mixed-
that generated the time series; enabling accurate forecasting
model ARMA(p, q) process, both the ACF and PACF will tail
of future values in the series; and yielding the stationary and
off with exponential or oscillating decay. Using these patterns,
nonautocorrelated residuals necessary to evaluate accurately
the model selection begins by using the minimum orders to
associations between time series, since they are devoid of any
achieve stationary and nonautocorrelated residuals.
dependencies stemming from within the series themselves.
Here, we implement two complementary tests to establish
Parameter Value Estimation
stationarity, which determines the value of the I(d) order.
Using these stationary series, we use median correlation ARIMA modeling has been implemented in Python with
values at each lag of the autocorrelation (ACF) and partial the Statsmodels package [McKinney], [Seabold]. It includes
autocorrelation (PACF) functions to identify a range of AR(p) parameter value estimation and model evaluation procedures.
and MA(q) orders to implement combinatorially. Then we We import the Statsmodels and Numpy packages as:
utilize the Statsmodels package to find the method-solver import statsmodels.api as sm
combination that provides good metrics for long time series. import numpy as np
Finally, we present a novel approach (White Noise Test) After the model orders have been selected, the
to diagnostic checking of ARIMA modeling for long time model parameter values can be estimated with the
series, which evaluates residual series based on stationarity sm.tsa.arima_model.ARIMA.fit() function to
and nonautocorrelation. Using our approach, an investigator maximize the likelihood that these parameter values (i.e.,
can perform ARIMA modeling and evaluate candidate models coefficients) describe the data, as follows. First, initial
with ease for large datasets and datasets containing long time estimates of the parameter values are used to get close to
series. the desired parameter values. Second, optimization functions
are applied to adjust the parameter values to maximize the
Model Identification and Selection likelihood by minimizing the negative log-likelihood function.
There are several factors that can influence a value in a If adequate initial parameter value estimates were selected, a
time series, which arise from previous values in the series, local optimization algorithm will find the local log-likelihood
variability in these values, or nonstationarities (trend, drift, minimum near the parameter value estimates, which will be
changing variance, or random walk). It is important to properly the global minimum.
remove the effects of these factors by modeling the time series In Statsmodels, default starting parameter value estimations
and taking the residuals. To identify the model orders for an are calculated using the Hannan-Rissanen method [Hannan]
ARIMA(p, d, q), the ACF and PACF are used. and these parameter values are checked for stationarity and
102 PROC. OF THE 14th PYTHON IN SCIENCE CONF. (SCIPY 2015)
invertibility (these concepts are discussed in further detail in where φi are the estimated AR parameter values, L is the time
the next section). If method is set to css-mle, starting lag operator, and
parameter values are estimated further with conditional sum
of squares methods. However, parameter values estimated 1 + θ1 L + · · · + θq Lq = 0
in this way are not guaranteed to be stationary; therefore, where θi are the estimated MA parameter values, should lie
we advise specifying starting parameter values as an input outside the unit circle, i.e., within bounds of stationarity (for
variable (start_params) to ARIMA.fit(). A custom the p parameter values) and invertibility (for the q parameter
starting parameter value selection method may be built upon values) [Pankratz]. For the model to be adequate, the residual
a copy of sm.tsa.ARMA._fit_start_params_hr, time series should not be significantly different from white
which forces stationarity and invertibility on the estimated noise; in other words, the series should have constant mean and
start_params when necessary. For example, variance, and each value in the series should be uncorrelated
if not np.all(np.abs(np.roots(np.r_ with other realizations up to k lags. If either model stability or
[1, -start_params[k:k + p]])) < 1) or adequacy have not been established, then model identification
not np.all(np.abs(np.roots(np.r_ and selection should be revised, and the diagnostic cycle
[1, start_params[k + p:]])) < 1):
start_params = np.array(start_params[0:k] continued, iteratively, until established.
+ [1./(p+1)] * p + [1./(q+1)] * q) Inspecting the p and q parameter values for being within the
bounds of stationarity and invertibility checks model stability.
In addition, the Hannan-Rissanen method uses an initial AR
Typically, this will be accomplished during parameter value
model with an order selected by minimizing Bayesian Infor-
estimation. The model adequacy is checked by examining the
mation Criterion (BIC); then it estimates ARMA using the
time-varying mean of the residuals (should be close to zero),
residuals from that model. This initial AR model is required
their variance (should not differ appreciably along time), and
to be larger than max(p, q) of the desired ARIMA model,
their autocorrelation (should not be different from chance).
which is not guaranteed with an AR selected by BIC criterion.
Finally, the ACF and PACF of the residuals should not contain
We have implemented a method similar to Hannan-Rissanen,
statistically significant terms more than the number expected
the long AR method, which is equivalent to Hannan-Rissanen
by chance. This number depends on the number of lags; for
except the initial AR model is set to be large (AR = 300). This
example, if k = 40 lags, one would expect 2 values (5% of 40)
results in an initial AR model order which is guaranteed to be
to exceed their standard error. Under the assumption that the
larger than max(p, q), and starting parameter value selection
process is white noise and when the length (N) of the series
is more time efficient since fitting multiple AR model orders
is long, the standard error of the sample autocorrelation (and
to optimize BIC is not required.
partial autocorrelation) [Bartlett] approximates to:
To fit ARIMA models, Statsmodels has options for methods √
and solvers. The chosen method will determine the type of Standard Error = 1/ N
likelihood for estimation, where mle is the exact likelihood
Several statistical tests are available to detect autocorrelation.
maximization (MLE), css is the conditional sum of squares
Most notable is the Ljung-Box test [Ljung], which is applied
(CSS) minimization, and css-mle involves first estimating
to residuals to detect whether they exhibit autocorrelation.
the starting parameter values with CSS followed by an MLE
The test statistic is calculated for each of h lags being tested.
fit. The solver variable in ARIMA.fit() designates the
Another common test to detect autocorrelation is the Durbin-
optimizer from scipy.optimize for minimizing the nega-
Watson test [Durbin]; however, unlike the Ljung-Box test
tive loglikelihood function. Optimization solvers nm (Nelder-
which is calculated for h lags, the Durbin-Watson test is calcu-
Mead) and powell are the most time efficient because they
lated only for lag 1. Therefore, any autocorrelation beyond lag
do not require a score, gradient, or Hessian. The next fastest
1 will not be detected by this test. Similar to the Ljung-Box
solvers, lbfgs (limited memory Broyden-Fletcher-Goldfarb-
test is the Breusch-Godfrey Lagrange multiplier test [Breusch],
Shanno), bfgs (Broyden-Fletcher-Goldfarb-Shanno), cg
[Godfrey]. This test also aims to detect autocorrelation up to
(conjugate gradient), and ncg (Newton conjugate-gradient),
h lags tested. We compare our model evaluation, namely the
require a score or gradient, but no Hessian. The newton
White Noise Test, to both the Ljung-Box and Breusch-Godfrey
(Newton-Raphson) solver requires a score, gradient, and
tests.
Hessian. Lastly, a global solver basinhopping, displaces
parameter values randomly before minimizing with another
local optimizer. For more information about these solvers, see White Noise Test
sm.base.model.GenericLikelihoodModel. The White Noise Test (Figure 1) calculates multiple attributes
on residuals. Inclusively, the attributes characterize an individ-
Model Evaluation
ual residual series by its “whiteness”. To change the degree of
“whiteness”, the thresholds in the red boxes of Figure 1 may
There are two components in evaluating an ARIMA model, be made more or less conservative.
namely, model stability and model adequacy. For the model Excluded data: Channels that could not be modeled with
to be stable, the roots of the characteristic equations the given model order were excluded from further analysis.
Additionally, channels with extreme values beyond a threshold
1 − φ1 L − · · · − φ p L p = 0 of 5 per channel, calculated on the residuals for each model
WHITE NOISE TEST: DETECTING AUTOCORRELATION AND NONSTATIONARITIES IN LONG TIME SERIES AFTER ARIMA MODELING 103
TABLE 3: Attributes for the White Noise Test shown for incrementing
model order combinations, listed as thresholded ACF and PACF
(tACF, tPACF), constant mean (cMEAN) and constant variance the overall percent of channels removed per subject is shown in
(cVAR), and the number of unique channels across the attributes.
Figure 7. One subject had over 200 channels removed, likely
due to errors within the recording, and was excluded from
Table 5.
in the model order combination for the given statistic. Each
bar shows different colors for each statistic and the relative
contribution each statistic makes to the total sum for that Step Min Max Median Mean Std Dev
model order combination. The Breusch-Godfrey, in place of
xVAL 0 60 1 9.67 16.63
the Ljung-Box, showed similar results. It can be seen that the
Normal 0 0 0 0.00 0.00
Ljung-Box corresponds well to our ACF thresholding when
tACF 0 51 0 2.53 8.12
the df equal the AR order but fails to identify autocorrelation tPACF 0 0 0 0.00 0.00
using either of the suggested df. Finally, the Breusch-Godfrey cMEAN 0 8 0 0.20 1.15
and Ljung-Box statistics are compared in terms of the percent cVAR 0 40 4 7.24 9.05
of residual series failing each statistic (Table 4). Channels 1 85 10 20.55 21.34
Removed
MEG Dataset Evaluation
Finally, using ARIMA(30,1,3), we apply the White Noise Test TABLE 5: Results of White Noise Test on full dataset, with the steps
procedure to the full MEG dataset. One channel at each stage listed as extreme values (xVAL), normality, thresholded ACF and
of modeling is shown in Figure 6. Descriptive statistics on each PACF (tACF, tPACF), constant mean (cMEAN) and constant variance
of the attributes for the full MEG data are shown in Table 5 and (cVAR), and the number of channels removed as a result.
WHITE NOISE TEST: DETECTING AUTOCORRELATION AND NONSTATIONARITIES IN LONG TIME SERIES AFTER ARIMA MODELING 107
Fig. 6: Raw, differenced, and ARIMA(30,1,3) series with corresponding ACF and PACF.
R EFERENCES
Conclusion
[Bartlett] Bartlett, M.S. 1946. "On the theoretical specification and
In this paper, we presented an expansion on the Box-Jenkins sampling properties of autocorrelated time-series." Journal of
the Royal Statistical Society, 8.1, 27-41.
methodology to ARIMA modeling. First, during model iden- [Box] Box, G. and Jenkins, G. 1976. "Time series analysis: forecast-
tification and selection, we implement two complementary ing and control." Holden Day, San Francisco, 2nd edition.
tests (KPSS and ADF) to establish stationarity. Using these [Breusch] Breusch, T.S. 1978. “Testing for autocorrelation in dynamic
stationary series, we use median correlation values at each lag linear models”, Australian Economic Papers, 17, 334–355.
[Durbin] Durbin, J. and Watson, G.S. 1971. "Testing for serial correla-
of the ACF and PACF across 600 channels to identify a range tion in least squares regression III”, Biometrika, 58.1, 1–19.
of AR(p) and MA(q) order to implement combinatorially. [Godfrey] Godfrey, L.G. 1978. “Testing against general autoregressive
This methodology allows for examining multiple time series and moving average error models when the regressors include
lagged dependent variables”, Econometrica, 49, 1293–1302.
simultaneously to determine a valid model order for the [Hannan] Hannan, E.J. and Rissanen, J. 1985. "Recursive estimation
majority of time series in a dataset. Second, during parameter of mixed autoregressive-moving average order". Biometrika,
value estimation, we utilize the Statsmodels package to find the 69.1, 81-94.
[Kwiatkowski] Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., Shin, Y. 1992.
method-solver combination that provides good metrics (model "Testing the null hypothesis of stationarity against the alter-
reliability, validity of residuals, and time efficient) for long native of a unit root", Journal of Econometrics, 54, 159ñ178
108 PROC. OF THE 14th PYTHON IN SCIENCE CONF. (SCIPY 2015)
[Ljung] Ljung, G.M. and Box, G.P. 1978. "On a Measure of a Lack
of Fit in Time Series Models”, Biometrika, 65.2, 297–303.
[McKinney] McKinney, W., Perktold, J., Seabold, S. 2011. "Time series
analysis in python with statsmodels", Proceedings of the 10th
Python in Science Conference, 96-102.
[Pankratz] Pankratz, A. 1991. "Forecasting with dynamic regression
models", John Wiley and Sons, New York.
[Said] Said, S.E. and Dickey, D. 1984. "Testing for unit roots in
autoregressive moving-average models with unknown order",
Biometrika, 71, 599-607.
[Seabold] Seabold, S. and Perktold J. 2010. "Statsmodels: econometric
and statistical modeling with python", Proceedings of the 9th
Python in Science Conference, 57-61.
[Tsay] Tsay, R.S. 2005. “Analysis of Financial Time Series”, John
Wiley & Sons, Inc., Hoboken, NJ.
[Tukey] Tukey, J.W. 1977. "Exploratory data analysis", Addison-
Wesley, Reading, MA.
[Yaffee] Yaffee, R.A. and McGee, M. 2000. "Introduction to time
series analysis and forecasting: with applications of SAS and
SPSS", Academic Press.