Chapter11 PDF
Chapter11 PDF
Chapter11 PDF
G. P. Nason
Time series analysis is about the study of data collected through time. The
field of time series is a vast one that pervades many areas of science and
engineering particularly statistics and signal processing: this short article
can only be an advertisement.
Hence, the first thing to say is that there are several excellent texts
on time series analysis. Most statistical books concentrate on stationary
time series and some texts have good coverage of globally non-stationary
series such as those often used in financial time series. For a general, elemen-
tary introduction to time series analysis the author highly recommends the
book by (Chatfield 2003). The core of Chatfields book is a highly readable
account of various topics in time series including time series models, fore-
casting, time series in the frequency domain and spectrum estimation and
also linear systems. More recent editions contain useful, well-presented and
well-referenced information on important new research areas. Of course,
there are many other books: ones the author finds useful are Priestley
(1983), Diggle (1990), Brockwell and Davis (1991), and Hamilton (1994).
The book by Hannan (1960) is concise (but concentrated) and Pole et al.
(1994) is a good introduction to a Bayesian way of doing time series analysis.
There are undoubtedly many more books.
This article is a brief survey of several kinds of time series model and
analysis. Section 11.1 covers stationary time series which, loosely speaking,
are those whose statistical properties remain constant over time. Of course,
for many real applications the stationarity assumption is not valid. Gener-
1
ally speaking one should not use methods designed for stationary series on
non-stationary series as you run the risk of obtaining completely misleading
answers. A tricky question is how can you know whether a time series is sta-
tionary or not? There are various tests for stationarity. As well as suffering
from all of the usual issues of statistical testing (what significance should
I use? what power do I get?) tests of stationarity tend to test against
particular alternative models of particular types of non-stationarity. For
example, test A might well be powerful at picking up non-stationarities of
type A but have no power at detecting those of type B. Section 11.2 briefly
considers such tests in the context of describing some non-stationary time
series models and also exhibits a technique for the analysis and modelling of
both a seismic and a medical time series. Section 11.3 lists some topics that
we do not cover in detail but are nevertheless important for the practical
use of time series analysis.
Bearing this in mind, stationary models form the basis for a huge proportion
of time series analysis methods. As is true for a great deal of mathematics
we shall begin with very simple building blocks and then build structures
of increasing complexity. In time series analysis the basic building block is
the purely random process.
2
integers then there is no problem but important differences arise if t can be
a continuous variable (e.g. any real number in the interval [0, 1] for example).
The second point, which the example brings out, is that we considered the
value of the time series t at a given time t to be a continuous random
variable (that is, it can potentially take any real value). Many time series,
especially those in volcanology, can take other kinds of values: for example,
count values such as those recording number of eruptions, or other kinds of
discrete events. For the latter kind of discrete valued time series the book
by Macdonald and Zucchini (1997) is a useful reference. Note also that we
use the braces {} to indicate the whole stochastic process but drop them
when referring to a generic member, e.g. t . In what follows if we do not
mention the limits then we assume the process indices range from t =
to t = .
By definition it is immediate that the mean and the variance of the
above purely random process are
Third and higher order quantities can be defined but for a great deal of time
series analysis they are largely ignored (it is amazing how only second-order
quantities can occupy us).
11.1.2 Stationarity
Loosely speaking a stationary process is one whose statistical properties do
not change over time. More formally, a strictly stationary stochastic process
is one where given t1 , . . . , t the joint statistical distribution of Xt1 , . . . , Xt
is the same as the joint statistical distribution of Xt1 + , . . . , Xt + for all
and . This is an extremely strong definition: it means that all moments of
all degrees (expectations, variances, third order and higher) of the process,
anywhere are the same. It also means that the joint distribution of (Xt , Xs )
is the same as (Xt+r , Xs+r ) and hence cannot depend on s or t but only on
the distance between s and t, i.e. s t.
3
Since the definition of strict stationarity is generally too strict for ev-
eryday life a weaker definition of second order or weak stationarity is usually
used. Weak stationarity means that mean and the variance of a stochastic
process do not depend on t (that is they are constant) and the autocovari-
ance between Xt and Xt+ only can depend on the lag ( is an integer,
the quantities also need to be finite). Hence for stationary processes, {Xt },
the definition of autocovariance is
for integers . It is vital to remember that, for the real world, the autoco-
variance of a stationary process is a model, albeit a useful one. Many actual
processes are not stationary as we will see in the next section. Having said
this much fun can be had with stationary stochastic processes!
One also routinely comes across the autocorrelation of a process which
is merely a normalized version of the autocovariance to values between 1
and 1 and commonly uses the Greek letter as its notation:
( ) = ( )/(0), (11.4)
4
and the shorthand notation is MA(q). Usually with a newly defined process
it is of interest to discover its statistical properties. For an MA(q) process
the mean is simple to find (since the expectation of a sum is the sum of the
expectations):
q q
!
X X
E(Xt ) = E i ti = i E(ti ) = 0, (11.6)
i=0 i=0
because E(r ) = 0 for any r. A similar argument can be applied for the
variance calculation:
q q q
!
X X X
2 2
var(Xt ) = var i ti = i var(ti ) = i2 , (11.7)
i=0 i=0 i=0
2
since var(r ) = for all t.
The autocovariance is slightly more tricky to work out.
( ) = cov(Xt , Xt ) (11.8)
q q
!
X X
= cov i ti , j t j (11.9)
i=0 j=0
q q
XX
= i j cov(ti , t j ) (11.10)
i=0 j=0
q q
X X
2
= i j j,i+ (11.11)
i=0 j=0
where u,v is the Kronecker delta which is 1 for u = v and zero otherwise
(this arises because of the independence of the values. Thus since j,i+ is
involved only terms in the j sum where j = i+ survive). Hence continuing
the summation gives
q
X
( ) = i i+ . (11.12)
i=0
In other words, the j becomes i+ and also the index of summation ranges
only up to q since the largest i+ occurs for i = q .
The formula for the autocovariance of an MA(q) process is fascinating:
it is effectively the convolution of {i } with itself (an autoconvolution).
One of the most important features of an MA(q) autocovariance is that
it is zero for > q. The reason for its importance is that when one is
confronted with an actual time series x1 , . . . , xn one can compute the sample
autocovariance given by:
n
X
c( ) = (xi x), (xi+ x). (11.13)
i=1
5
for = 0, . . . , n 1. The sample autocorrelation can be computed as
r( ) = c( )/c(0). If, when one computes the sample autocovariance, it
cuts off at a certain lag q, i.e. it is effectively zero for lags of q + 1 or
higher, then one can postulate the MA(q) model in (11.5) as the underlying
probability model. There are other checks and tests that one can make
but comparison of the sample autocovariance with reference values, such as
the model autocovariance given in (11.12), is a major first step in model
identification.
Also, at this point one should question what one means by effectively
zero. The sample autocovariance is an empirical statistic calculated from
the random sample at hand. If more data in the time series were collected,
or another sample stretch used then the sample autocovariance would be
different (although for long samples and stationary series the probability
of a large difference should be very small). Hence, sample autocovariances
(and autocorrelations) are necessarily random quantities and hence is ef-
fectively zero translates into a statistical hypothesis test on whether the
true autocorrelation is zero or not.
Finally, whilst we are on the topic of sample autocovariances notice
that at the extremes of the range of :
n
X
c(0) = (xi x)2 , (11.14)
i=0
c(n 1) = x1 xn . (11.15)
6
this is merely an illustration of the comment that the sample acf function
is a random quantity and some care and experience is required to prevent
reading too much into them.
Autoregressive models. The other basic development from a purely
random process is the autoregressive model which, as its name suggests,
models a process where future values somehow depend on the recent past.
More formally, {Xt } follows an AR(p) model which is characterised by
Xt = Xt1 + t . (11.17)
How can we answer the simple sounding question: what is the expectation of
Xt ? The obvious answer to this is to dumbly apply the expectation operator
E to formula (11.17). However, knowing that EXt = EXt1 does not get us
very far, especially if it is assumed (or discovered) that {Xt } is stationary
in which case weve got = ! Another, more successful approach, is to
recurse formula (11.17):
Xt = Xt1 + t (11.18)
= (Xt2 + t1 ) + t (11.19)
= 2 Xt2 + t1 + t (11.20)
= 2 (Xt3 + t2 ) + t1 + t (11.21)
= 3 Xt3 + 2 t2 + t1 + t . (11.22)
7
since we the t are independent:
X
var Xt = var i ti (11.25)
i=0
X
= 2 2i . (11.26)
i=0
This latter sum is only finite if || < 1 in which case using basic knowledge
about geometric sums one obtains
If || 1 then the sum in (11.26) does not converge and the process Xt is
not stationary (one can see this as the variance would increase with t). The
case = 1 is an interesting one. Here the model is
Xt = Xt1 + t (11.28)
and is known as a random walk often mooted as a model for the stock
market.
Another way of deriving the MA() model for an AR process is by
introducting the shift operator, B, defined as
valid for any process {Xt }. Then we can rewrite the AR model in (11.17)
as
Xt Xt1 = t (11.30)
(1 B)Xt = t (11.31)
Xt = (1 B)1 t (11.32)
Xt = (1 + B + 2 B 2 + )t (11.33)
Xt = t + t1 + 2 t2 + , (11.34)
8
Since the EXt = 0 the first terms on both sides of this equation are the
autocovariances of Xt : in other words E (Xt Xt ) = cov(Xt , Xt ) = ( )
and similarly E(Xt1 Xt ) = ( 1). Finally, note from the model for-
mula (11.17) that Xt will only contain terms but only for i with i < t ,
i.e. Xt only includes past relative to t . This means that t is indepen-
dent of Xt and hence E(t Xt ) = Et EXt = 0 since the purely random
process here has zero mean. Thus (11.35) turns into
( ) = ( 1), (11.36)
this is all for > 0, but formula (11.27) gives (0) and hence using (11.36)
one can obtain all values of ( ). Formula (11.36) is a simple example
of a Yule-Walker equation, more complex versions can be used to obtain
formulae for the autocovariances of more general AR(p) processes.
For the AR(1) model we can divide through both sides of (11.36) by
(0) to obtain an equivalent expression for the autocorrelation function:
( ) = ( 1). (11.37)
9
the model I fitted fit the data well?. Indeed, these sorts of questions lead
to the Box-Jenkins procedure, see Box et al. (1994). Part of this procedure
involves studying the sample autocorrelation functions (and a related func-
tion called the partial autocorrelation function) to decide on the order of
any MA (or AR) terms. However, there are many other aspects such as re-
moving deterministic trends, removing outliers, and/or checking residuals.
Indeed, a full treatment is way beyond the length constraints of this article
so we refer the interested reader to Chatfield (2003) in the first instance.
where K, {Ai } and {i } are constants, and the {i } are independent random
variables each having the uniform distribution on [, ]. We emphasize
that (11.39) is but one model which has been found to be useful. There
are many other possibilities and randomness could, in principle, be added
to other quantities (for example, K, the number of frequencies could be
made random for different parts of the series). The key points for us with
this model is that Xt is comprised of sinusoidal waves where the wave with
frequency i has amplitude Ai . Here you have to imagine a time series being
comprised of several sine waves of different frequencies (indexed by i ) each
with different amplitudes. In general scientists are interested in analysing
time series with model (11.39) in mind and figuring out what Ai and i are.
10
The concept of building time series using oscillatory building blocks
(sines) and varying amplitudes has been born in this paper. Essentially,
the rest of this short paper deals with extensions and generalizations of this
theme.
In general, there is no reason why only a finite number, K, of ampli-
tudes need be involved. In fact it turns out that any discrete-time stationary
process has a representation of the form
Z
Xt = A() exp(it)d(), (11.40)
11
However, this is enough to explain that the autocovariance and the spectrum
are linked by the standard Fourier transform, i.e.
Z
1 X
f () = (k)e ik
and (k) = f (w)eik d. (11.41)
2 k=
12
available one can actually estimate more frequencies. In other words, the
ratio of data available to estimate the spectral content at each frequency
remains constant as more data become available (contrast this to estimation
of, e.g. the sample mean of a sample which only gets better as more data
are collected). Technically, the periodogram lacks a property called consis-
tency. There are various techniques to create a consistent spectral estimate
from the periodogram which usually involve smoothing the periodogram by
pooling information from nearby frequencies. However, most of these tech-
niques introduce bias into what is an asymptotically uncorrelated quantity.
So, as with most smoothing methods there is a tradeoff: too much smooth-
ing and the estimate of the spectrum becomes biased, too little and the
estimate becomes too variable. In particular, the spectral estimate plots in
Figure 11.3 are very smooth since we have used the parametric AR method
for estimation. Typical periodogram based estimates are far noisier.
There are other important practical issues concerned with spectral es-
timation. The rate at which samples are taken for a time series puts an
upper bound on the frequencies that can be estimated. If time series sam-
ples are taken at smaller and smaller intervals then higher frequencies can
be observed. The upper bound is called the Nyquist rate and it is twice
sampling rate (twice depends on the constant you assume in your def-
inition of the Fourier transform). For example, sharp-eared humans can
hear sounds up to a frequency of about 22kHz so the sampling frequency
for a CD is about twice this: 44kHz. Suppose now you fix your sampling
rate. Then if the time series you observe contains frequencies higher than
the Nyquist rate then these higher frequencies cannot get sampled often
enough and so they actually appear in the sample as lower frequencies. In
other words, the spectral information in the sampled series gets distorted.
This phenomenon is called aliasing and basically means that you need to
make sure your sampling rate is high enough to capture what you think
will be the highest frequencies in the signal you are recording (it also is
one reason why CD music played down the telephone sounds awful). Also,
the converse of the aliasing problem is that to capture really long, very low
frequency, cycles in a stationary time series you need enough time series
observations to see the whole cycle.
Another effect is spectral leakage. Leakage occurs when there is a lack
of periodicity in the time series over the whole interval of observation (this
periodicity is assumed by the Fourier transform which is typically used
to compute spectral estimates). Many real series are not periodic in that
their first observation is almost never equal to their last! In other words,
the behaviour at the start of the series is not the same as at the end which
almost always occurs with real time series. However, this mismatch causes
the Fourier series to see a large discontinuity which is an extremely high
13
frequency feature. This high frequency feature then gets aliased and spreads
to low frequencies as noted in the previous paragraph. So, power seems to
leak to a wider range of frequencies and a leaked spectral estimate can
seem more spread out and vague when compared to the true spectrum.
The effects of leakage can be minimized by using a technique called tapering
which gently tries to taper or reduce the values at the start and end of a
series to be the same value.
11.2.1 Is it stationary?
As mentioned in the Introduction one should first test a time series to see if
it is stationary. Formal hypothesis tests tend to concentrate on testing one
kind of alternative but are often insensitive to other kinds (but, of course,
they are often very powerful for the phenomena that they are designed to
detect). For more information on these kinds of tests see Priestley (1983) or
Van Bellegem (2003). As in many areas of statistics one can achieve quite a
lot just through fairly simple plots. For example, one might look at a time
series plot to see whether the mean or the variance of the time series changes
over time. Another useful indication is to compute the autocovariance or
spectrum (or both) on two different parts of the time series (that themselves
seem stationary). If the two quantities from the different regions look very
different then this provides some evidence of non-stationarity. Additional
graphical procedures might be to look at some kind of time-frequency or
time-scale plot (as later) and see if this exhibits constancy over time or not.
If a test or a plot indicates non-stationarity in a particular way then that
non-stationarity can be modelled in a number of ways as described next.
14
One can imagine that the process and parameters of such a global non-
stationarity process are fixed some time ago (usually infinity) and then the
process itself evolves but the rule for evolution does not change. Another
important example of such processes are the ARIMA processes which are a
generalization of ARMA processes. As an example, suppose we difference
the random walk process and give it another name:
Wt = Xt Xt1 = t . (11.42)
Xt = t t , (11.43)
This kind of model can explain all kinds of phenomena common in financial
time series (e.g. the autocorrelation of the series is insignificant, but the
15
autocorrelations of the absolute values of the series are not) and there are
many more developments on this theme of providing a model for the variance
too. Clearly, since the variance of this process, t , changes through time
the process is not stationary. However, from (11.44) the parameters i stay
fixed for all time. This assumption is surely only approximately true for a
great many real time series.
where jk (t) are wavelets at scale j and location k, jk are simply identi-
cally distributed independent Gaussian random variables and wjk specifies
the amplitude at scale j and location k. Here the spectral quantity is called
16
the evolutionary wavelet spectrum (EWS), Sj (k), which measures the power
of oscillation in the time series operating at scale 2j and location k. Ap-
2
proximately Sj (k) wjk . Many spectral methods just propose computing
coefficients (whether they be Fourier or wavelet or other) and examining
simple functions of these (usually the square) as spectral estimates. The
advantage with models (11.40) and (11.45) is that we can compute sta-
tistical properties of our estimators (functions of coefficients), e.g. obtain
expectations, variances and compute confidence intervals. A properly for-
mulated model makes it clear whether certain kinds of phenomena can be
actually estimated. For example, if your series evolves so fast in that it
cannot be estimated with a slow sampling rate then it is important to be
able to know this. Model based estimation provides the framework for this.
17
to be oscillation at about 52 days that might reflect banded tremor and
there are also early power peaks in the 2d16 and 5d8 bands at about 8
days that might indicate other geophysical processes. Of course, this is a
preliminary analysis and much further work on interpretation and linking
to real events on the ground would need to be done.
18
for example. In general, for good forecasts one needs really good param-
eter estimates. For locally stationary processes this is doubly true, see
Fryzlewicz et al. (2003). Multivariate systems: We have only studied
situations concerned with a single time series. In practice, this is unrealistic
as most experiments collect a multitude of data sets. Methods exist for
the modelling and interpretation of vector-valued multivariate time series.
For example, a basic quantity is the cross-correlation which measures the
association between different components of a vector time series at different
lags. Discrete-valued time series From brief conversations with various
Earth Science colleagues it appears that many volcanological time series are
discrete-valued (for example, counts of things, state of things). A lot of cur-
rent time series work is for continuous-valued time series. However, this is
changing, for example, see MacDonald and Zucchini (1997). (We also have
not mentioned time series that are measured continuously in time. At least
some of the standard texts, e.g. Priestley (1983) consider some models and
examples).
Further reading
As mentioned in the introduction my favourite time series book is Chatfield
(2003). Diggle (1990) is also a nice introduction. The book by Priestley (1983)
approaches the field through oscillation and spectral analysis but also has a com-
prehensive coverage of the time-domain concepts and models. Priestley (1983)
has extremely well written sections on the intuitive fundamentals of time se-
ries but also contains a lot of well-explained mathematical theory. The book by
19
Hamilton (1994) is pretty well comprehensive and up-to-date but might be too
detailed for those seeking a quick introduction.
Acknowledgements
The author is grateful to Heidy Mader and Stuart Coles, co-organisers and helpers
for organising the very interesting Workshop in Statistics and Volcanology. It
must have been very hard work: thank-you. The author is grateful to Sofia Ol-
hede of Imperial College for bringing the Thomson quote to his attention. The
author would like to thank P. Fleming, A. Sawczenko and J. Young of the Insti-
tute of Child Health, Royal Hospital for Sick Children, Bristol for supplying the
ECG data. The author would like to thank the Montserrat Volcano Observatory
for the RSAM time series and he would like to thank Willy Aspinall for supplying
this data and also for many helpful explanations concerning it. The author would
like to thank the referees and editors for suggesting many important and help-
ful things. This work was supported by EPSRC Advanced Research Fellowship
GR/A01664.
20
Bibliography
[1] Box, G.E.P., Jenkins, G.M., & Reinsel, G.C. 1994. Time-Series Anal-
ysis, Fore-casting and Control. Holden-Day, San Francisco.
[2] Brockwell, P.J. & Davis, R.A. 1991. Time Series: Theory and Methods.
Springer.
[6] Fryzlewicz, P., Van Bellegem, S. & von Sachs, R 2003. Forecasting
non-stationary time series by wavelet process modelling. Annals of the
Institute of Statistical Mathematics, 55, 737-764.
[7] Hamilton, J.D. 1994. Time series analysis. Princeton University Press,
Princeton, New Jersey.
[8] Hannan, E.J. 1960. Time series analysis. Chapman and Hall.
[9] MacDonald, I.L. and W. Zucchini 1997. Hidden Markov and Other
Models for Discrete-valued Time Series. Chapman and Hall/CRC.
[10] Nason, G.P. & von Sachs, R. 1999. Wavelets in time series analysis.
Philosophical Transactions: Mathematical, Physical and Engineering
Sciences, 357(1760), 2511-2526, doi:10.1098/rsta.1999.0445.
[11] Nason, G.P., von Sachs, R. & Kroisandt, G. 2000. Wavelet processes
and adaptive estimation of the evolutionary wavelet spectrum. Jour-
nal of the Royal Statistical Society: Series B (Statistical Methodology),
62(2), 271-292, doi:10.1111/1467-9868.00231.
21
[12] Nason, G.P., Sapatinas, T. & Sawczenko, A. 2001. Wavelet packet mod-
elling of infant sleep state using heart rate data. Sankhya, 63, 199-217.
[14] Pole, A., West, M. & Harrison, J 1994. Applied Bayesian Forecast-
ing and Time Series Analysis. Chapman and Hall/CRC.
[16] Priestley, M.B. 1983. Spectral Analysis and Time Series. Academic
Press.
[19] Torrence, C. & Compo, G.P. 1998. A practical guide to wavelet analy-
sis. Bulletin of the American Meteorological Society, 79, 61-78.
[20] Tsay, R.S. 2002. Analysis of Financial Time Series. Wiley, New York.
22
(a)
4
2
Xt
0
4
0 5 10 15 20
Lag
23
4
4
2
ar1pos
ar1neg
2
0
0
4 2
4 2
(a) (b)
0 5
0 100 150 200 0 5
0 100 150 200
Time Time
1.0
0.5
ACF
ACF
0.4
0.5 0.0
0.0
0 5 10 15 20 0 5 10 15 20
Lag Lag
Figure 11.2: The plots are (a): realization of ar1pos, (b): realization of
ar1neg, (c): autocorrelation of ar1neg, and (d): autocorrelation of ar1pos.
Recall Time is t and lag is and ACF stands for autocorrelation function
( ) in both cases.
24
50.0
50.0
20.0
spectrum
spectrum
5.0
5.0
0.5 1.0 2.0
2.0
1.0
0.5
0.2
(a) (b)
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
frequency frequency
25
40000
30000
RSAM counts
20000
10000
0
0 10 20 30 40 50
Time (days)
Figure 11.4: RSAM count series (from Montserrat from the short-period
station MBLG, Long Ground, 16:43.50N, 62:09.74W, 287m altitude). Series
starts 31th July 2001 00:00, ends 8th October 2001 23:54. Sampling time:
every 5 minutes.
26
21d8
Mid Period
5d8
1d8
1 2 4 8
15m
0 10 20 30 40 50
Time (days)
Figure 11.5: EWS estimate, Sj (k), for time series shown in Figure 11.4.
Horizontal axis is time in days. Curved dashed lines indicate cone of
influence. Mid-period is explained in the text.
27
180
Heart rate (beats per min)
160
140
120
100
80
22 23 00 01 02 03 04 05 06
Time (hours)
Figure 11.6: Heart rate recording of a 66 day old infant: the series is sampled
at 1/16 Hz and is recorded from 21:17:59 to 06:27:18; there are T = 2048
observations. Reproduced from Nason et al. (2000) with permission.
28
4
20
EWS at finest scale
15
3
10
2
5
1
22 23 00 01 02 03 04 05 06
Time (hours)
Figure 11.7: Estimate of the finest scale EWS for the ECG data in figure 11.6
(continuous curve, left axis) and sleep state (dashed line, right axis): 1, quiet
sleep; 2, state between 1 and 3; 3, active sleep; 4 awake. Reproduced from
Nason et al. (2000) with permission.
29