Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406

Lecture 3: Autoregressive Moving
Average (ARMA) Models and their

Practical Applications
Prof. Massimo Guidolin
20192– Financial Econometrics
Winter/Spring 2020
Overview
 Moving average processes
 Autoregressive processes: moments and the Yule-Walker
equations
 Wold’s decomposition theorem
 Moments, ACFs and PACFs of AR and MA processes
 Mixed ARMA(p, q) processed
 Model selection: SACF and SPACF vs. information criteria
 Model specification tests
 Forecasting with ARMA models
 A few examples of applications
Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 2
Moving Average Process
 MA(q) models are always stationary as they are finite, linear

combination of white noise processes
o Therefore a MA(q) process has constant mean, variance and
autocovariances that differ from zero up to lag q, but zero afterwards

Moving Average Process: Examples
 MA(q) models are always stationary as they are finite, linear
combination of white noise processes
o Therefore a MA(q) process has constant mean, variance and
autocovariances that differ from zero up to lag q, but zero afterwards
o Simulations are based on

Moving Average Process : Examples

Autoregressive Process
 An autoregressive (henceforth AR) process of order p is a process
in which the series 𝑦 is a weighted sum of p past variables in the
series (𝑦 , 𝑦 , … , 𝑦 ) plus a white noise error term, 𝜖
o AR(p) models are simple univariate devices to capture the observed
Markovian nature of financial and macroeconomic data, i.e., the fact
that the series tends to be influenced at most by a finite number of
past values of the same series, which is often also described as the
series only having a finite memory

The Lag and Difference Operators
 The lag operator, generally denoted by L, shifts the time index of a
variable regularly sampled over time backward by one unit
o Therefore, applying the lag operator to a generic variable 𝑦 , we
obtain the value of the variable at time t -1, i.e., 𝑦
o Equivalently, applying 𝐿 means lagging the variable k > 1 times, i.e.,
𝐿 𝑦 𝐿 𝐿𝑦 𝐿 𝑦 𝐿 𝐿𝑦 ⋯ 𝑦
 The difference operator, , is used to express the difference
between consecutive realizations of a time series, Δ𝑦 𝑦 𝑦
o With Δ we denote the first difference, with Δ2 we denote the second-
order difference, i.e., Δ 𝑦 Δ Δ𝑦 Δ 𝑦 𝑦 Δ𝑦 Δ𝑦
𝑦 𝑦 𝑦 𝑦 𝑦 2𝑦 𝑦 and so on
o Note that Δ 𝑦 𝑦 𝑦
o Δ𝑦 can also be rewritten using the lag operator, i.e., Δ𝑦 1 𝐿 𝑦
o More generally, we can write a difference equation of any order, Δ 𝑦
as Δ 𝑦 1 𝐿 𝑦 , k≥1
Stability and Stationarity of AR(p) Processes
 In case of an AR(p), because it is a stochastic difference equation, it
can be rewritten as
or, more compactly, as 𝜙 𝐿 𝑦 𝜙 𝜀 , where 𝜙 𝐿 is a
polynomial of order p,
 Replacing in the polynomial 𝜙 𝐿 the lag operator by a variable 𝜆
and setting it equal to 0, i.e., 𝜙 𝜆 0, we obtain the characteristic
equation associated with the difference equation 𝜙 𝐿 𝑦 𝜙 𝜀
o A value of 𝜆 which satisfies the polynomial equation is called a root
o A polynomial of degree p has p roots, often complex numbers
 If the absolute value of all the roots of the characteristic equations
is higher than one the process is said to be stable
 A stable process is always weakly stationary
o Even if stability and stationarity are conceptually different, stability
conditions are commonly referred to as stationarity conditions
Wold’s Decomposition Theorem
𝑦 𝜇 𝜖 𝜓 𝜖 𝜓 𝜖 ⋯ 𝜓𝜖
 An autoregressive process of order p with no constant and no other

predetermined, fixed terms can be ex-pressed as an infinite order
moving average process, MA(), and it is therefore linear
 If the process is stationary, the sum ∑ 𝜓 𝜖 will converge
 The (unconditional) mean of an AR(p) model is
o The sufficient condition for the mean of an AR(p) process to exist and
be finite is that the sum of the AR coefficients is less than one in
absolute value, , see next
Moments and ACFs of an AR(p) Process
 The (unconditional) variance of an AR(p) process is computed from
Yule-Walker equations written in recursive form (see below)
o In the AR(2) case, for instance, we have
1 𝜙 𝜎
𝑉𝑎𝑟 𝑦
1 𝜙 1 𝜙 𝜙 1 𝜙 𝜙
o For AR(p) models, the characteristic polynomials are rather
convoluted – it is infeasible to define simple restrictions on the AR
coefficients that ensure covariance stationarity
o E.g., for AR(2), the conditions are 𝜙 𝜙 1, 𝜙 𝜙 1, |𝜙 | 1
 The autocovariances and autocorrelations functions of AR(p)
processes can be computed by solving a set of simultaneous
equations known as Yule-Walker equations
o It is a system of K equations that we recursively solve to determine the
ACF of the process, i.e., 𝜌 for h = 1, 2, …
o See example concerning AR(2) process given in the lectures and/or in
the textbook
 For a stationary AR(p), the ACF will decay geometrically to zero
ACF and PACF of AR(p) Process
 The SACF and SPACF are of primary importance to identify the lag
order p of a process
o F 11
Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin
ACF and PACF of AR(p) and MA(q) Processes
 An AR(p) process is described by an ACF that may slowly tail off at

infinity and a PACF that is zero for lags larger than p
 Conversely, the ACF of a MA(q) process cuts off after lag q, while the
PACF of the process may slowly tail off at infinity

ARMA(p,q) Processes
 In some applications, the empirical description of the dynamic
structure of the data require us to specify high-order AR or MA
models, with many parameters
 To overcome this problem, the literature has introduced the class of
autoregressive moving-average (AR-MA) models, combinations of
AR and MA models

ARMA(p,q) Processes
 We can also write the ARMA(p, q) process using the lag operator:
 The ARMA(p, q) model will have a stable solution (seen as a

deterministic difference equation) and will be co-variance
stationary if the roots of the polynomial
lie outside the unit circle
 The statistical properties of an ARMA process will be a combination
of those its AR and MA components
 The unconditional expectation of an ARMA(p, q) is
o An ARMA(p, q) process gives the same mean as the corresponding

ARMA(p, 0) or AR(p)
 The general variances and autocovariances can be found solving
the Yule-Walker equation, see the book
ARMA(p,q) Processes
 For a general ARMA(p, q) model, beginning with lag q, the values of
will satisfy:
o After the qth lag, the ACF of an ARMA model is geometrically

declining, similarly to a pure AR(p) model
 The PACF is useful for distinguishing between an AR(p) process and
an ARMA(p, q) process
o While both have geometrically declining autocorrelation functions,
the former will have a partial autocorrelation function which cuts off
to zero after p lags, while the latter will have a partial autocorrelation
function which declines geometrically

ARMA(p,q) Processes
o As one would expect of an ARMA process, both the ACF and the PACF
decline geometrically: the ACF as a result of the AR part and the PACF
as a result of the MA part
o However, as the coefficient of the MA part is quite small the PACF
becomes insignificant after only two lags. Instead, the AR coefficient is
higher (0.7) and thus the ACF dies away after 9 lags and rather slowly
Model Selection: SACF and SPACF
 A first strategy, compares the
sample ACF and PACF with the
theoretical, population ACF and
PACF and uses them to identify
the order of the ARMA(p, q) model
US CPI Inflation
o Process of some ARMA type, but it

remains quite difficult to determine
its precise order (especially the MA)
Model Selection: Information Criteria
 The alternative is to use information criteria (often shortened to IC)
 They essentially trade off the goodness of (in-sample) fit and the
parsimony of the model and provide a (cardinal, even if specific to
an estimation sample) summary measure
o We are interested in forecasting out-of-sample: using too many para-
meters we will end up fitting noise and not the dependence structure
in the data, reducing the predictive power of the model (overfitting)
 Information criteria include in rather simple mathematical
formulations two terms: one which is a function of the sum of
squared residual (SSR), supplemented by a penalty for the loss of
degrees of freedom from the number of parameters of the model
o Adding a new variable (or a lag of a shock or of the series itself) will
have two opposite effects on the information criteria: it will reduce
the residual sum of squares but increase the value of the penalty term
 The best performing (promising in out-of-sample terms) model will
be the one that minimizes the information criteria
Model Selection: Information Criteria
Number of parameters
Sample size
 The SBIC is the one IC that imposes the strongest penalty (lnT) for
each additional parameter that is included in the model.
 The HQIC embodies a penalty that is somewhere in between the
one typical of AIC and the SBIC
o SBIC is a consistent criterion, i.e.,
it determinates the true model
asymptotically
o AIC asymptotically overestimates
the order/complexity of a model
with positive probability
Estimation Methods: OLS vs MLE
o It is not uncommon that different criteria lead to different models
o Using the guidance derived from the inspection of the correlogram,
we believe that an ARMA model is more likely, given that the ACF does
not show signs of geometric decay
o Could be inclined to conclude in favor of a ARMA(2,1) for the US
monthly CPI inflation rate
 The estimation of an AR(p) model because it can be performed
simply by (conditional) OLS
o Conditional on p starting values for the series
 When an MA(q) component is included, the estimation becomes
more complicated and requires Maximum Likelihood
o Please review Statistics prep-course + see the textbook
 However, this opposition is only apparent: conditional on the p
starting values, under the assumptions of a classical regression
model, OLS and MLE are identical for an AR(p)
o See 20191 for the classical linear regression model
Estimation Methods: MLE
 The first step in deriving the MLE consists of defining the joint
probability distribution of the observed data
 The joint density of the random variables in the sample may be
written as a product of conditional densities so that the log-like-
lihood function of ARMA(p, q) process has the form
o For instance, if 𝑦 has a joint and marginal normal pdf (which must
derive from the fact that 𝜖 has it), then
o MLE can be applied to any parametric distribution even when

different from the normal
 Under general conditions, the resulting estimators will then be
consistent and have an asymptotic normal distribution, which may
be used for inference
Example: ARMA(2,1) Model of US Inflation

Model Specification Tests
 If the model has been specified correctly, all the structure in the
(mean of the) data ought to be captured and the residuals shall not
exhibit any predictable patterns
 Most diagnostic checks involve the analysis of the residuals
 ① An intuitive way to identify potential problems with a ARMA
model is to plot the residuals or, better, the standardized residuals,
i.e.,
o If the residuals are normally distributed with zero mean and unit
variance, then approximately 95% of the standardized residuals
should fall in an interval of 2 around zero
o Also useful to plot the squared (standardized) residuals: if the model
is correctly specified, such a plot of squared residuals should not
display any clusters, i.e., the tendency of high (low) squared residuals
to be followed by other high (low) squared standardized residuals
 ② A more formal way to test for normality of the residuals is the
Jarque-Bera test
Model Specification Tests: Jarque-Bera Test
o Because the normal distribution is symmetric, the third central
moment, denoted by 𝜇 , should be zero; and the fourth central
moment, 𝜇 , should satisfy 𝜇 3𝜎
o A typical index of asymmetry based on the third moment (skewness),
that we denote by 𝑆, of the distribution of the residuals is
o The most commonly employed index of tail thickness based on the

fourth moment (excess kurtosis), denoted by 𝐾 , is
o If the residuals were normal, 𝑆 and 𝐾 would have a zero-mean

asymptotic distribution, with variances 6/T and 24/T, respectively
o The Jarque-Bera test concerns the composite null hypothesis:

Model Specification Tests: Jarque-Bera Test
 Jarque and Bera prove that because the square roots of the sample
statistics
are distributed, the null consists of a joint test that 𝜆 and

𝜆 are zero tested as 𝐻 : 𝜆 2 +𝜆 2 0, where 𝜆 𝜆 ~𝜒 as T⟶
 ③ Compute sample autocorrelations of residuals and perform tests
of hypotheses to assess whether there is any linear dependence
o Same portmanteau tests based on the Q-statistic can be applied to test
the null hypothesis that there is no autocorrelation at orders up to h
ARMA(2,1) Model of US Inflation

Example: ARMA(2,1) Model of US Inflation
Residuals Squared Residuals

Forecasting with ARMA
 In-sample forecasts are those generated with reference to the same
data that were used to estimate the parameters of the model
o The R-square of the model is a measure of in-sample goodness of fit
o Yet, ARMA are time series models in which the past of a series is used
to explain the behavior of the series, so that using the R-square to
quantify the quality of a model faces limitations
 More interested in how well the model performs when it is used to
forecast out-of-sample, i.e., to predict the value of observations that
were not used to specify and estimate the model
 Forecasts can be one-step-ahead, 𝑦 1 , or multi-step-ahead, 𝑦 ℎ
 In order to evaluate the usefulness of a forecast we need to specify
a loss function that defines how concerned we are if our forecast
were to be off relative to the realized value, by a certain amount.
 Convenient results obtain if one assumes a quadratic loss function,
i.e., the minimization of:
Forecasting with AR(p)
o This is known as the mean square forecast error (MSFE)
 It is possible to prove that MSFE is minimized when 𝑦 ℎ is equal
to where ℑ is the information set available
 In words, the conditional mean of 𝑦 given its past observations
is the best estimator of 𝑦 ℎ in terms of MSFE
 In the case of an AR(p) model, we have:
where
o For instance,
o The forecast error is
o The h-step forecast can be computed recursively, see the
textbook/class notes
 For a stationary AR(p) model, 𝑦 ℎ converges to the mean 𝐸 𝑦 as
h grows, the mean reversion property
Forecasting with MA(q)
 Because the model has a memory limited to q periods only, the
point forecasts converge to the mean quickly and they are forced to
do so when the forecast horizon exceeds q periods
o E.g., for a MA(2),
because both shocks have been observed and are therefore known
o Because 𝜀 has not yet been observed at time t, and its expectation at
time t is zero, then
o By the same principle, because 𝜀 ,
𝜀 , and 𝜀 are not known at time t
 By induction, the forecasts of an ARMA(p, q) model can be obtained
from
𝑦 ℎ 𝜙 𝜙 𝑦 ℎ 𝑖 𝜃 𝐸 𝜖
 How do we assess the forecasting accuracy of a model?

Forecasting US CPI Inflation with ARMA Models

A Few Examples of Potential Applications
 To fit the time series dynamics of the earnings of company i = 1, 2,
…, N with ARMA(pi, qi) models, to compare the resulting forecasts
with earnings forecasts published by analysts
o Possibly, also compare the 90% forecast confidence intervals from
ARMA(pi, qi) with the dispersion over time of analysts forecasts
o Possibly also price stocks of each company using some DCF model and
compare it with the target prices published by the analysts
 To fit the time series dynamics of commodity future returns (for a
range of underlying assets) using ARMA(p, q) models to forecast
o Possibly, compare such forecasts with those produced by predictive
regressions that just use (or also use) commodity-specific information
o A predictive regression is a linear model to predict the conditional
mean with structure 𝑦 ℎ 𝛼 ∑ 𝛽 𝑥 , where the
𝑥 , , 𝑥 , , … , 𝑥 , are the commodity-specific variables
o Possibly, to try and understand why and when only the past of a series
helps to predict future returns or not (i.e., for which commodities)
A Few Examples of Potential Applications
 Given the mkt. ptf., use mean-variance portfolio theory (see 20135,
part 1) and ARMA models to forecast the (conditional risk
premium) and decide the optimal weight to be assigned to risky vs.
riskless assets
o Also called strategic asset allocation problem
o As you will surely recall, 𝜔 1/𝜆𝜎 ) 𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 𝑜𝑓 𝑟 𝑟
o Similar/partly identical to a question of Homework 2 in 20135!?
o Possibly compare with the performance results (say, Sharpe ratio)
produced by the strategy 𝜔 1/𝜆𝜎 ) 𝐻𝑖𝑠𝑡. 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑟̂ 𝑟
which results from ARMA(0,0) processes (== white noise returns)
 After measuring which portion of a given policy or company
announcement represents news/unexpected information, measure
how long it takes for the news to be incorporated in the price
o Equivalent to test the number of lags q in a ARMA(0, q) model
o Unclear what the finding of p > 0 could mean in a ARMA(p, q) case
o Related to standard and (too) popular event studies

Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406

Uploaded by

Copyright:

Available Formats

Lecture 3: Autoregressive Moving

Average (ARMA) Models and their

20192– Financial Econometrics

 MA(q) models are always stationary as they are finite, linear

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 3

o Simulations are based on

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 5

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 6

 An autoregressive process of order p with no constant and no other

 An AR(p) process is described by an ACF that may slowly tail off at

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 12

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 13

 The ARMA(p, q) model will have a stable solution (seen as a

o An ARMA(p, q) process gives the same mean as the corresponding

o After the qth lag, the ACF of an ARMA model is geometrically

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 15

o Process of some ARMA type, but it

o MLE can be applied to any parametric distribution even when

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 22

o The most commonly employed index of tail thickness based on the

o If the residuals were normal, 𝑆 and 𝐾 would have a zero-mean

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 24

are distributed, the null consists of a joint test that 𝜆 and

ARMA(2,1) Model of US Inflation

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 25

Residuals Squared Residuals

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 26

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 29

Lecture 3: Autoregressive Moving Average (ARMA) Models – Prof. Guidolin 30

You might also like