Thesis

CHRISTIAN ALBRECHT UNIVERSITÄT ZU
KIEL
Faculty of Business, Economics and Social Sciences
Forecasting Stock Index Return Using

Nonlinear Time Series Models and
Artificial Neural Networks
Semester: 6
Student ID: 1110282
First Supervisor: Prof. Dr. Markus Haas
Second Supervisor: Prof. Dr. Stephan Reitz
Master’s Thesis
for the Master’s Program
MSc Quantitative Finance
Submitted by Alphonse Malonda Tsumbu

Gurlittstraße 1-3, Room 3.13 D-24106, Kiel, Germany
September 2019
Contents
List of Tables III
List of Figures VIII
List of Abbreviations X
1 Introduction 1
2 Forecasting with nonlinear models 3
2.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Point forecast evaluation . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 The Giacomini-White test for conditional predictability 9
2.2.2 Directional Accuracy test . . . . . . . . . . . . . . . . . 9
2.2.3 Excess Profitability test . . . . . . . . . . . . . . . . . . 10
3 Methodology 11
3.1 Threshold Autoregressive Models . . . . . . . . . . . . . . . . . 11
3.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . 13
3.3 Autoregressive model . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Models implementation 15
4.1 SETAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 LSTAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 SLFN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
I
5 Forecast result 23
6 Conclusion 32
References 35
Appendix A Tables 38
Appendix B Figures 58
Appendix C Information on R Codes 68
Appendix D Declaration of Authorship 70
II
List of Tables
4.1 Descriptive statistics of return time series . . . . . . . . . . . . 17
5.1 In-sample fit comparison between SETAR and AR . . . . . . . 24
5.2 Relative out-of-sample fit (loss from SETAR-Naı̈ve/loss from AR) 25
5.3 Relative out-of-sample fit (loss from SETAR-bootstrap/loss from

AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 Relative out-of-sample fit (loss from SETAR-bock bootstrap/loss

from AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Relative out-of-sample fit (loss from SETAR-mc/loss from AR) 25
5.6 P-values of Giacomini-White test with quadratic loss, SETAR

against AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.7 P-values of Giacomini-White test with absolute loss, SETAR

against AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.8 P-values of directional accuracy test on SETAR and AR forecasts 25
5.9 P-values of excess profit test on SETAR and AR forecasts . . . 26
5.10 In-sample fit comparison between LSTAR and AR . . . . . . . 27
5.11 Relative out-of-sample fit (loss from LSTAR-/loss from AR) . . 27
5.12 Relative out-of-sample fit (loss from LSTAR-bootstrap/loss from

AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.13 Relative out-of-sample fit (loss from LSTAR-block bootstrap/loss

from AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.14 Relative out-of-sample fit (loss from LSTAR-Monte Carlo/loss

from AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
III
5.15 P-values of Giacomini-White test with quadratic loss, LSTAR
against AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.16 P-values of Giacomini-White test with absolute loss, LSTAR

against AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.17 P-values of directional accuracy test on LSTAR and AR forecasts 28
5.18 P-values of excess profitability test on LSTAR and AR forecasts 28
5.19 In-sample fit comparison between SLFN and AR . . . . . . . . 29
5.20 Relative out-of-sample fit (loss from SLFN-/loss from AR) . . . 29
5.21 Relative out-of-sample fit (loss from SLFN-bootstrap/loss from

AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.22 Relative out-of-sample fit (loss from SLFN-Block bootstrap/loss

from AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.23 Relative out-of-sample fit (loss from SLFN-Monte Carlo/loss

from AR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.24 P-values of Giacomini-White test with quadratic loss, SLFN

against AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.25 P-values of Giacomini-White test with absolute loss, SLFN against

AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.26 P-values of directional accuracy test on SLFN and AR forecasts 30
5.27 P-values of excess profitability test on SLFN and AR forecasts 30
A.1 BDS test on DJI return time series . . . . . . . . . . . . . . . . 38
A.2 BDS test on NASDAQ return time series . . . . . . . . . . . . 38
A.3 BDS test on NYSE return time series . . . . . . . . . . . . . . 39
A.4 BDS test on S&P return time series . . . . . . . . . . . . . . . 39
IV
A.5 Jarque-Bera test on return . . . . . . . . . . . . . . . . . . . . . 39
A.6 ADF and PP test p-value . . . . . . . . . . . . . . . . . . . . . 40
A.7 P-values from Chan threshold nonlinearity test . . . . . . . . . 40
A.8 Threshold grid search in the SETAR model, DJI time series . . 40
A.9 Threshold grid search in the SETAR model, NASDAQ time series 40
A.10 Threshold grid search in the SETAR model, NYSE time series 40
A.11 Threshold grid search in the SETAR model, S&P time series . 41
A.12 SETAR parameter estimates for the DJI time series . . . . . . 41
A.13 SETAR parameter estimate for the NASDAQ time series . . . 41
A.14 SETAR parameter estimate for the NYSE time series . . . . . 42
A.15 SETAR parameter estimate for the S&P time series . . . . . . 42
A.16 The Jarque-Bera test on SETAR residuals . . . . . . . . . . . . 42
A.17 LSTAR nonlinearity test, DJI time series . . . . . . . . . . . . 42
A.18 LSTAR nonlinearity test, NASDAQ time series . . . . . . . . . 43
A.19 LSTAR nonlinearity test, NYSE time series . . . . . . . . . . . 43
A.20 LSTAR nonlinearity test, S&P time series . . . . . . . . . . . . 43
A.21 Threshold grid search in the LSTAR model, DJI time series . . 43
A.22 Threshold grid search in the LSTAR model, NASDAQ time series 43
A.23 Threshold grid search in the LSTAR model, NYSE time series . 44
A.24 Threshold grid search in the LSTAR model, S&P time series . 44
A.25 LSTAR parameter estimate for the DJI time series . . . . . . . 44
A.26 LSTAR parameter estimate for the NASDAQ time series . . . . 45
V
A.27 LSTAR parameter estimate for the NYSE time series . . . . . . 45
A.28 LSTAR parameter estimate for theS&P time series . . . . . . . 45
A.29 LSTAR remaining nonlinearity test on LSTAR DJI residuals . 46
A.30 LSTAR remaining nonlinearity test on LSTAR NASDAQ resi-

duals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.31 LSTAR remaining nonlinearity test on LSTAR NYSE residuals 46
A.32 LSTAR remaining nonlinearity test on LSTAR S&P residuals . 46
A.33 Grid search for SLFN model estimation and selection for DJI
return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.34 Grid search for SLFN model estimation and selection for NAS-
DAQ return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.35 Grid search for SLFN model estimation and selection for NYSE
return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.36 Grid search for SLFN model estimation and selection for S&P
return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.37 Jarque-Bera test on SLFN . . . . . . . . . . . . . . . . . . . . . 47
A.38 Teräsvirta’s neural network test for neglected nonlinearity . . . 48
A.39 AR estimated parameter for DJI return, Ljung-Box test on

residual and AIC . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.40 AR estimated parameter for NASDAQ return, Ljung-Box test

on residual and AIC . . . . . . . . . . . . . . . . . . . . . . . . 49
A.41 AR estimated parameter for NYSE return, Ljung-Box test on

residual and AIC . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.42 AR estimated parameter for S&P return, Ljung-Box test on

residual and AIC . . . . . . . . . . . . . . . . . . . . . . . . . . 50
VI
A.43 SETAR detailed in-sample fit . . . . . . . . . . . . . . . . . . . 50
A.44 LSTAR detailed in-sample fit . . . . . . . . . . . . . . . . . . . 51
A.45 ANN detailed in-sample fit . . . . . . . . . . . . . . . . . . . . 51
A.46 SETAR, naı̈ve approach detailed out-of-sample fit . . . . . . . 52
A.47 SETAR, bootstrap approach detailed out-of-sample fit . . . . . 52
A.48 SETAR, block bootstrap approach detailed out-of-sample fit . . 53
A.49 SETAR, Monte Carlo approach detailed out-of-sample fit . . . 53
A.50 LSTAR, naı̈ve approach detailed out-of-sample fit . . . . . . . . 54
A.51 LSTAR, bootstrap detailed out-of-sample fit . . . . . . . . . . . 54
A.52 LSTAR, block bootstrap detailed out-of-sample fit . . . . . . . 55
A.53 LSTAR, Monte Carlo detailed out-of-sample fit . . . . . . . . . 55
A.54 SFLN, naı̈ve approach detailed out-of-sample fit . . . . . . . . 56
A.55 SLFN, bootstrap detailed out-of-sample fit . . . . . . . . . . . . 56
A.56 SLFN, block bootstrap detailed out-of-sample fit . . . . . . . . 57
A.57 SLFN, Monte Carlo detailed out-of-sample fit . . . . . . . . . . 57
VII
List of Figures
4.1 Plots of returns time series . . . . . . . . . . . . . . . . . . . . 17
4.2 Regime-switching plots, SETAR . . . . . . . . . . . . . . . . . . 19
4.3 Regime-switching plots, LSTAR . . . . . . . . . . . . . . . . . . 21
B.1 Q-Q plot of returns . . . . . . . . . . . . . . . . . . . . . . . . . 58
B.2 Histogram of returns illustrating nonnormality . . . . . . . . . 58
B.3 SETAR grid search . . . . . . . . . . . . . . . . . . . . . . . . . 59
B.4 Q-Q plots of SETAR residuals . . . . . . . . . . . . . . . . . . . 59
B.5 ACF and PACF of DJI return . . . . . . . . . . . . . . . . . . . 60
B.6 ACF and PACF of NASDAQ return . . . . . . . . . . . . . . . 60
B.7 ACF and PACF of NYSE return . . . . . . . . . . . . . . . . . 60
B.8 ACF and PACF of S&P return . . . . . . . . . . . . . . . . . . 61
B.9 SETAR DJI return forecasts comparison . . . . . . . . . . . . . 61
B.10 SETAR NASDAQ return forecasts comparison . . . . . . . . . 61
B.11 SETAR NYSE return forecasts comparison . . . . . . . . . . . 62
B.12 SETAR S&P return forecasts comparison . . . . . . . . . . . . 62
B.13 LSTAR DJI return forecasts comparison . . . . . . . . . . . . . 62
B.14 LSTAR NASDAQ return forecasts comparison . . . . . . . . . 63
B.15 LSTAR NYSE return forecasts comparison . . . . . . . . . . . 63
B.16 LSTAR S&P return forecasts comparison . . . . . . . . . . . . 63
B.17 SLFN DJI return forecasts comparison . . . . . . . . . . . . . . 64
VIII
B.18 SLFN NASDAQ return forecasts comparison . . . . . . . . . . 64
B.19 SLFN NYSE return forecasts comparison . . . . . . . . . . . . 64
B.20 SLFN S&P return forecasts comparison . . . . . . . . . . . . . 65
B.21 Plots of AR in-sample fit . . . . . . . . . . . . . . . . . . . . . 65
B.22 Plots of SETAR in-sample fit . . . . . . . . . . . . . . . . . . . 66
B.23 Plots of LSTAR in-sample fit . . . . . . . . . . . . . . . . . . . 66
B.24 Plots of ANN in-sample fit . . . . . . . . . . . . . . . . . . . . . 67
C.1 Example of results on HTML file . . . . . . . . . . . . . . . . . 69
IX
List of Abbreviations
ACF Autocorrelation function

AIC Aikake Information Criterion
ANN Artificial Neural Network
AR-GARCH Autoregressive-Generalized Autoregressive
Conditional Heteroskedasticity
AR Autoregressive
BDS Brock-Dechert-Scheinkman (test)
BFGS Broyden–Fletcher–Goldfarb–Shanno (algo-
rithm)
CLS Conditional Least Square
DJI Dow Jones Industrial Average
DM Diebold-Mariano (test)
EP Excess Profitability (test)
GARCH Generalized Autoregressive Conditional Hete-
roskedasticity
GW Giacomini-White (test)
LSTAR Logistic Smooth Transition Autoregressive
MAFE Mean Absolute Forecast Error
ML Maximum Likelihood
NASDAQ National Association of Securities Dealers Au-
tomated Quotations
NLS Nonlinear Least Square
NYSE New York Stock Exchange (composite)
OECD Organisation for Economic Co-operation and
Development
OLS Ordinary Least Square
PACF Partial Autocorrelation function
RMSFE Root Mean Square Forecast Error
SETAR Self Exiting Threshold Autoregressive
SLFN Single hidden Layer Feedforward Neural net-
work
X
S&P Standard and Poor
STAR Smooth Transition Autoregressive
TAR Threshold Autoregressive
XI
1. Introduction
For obvious reasons, research on stock return predictability has been for long
a focal standpoint of many academics and finance experts. On the experts’
side, building models that would provide reliable return forecasts is vital in
order to advise and enhance investment strategies. On the academics’ side, the
topic of stock return predictability leads to the efficiency market hypothesis.
In fact, the ability to understand the nature of stock return predictability
has some major implications for tests of market efficiency in the sense that
understanding the nature of stock predictability is useful in building realistic
asset pricing models in order to better explain return time series (Rapach &
Zhou 2013, p. 330 ).
Predicting stock return has proven to be a very tedious task. Stock return
series inherently contain some unpredictable component so that even the best
forecasting model can only explain a small part of their behaviour. Timmer-
mann (2018, p. 2) argues that competition among market participants implies
that if a successful model in predicting return is discovered, it will be readily
adopted by different traders and the dissemination of the model would cause
the price to move unpredictably. Nevertheless, some studies suggest that stock
returns are to some extent predictable either from their past story or by using
publicly available information. Rational asset pricing theory for instance sug-
gests that stock return predictability can result from exposure to time-varying
aggregate risk; therefore, if a successful model can capture this time-varying
aggregate risk premium, it will remain successful over time (Rapach & Zhou
2013,p. 330).
In the context of return modelling, literature has identified what is commonly

known as stylized facts of returns, which suggests that stock returns display
erratic behaviours. Particularly, it is sometimes observed that large negative
returns occur more often than large positive returns leading to asymmetry in
the distribution of return. This, alongside with the fat tail property, implies
deviation from normality. Furthermore, Mandelbrot (1963) noted that volatile
periods alternate with tranquil periods; this means that volatility appears to
1
be autocorrelated. This time-varying property of volatility is also referred to as
volatility clustering. Moreover, raw return appears to display no or only little
autocorrelation. Without excluding the possibility of nonlinear relation,
this means that the linear relation between consecutive returns is very small.
Likewise, the existence of frequent structural breaks and behavioural changes
in financial time series is well documented (see Franses and Dijk 2000, p. 5-19
for more details).
Stylized facts of return have led to a need for models that would better reflect
those features. While the theory of linear models is well established, linear
models fail to capture the nonlinear characteristics of financial time series and
do not produce reliable forecasts. Therefore, we are going to consider, still
under massive research, non-linear time series models, not because we believe
that they provide desired out-of-sample forecast performance, but simply be-
cause they can capture some of the nonlinear patterns in return and might
provide enhanced out-of-sample forecast vis-a-vis linear models. Besides, we
are going to consider artificial neural network models, which have proven to
yield useful results in a range of applications in various fields.
The main aim of this thesis is to compare the predictive power of some of
the most prominent non-linear time series models and artificial neural net-
work models in forecasting returns of stock indexes. More explicitly, we are
going to examine whether there is a predictive gain or loss in using nonlin-
ear models in lieu of a benchmark linear model (the autoregressive model,
AR) in forecasting returns of four major stock indexes, namely the DJI, the
NASDAQ, the NYSE and the S&P 500. Our attention is restricted to multi-
step point forecasts. We use three different models, namely the self-exciting
threshold autoregressive (SETAR), the logistic smooth transition autoregres-
sive (LSTAR) and the single hidden layer feedforward network (SLFN). We
restrict our attention to past returns as the only explanatory variable in or-
der to find out about return predictability from its own history. Our work
is going to be structured as follows: section 2 is going to be dedicated the
topic of forecasting with nonlinear models and forecasts comparison, section
3, the methodology, discusses different models used to generate forecasts, sec-
2
tion 4 concerns models implementation, the results are going to be presented
in section 5 and in section 6 a conclusion is going to complete the work.
2. Forecasting with nonlinear models
In this section, we are going to discuss typical forecasting methods employed

to generate multi-step forecasts through nonlinear models, review the current
literature and expose forecasting evaluation methods. The forecasting me-
thods exposed here to generate forecasts from nonlinear models closely follow
Lundbergh and Teräsvirta (2004, p. 11-13). We first write a nonlinear model
as
rt = F (xt ; θ) + εt (2.1)
where F (xt ; θ) is the skeleton of the model under consideration. The one-step
ahead forecast is obtained like in the linear model as
f
rt+1 = E (rt+1 |It ) = F (xt+1 ; θ) . (2.2)
Equation 2.2 is hence an unbiased forecast of rt+1 given the information set
0
It . The relevant information is contained in xt+1 = 1, rt , rt−1 , . . . , rt−(p−1) .
Turning to longer time horizon forecast, obtaining E (rt+h |It ) for h > 1 be-
comes more involved. We use the 2 steps ahead to illustrate this problem. We
have
nh i o n o
f
rt+2|t = E (rt+2 |It ) = E F xft+2 ; θ + εt+2 |It = E F xft+2 ; θ |It
(2.3)

where xft+2 = 1, xft+1|t + εt+1 , rt , . . . , rt−(p−2) ). The exact expression for 2.3
is
n o Z ∞
f f
rt+2|t = E F xt+2 ; θ |It = F xft+2 ; θ dφ(z)dz. (2.4)
−∞
Thus, obtaining forecasts requires numerical integration. For h = 2, the nu-

merical integration is not very complicated from a relative perspective. How-
ever, as the forecast horizon increases, the dimension of the integral grows,
making this approach much more computationally involved. Few methods
exist to circumvent this problem:
3
• The naı̈ve approach: this approach comes down to simply setting εt+1 =
0 and using the skeleton, but it yields biased forecasts. We have

fn
rt+2|t = F xft+2
n
;θ (2.5)
0
where xft+2 = 1, rt+1|t
f
, rt , . . . , rt−(p−2)
• The Monte Carlo approach: this is given by

M
fm 1 X f
rt+2|t = F xt+2,m ; θ (2.6)
M
m=1
where each of the M values of εt+1 in xft+2,m is drawn independently

from the error distribution of rt .
• The bootstrap approach: this is given by

B
fb 1 X f
rt+2|t = F xt+2,b ; θ (2.7)
B
b=1
where each of εt+1 in xft+2,b is drawn independently from the set of

residuals of the estimated model with replacement.
• The block bootstrap approach: this method is closely related to the

previous one and can be used to improve accuracy by dividing the time
series into several blocks. The accuracy of the block bootstrap depends
on the choice of the block length. In this thesis, the block length is chosen
to be T 1/3 , where T is the sample size, following the recommendation in
Hall, Horowitz, and Jing (1995).
2.1 Literature review
In the context of time series forecasting, there is a gradual change from fore-
casting with a linear model to forecasting with nonlinear models. Few authors
have been interested in studying whether using nonlinear models could im-
prove upon forecasts obtained by linear models. The results are quite mixed.
Stock and Watson (1999) undertake a forecasting comparison of 215 US monthly

macroeconomics time series, including stock returns. Their study involves four
4
classes of models: linear autoregressions with and without unit root pre-test,
exponential smoothing, ANN and STAR models. Autoregressions with unit
root pre-test achieve the best overall performance. However, the study sug-
gests that this performance can be increased if forecast combination with other
methods is used.
Sarantis (1999) forecasts real exchange rates of 10 major industrial countries.

STAR family models are applied to exploit nonlinearity in exchange rates after
rejecting linearity. The finding is that the real exchange rate process is cyclical
in both regimes for almost all countries and some evidences about asymmetries
were reported. Furthermore, while STAR models perform better than out-
of-sample Markov regime switching models, they were not much difference
between forecasts from STAR and forecasts from AR.
Boero and Marrocu (2002) compare the forecasting performance of different

models for returns of the Japanese yen, the French franc and the German
mark. More precisely, forecasts from SETAR, STAR and GARCH types are
contrasted to forecasts from linear specifications. They find that the time series
under consideration are better forecasted with linear models if the estimation
criterion is limited to RMSFE. Furthermore, when the evaluation is conducted
conditional on the regime at the forecast origin and the density forecasts,
nonlinear models exhibit forecasting gains.
Marcellino (2004) fits a range of nonlinear and time-varying models to aggre-

gate European Monetary union macroeconomic variables and compare them
with linear models in order to assess their performance in a real-time frame-
work. The finding is that nonlinear models outperform linear models in most
specifications.
Bradley and Jansen (2004) model stock return and production as nonlinear and
regime-dependent variables. Various nonlinear models are used to generate
out-of-sample forecasts of both variables and compare the result to forecasts
from the linear model. The finding is that the linear model performs as well
or better than any of the nonlinear specifications for stock returns. For the
industrial production, two of the nonlinear specifications provide better results
5
than the linear model.
Timo Teräsvirta, van Dijk, and Medeiros (2005) provide an empirical study,
trying to answer the question whether careful modelling can improve forecast
accuracy of nonlinear models in contrast to linear ones. 47 monthly macro-
economics variables of the G7 economies are examined and 3 models are con-
sidered: the linear autoregressive, the smooth transition autoregressive and
artificial neural networks. The findings are mixed for the ANN in the sense
that ANN obtained using Bayesian regularization perform better than the AR
but only in the long term. On the other hand, the STAR model outperforms
linear autoregressive models demonstrating that a careful modelling of the
nonlinear model is necessary.
Lim and Hooy (2013) study the source and the persistence of nonlinear pre-
dictability in various stock markets of G7 countries. Evidence of local nonlin-
ear predictability is detected by applying the BDS test to autoregression AR-
filtered return in rolling estimation windows. In order to identify the source
of nonlinear predictability, the BDS test is applied to AR-GARCH-filtered re-
turn in rolling windows. Taking into account conditional heteroskedasticity,
evidence of nonlinear predictability is brought out during some short-time in-
tervals in all markets, thus contradicting the weak form of market efficiency
hypothesis.
Ferrara, Marcellino, and Mogliani (2015) conduct an extensive analysis over

different major macroeconomic and financial variables for a large panel of
OECD (Organisation for Economic Co-operation and Development) member
countries. It is found that on average, nonlinear models do not perform better
than standard linear models, even during the great recession.
2.2 Point forecast evaluation
The goodness of a forecast can be evaluated based on a given loss function.

In order to investigate the out-of-sample fit of two competing models over the
forecast period T + 1 to T + P , two loss functions are considered, namely the
6
root mean squared forecast error (RMSFE) and the mean absolute forecast
error (MAFE), respectively given by
v
u
u TX
+P
RM SF E = P
t −1 e2t (2.8)
t=T +1
TX
+P
M AF E = P −1 |et | (2.9)
t=T +1
where et is the forecast error. The better model is the one which has the
smaller loss. In order to make the comparison easy, we use an approach that
relies on relative accuracy, which means that for each model we have
l (rt , r̂t )
R (rt , r̂t ) = (2.10)
L (rt , r̂t )
where l (rt , r̂t ) is the loss from a given nonlinear model and L (rt , r̂t ) is the
loss from the benchmark model. An R(rt , r̂t ) > 1 means that the AR model
performs better than the compared nonlinear model and vice versa.
Furthermore, a test to evaluate whether the loss differentials between forecasts

from nonlinear models and forecasts from the benchmark linear model are sig-
nificant is crucially important. In this context, Diebold and Mariano (1995)
provide a test which is most commonly used in comparing forecasts. Unfor-
tunately, the Diebold Mariano (DM) test cannot be applied in this thesis as
this study involves comparing forecasts from nested models. The asymptotic
distribution of the DM test statistic is no longer normal when comparing fore-
casts from nested models, and it converges to a function of stochastic integrals
of quadratics of Brownian motion (Clark and McCracken 2001, p. 13).
Alternatively, Giacomini and White (2006) provide a test to assess whether

the conditional loss differential between the examined forecasts is significant.
This test is referred to as the GW test for conditional predictive ability. This
is somehow different from the DM test in the sense that the DM test uses
unconditional expected loss functions, while in the GW test, the loss function
is based on the information set used for model estimation. Another difference
between the GW test and other existing ones (e.g. Clark& McCracken, 2001;
Corradi, Swanson,& Olivetti, 2001) is that it examines forecasting methods,
which includes both the forecasting model and other choices that are made by
7
the forecaster at the time of the prediction, such as the estimation procedure
to choose. Of course, evaluating forecasting methods rather than models is
important as all the elements of the method can affect the forecast perfor-
mance. Obviously, the main reason why the GW test is used in this thesis and
its main advantage is that it can be applied to compare forecasts from nested
models. We should keep in mind, however, that the GW test assumes that
forecasts are obtained using rolling window estimators, and this can lead to
a substantial decrease of statistical power (Elliott and Timmermann 2016, p.
104). To the best of my knowledge, there is no published paper that employ
the GW test to compare forecasts from nonlinear models against forecasts
from a benchmark nested linear model and this research is probably the first
to do so in this regard.
So far, the methods presented above to assess forecast accuracy are based on
measures of the distance between forecasts and realizations; put differently,
they concentrate on the magnitude of forecast errors and can be considered
as quantitative measures of forecast accuracy. However, regime switching mo-
dels, with the idea of moving from one state of the world to another, may be
better suited for predicting future movements of time series. Thus, one way of
capturing this idea is to use an evaluation criterion based on how often the sign
of return is correctly predicted. We refer to such procedure as a qualitative
measure of forecast accuracy and consider two market timing tests to evaluate
how well models can predict return movements. The first test considered in
this context is the Directional Accuracy (DA) test of Pesaran and Timmer-
mann (1992) and then the Excess Profitability (EP) test of Anatolyev and
Gerko (2005). Both tests are described below. Such methods of evaluating
forecasts can be of interest to investors who might be interested in knowing
the future direction of returns rather than the magnitude of their changes.
For the sake of completness, we also thought of reporting R2 and Adjusted

R2 to access the in-sample fit, and more precisely, we have focused on R2 or
adjusted R2 differential (in-sample fit of a nonlinear model minus -in-sample
fit of AR). A posivite value of the differential means that the nonlinear model
fits better in-sample than the AR model while a negative value means the
8
opposite.
2.2.1 The Giacomini-White test for conditional predictability
The GW test compares the accuracy of two competing forecasts by examining

the conditional expected loss from both forecasts. Here, we focus on multi-step
conditional predictability. Given the loss function Lt+h (.), the null hypothesis
of equal conditional predictive ability of h > 1 ahead forecasts fˆm,t and ĝm,t
can be written as
HO : E[Lt+h (Yt+h , fˆm,t t) − Lt+h (Yt+h , ĝm,t ) |It ] ≡ E [∆Lm,t+h |It ] = 0 (2.11)
almost surely, t = 1, 2 . . . and the alternative hypothesis is of non-equal con-

ditional predictive ability of fˆm,t and ĝm,t . The subscripts m is the maximum
estimation window size. The test statistic is given by
T −h T −h
−1 −1
X X
GW vn,m,h = n(n ht ∆Lm,t+h )0Ω̄−1
n (n ht ∆Lm,t+h ) = nZ̄ 0 m,n Ω̄−1
n Z̄m,n
t=m t=m
(2.12)
where T is the sample size, h the forecast horizon, n ≡ T − h − m + 1,
vt is a vector test function so that E (vt ∆Lm,t+h ) = 0 and Ω̄T is chosen
in this thesis to be the Newey-West (1987) heteroskedasticity-autocorrelation
consistent estimator. The null hypothesis is rejected whenever GW vn,m,h >
χ2q,1−α , where χ2q,1−α is the (1 − α) quantile of a χ2q distribution. For more
detail, we refer to the original paper by Giacomani & White (2006).
2.2.2 Directional Accuracy test
The directional accuracy test aims at determining whether generated forecasts

correctly predict the future direction of change in the variable under conside-
ration. The null hypothesis is that forecasts and true withheld values are
independently distributed, that is, the forecast has no power in predicting the
true withheld value and the alternative is of dependence. The test statistics
is given by
P̂ − P̂∗ a
PT = r ∼ N (0, 1) (2.13)
V (P̂ ) − V P̂∗
9
where P̂ is the proportion of time that the sign of the true withheld value is
correctly predicted, P̂∗ is the estimate of the probability of correctly predicting
the sign of the true withheld value and V (.) is a consistent estimate of the
sample variance. We need to mention that if all the signs of the true withheld
values or of forecasts are the same, the PT will be undefined. For more details,
we refer to Pesaran & Timmermann (1992).
2.2.3 Excess Profitability test
The directional accuracy test discussed above concentrates on the ability of

the forecast to correctly predict the sign of the variable of interest, which in
our case is return. The EP test goes one step ahead by taking into account
the out-of-sample profitability of a trading strategy that is solely based on
future return sign prediction. In fact, Anatolyev and Gerko (2005) argue that
the ability of an investor to predict return movement may not necessarily lead
toward realizing excess profit. This is particularly true if at the time when
mistakes on direction prediction are made, (negative) returns are greater in
absolute value than positive return, compared to the time when no mistakes
on direction prediction are made. The null hypothesis of conditional mean
independence is formally given by
Ho : E (rt |It−1 ) = c (2.14)
where c is a constant, It−1 = {rt−1 , rt−2 , . . .} being the information set on

f
which the forecast r̂t depends. Technically, rt and rt+1 (the forecast) are re-
quired to be independent, for all leads and lags, under the null. The alternative
is of a non-constant conditional mean. The test statistic is given by
AT − B T a
EP = p N (0, 1) (2.15)
V̂ EP
where BT = ( T1 Tt=1 sign(r̂t ))( T1 Tt=1 rt ),AT is the expected one-period re-
P P
turn of the trading strategy that buys stock whose predicted return is positive
and sells stock whose predicted return is negative, and V̂EP is the estimate of
the variance of AT − BT (see Anatolyev & Gerko (2005) for more details).
10
3. Methodology
This section reviews different models whose forecasts are going to be studied.
For each model, 3 parts are going to be discussed: model representation, model
estimation, and model selection. The logarithmic return rt is employed and is
calculated as
rt = ln (Pt ) − ln (Pt−1 ) (3.1)
where Pt is the stock price at time t. Throughout this thesis, the stock index
return will often simply be referred to as return.
3.1 Threshold Autoregressive Models
Threshold autoregressive models, TAR models, are regime-switching models

in which regimes are assumed to be determined by a threshold variable zt
relative to a threshold value, which we denote by c. For simplicity, we restrict
our attention to TAR models with two regimes. In the special case where
zt = rt−d , that is, a lagged value of the time series, where d denotes the
delay, the model is called a self-exciting TAR (SETAR) model. We have the
following representation for a two-regimes SETAR model with p lags in each
regime

 φ0,1 + φ1,1 rt−1 + · · · + φp1,1 rt−p1 + εt if r
t−d ≤ c
rt = (3.2)
1,2 t−1 + · · · + φp2,2 rt−p2 + εt if rt−d > c.
 φ +φ r
0,2
Parameters of interest to be estimated in the SETAR model are c, and φi,j , i =

0, . . . p, j = 1, 2. The autoregressive coefficients of the SETAR model are es-
timated by conditional least square (CLS). In order to make this clearer, 3.2
can be written as
yt = φ0 1 xt I [rt−d ≤ c] + φ0 2 xt I [rt−d > c] + εt , (3.3)
where φ0j = (φ0,j , φ1,j . . . φp,j ) and xt = (1, rt−1 , . . . , rt−p ) . From this it is
clear to see that estimators of the parameter φ = (φ01 , φ02 ) in the two-regimes
switching model can be obtained by CLS as
11
n
!−1 n
!
X X
φ(c)
b = xt (c)xt (c)0 xt (c)rt . (3.4)
t=1 t=1
The notation φ(c)

b indicates that the estimator of φ depends on the threshold
value c. Moreover, xt (c) = (x0t I [rt−d ≤ c] , x0t I [rt−d > c]) (see Franses and Dijk
(2000, p. 84) for more details). The threshold value c is chosen so that the
residual variance is minimized i.e.
argmin
(ĉ) = σ̂ 2 (ĉ) (3.5)
c∈C
with C being the set of all allowable threshold values. C should be such that
each regime contains enough observation and a popular choice is to leave at
least 15% of observation in each regime. Chan (1993) shows that this proce-
dure produces a consistent estimate of c. Equation 3.4 would then become
n
!−1 n !
X X
0
φ(ĉ) =
b xt (ĉ)xt (ĉ) xt (ĉ)rt (3.6)
t=1 t=1
Lastly, the delay in the threshold variable is chosen to be d = 1 in this thesis.

However, it is important to mention that the search of the delay is sometimes
included in 3.5, but this practice is more computationally costly. Thus, for
the sake of saving computation time, we set d = 1.
A more recent model developed by T. Teräsvirta and Anderson (1992) assumes

a smooth (or continuous) transition between different regimes and it is given
(for the two-regimes model) by
rt = (φ0,1 + φ1,1 rt−1 + · · · + φp1,1 rt−p1 ) (1 − G (rt−d ; γ, c))

. (3.7)
+ (φ0,2 + φ1,1 rt−2 + · · · + φp2,2 rt−p2 ) G (rt−d ; γ, c) + εt.
This is the transition smoothing autoregressive (STAR) model. For
1
G (rt−1 ; γ, c) = (3.8)
1 + exp (−γ [rt−1 − c])
we obtain the logistic STAR (LSTAR) model.
In the LSTAR model, the focus lies in estimating the parameter vector θ =
(φ0 1 , φ0 2 , γ, c)0 . The estimation of parameters is done by nonlinear least
square (NLS), i.e.
12
n
X
θ̂ = argmin [yt − F (xt ; θ)]2 . (3.9)
θ t=1
For fixed γ and c, the LSTAR model is linear in autoregressive parameters.

Consequently, the parameters φ1 and φ2 can be obtained by OLS as
n
!−1 n
!
X X
φ(γ,
b c) = xt (γ, c)xt (γ, c)0 xt (γ, c)rt (3.10)
t=1 t=1
where xt (γ, c) = (x0t (1 − G (yt−1 ; γ, c)) , x0t G (yt−1 ; γ, c))0 . In order to find the
optimal estimates of γ and c, we perform a two-dimensional grid search over
different combinations of γ and c and select the pair of estimates for which
the residual variance is minimized. For the threshold variable, the delay is set
to d = 1 for the same reason as in the SETAR model and it is incremented by
one each time the algorithm to obtain estimates of γ and c do not converge.
For both the SETAR and the LSTAR model, we assume that their residuals
are standard normally distributed. Thus, their estimates can be interpreted
as maximum likelihood (ML) estimates (Franses and Dijk 2000, p. 84 and p.
90). When it comes to lag order selection, our aim being to forecast, we use
the Akaike information criterion (AIC). This approach is preferred over an
alternative existant one, which consists of studying the ACF and the PACF,
because it takes into account lags that are jointly significant. Given some
upper bound for the number of lags p1 and p2 in each regime and given the
set C, the selected lag order is the one which minimizes the AIC. An obvious
drawback of this approach is that it is computationally demanding as the
model has to be estimated for different combinations of p1 and p2 . Lastly, the
BFGS algorithm is applied to solve both models.
3.2 Artificial Neural Networks
In this sub-section, we consider a variation of ANNs called the single hidden

layer feedforward neural network (SLFN) model, which is given by
q
X
βi G γi0 xt + εt. .

rt = β0 xt + (3.11)
i=1
13
The model described by 3.11 consist of three different layers. First, an in-
put layer consisting of input units. Input units are multiplied by connection
strength γi0 . The second type of layer is the hidden layer, which is composed
of q hidden units and the activation functions G (.). The third one, the output
layer, is in our case the response variable in equation 3.11.
Assuming that the true relationship between yt and zt is given by
rt = g (xt , ξ) + ηt (3.12)
where g (xt , ξ) is a continuous function, it can be shown that 3.11 can appro-
ximate any function g (xt , ξ) to any desired degree of accuracy given that the
number of hidden units is sufficiently large. Mathematically, rewriting 3.11 as
rt = F (xt ; θ) + εt (3.13)
where θ is the vector of parameter ∅, β1 , . . . , βq , . . . , γ1 , . . . , γq and

q
X
βi G γi0 xt

F (xt ; θ) = β0 xt + (3.14)
i=1
it can be proved that for any continuous function g (xt , ξ), every compact
subset K of RK , and every δ > 0, there is an ANN F (xt ; θ) so that
sup |F (x; θ) − g (xt , ξ)| < δ (3.15)

x∈K
(for reference and detail see Franses and van Dijk (2000, p. 208), Cybenko
(1989, p. 308-312), Hornik, Stinchcombe, and White (1990, p. 556-557)).
Consequently, the SLFN can be used to approximate any nonlinear relation-
ship between rt and its lagged variables. Parameter estimation can be accom-
plished by minimizing the residual sum of the squares function as
n
X
θ̂ = argmin [yt − F (xt ; θ)]2 (3.16)
θ t=1
Any conventional nonlinear least square algorithm can be used to solve 3.16
and obtain estimates of θ. While it is common to solve 3.16 by residual
backpropagation, this approach requires to carefully chose a stopping rule to
stop training the model and can therefore easily lead to overfitting. Thus,
we stick to the BFGS algorithm as with TAR models. It is also important
14
to point out that ANN models are usually thought of as an approximation
model rather than models that capture the underlying data generating process.
Consequently, ANN models are inherently misspecified (Franses and Dijk 2000,
p. 217-218). Another drawback of ANN models is the risk of overfitting. By
increasing the number of hidden units in the hidden layer, it is possible to
obtain an almost perfect in-sample fit. However, a perfect in-sample fit does
not guarantee an improved out-of-sample forecast performance. For model
selection, currently, no universal rules exist in selecting the most appropriate
model for practical application. We are going come back to this issue later on.
Finally, for the activation function G (.) , we use the logistic function given by
1
G (zt ; γ) = (3.17)
1 + exp (−γzt )
3.3 Autoregressive model
An AR model with p lags, usually referred to as AR(p), is of the form
rt = φ0 + φ1 rt−1 + · · · + φp1 rt−p + εt (3.18)
where εt is assumed to be standard normal white noise. The parameter estima-

tion is conducted by OLS and the model is selected according to the AIC. We
further test for residuals serial autocorrelation by the mean of the Ljung-Box
test.
4. Models implementation
In this section, we are going to implement models explained in section 3 and

apply different test statistics for model suitability and diagnostics. We in-
vestigate return forecasts of four different stock indexes, namely the DJI, the
NASDAQ composite, the NYSE composite and the S&P 500. Daily return
data used for model estimation range from 01/01/2000 to 31/12/2017, in-
cluding the most recent financial crisis. The stock prices were obtained from
Yahoo and returns are computed using equation 3.1, the log return. The para-
meter estimation is performed and a one-month-ahead return forecast (which
15
in this case is 19 days) is calculated for each stock index. The return data
are divided into two sets: the first set, starting from 01/2000 to 12/2017, is
used for model specification and estimation and the second set, covering the
forecast period of January 2018, is used to evaluate the predictive accuracy of
the model. Figure 4.1 shows plots of different returns. One can observe the
autocorrelated volatility (volatility clustering).
We have argued that return exhibits nonlinear patterns. In fact, fitting a

nonlinear model to data that do not exhibit significant nonlinearity is spurious.
Consequently, the need to test nonlinearity ranks first. In order to formally
test for nonlinearity in return series, we state the following hypothesis

 H0 = linearity
(4.1)
 H = nonlinearity
1
and employ the BDS test for this purpose. This test uses the correlogram
integral to analyse the spatial dependence of the series by embedding the
observed data in m-space. The result of the test is reported from table A.1
to table A.4 in the appendix. For the four time series data, the test strongly
rejects the null hypothesis of linearity for all the combinations of m embedded
dimensions, and epsilon. The next important propriety of the time series
investigated is stationarity. This was tested through the augmented Dickey-
Fuller test and the Phillips-Perron test. Testing stationarity is relevant to our
work in the sense that regressing nonstationary data gives spurious regression
and makes no sense. Nonstationarity implies a permanent deviation from
equilibrium, which is hard to interpret economically. P-values for both of the
tests are reported in table A.6 of the appendix, and the null of nonstationarity
is rejected at 5% significance level.
Table 4.1 reports some descriptive statistics of the data. The measure of kur-
tosis especially suggests excess kurtosis and the estimated skewness reflects the
asymmetry that was discussed earlier, coming from the fact that large negative
returns occur more often than large positive returns. In fact, the null hypo-
thesis of normality was tested against the alternative of nonnormality through
the Jarque-Bera test, which uses the fact that a normally distributed random
variable has a zero skewness and a kurtosis equals to 3. The null hypothesis
16
of normality is rejected at 5% significance level, meaning that returns series
are not normally distributed. The result of the Jarque-Bera test is reported in
table A.5 in the appendix. Moreover, Q-Q plots and histogram plots compa-
ring the empirical distribution of returns to the normal distribution in order
to visualize nonnormality are also reported in figure B.1 and figure B.2 in the
appendix. The ACF and PACF functions are shown in figure B.7 to figure B.8
in the appendix. The immediate observation are the smoothly decaying auto-
correlations which is a property that can be better captured by autoregressive
models. After examining some statistical properties of the data, we now turn
to the implementation of different models.
Table 4.1: Descriptive statistics of return time series
Figure 4.1: Plots of returns time series, observable autocorrelated volatility
17
4.1 SETAR
Before implementing a nonlinear model, we may be interested in knowing the

specific type of nonlinearity that we are facing. From the very beginning, this
can give us an idea of how suitable our model is in modelling the data at
hand. In this case, we are interested in testing for the existence of SETAR
threshold-type nonlinearity in the time series. To do so, we consider the pair
of hypotheses

 H0 = yt follow a SETAR (1), which is an AR
(4.2)
 H1 = yt follow a SETAR(2).
Put differently, we test the null of linearity against the alternative of a two-
regimes SETAR nonlinearity. The main issue in testing for SETAR type non-
linearity springs from unidentified parameter nuisance in the null hypothesis.
This means that the SETAR model contains an extra parameter, the threshold,
which is not restricted under the null hypothesis and which is not present in
the linear model. Thus, the asymptotic distribution of the test statistic tends
to be non-standard, with no available analytical expression as conventional
statistical theory cannot be applied. Chan (1991) defines a likelihood ratio to
test the restriction in the null hypothesis. Using the threshold nonlinearity
test of Chan (1991), the null hypothesis of linearity is strongly rejected at 5%
significance level (for all return time series; see table A.7 in the appendix).
We now proceed to the next stage of estimating the SETAR model following
the method outlined in the methodology section.
In order to estimate the model, we first need to specify the maximum autore-
gressive order for both regimes. Having high maximum order for both regimes
would be beneficial in the sense that it would provide a wider scope for model
selection. However, this comes at the cost of being considerably computa-
tionally time consuming since the estimation of a SETAR model involves grid
approximations in order to find the optimal threshold c. Therefore, the maxi-
mum order for both regimes is set at 4. The algorithm employed to estimate
the model searches a wide range of possible threshold values within regimes
with sufficient 15% number of observations in each regime as suggested by
18
Chan (1993). The AIC is used to select the most appropriate model. These
results of the grid search, for all the return series, are reported in table A.8
in the appendix and different plots of the grid searches are reported in figure
B.3 in the appendix. In figure 4.2, we show regime-switching plots of all the
return time series. The SETAR model is estimated via CLS and the results
are reported in table A.12 to table A.15 in the appendix.
A natural approach in econometrics modelling is to test for model misspeci-

fication after the model has been estimated. In the context of TAR models,
one of the most useful methods for model diagnostic consists in testing for
remaining nonlinearity. However, due to the problem of unidentified parame-
ter nuisance discussed above, such tests cannot be obtained easily. In fact,
one may apply the so-called Davies procedures but it is computationally and
statistically cumbersome so one uses it only if one has to (Davies 1977 and
Davies 1987). However, we should mention that the Jarque-Bera test on re-
siduals rejects the null normality for all the estimated models. Nevertheless,
this does not have major implications in terms of model estimation. It simply
implies that inference on residuals should be done taking into account non-
normality. P-values of the Jarque-Bera test on residuals are reported in table
A.16 and Q-Q plots are presented in figure B.4, both in the appendix, in order
to illustrate the nonnormality of residuals.
Figure 4.2: Regime switching plots of differents return time series, columnwise DJI, NAS-
DAQ, NYSE, S&P, in the SETAR model
19
4.2 LSTAR
The first step in modelling with the LSTAR consists of testing the existence of
a LSTAR nonlinearity in the data. Again, the main issue in this procedure is
the unidentified parameter nuisance problem that we encountered in the case
of the SETAR model, which in the LSTAR model sterms from the fact that the
threshold c and the parameter γ in the logistic function are not identified under
the null hypothesis. Luukkonen, Saikkonen and Teräsvirta (1988) circumvent
this problem by using a third order Taylor expansion of the logistic function
around γ = 0. The logic behind setting γ = 0, is that for γ = 0, the LSTAR
model collapses to an AR model. The auxiliary regression can be written as
yt = β0,0 + Xt β0 + Xt zt β1 + Xt zt2 β2 + Xt zt3 β3 + εt (4.3)
where Xt = (1, yt−1 , yt−2 . . . yt−p ), zt is the threshold variable and βi are func-
tions of the original model parameters. Thus, the null of linearity and the
alternative of LSTAR nonlinearity which can be written as

 H0 : γ = 0
(4.4)
 H : γ 6= 0
1
are tested via the auxiliary regression by performing the heteroskedasticity

robust Wald test with a Chi-square asymptotic distribution on the joint signi-
ficance of β1 to β3 (see Luukkonen, Saikkonen, and Teräsvirta (1988), Franses
and van Dijk (2000, p. 102), Zivot and Wang (2007, p.680), and Teräsvirta,
Tjostheim and Granger (2010) for detailed information). Results of the test
are reported in table A.17 to table A.17 in the appendix and suggest that
there is strong evidence of LSTAR nonlinearity in all return data.
Before estimating autoregressive parameters, we need to specify the maximum

number of lags in each regime. Again, we set the maximum number of lags in
each regime to be equal to 4 for the same reasons provided in the case of the
SETAR model. The BGFS algorithm selects the threshold and the gamma
parameter so that the residual variance is minimized while keeping minimum
15% observation in each regime. At the same time, the AIC is used to decide
on the number of lags to include in each regime, given that the maximum
20
autoregressive order is 4. These results of the grid search, for all the return
series, are reported in table A.21 to table A.24 in the appendix. In figure 4.3,
we present regime-switching plots of all the return time series. The LSTAR
model is estimated via NLS and the results are reported in table A.25 to A.28
in the appendix.
Figure 4.3: Regime switching plots of differents return time series, columnwise DJI, NAS-
DAQ, NYSE, S&P, in the LSTAR model
Lastly, the test of remaining nonlinearity developed by Eitrheim and Teräsvirta

(1996) is applied to test for model misspecification. This procedure actually
tests the null that a two-regimes model is adequate against the alternative that
a third regime is necessary. The trick to circumvent the parameter nuisance
problem is quite similar to the one described above; a Taylor expansion of
the logistic function regulating the transition from regime 2 to regime 3 now
around the gamma parameter in the new activation function, which means
γ2 = 0. We refer to Franses and van Dijk (2000, p 102-104 ) and Eitrheim
and Teräsvirta (1996) for more details. As a result, the test rejects the null of
adequacy of two regimes against the alternative of three regimes for all return
time series. In fact, we have tried various specifications of the LSTAR and
21
none could be validated by the Eitrheim-Teräsvirta test for remaining nonli-
nearity, suggesting that considering additional regime(s) would be beneficial
in terms of exploiting nonlinearity in the data. Results of the test are reported
in table A.29 to table A.32
4.3 SLFN
So far, the approach considered in applying different models consisted, at the

first stage, of testing for a specific type of nonlinearity. However, tests of linea-
rity against an alternative of ANN nonlinearity are not much used in literature
and probably do not exist. This might be due to the fact that ANN models are
considered as universal approximators rather than data generating processes
(the black box). However, the evidence of significant nonlinearity provided by
the BDS test can be viewed as an argument for considering ANN models. We
therefore proceed to the training process using the BFGS algorithm.
For model specification, as pointed out before, there are no universally ac-
cepted schemes. Two decisions have to be made, namely the number of input
units and the number of neurons. For the number of input units, we argue that
return time series display day-of-the-week seasonality and we therefore set it
equal to five, each input unit representing an open day of the week (Saturday
and Sunday are excluded). The next step is to decide on the number of neu-
rons. For the fixed number of input units, 5 lags in this case, the SLFN was
estimated one hundred times, with the number of neurons varying from 1 to
100. The estimated models where subsequently tested with the Teraesvirta’s
neural network test for neglected nonlinearity and the null hypothesis of li-
nearity in mean, against the alternative of nonlinearity in mean, was strongly
rejected at 5% significance level, at each specification of the SLFN, with all the
100 p-values arbitrarily close to zero. Results for this experiment are reported
in table A.38 in the appendix. Such results do not come as a surprise if we
refer to Franses and Dijk (2000, p 217-218) who note that ANN are inherently
misspecified. Teraesvirta’s neural network test for neglected nonlinearity uses
the same trick as the LSTAR test in order to circumvent the parameter nui-
22
sance problem- a Taylor expansion of the activation function (more details on
the test are provided in Teräsvirta, Lin, and Granger (1993)). This approach
would have allowed us to select the model which most strongly fails to reject
the null of linearity in mean (higher p-value). Unfortunately, all models are
misspecified according to the test above. The Jarque-Bera test also shows that
residuals are not normally distributed (table A.37).
Since model selection by a subsequent misspecification test turns out to be

impossible in this case, we rely on the AIC as we did in the case of SETAR
and LSTAR. For the number of neurons ranging from 1 to 15, the selected
model is the one for which AIC is minimum. Results for model selection
are reported in table A.33 to table A.36 in the appendix. We refrain from
providing estimated parameters in the case of SLFN since it is hard to attach
an interpretation to them (the black box). However, all the estimated AR
models are also shown in table A.39 to table A.42.
5. Forecast result
As noted previously, multi-step forecasts in nonlinear time series model are ob-
tained via 4 different methods: the naı̈ve, Monte Carlo, bootstrap and block
bootstrap. We also recall that we have defined quantitative measures of fit,
which included R2 , adjusted R2 , RMSFE, MAFE, the GW test, and quali-
tative measures of fit, which included the directional accuracy test and the
excess profitability test. R2 and adjusted R2 are for in-sample fit, while the
others are for out-of-sample fit. In total, 52 forecasts were generated. More
precisely, 16 forecasts were genarated through each nonlinear model and 4
forecasts were generated through the AR model.
First, the SETAR model was fitted to the data, and forecasts were obtained.
R2 and adjusted R2 in the AR model were greater than in the SETAR model
for all return time series. This suggest that the AR model explain the volatility
in return better than the SETAR model; those results can be found in table
5.1. Turning to the out-of-sample fit, the SETAR model uniformly dominates
the AR model in the naı̈ve approach, both in terms of RMSFE and MAFE.
23
For the bootstrap, the SETAR dominates again, except in the MAFE of DJI
return. In the block bootstrap, there is no clear winner as the results are
mixed. On the other hand, the AR model provide a better forecast than the
SETAR model in terms of MAFE in the Monte Carlo approach, while the
results are mixed for the RMSFE. The GW test with a quadratic loss function
fails to reject the null of equal predictive ability of all the SETAR forecasts
compared to the forecast from the AR model at 5% significance level. The
same result is found with the absolute loss function, with the only exception
at naı̈ve forecast of the NYSE return. This suggests that the gain or loss in
precision by forecasting return with the SETAR model compared to the AR
model is not significant. The results of the relative loss are reported in table
5.2 to table 5.5 and the results of the GW test are reported in table 5.6.
So far, the methods considered are quantitative measures of forecast accuracy.

For qualitative measures, we provide results for both the SETAR and the AR.
The directional accuracy test fails to reject the null of independence between
actual returns and returns forecasted by the SETAR and the AR model, except
in the case of the block bootstrap forecast of the NASDAQ return. Failing to
reject the null of independence means that the generated forecasts have failed
to predict the future sign of return. The excess profitability test also rejects
the null of conditional mean in almost all the forecasts, with three exceptions:
bootstrap and block bootstrap in NASDAQ return forecasts and bootstrap
forecast of DJI return. The interpretation here is that a strategy solely based
on the return sign prediction would overall not yield any profit. We should also
note that the directional accuracy test and the excess profit test are undefined
in the naı̈ve forecast of NASDAQ and NYSE since forecasted returns in those
two cases have only one sign. P-values of both the directional accuracy and
the excess profitability test are reported respectively in table 5.8 and 5.9.
Table 5.1: The in-sample fit differential between SETAR and AR model suggest that the AR
model explains a bigger amount of variance in return than the SETAR model (SETAR-AR)
24
Table 5.2: Relative out-of-sample fit (loss from SETAR-Naı̈ve/loss from AR)
Table 5.3: Relative out-of-sample fit (loss from SETAR-bootstrap/loss from AR)
Table 5.4: Relative out-of-sample fit (loss from SETAR-block bootstrap/loss from AR)
Table 5.5: Relative out-of-sample fit (loss from SETAR-mc/loss from AR)
Table 5.6: P-values of Giacomini-White test with quadratic loss, SETAR against AR
Table 5.7: P-values of Giacomini-White test with absolute loss, SETAR against AR
Table 5.8: P-values of directional accuracy test on SETAR and AR forecasts
25
Table 5.9: P-values of excess profit test on SETAR and AR forecasts
Turning to the LSTAR model, the results are quite mixed for the in-sample fit.
The AR model performs better than the LSTAR model in explaining return
volatility in the DJI and NASDAQ returns for both R2 and adjusted R2 ,
while the SETAR model dominate in the NYSE and S&P returns. In-sample
fit results are provided in table 5.10. Coming to the out-of-sample forecasts,
forecasts from the AR model uniformly dominate the LSTAR naı̈ve approach
in terms of both RMSFE and MAFE. The results obtained in the bootstrap
favor the LSTAR in terms of MAFE, while there is no winner in terms of
RMSFE. In the block bootstrap and Monte Carlo approach, there is also no
winner. The first impression in using those quantitative measures seems to
slightly favour the AR model in general. However, most importantly, the GW
test using a quadratic loss function tells us that the loss differentials between
forecasts from both models are not significant at 5% significance level. The
same result is found in the case of the absolute loss function, with the only
exception being in the naı̈ve forecast of DJI. Again, the conclusion is similar
to the SETAR, that is, overall, the loss differential from forecasts from both
models is not significant. Results for the RMSFE and MAFE can be found
in table 5.11 to table 5.14, and the p-values of the GW test are reported in
table 5.15 and table 5.16 respectively for both the quadratic and absolute loss
function.
Coming back to qualitative evaluation criterions, results from the directional

accuracy test reveal that the LSTAR model fails to predict future movements
of return. Additionally, the excess profitability test suggests that no profit
can be derived by simply exploiting the signs information of the generated
forecasts. P-values of both the directional accuracy and the excess profitability
test are respectively reported in table 5.17 and 5.18.
Lastly, we turn to the SLFN model. While there is a common misconception
26
Table 5.10: For the in-sample fit differential between LSTAR and AR model, the results are
rather mixed
Table 5.11: Relative out-of-sample fit (loss from LSTAR-/loss from AR)
Table 5.12: Relative out-of-sample fit (loss from LSTAR-bootstrap/loss from AR)
Table 5.13: Relative out-of-sample fit (loss from LSTAR-block bootstrap/loss from AR)
Table 5.14: Relative out-of-sample fit (loss from LSTAR-Monte Carlo/loss from AR)
Table 5.15: P-values of Giacomini-White test with quadratic loss, LSTAR against AR
among ordinary people (non econometricians or statisticians) that ANN mo-

dels always perform better than other simpler models, the results here suggest
something different. The misconception here springs from the fancy name
“ANN” which lead people to compare the power of ANNs to that of the bio-
logical brain. The biological brain is a very complex system that, even today,
27
Table 5.16: P-values of Giacomini-White test with absolute loss, LSTAR against AR
Table 5.17: P-values of directional accuracy test on LSTAR and AR forecasts
Table 5.18: P-values of excess profitability test on LSTAR and AR forecasts
scientists do not fully understand. A neuron in an ANN model is a simple

mathematical function meant to capture some proprieties in the data, while
a neuron in the brain represents a big black box of much more complexity.
Therefore, ANN can mimic the brain but are nothing compared to it.
We start as usual in the in-sample fit. The AR outperforms the SLFN in

the DJI and the NASDAQ returns, while the SLFN wins in the NYSE and
the S&P return (see table 5.19). Moving to quantitative measures of out-of-
sample fit, the naı̈ve forecasts from the SLFN are better than the forecasts
from the AR in terms of both RMSFE and MAFE, except in the NASDAQ
return. The AR uniformly dominates the SLFN model in the bootstrap and
the Monte Carlo approach for both loss functions. In the block bootstrap, the
AR dominates in terms of RMSFE, while the SLFN is preferred in terms of
MAFE. The GW test on the other hand, for each of the loss functions, fails
to reject the null of equal predictive ability. Overall, we can safely conclude
that there is no significant difference whether we forecast with the SLFN or
the AR model. Results for the RMSFE and MAFE can be found in table 5.20
to table 5.23, and the p-values of the GW test are reported in table 5.24 and
table 5.25 respectively for both the quadratic and absolute loss function.
28
Qualitative measures of out-of-sample fit also produce results that are closely
related to the previous ones. The directional accuracy test fails to reject the
null of independence between actual and predicted return, for all the data, at
5% significant level, meaning that the SLFN also fails to predict the future
signs of returns. The null of conditional independence is also not rejected
in the excess profitability test, meaning that we cannot generate profit solely
based on the sign of the generated forecasts. P-values of both the directional
accuracy and the excess profitability test are respectively reported in table
5.26 and 5.26.
Table 5.19: For the in-sample fit differential between SLFN and AR model, the results are
rather mixed
Table 5.20: Relative out-of-sample fit (loss from SLFN-Naı̈ve/loss from AR)
Table 5.21: Relative out-of-sample fit (loss from SLFN-bootstrap/loss from AR)
Table 5.22: Relative out-of-sample fit (loss from SLFN-Block bootstrap/loss from AR)
Table 5.23: Relative out-of-sample fit (loss from SLFN-Monte Carlo/loss from AR)
29
Table 5.24: P-values of Giacomini-White test with quadratic loss, SLFN against AR
Table 5.25: P-values of Giacomini-White test with absolute loss, SLFN against AR
Table 5.26: P-values of directional accuracy test on SLFN and AR forecasts
Table 5.27: P-values of excess profitability test on SLFN and AR forecasts
In summary, for the in-sample fit, the AR model did better than the SETAR
model in explaining volatility in return time series, but when compared to the
LSTAR and the SLFN model, the results are rather mixed. Turning to the out-
of-sample fit, most importantly, the GW test suggests that overall, none of the
nonlinear models performs better or worse than the AR model in forecasting
return as the null of equal predictive ability could not be rejected in more than
90% of the cases. It can also be observed that all relative measures of out-
of-sample performance are very close to 1, already hinting at equal predictive
accuracy. Furthermore, findings from the directional accuracy and the excess
profit test suggested that future return signs could not be forecasted reliably
and no profit could be derived by simply basing the investment decision on
the predicted return signs. These results are in line with those of a few other
authors who studied the same topics, such as Bradley and Jansen (2004) and
Ferrara, Marcellino, and Mogliani (2015) among others.
30
Few reasons can be identified to explain the failure of nonlinear models in
forecasting return or in outperforming linear models:
• The first reason has to do with the presence of nonlinearity. In fact,

Teräsvirta, Dijk, and Medeiros (2005) argue that nonlinear models would
provide forecasts which are inferior to those of linear models if nonlinea-
rity is not strong enough or if modelling is not done correctly. In our
case, the data went through extensive nonlinearity tests and modelling
was done following the recommendations in literature. Nevertheless, we
argue that it might be the case that nonlinearity is not strong enough
during the forecasted period leading to non-superior forecasts.
• The second reason can be linked to parameter uncertainty in the sense

that nonlinear models require lots of numerical approximations, which
could lead to the estimated model being sub-optimal even if the true
model generated the data.
• The third reason can be due to the fact that using only two regimes,
is not enough to capture a significant portion of nonlinearity. In fact,
the test of remaining nonlinearity in the LSTAR model have rejected
the null of linearity suggested the presence of nonlinearity in residuals.
The same results was found for the test of neglected nonlinearity in the
SLFN.
• The fourth reason and arguably the most important one is the efficient
market hypothesis, which simply supports in its strong form the idea
that the market is informationally efficient and that no predictability
can be exploited to generate risk-adjusted profit.
We provide plots, in figure B.17 to figure B.20 in the appendix, comparing

all forecasts to the true realisation in order to visualize how blatantly all the
used models fail to predict return. So far, the tables provided for model fit-
ness, only report R-square and adjusted R-square differential for the in-sample
fit and relative RMSFE and relative MAFE for out-of-sample fit. However,
being able to see individual measures of fitness can also be interesting as one
31
might for instance be interested in knowing how much volatility in return is
explained by a given model. For this purpose, we provide R and R-squares for
different models and time series in table A.43 to table A.45 in the appendix,
and indivual RMSFE and MAFE for different forecast methods are reported
in table A.46 to table A.57 in the appendix. Lastly, plots of in-sample fit for
all the models are reported in figure B.21 to figure B.24.
Few suggestions can be taken into account in order to improve the current
results. First, applying rolling window estimation could be beneficial as this
mitigates the effect of parameter change in the time series (Teräsvirta, Dijk,
Medeiros 2005, p. 772). Furthermore, considering a forecast combination
approach could lead to better results as it is extensively shown in literature
that combining forecasts provides better forecast performance than using a
single best forecast. This is in line with the recommendation of Stock and
Watson (1999). Moreover, as it was found that simply using two regimes
does not exploit nonlinearity in return well, particularly in the LSTAR model,
using more than two regimes can turn out to be beneficial. Also, applying
Bayesian methods might be appealling as they provide a coherent framework
for handling model instability, model uncertainty and parameter estimation
error (Teräsvirta 2018, p. 13). Lastly, considering additional explanatory
variables, such as price-earnings, aggregate output, dividend price, interest
rate, dividend pay-out among others in order to exploit potential information
they might contain, might be helpful as well.
6. Conclusion
In this thesis, we discussed what we believe to be some of the most prominent

nonlinear models and applied them to generate out-of-sample forecasts of stock
index returns and compared their forecast performance. The motivation to use
nonlinear models came from the observation that stock index return time se-
ries display typical nonlinear characteristics. While it is extensively shown
in literature that the well-known linear models do not provide reliable return
forecasts, we thought it was worth giving a chance to still-under-research non-
32
linear models, and compare the resulting forecasts to assess whether there is
a predictive gain in using nonlinear models in lieu of linear models. Thus, we
generated forecasts from the SETAR, the LSTAR and the SLFN models and
we compared them to forecasts generated from the AR model.
We came to the conclusion that nonlinear models, just like the AR model,
fail to reliably forecast stock index return. In fact, through some extensive
testing with the GW test, we could not find evidence of a significant loss
differential between using nonlinear models or the AR model. Moreover, using
the directional accuracy test, we found that overall none of the models could
forecast the sign of future returns, and the excess profit test has lead to the
conclusion that no profit can be derived by an investment strategy that buy
stocks whose predicted sign is positive and sell stocks whose predicted sign is
negative. Also, all the models only explain a small fraction of return volatility.
We provided some reasons that could justify our findings and discussed some
potential solutions in order to improve the current results.
Having conducted this research, we are well aware that we did not discuss
several important nonlinear models which could be used to generate out-of-
sample forecast. Such models include Markov-switching models, and time
varying smooth transition models among others. We also refrained from trea-
ting seasonality rigorously and it was only used as an argument in choosing
the number of input units in the SLFN. Finally, multivariate nonlinear mo-
dels were not considered as research on this topic is very recent and not much
developed yet. However, constructing multivariate nonlinear models is very
important for future research. Other interesting future studies could be:
• To compare the benchmark model’s forecasts to all forecasts from non-

linear models at the same time instead of pairwise comparison as it was
the case throughout this thesis. Cartensen, Wohlrabe and Ziegler (2011,
p. 2) argue that in situations where we are interested in finding the
best forecast out of a large set of candidates, pairwise tests like the DM
or the GW test can signal dominance of one forecast over the others
simply by chance, just like a repeated draw from a given distribution,
33
so to speak the standard normal distribution, will from time to time
yield values that exceed conventional critical values and leads to rejec-
tion of the mean zero hypothesis. This data snoopinp problem can be
circumvented by using the test for superior predictive ability provided
by Hansen (2005).
• To investigate stability in the performance of models over time. Giaco-

mini and Rossi (2010) argue that it is plausible that the relative per-
formance of models may itself be changing over time due to structural
instability. In this case, the best model would be the one that has shown
a stable dominance over time in contrast to its competitors. While Gia-
comini and Rossi (2010) provide a fluctuation test in the case of non-
nested models, there seems to be no or at least no much used alternative
test in the case of nested models.
• to examine the interval forecast and the density forecast which could
provide more information than simply analysing points forecasts. This
could be achieved respectively by methods such as the conditional cove-
rage test of Christoffersen (1998), which concerns the percentage of the
observations that falls in the 95% forecast confidence intervals, and the
density forecast likelihood ratio test of Berkowitz (2001) among others.
• and lastly to track model change in order to deal with model or para-
meter instability by for instance allowing time varying slop coefficients
in different models (Teräsvirta 2018, p. 12)
The explosion in the use of nonlinear models in recent years has been obvious.
ANNs models constitute a very powerful class of machine learning models and
will probably remain a key tool for forecasting in the next decades. For the
threshold principle, it is expected to make worthwhile contributions in time se-
ries over the next years; especially in nonstationary nonlinear modelling, panel
time series modelling, and spatial-temporal series modelling among others
(Tong 2011).
34
References
Berkowitz, J. (2001). Testing density forecasts, with applications to risk

management. Journal of Business & Economic Statistics, 19 (4), 465–
474.
Boero, G., & Marrocu, E. (2002). The performance of non-linear exchange
rate models: a forecasting comparison. Journal of Forecasting, 21 (7),
513–542.
Bradley, M. D., & Jansen, D. W. (2004). Forecasting with a nonlinear dynamic
model of stock returns and industrial production. International Journal
of Forecasting, 20 (2), 321–342.
Carstensen, K., Wohlrabe, K., & Ziegler, C. (2011). Predictive ability of
business cycle indicators under test. Jahrbücher für Nationalökonomie
und Statistik , 231 (1), 82–106.
Chan, K. S. (1991). Percentage points of likelihood ratio tests for thresh-
old autoregression. Journal of the Royal Statistical Society: Series B
(Methodological), 53 (3), 691–696.
Chan, K.-S., et al. (1993). Consistency and limiting distribution of the least
squares estimator of a threshold autoregressive model. The annals of
statistics, 21 (1), 520–533.
Christoffersen, P. F. (1998). Evaluating interval forecasts. International
economic review , 841–862.
Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast accuracy
and encompassing for nested models. Journal of econometrics, 105 (1),
85–110.
Corradi, V., Swanson, N. R., & Olivetti, C. (2001). Predictive ability with
cointegrated variables. Journal of Econometrics, 104 (2), 315–358.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.
Mathematics of control, signals and systems, 2 (4), 303–314.
Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present
only under the alternative. Biometrika, 64 (2), 247–254.
Davies, R. B. (1987). Hypothesis testing when a nuisance parameter is present
only under the alternative. Biometrika, 74 (1), 33–43.
35
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy.
Journal of Business and Economic Statistics, 13 (3), 253–263.
Eitrheim, Ø., & Teräsvirta, T. (1996). Testing the adequacy of smooth tran-
sition autoregressive models. Journal of Econometrics, 74 (1), 59–75.
Elliott, G., & Timmermann, A. (2016). Forecasting in economics and finance.
Annual Review of Economics, 8 , 81–110.
Ferrara, L., Marcellino, M., & Mogliani, M. (2015). Macroeconomic forecasting
during the great recession: The return of non-linearity? International
Journal of Forecasting, 31 (3), 664–679.
Franses, P. H., Van Dijk, D., et al. (2000). Non-linear time series models in
empirical finance. Cambridge University Press.
Giacomini, R., & Rossi, B. (2010). Forecast comparisons in unstable environ-
ments. Journal of Applied Econometrics, 25 (4), 595–620.
Giacomini, R., & White, H. (2006). Tests of conditional predictive ability.
Econometrica, 74 (6), 1545–1578.
Hall, P., Horowitz, J. L., & Jing, B.-Y. (1995). On blocking rules for the
bootstrap with dependent data. Biometrika, 82 (3), 561–574.
Hansen, P. R. (2005). A test for superior predictive ability. Journal of Business
& Economic Statistics, 23 (4), 365–380.
Hornik, K., Stinchcombe, M., & White, H. (1990). Universal approximation
of an unknown mapping and its derivatives using multilayer feedforward
networks. Neural networks, 3 (5), 551–560.
Lim, K.-P., & Hooy, C.-W. (2013). Non-linear predictability in G7 stock index
returns. The Manchester School , 81 (4), 620–637.
Lundbergh, S., & Teräsvirta, T. (2002). Forecasting with smooth transition
autoregressive models. A companion to economic forecasting, 485–509.
Luukkonen, R., Saikkonen, P., & Teräsvirta, T. (1988). Testing linearity
against smooth transition autoregressive models. Biometrika, 75 (3),
491–499.
Mandelbrot, B. (1963). New methods in statistical economics. Journal of
political economy, 71 (5), 421–440.
Marcellino, M. (2004). Forecasting EMU macroeconomic variables. Interna-
tional Journal of Forecasting, 20 (2), 359–372.
36
Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, hete-
roskedasticity and autocorrelation consistent covariance matrix. Econo-
metrica: Journal of the Econometric Society, 703–708.
Pesaran, M. H., & Timmermann, A. (1992). A simple nonparametric test
of predictive performance. Journal of Business & Economic Statistics,
10 (4), 461–465.
Rapach, D., & Zhou, G. (2013). Forecasting stock returns. In Handbook of
economic forecasting (Vol. 2, pp. 328–383). Elsevier.
Sarantis, N. (1999). Modeling non-linearities in real effective exchange rates.
Journal of international money and finance, 18 (1), 27–45.
Teräsvirta, T. (2018). Nonlinear models in macroeconometrics. In Oxford
research encyclopedia of economics and finance.
Teräsvirta, T., & Anderson, H. M. (1992). Characterizing nonlinearities in
business cycles using smooth transition autoregressive models. Journal
of applied Econometrics, 7 (S1), S119–S136.
Teräsvirta, T., Lin, C.-F., & Granger, C. W. (1993). Power of the neural
network linearity test. Journal of time series analysis, 14 (2), 209–220.
Teräsvirta, T., Tjøstheim, D., Granger, C. W. J., et al. (2010). Modelling
nonlinear economic time series. Oxford University Press Oxford.
Teräsvirta, T., Van Dijk, D., & Medeiros, M. C. (2005). Linear models,
smooth transition autoregressions, and neural networks for forecasting
macroeconomic time series: A re-examination. International Journal of
Forecasting, 21 (4), 755–774.
Timmermann, A. (2018). Forecasting methods in finance. Annual Review of
Financial Economics, 10 , 449–479.
Tong, H. (1990). Non-linear time series: a dynamical system approach. Oxford
University Press.
Tong, H. (2011). Threshold models in time series analysis—30 years on.
Statistics and its Interface, 4 (2), 107–118.
Zivot, E., & Wang, J. (2007). Modeling financial time series with s-plus
R
(Vol. 191). Springer Science & Business Media.
37
A. Tables
Table A.1: BDS test on DJI return time series, the null of linearity is strongly rejected at
5% significance level for all the combinations of m embedded dimensions, and epsilon
Table A.2: BDS test on NASDAQ return time series, the null of linearity is strongly rejected
at 5% significance level for all the combinations of m embedded dimensions, and epsilon
38
Table A.3: BDS test on NYSE return time series, the null of linearity is strongly rejected at
Table A.4: BDS test on NYSE return time series, the null of linearity is strongly rejected at
Table A.5: Jarque-Bera test on return, the null of normality is rejected at 5% significance
level, suggesting that returns are not normally distributed
39
Table A.6: ADF and PP test p-value. The null of nonstationarity is rejected at 5% significant
level in both test, leading to the conclusion that the data are stationary
Table A.7: The null of linearity is rejected at 5% significance level, suggesting the presence
of SETAR nonlinearity
Table A.8: Threshold grid search in the SETAR model, DJI time series
Table A.9: Threshold grid search in the SETAR model, NASDAQ time series
Table A.10: Threshold grid search in the SETAR model, NYSE time series
40
Table A.11: Threshold grid search in the SETAR model, S&P time series
Table A.12: SETAR parameter estimates for the DJI time series
Table A.13: SETAR parameter estimate for the NASDAQ time series
41
Table A.14: SETAR parameter estimate for the NYSE time series
Table A.15: SETAR parameter estimate for the S&P time series
Table A.16: The Jarque-Bera test on SETAR residuals suggests that residuals are not nor-
mally distributed
Table A.17: The LSTAR nonlinearity test on DJI time series strongly rejects the null hypo-
thesis of linearity, implying the presence of LSTAR nonlinearity
42
Table A.18: The LSTAR nonlinearity test on NASDAQ time series strongly rejects the null
hypothesis of linearity, implying the presence of LSTAR nonlinearity
Table A.19: The LSTAR nonlinearity test on NYSE time series strongly rejects the null
Table A.20: The LSTAR nonlinearity test on S&P time series strongly rejects the null
Table A.21: Threshold grid search in the LSTAR model, DJI time series
Table A.22: Threshold grid search in the LSTAR model, NASDAQ time series
43
Table A.23: Threshold grid search in the LSTAR model, NYSE time series
Table A.24: Threshold grid search in the LSTAR model, S&P time series
Table A.25: LSTAR parameter estimate for the DJI time series
44
Table A.26: LSTAR parameter estimate for the NASDAQ time series
Table A.27: LSTAR parameter estimate for the NYSE time series
Table A.28: LSTAR parameter estimate for theS&P time series
45
Table A.29: The LSTAR remaining nonlinearity test on LSTAR DJI residuals strongly rejects
the null hypothesis of 2-regimes adequacy (linearity), in favor of a 3-regimes LSTAR model,
implying the presence of remaining nonlinearity
Table A.30: The LSTAR remaining nonlinearity test on LSTAR NASDAQ residuals strongly
rejects the null hypothesis of 2-regimes adequacy (linearity), in favor of a 3-regimes LSTAR
model, implying the presence of remaining nonlinearity
Table A.31: The LSTAR remaining nonlinearity test on LSTAR NYSE residuals strongly
Table A.32: The LSTAR remaining nonlinearity test on LSTAR S&P residuals strongly
Table A.33: Grid search for SLFN model estimation and selection for DJI return
46
Table A.34: [Grid search for SLFN model estimation and selection for NASDAQ return
Table A.35: [Grid search for SLFN model estimation and selection for NYSE return
Table A.36: [Grid search for SLFN model estimation and selection for S&P return
Table A.37: Jarque-Bera test on SLFN residuals, normality is strongly rejected at 5% signi-
ficance level
47
Table A.38: Teräsvirta’s neural network test for neglected nonlinearity suggests that no
specification out of the 100 is correct.
Table A.39: AR estimated parameter for DJI return, Ljung-Box test on residual and AIC
48
Table A.40: AR estimated parameter for NASDAQ return, Ljung-Box test on residual and
AIC
Table A.41: AR estimated parameter for NYSE return, Ljung-Box test on residual and AIC
49
Table A.42: AR estimated parameter for S&P return, Ljung-Box test on residual and AIC
Table A.43: SETAR detailed in-sample fit. AR in-sample fit and in-sample fit differentials
are also reported in oder to make comparison easy
50
Table A.44: LSTAR detailed in-sample fit. AR in-sample fit and in-sample fit differentials
are also reported in oder to make comparison easy
Table A.45: ANN detailed in-sample fit. AR in-sample fit and in-sample fit differentials are
also reported in oder to make comparison easy
51
Table A.46: SETAR, naı̈ve detailed out-of-sample fit. RMSFE and MAFE for different
approaches, and they relative value to the ARs are also reported in oder to make comparison
easy
Table A.47: SETAR, bootstrap detailed out-of-sample fit. RMSFE and MAFE for different
easy
52
Table A.48: SETAR, block bootstrap detailed out-of-sample fit. RMSFE and MAFE for
different approaches, and they relative value to the ARs are also reported in oder to make
comparison easy
Table A.49: SETAR, Monte Carlo detailed out-of-sample fit. RMSFE and MAFE for dif-
ferent approaches, and they relative value to the ARs are also reported in oder to make
comparison easy
53
Table A.50: LSTAR, naı̈ve detailed out-of-sample fit. RMSFE and MAFE for different
easy
Table A.51: LSTAR, bootstrap detailed out-of-sample fit. RMSFE and MAFE for different
easy
54
Table A.52: LSTAR, block bootstrap detailed out-of-sample fit. RMSFE and MAFE for
comparison easy
Table A.53: LSTAR, Monte Carlo detailed out-of-sample fit. RMSFE and MAFE for dif-
ferent approaches, and they relative value to the ARs are also reported in oder to make
comparison easy
55
Table A.54: SFLN, naı̈ve detailed out-of-sample fit. RMSFE and MAFE for different ap-
proaches, and they relative value to the ARs are also reported in oder to make comparison
easy
Table A.55: SLFN, bootstrap detailed out-of-sample fit. RMSFE and MAFE for different
easy
56
Table A.56: SLFN, block bootstrap detailed out-of-sample fit. RMSFE and MAFE for
comparison easy
Table A.57: SLFN, Monte Carlo detailed out-of-sample fit. RMSFE and MAFE for different
easy
57
B. Figures
Figure B.1: Q-Q plot of returns shows excess kurtosis
Figure B.2: Histogram of returns illustrating nonnormality. The distribution of return is

rather leptokurtic
58
Figure B.3: SETAR grid search. Columnwise DJI, NASDAQ, NYSE and S&P
Figure B.4: Q-Q plots of SETAR residuals suggest that residuals are not normally distributed
59
Figure B.5: ACF and PACF of DJI return. The time series displays a smoothly declining
PACF, a propriety that is well captured by autoregressive models
Figure B.6: ACF and PACF of NASDAQ return. The time series displays a smoothly
declining PACF, a propriety that is well captured by autoregressive models
Figure B.7: ACF and PACF of NYSE return. The time series displays a smoothly declining
60
Figure B.8: ACF and PACF of S&P return. The time series displays a smoothly declining
Figure B.9: Comparison of DJI return forecasts from SETAR and AR to the true realised
values. Both models do not provide reliable forecasts
Figure B.10: Comparison of NASDAQ return forecasts from SETAR and AR to the true
realised values. Both models do not provide reliable forecasts
61
Figure B.11: Comparison of NYSE return forecasts from SETAR and AR to the true realised
Figure B.12: Comparison of S&P return forecasts from SETAR and AR to the true realised
Figure B.13: Comparison of DJI return forecasts from LSTAR and AR to the true realised
62
Figure B.14: Comparison of NASDAQ return forecasts from LSTAR and AR to the true
Figure B.15: Comparison of NYSE return forecasts from LSTAR and AR to the true realised
Figure B.16: Comparison of S&P return forecasts from LSTAR and AR to the true realised
63
Figure B.17: Comparison of DJI return forecasts from SLFN and AR to the true realised
Figure B.18: Comparison of NASDAQ return forecasts from SLFN and AR to the true
Figure B.19: Comparison of NYSE return forecasts from SLFN and AR to the true realised
64
Figure B.20: Comparison of S&P return forecasts from SLFN and AR to the true realised
Figure B.21: Plots of AR in-sample fit
65
Figure B.22: Plots of SETAR in-sample fit
Figure B.23: Plots of LSTAR in-sample fit
66
Figure B.24: Plots of ANN in-sample fit
67
C. Information on R Codes
The empirical work was entirely conducted in R. It is also important to note

that running those codes is very much time consuming as estimation of non-
linear models is done numerically, and this procedure is computationally very
demanding. The computation time can easily take about 90 minutes. There-
fore, code where created on R-markdown, so that they can be provided in
HTLM files. So, alternatively to running the codes, there are HTLM files
which can be opened normally in any browser (I have tried internet explorer
and google chrome), and those HTLM files contain all the codes and their
results.
Before running the codes, it is crucially important to first install the Rmark-
down library and set the working directory in the first chunk of
codes in each R file. Data are provided in a document called ”data” which
contain 8 excel files, that is 4 files containing data used for model estimation
for each stock index return and 4 other files used for out-out-sample forecast
comparison.
In total, 4 files of codes were created:
• General: contains codes for general return times series properties. Those
include stationarity tests, the BDS test, ACF, PACF and so on.
Libraries used here are: knitr, quantmod, tseries,psych, stats
• SETAR: Modelling, forecasting with SETAR and forecast comparison.

Libraries used here are: TSA, quadprog, tsDyn, tseries, knitr, stats,
forecast, sandwich, rugarch
• LSTAR: Modelling, forecasting with LSTAR and forecast comparison.

Libraries used here are: knitr, quantmod, tsDyn, car, lmtest, aod, tseries,
stats, forecast, sandwich, rugarch
• SLFN: Modelling, forecasting with SLFN and forecast comparison.

Libraries used here are: knitr, quantmod, tsDyn, tseries, normtest, stats,
forecast, sandwich, rugarch
68
For each nonlinear model, the main steps taken after uploading the data and
computing return closely follow this order (not exactely):
• testing for specific nonlinearity type,
• estimating the nonlinear model and picking the setting for which AIC is
minimum,
• testing for model misspecification,
• predicting return with the nonlinear model,
• estimating the linear model,
• testing for AR model misspecification,
• predicting return,
• comparing forecasts.
Below table C.1 provide an example of how the codes and the results in the
HTLM files looks like.
Figure C.1: Example of results on HTML file
69
D. Declaration of Authorship
I hereby declare that I have composed my thesis titled “Forecasting Stock

Index Return Using Nonlinear Time Series Models and Artificial Neural Net-
works” independently using only those resources mentioned, and that I have
as such identified all passages which I have taken from publications verba-
tim or in substance. I am informed that my thesis might be controlled by
anti-plagiarism software. Neither this thesis, nor any extract of it, has been
previously submitted to an examining authority, in this or a similar form. I
have ensured that the written version of this thesis is identical to the version
saved on the enclosed storage medium.
Kiel, 30.09.2019
Alphonse Malonda Tsumbu
70

Thesis

Uploaded by

Copyright:

Available Formats

Thesis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis

Uploaded by

Copyright:

Available Formats

CHRISTIAN ALBRECHT UNIVERSITÄT ZU

Faculty of Business, Economics and Social Sciences

Forecasting Stock Index Return Using

Submitted by Alphonse Malonda Tsumbu

List of Tables III

List of Figures VIII

2 Forecasting with nonlinear models 3

2.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Point forecast evaluation . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 The Giacomini-White test for conditional predictability 9

2.2.2 Directional Accuracy test . . . . . . . . . . . . . . . . . 9

2.2.3 Excess Profitability test . . . . . . . . . . . . . . . . . . 10

3.1 Threshold Autoregressive Models . . . . . . . . . . . . . . . . . 11

3.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . 13

3.3 Autoregressive model . . . . . . . . . . . . . . . . . . . . . . . . 15

Appendix C Information on R Codes 68

Appendix D Declaration of Authorship 70

4.1 Descriptive statistics of return time series . . . . . . . . . . . . 17

5.1 In-sample fit comparison between SETAR and AR . . . . . . . 24

5.2 Relative out-of-sample fit (loss from SETAR-Naı̈ve/loss from AR) 25

5.3 Relative out-of-sample fit (loss from SETAR-bootstrap/loss from

5.4 Relative out-of-sample fit (loss from SETAR-bock bootstrap/loss

5.5 Relative out-of-sample fit (loss from SETAR-mc/loss from AR) 25

5.6 P-values of Giacomini-White test with quadratic loss, SETAR

5.7 P-values of Giacomini-White test with absolute loss, SETAR

5.8 P-values of directional accuracy test on SETAR and AR forecasts 25

5.9 P-values of excess profit test on SETAR and AR forecasts . . . 26

5.10 In-sample fit comparison between LSTAR and AR . . . . . . . 27

5.11 Relative out-of-sample fit (loss from LSTAR-/loss from AR) . . 27

5.12 Relative out-of-sample fit (loss from LSTAR-bootstrap/loss from

5.13 Relative out-of-sample fit (loss from LSTAR-block bootstrap/loss

5.14 Relative out-of-sample fit (loss from LSTAR-Monte Carlo/loss

5.16 P-values of Giacomini-White test with absolute loss, LSTAR

5.17 P-values of directional accuracy test on LSTAR and AR forecasts 28

5.18 P-values of excess profitability test on LSTAR and AR forecasts 28

5.19 In-sample fit comparison between SLFN and AR . . . . . . . . 29

5.20 Relative out-of-sample fit (loss from SLFN-/loss from AR) . . . 29

5.21 Relative out-of-sample fit (loss from SLFN-bootstrap/loss from

5.22 Relative out-of-sample fit (loss from SLFN-Block bootstrap/loss

5.23 Relative out-of-sample fit (loss from SLFN-Monte Carlo/loss

5.24 P-values of Giacomini-White test with quadratic loss, SLFN

5.25 P-values of Giacomini-White test with absolute loss, SLFN against

5.26 P-values of directional accuracy test on SLFN and AR forecasts 30

5.27 P-values of excess profitability test on SLFN and AR forecasts 30

A.1 BDS test on DJI return time series . . . . . . . . . . . . . . . . 38

A.2 BDS test on NASDAQ return time series . . . . . . . . . . . . 38

A.3 BDS test on NYSE return time series . . . . . . . . . . . . . . 39

A.4 BDS test on S&P return time series . . . . . . . . . . . . . . . 39

A.6 ADF and PP test p-value . . . . . . . . . . . . . . . . . . . . . 40

A.7 P-values from Chan threshold nonlinearity test . . . . . . . . . 40

A.12 SETAR parameter estimates for the DJI time series . . . . . . 41

A.13 SETAR parameter estimate for the NASDAQ time series . . . 41

A.14 SETAR parameter estimate for the NYSE time series . . . . . 42

A.15 SETAR parameter estimate for the S&P time series . . . . . . 42

A.16 The Jarque-Bera test on SETAR residuals . . . . . . . . . . . . 42

A.17 LSTAR nonlinearity test, DJI time series . . . . . . . . . . . . 42

A.18 LSTAR nonlinearity test, NASDAQ time series . . . . . . . . . 43

A.19 LSTAR nonlinearity test, NYSE time series . . . . . . . . . . . 43

A.20 LSTAR nonlinearity test, S&P time series . . . . . . . . . . . . 43