Hidden Marlov & SVM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Analysis of Hidden Markov Models and Support

Vector Machines in Financial Applications

Satish Rao
Jerry Hong

Electrical Engineering and Computer Sciences


University of California at Berkeley

Technical Report No. UCB/EECS-2010-63


http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-63.html

May 12, 2010


Copyright 2010, by the author(s).
All rights reserved.

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.

Acknowledgement

We thank Simlio LLC and all its team members for providing many useful
market tools to allow us to graph and display market trends with ease. We
also thank Professor Peter Bartlett for providing many useful lectures about
statistical graphical models, such as exponential families and maximum
likelihood, and for helping us get started on this topic. Furthermore,
Professor Satish Rao played a crucial role in helping us determine what
feature sets to try as well as suggesting different trading algorithms.
Satishs ideas were instrumental in developing the manipulation detection
section of this paper.
Analysis of Hidden Markov Models and Support Vector
Machines in Financial Applications
Jerry Hong
University of California, Berkeley
Soda Hall, 2599 Hearst Ave
Berkeley, CA 94720-1776
[email protected]

ABSTRACT exists symmetric information among the agents. We


generalize the market as a perfectly competitive
world where individuals are price-takers and
This paper presents two approaches
present an over-simplistic representation of the
in helping investors make better decisions. financial market. The conventional theories that we
First, we discuss conventional methods, focus in this paper include EMH (Efficient Market
such as using the Efficient Market Hypothesis), and technical indicators, such as the
Hypothesis and technical indicators, for SMA (Simple Moving Average) or MACD
(Moving Average Convergence / Divergence) [1].
forecasting stock prices and movements. These three tools are used to help model the world
We will show that these methods are of finance and assist investors in predicting future
inadequate, and thus, we need to rethink events in the market. For instance, technical
the issue. Afterwards, we will discuss indicators are often used by stock traders for
using artificial intelligence, such as predicting future prices using historical trends.
Hidden Markov Models and Support
Vector Machines, to help investors gather These conventional tools offered much insight into
the workings of the financial market. However,
and compute enormous amount of data
they provide only a macro-simplification that does
that will enable them to make informed not always reflect how the real market works. There
decisions. We will leverage the Simlio* are definitely limitations to these tools that prevent
engine to train both the HMM and SVM them from modeling the market in a more focused,
on past datasets and use it to predict micro manner. One of the major issues is that
many conventional finance theories only take in so
future stock movements. The results are many factors. This limited scope prevents us to
encouraging and they warrant future accurately model the real market that has countably
research on using AI for market forecasts. infinite number of patterns. We need a model that
can constantly adapt to the dynamic nature of the
*
Simlio LLC is a startup co-founded by Jerry Hong. It is currently market. Technical indicators can only help an
a stock research platform on the web that enables users to draw investor so much before the different combinations
graphs at ease as well as perform intensive formula calculations to and patterns causes the investor to question whether
see how well an idea would profit over time.
any formula actually works consistently.

1. INTRODUCTION
This is where AI models such as HMMs and SVMs
come into play. Using these tools, we can achieve a
In much of traditional finance theories and more realistic micro-representation of the market
modeling, we are under the assumption that there while overcoming the limitations of the earlier

Page1
techniques. They allow us to model many different
historical patterns and predict how they change
through time. In this paper, we will first review
conventional models and explore their benefits and
problems. Focusing on their shortcomings, we will
see how HMMs and SVMs can potentially
overcome these weaknesses while maintaining the
advantages of classical finance theory[2].

1.1 Introduction to Conventional Approaches


1.1.1 Efficient Market Hypothesis
http://www.econ.berkeley.edu/~szeidl/ec136/lecture16.pdf
The EMH comes in three different forms: weak,
Figure 1: Value and Size Effect from 1927-2005
semi-strong, strong. All of them claim in some way
that the financial markets are information
efficient and thus using past/historical data will
not help predict future prices because the current
prices already take that information into account.
The semi-strong and strong forms take a further
step and claim that it is futile even if one knows
public information and private information
respectively.

This theory has been rather controversial for the


last few decades because there have been some
examples in history of where it holds and others
where it seems to suggest that the EMH cant
certainly be true. One example of that supports the http://www.econ.berkeley.edu/~szeidl/ec136/lecture16.pdf
EMH is the public announcement of the value and Figure 2: Value and Size Effect after 1980
small stocks vs. growth and large stocks (See Fortunately for investors, there are many
Figures). Before the announcement, this idea has shortcomings to this theory. For one, the empirical
been used by many investment firms to help them evidence for whether this theory holds or not has
profit above the norm for some number of years. been mixed. For instance, many papers are
Essentially, it is better to buy small/value stocks published showing that low P/E ratio stocks seem
than large/growth because they have more potential to provide greater returns[3]. Another example is
and they are undervalued. So before the 1980s, if that the loser stocks today are usually much more
you bought these types of stocks, you were more undervalued than the winner stocks, so they get
likely to receive a higher profit. However, after less attention. Historically, the loser stocks yield
1980, a PhD student discovered the trend that small higher average returns than the winner stocks at
stocks are being undervalued and many public the time; hence, this becomes an endless cycle. If
investors dont pay much attention to it. He decided the EMH were to hold, then these instances should
to publish his findings and as a result, this turned have happened and there will be no point of trying
into public information. As one can see from the to beat the market[4].
diagrams, it is no longer feasible to just buy small
stocks to profit greatly, for the EMH has taken that
factor into account and that information is no However, it is very difficult to determine whether
longer usable to gain an edge in the market. or not the EMH holds. Thus, we need new methods

Page2
to help decide whether it is able to predict the 200 days on the stock Apple from September 2008
market. We need a tool that is very versatile that to February 2009. Investors pay very close attention
can take in multiple factors into account, such as to the indicators pointed out by the red circles on
combinations of historical data, news, peoples the figure. When the EMA 50 crosses below the
behaviors, etc. One potential way of doing so could EMA 200, it usually signals a downtrend in the
be to use HMMs and SVMs. future. As we see above, the Apple stock fell from
$140, at the time the signal got triggered, to around
$80. Thus, the investors who heeded the signal and
1.1.2 Technical Indicators got out of the market or shorted the stocks were
much better off than those who stayed in the
Another conventional method in financial market.
economics is to use technical indicators to predict
stock movements and prices. There are numerous
Another signal that investors pay close attention to
indicators that investors use and some of the most
is the EMA 50 when the prices are below it. This
commonly used ones are the EMA (exponential
line usually symbolizes a resistance point and that
moving average), MACD (Moving Average
unless a stock has regained its momentum and
Convergence / Divergence), Bollinger Band, etc.
health again, the prices will not go above this line.
We will point out one of most common indicators
As we see from December 2008 to February 2009,
that many investors use to help them pinpoint
Apples stock prices hit the resistance line again
relative resistance points.
and again, but it wasnt able to break the barrier
caused by the EMA 50.
Looking at Figure 3, we used a stock investment
tool, called Simlio, to draw out the EMA for 50 and

Courtesy of Simlio (www.simlio.com)


Figure 3: EMA 50(Green) and EMA 200(Blue) for Apple

Page3
Courtesy of Simlio (www.simlio.com)
Figure 4: MACD (26, 12, 9) for Apple

Another indicator that many investors use is the not help explain why some stocks are in a
MACD [5]. As a matter of fact, the MACD used to downtrend/uptrend. For instance, when the
work very well when hedge funds kept this concept mortgage crisis hit the economy around a year ago,
private. However, when the idea of this indicator there are certain companies that investors should
was published, it started working less efficiently, have been wary about. For instance, financial
probably due to the EMH. Nevertheless, it is still companies that made huge margin bets on mortgage
very useful in predicting up and down trends. deals (such as Lehman Brothers) or even
Looking at Apples stock prices again, we label commercial banks that generously gave very low
four of the many instances of when the MACD interests on loans (such as Washington Mutual)
lines crossed each other. In the diagram above, were definitely ones that investors should have
when the orange line goes below the green line, it thought twice about before purchasing any of their
signals a downtrend in the near future. When the stocks. Investors that did their news research and
orange line crosses and goes above the green line, it pieced the information together either shorted these
signals an uptrend in the near future. Although this stocks or avoided them. Those who purchased
indicator is not perfect, the four cases we showed stocks suffered greatly as both of their stocks
above do a very good job in predicting the future at dropped to essentially 0. Solely using technical
the given times. Again, investors who used this indicators would not have told any investors to
indicator may be at a better position to trade than back away from these stocks, but utilizing other
those who do not do research. factors and information in the economy should have
been more than sufficient to cause the investors to
be suspicious.
Although technical indicators appear to be very
helpful, there are many shortcomings. First, most
indicators only analyze historical prices and do not This is where statistical learning theories, such as
take any other factor into account. Thus, it is HMMs and SVMs, can make a significant
limited to what kind of information it can take in as difference. They allow us to model the market
an input. This becomes problematic because it does based on many different information. For instance,

Page4
it can model how historical prices, fundamentals, B = {bj(Ot)} (observation emission matrix),
and current news affect stock prices. They where bj(Ot) represent the probability of
eliminate the weaknesses of being limited to just observing Ot at state j
analyzing historical prices and they can be = {i} the prior probability
exploited to take other factors in the economy into = (A,B, ) (the overall HMM model)
account. The section below will discuss the true
potential of applying HMMs and SVMs in Using HMMs, we can somewhat accurately answer
forecasting the market. the following three questions[7]:

1. Given the model , what is P(O | ) where


1.2 Introduction to Modern Approaches O=O1, O2, , OT?
1.2.1 Hidden Markov Models 2. Given the observation sequence O and a
model , what is the best/most likely state
sequence q1, q2, , qT?
The Hidden Markov Model (HMM) is a statistical
3. Given the observation sequence O and a
model that is often used in pattern recognition
space of models found by varying the
applications, such as speech, handwriting,
model parameters, what is the best model?
bioinformatics, etc[6]. The user first needs to decide

on how many hidden states are possible for each
unobserved state. Moreover, the initial starting We will use the forward-backward algorithm to
probability of each of the hidden states must be solve P(O | ) and use Viterbi algorithm to answer
specified. Afterwards, the HMM model needs to be #2. As for #3, we will look into Baum-Welch
trained on a set of data where we have a set of algorithm to train the HMM for the best parameters
possible observation emissions for each unobserved and test it on a dataset.
state.
The Baum-Welch algorithm is a special case of EM
(Expectation-Maximization) algorithm[9], allowing
us to find the best parameters for the model . The
EM algorithm is an iterative method used to find
the maximum likelihood estimates of parameters
when there is hidden data[10]. There are two steps in
each iteration of the EM algorithm: the E step and
Courtesy of Wikipedia the M step. We use conditional expectation to best
Figure 5: States of HMM
estimate the missing data using the given observed
features and most updated model. In the M step, we
maximize the likelihood function assuming that we
We followed similar notations from Hassan and have the missing data. One great property of the
Naths paper[7]. EM algorithm is that it is guaranteed to converge
because we increase the likelihood at each iteration.
N = # of states in the model
M = # of distinct observation symbols per Here is a quick derivation of the EM algorithm[10].
state
T = length of observation sequence L()=ln P(X|)
O = observation sequence
Q = state sequence q1, q2, , qT in the L() is the log likelihood function of and X is a
Markov model random vector. Our goal is to find that maximizes
A = {aij} (transition matrix) where aij P(X| ). At each step during the iteration, we want to
represent the transition probability from make an improvement in maximizing L(). Recall that
state i to state j ln(x) is a strictly increasing function. Thus, at each

Page5
PHX,z, L PHz L
argmax9PHzX, currentL ln =
PHz, L PHL
iteration, we want our new L() to be greater than the
old one:

argmax8EzX,current 8lnPHX,zL<<
z

L HL > L HcurrentL

L HL L HcurrentL = lnP HX L lnP HX currentL


The E step is used to determine:

Ez X, current 8lnP HX, z L<


Now to make things interesting, we will introduce a
hidden random vector Z, whose given realization will
be noted as z. P(X| ) is now:

P HX L = P HX z, L P Hz L
The M step is used to maximize the above
z
expression with respect to . We continue the EM
LHLLHcurrentL
iterations until we maximized our log likelihood. Note

i y
that a special feature about this algorithm is that
lnj
j z
zlnPHXcurrentL
jPHXz,LPHzLz
j z
convergence is guaranteed since we are using log
kz { functions.
i PHzX,currentL y
lnj
j
jPHXz,LPHzL PHzX,
j
z
z
zlnPHXcurrentL
z
kz currentL { The Baum-Welch algorithm is a particular version
i PHXz,LPHzL z y
lnj
j
jPHzX,currentL PHzX,
z
zlnPHXcurrentL
of the generalized EM algorithm. It uses the
j z
kz L
current {
forward-backward algorithm with an additional two
PHXz,LPHzL
auxiliary variables[11]. Using the notation noted
PHzX,currentL lnPHXcurrentL
PHzX,currentL
above in Hassan and Naths paper, the Baum-
Welch re-estimation formulas aim at adjusting the
PHXz,LPHzL
z

= PHzX,currentL = HcurrentL parameters of the model = (A,B, ), such that we


z PHzX,currentL achieve the maximum value of P(O | ). Before we
use the Baum-Welch algorithm to improve our
parameters, an initial HMM must be constructed.
Thus, we now have: We could potentially use the K-means algorithm, or
we could find some other way of guessing the
L HL L HcurrentL + H currentL initial values[12]. Lets note a couple of new
l H currentL = L HcurrentL + H currentL, notations:
where L HL l H currentL
t HiL = P Hit = i O, L
P Hit = i, O L t HiL t HiL
P HO L P HO L
Our objective is to figure out what values of =
would maximize L(). In order to do that, we could
try to maximize l( | current) instead. So,

current+1 =argmax8lHcurrentL<
t(i) is the probability that we are in state i at time t
given by the observation sequences O and model .
PHXz, LPHzL

argmax9LHcurrentL+PHzX,currentL =
Furthermore,
PHzX,currentL
argmax9PHzX, currentLPHXz,LPHzL= t Hi,jL = P Hit = i,it+1 = j O, L
z

PHit = i,it+1 = j, O L t HiL aij bj HOt+1L t+1 HjL


PHO L PHO L
z
=

Page6
t(i,j) is defined as the probability of being in state i Let mt[qt] store the most likely state at time t:
at time t and transitioning to state j at time t+1,
mt@qtD = max P Hq1:t1, qt, o1:TL
given the observation sequences O and model .
The simplified equation can be easily deciphered q1:t1
as: max P Hq1:t1, o1:t1L P Hqt qt1L P Hot qtL
q1:t1

P Hot qtL max P Hqt qt1L max P Hq1:t1, o1:t1L


t(i) = observation sequences O1,,Ot q1:t1 q1:t2

aij = transitioning from state i to j P Hot qtL max P Hqt qt1L mt1@qt1D
q1:t1
bj(Ot+1) = seeing observation Ot+1 at state j
t+1(j) = remaining observation sequences Ot+2 Now that we have methods to train the HMM
to OT model for the best parameters and predict the most
likely current state, we will be able to apply this
modern approach to model financial time series:
We also note that: stocks. We can vary our inputs by supplying
different combinations of technical indicators and
1. =Expectednumberof news. The HMM will try to train itself using
historical prices and data. The hidden states can
transitionsfromstatei
either be discrete or continuous. For discrete states,
2. , =Expectednumberof we could potentially want HMM to predict how
transitionsfromstateitostatej much a particular stock would move tomorrow: big
drop, low drop, neutral, low increase, or high
Then, the Baum-Welch alpha-beta recursion update increase. If we use a continuous hidden state, we
equations are as follows: would most likely be predicting tomorrows closing
price. We will be using HMMs to learn and locate
i = t HiL, 1 i N
patterns from the past and apply it to todays stock

T
t=1 t Hi, jL
price behavior, which essentially answers the
1 question about what tomorrows predicted price is:

t=1 t HiL
T
aij = 1 P(qt+1|qt).

t=1and Ot= k t HjL


bj HkL =
T

t=1 t HjL
T
However, there are issues with using HMMs to
model financial time series data[8]. HMMs have
much success in models that are not sensitive to the
concept of time, such as language and video
After finding the best parameters for the model , processing. However, once the sequence of
we will use the Viterbi algorithm to find the most observed data becomes important, there is an extra
likely sequence given some set of observations. factor, the time dimension, the HMMs cannot really
This will consequently also give us the current most account for. Moreover, financial data has an even
likely state and we can leverage that to predict the more unique property in the sense that current data
future state, which in our case would either be the is more important than past data because old
future price or price movement. To demonstrate patterns and trends may no longer apply anymore.
this concept with equations[13]: Thus, we will also look into a different machine
learning approach and compare the two results.
Q = q1:T = argmaxPHq1:T o1:TL = argmaxPHq1:T, o1:TL
q1:T q1:T

Page7
1.2.2 Support Vector Machines where (-) is a nonlinear function that maps the given
inputs into some higher dimensional space. In case we
The support vector machine (SVM) is a data cannot find the separating hyperplane in this space,
we introduce additional variables: i, where i=1, ..., N.
classification technique that has been recently
After this, we will attempt to solve this minimization
shown to outperform other machine learning
problem:
techniques when applied to stock market
forecasting[8].
min J Hw, iL = w w + C i
1 T N

w,b,i 2
Similarly to HMMs, given a set of training
s.t. yiAwT Hx HiLL + bE 1 i
i=1
examples, SVMs will try to build a model. Each
training data instance is marked as belonging to one i 0, i = 1, ..., N
of two categories. The SVM will attempt to
separate the data instances into those two categories
with a p-1 dimensional hyperplane, where p is the The solution to the above model will be the optimal
size of each data instance. This model can then be separating hyperplane.
used on a new data instance to predict which
category it would fall onto. The maximum margin For this paper, we focused primarily on the radial
hyperplane can be represented as[14]: kernel because K. Kims paper has shown that by
tweaking the RBF kernel slightly, the SVM can give
y HxL = b + i yi K Hx HiL, xL superior results versus using other types of kernels
when forecasting stock market prices[14]. The RBF
kernel is capable of handling relations between class
Vector x is a test example and yi is the class value labels and attributes that are nonlinear by nonlinearly
of the training example x(i). In the equation, the mapping samples into higher dimensional space if
required, thus giving us the extra buffer just in case
parameters of the hyperplane are b and i. b is a real
our feature set cannot be separated in a linear fashion.
constant, and i are non-negative real constants.
The function K(x(i), x) is a kernel function and
SVMs are powerful in the sense that one can Moreover, SVMs are very promising because it
substitute different kernel functions. The four basic reduces the danger of overfitting. With other machine
kernel functions are[15]: learning techniques, such a neural networks and
possibly even HMM, we risk training the model too
well and it starts to overfit our training data. When
1. Linear:K(xi,xj)=xiTxj that happens, we get significantly poorer results when
2. Polynomial:K(xi,xj)=(xiTxj+r)d,>0 we start providing testing data to the trained model.
3. Radial(RBF):K(xi,xj)=exp(||xixj||2),>0 Furthermore, since we are using the RBF kernel, we
only have to tune 2 parameters C and . That is
4. Sigmoid:K(xi,xj)=(xiTxj+r)
relatively little work compared to other machine
learning methods.

The classifier can be constructed as follows[17]:


Thus, given the advantages of using SVMs, we will

wT Hx HiLL + b 1, if yi = 1
try to train it on a few years worth of data and use it to
forecast stock prices/movements. We will also be
wT Hx HiLL + b 1, if yi = 1 comparing the results with our HMM experiments
yiAwT Hx HiLL + bE 1, i = 1, ..., N
too.

Page8
2. APPLYING MODERN APPROACHES We trained the HMM on daily lag profits from the
AND ITS RESULTS beginning of 2004 to mid September 2009. We then
tested and trained the data on the last 30 days to
Forecasting financial time series is difficult because calculate how accurate our HMM model is in
there is no single fixed model that explains the forecasting price movements.
changes in prices all the time. In order to account Here are the following results:
for the dynamic pattern changes, we need to use a
tool that can adapt to new situations. The Hidden accuracy
Markov Model (HMM) with a Gaussian mixture at ticker prediction
each state has the potential in tackling such a pot 0.5172
problem[8]. We will use a Markov chain to represent aapl 0.5172
stock movement. Using this model, we are able to gs 0.5517
make predictions to answer questions, such as
mos 0.5517
What is the probability of seeing a big price drop
tomorrow given todays state and observations. ibm 0.5172
We will show that the results are somewhat msft 0.7586
promising; however, due to the limitations of gg 0.5862
HMMs mentioned above, we looked into SVMs in bac 0.5862
hopes of it giving us better results. goog 0.5517
c 0.5172
2.1.1 Hidden Markov Model with K-means sp_500 0.5172
Table 1: Results from HMM using K-means learner
In this experiment, we used the Simlio engine to (Trained from 01/2004-09/2009 / Tested from
provide us the data we need to pass into the HMM. 09/2009 11/2009)
We initially used lagged profits as the sole
observation in forecasting future price movements.
We decided to run 11 experiments (10 stocks and 1 Although the results are very promising, we must
index), and in each experiment, we used the k- note a major issue with this approach. Since we use
means learner algorithm in Jahmms library to the k-means learner algorithm on the lagged profits,
cluster the data points into one of the 5 hidden it just so happened that the 5 hidden states it chose
states: big price movement up, small price where the 5 that we wanted. If we were to add
movement up, no movement, small price movement additional observation features in, such as EMA,
down, big price movement down[18]. MACD, etc, we would not be able to control what
the 5 states are. Because of this issue, we started
using the Baum-Welch algorithm.

2.1.2 Hidden Markov Model with Baum-Welch and


Viterbi algorithms

In order to leverage the full power of the Simlio


engine and HMMs to potentially develop more
meaningful predictions, we decided to move away
from the k-means learner. In our second
experiment, we trained the HMM with the Baum-
Courtesy of Simlio (www.simlio.com) Welch algorithm and then used the Viterbi
Figure 6: Pot (Sept 09 Nov 09) algorithm to detect our current most likely state in

Page9
order to check how accurate our model is. Again, 1. Weonlyusedlaggedprofitsasourinputs
we used our lagged profit as the sole input. 2. Ithasnoconceptofrecentnewsandhow
itcouldpotentiallyaffectthemarket
For this experiment, we mimic the idea of a 3. Trendsofstocksmaygobeyondjustone
dynamic training pool that is presented in Y. day.Astocksmovementupordown
Zhangs paper[8]. Essentially, our training window
yesterdayisprobablynotanindicatorof
will always be of the same constant size and we
would shift the training pool across time. We chose whatitwoulddotomorrow
this approach because we want to train the new test
data instances one at a time after we have done accuracy
predicting whether it was accurate for that day. This ticker prediction
way, we will include our most recent data in the pot 0.5167
training pool, and it will hopefully always detect aapl 0.5611
new patterns and put the same emphasis on it, since gs 0.5
the training size is constant. mos 0.55
ibm 0.5222
msft 0.5778
gg 0.5556
bac 0.5167
goog 0.5517
c 0.5222
sp_500 0.5172
Table 2: Results from HMM using Baum-Welch
and Viterbi algorithms
Tested and retrained for last 180 days

Given that our results are mediocre, we decided to


look into combining the k-means learner with
Baum-Welch and Viterbi.
Figure 7: Dynamic Training Pool
2.1.3 Hidden Markov Model with K-means, Baum-
Welch, and Viterbi algorithms
After training the HMM on data from 01/01/2004
to 180 days ago, we started testing the model on the
most recent 180 days once at a time. After every One issue that we ran into for the second
test, we popped off the oldest training data instance experiment is that the Baum-Welch algorithm
and we appended the recent tested test instance, required us to set an initially matrix. Using this
keeping the training pool the same size. Then we matrix, the BW algorithm will iterate through the
retrain and test again. We do so until we finished EM iterations to find the best parameters for the
testing the prediction accuracy on the last 180 days. model. However, that initial matrix is not always
trivial to setup and giving the wrong initial matrix
could mean that the Baum-Welch algorithm will
Table 2 summarizes the result of our experiment. find the wrong local maximums. Thus, we decided
As we can see, the average accuracy prediction is to utilize the k-means learner to initialize the matrix
roughly around 53%. There are several factors that for us before we run the Baum-Welch algorithm.
could account for why the accuracy is so low:

Page10
Realvs.PredictedClosingPricesforPotash
(20042009)
300

250

200

150

100

50

1026
1067
1108
1149
1190
1231
1272
1313
1354
42
83
124
165
206
247
288
329
370
411
452
493
534
575
616
657
698
739
780
821
862
903
944
985
1

RealClosingPrice(Pot) PredictedClosingPrice(Pot)

Figure 8a: HMMs Predicted Closing Prices vs. Real Prices for Potash
(01/01/2004 10/31/2009)

Realvs.PredictedClosingPriceforGoogle(2004
2009)
800
700
600
500
400
300
200
100
0
109
145
181
217
253
289
325
361
397
433
469
505
541
577
613
649
685
721
757
793
829
865
901
937
973
1
37
73

1009
1045
1081
1117
1153
1189

RealClosingPrice(Goog) PredictedClosingPrice(Goog)

Figure 8b: HMMs Predicted Closing Prices vs. Real Prices for Google
(01/01/2004 10/31/2009)

Page11
Using Viterbis algorithm on the data, we were able 3. EMA200
to demonstrate what the HMMs predictions are for 4. MACD
the prices from 2004-2009. We drew out the
5. RSI
predictions on the figures above.
6. ADX
7. Lagprofits
Figure 8a is for a stock called Potash and Figure 8b
8. High
is for Google. For both the stocks, we let the K-
means algorithm find around 10+ states. 9. Low
Afterwards, we used the Baum-Welch algorithm to 10. Closingprice>EMA200
train on the HLOC (high, low, open and close
prices), assuming Gaussian distribution, and figure We trained the SVM from 01/01/2004. While
out what the best parameters are for the model. training, we classify each instance as a buy or sell
Lastly, we decided to validate the accuracy on the signal. We stopped training on X days before
training data by using Viterbis algorithm at each 11/1/2009, where X is varied from 30 to 180 (1
step. From the graphs above, HMM was able to month to 6 months). We then test our trained model
decently predict the stock movement. This warrants on our test data, which is the X number of days we
more research in this field, and this makes the left out of our training. Initially, we did not retrain
statement that the market is always efficient after testing each test data instance. We will show
somewhat suspect. the results of several experiments. We tried using
the linear kernel first, and then we changed it to the
RBF kernel. We also tried tweaking the parameters
2.1.4 Support Vector Machines with Technical in other ways besides what was suggested by
Indicators Kyoung-jae.

Although our HMM results were relatively


impressive, we have done more research and C 1
researchers have suggested the SVMs outperforms gamma 0
HMM and other machine learning methods by a kernel linear
significant margin. We decided to use libraries accuracy
JavaML[19] and LibSVM[20] to investigate that ticker prediction
claim. pot 0.5333
aapl 0.5
In order to train the SVM with a radial kernel, we gs 0.5333
first researched on what parameters to use. Based mos 0.5667
on K. Kims paper, the best C is 78 and gamma is ibm 0.5
25[14]. With these two parameters, Kyoung-jae was
msft 0.5333
able to build a model that predicts test data with
57.83% accuracy on average. In order to replicate gg 0.6333
similar results, we leveraged the Simlio engine to bac 0.6333
give us the prices and technical indicators of any goog 0.5667
stock we want. c 0.5
sp_500 0.5667
In this experiment, we decided to train the SVM on Table 3: Results from SVM with Linear kernel
the following features: Test data: Last 30 days
1. EMA7
2. EMA50

Page12
C 100 C 78
gamma 78 gamma 25
kernel rbf kernel Rbf
accuracy accuracy
ticker prediction ticker prediction
pot 0.5667 pot 0.5333
aapl 0.5333 aapl 0.5
gs 0.6 gs 0.6333
mos 0.5 mos 0.5333
ibm 0.6333 ibm 0.6333
msft 0.8333 msft 0.6333
gg 0.5333 gg 0.6333
bac 0.5667 bac 0.5667
goog 0.5333 goog 0.6667
c 0.5333 c 0.5333
sp_500 0.5333 sp_500 0.5333
Table 4: Results from SVM with RBF kernel 1 Table 6: Results from SVM with RBF kernel 3
Test data: Last 30 days Test data: Last 30 days

Table 3 shows the results of using a plain linear


kernel. With that result, we can already see that
SVMs may have more potential than HMMs.
C 78
gamma 100
kernel rbf When we switched our kernel to RBF, we started
accuracy
getting even better consistent results. For instance
Table 4 and 5 differ because the parameters are
ticker prediction
slightly changed. It turns out the Table 4 seems to
pot 0.5667 be better than 5 on average. For Microsoft, it even
aapl 0.5333 has an 83.33% prediction accuracy rate in the last
gs 0.6 30 days when the SVM predicted whether or not to
mos 0.6 buy/sell the next day.
ibm 0.5667
msft 0.8 We use Kyoung-jaes suggested parameters in
gg 0.5333 Table 6, and we get a rather consistent prediction
bac 0.5667 average of around 60% for most stocks. This is
goog 0.5333 extremely promising, but we wanted to make sure
that this was not just a coincidence since we only
c 0.5
tested the model on 30 days. Taking Professor
sp_500 0.5333 Satishs advice, we decided to look into the
Table 5: Results from SVM with RBF kernel 2 situation more, and we started testing on 180 days /
Test data: Last 30 days 6 months worth of data.

Page13
Recall that for the experiments that we did above,
we did not retrain our model after running through C 78
the tests each day for 30 days. This was probably
gamma 25
fine for one month because it is short. However,
since we are now testing on 180 days worth of day, kernel rbf
new patterns and trends could definitely arise accuracy
during that period and it might significant to retrain ticker prediction
the model to take them into account when we test pot 0.5028
the later half of the testing data. We ran both aapl 0.5251
experiments one without retraining and one with gs 0.5196
retraining.
mos 0.5251
ibm 0.5307
Comparing the two results on the right, we see that msft 0.5363
in general, the one without retraining performs gg 0.5698
around 50%, almost like a coin toss. The
bac 0.5363
experiment with retraining performs better in
almost all cases. More importantly, the prediction goog 0.5978
accuracy for the S&P500 is as high as 60%, which c 0.5195
definitely warrants further research. sp_500 0.5028
Table 7: Results from SVM with RBF kernel 3
2.1.5 Support Vector Machines with Technical Test data: Last 180 days. No retraining
Indicators and News

We were able to add a module to the Simlio engine


that crawls the web and parses out news articles of
any stock for every day. We started to incorporate C 78
news and add it as an additional feature to the gamma 25
SVM. However, due to time constraints, we kernel rbf
decided to keep it simple and just use the number of accuracy
news articles as a feature of the SVM. Our hunch ticker prediction
was that if something important happens to a pot 0.5667
company (whether bad or good), there will be a lot aapl 0.6
of news about it the previous day. Hence, knowing
this, perhaps the SVM can detect when there could gs 0.5667
be a relatively big jump in prices the next day. mos 0.5333
ibm 0.5333
msft 0.6666
Based on our results from the experiment, it seems
that there could be some potential to our idea. In gg 0.6
most cases, the retrained version outperforms the bac 0.5
other by a few percentages, which could be enough goog 0.5
to be significant. c 0.5
sp_500 0.6
Table 8: Results from SVM with RBF kernel 3
Test data: Last 180 days. With retraining

Page14
C 78 Comparing the results of using news as a feature for
gamma 25 the SVM on Table 9 and 10, we see that using news
kernel rbf does change our prediction accuracy. In most cases,
No
it made the SVM accuracy even higher. Using the
parameters that Kyoung-jae suggested, we were
ticker News News
able to increase the accuracy by around 3% for
pot 0.5333 0.6 most stocks, and notice that we have increased the
aapl 0.5 0.5333 accuracy of predicting the S&P500 index to 67%!
gs 0.6333 0.6667
mos 0.5333 0.5333
We did some further test by tweaking more
ibm 0.6333 0.5 parameters. We realized that if we deviate from
msft 0.6333 0.6667 Kyoung-jaes parameters and use C=78 and
gg 0.6333 0.5333 gamma=100, we would further increase the average
bac 0.5667 0.5 prediction accuracy. Using these parameters, the
goog 0.6667 0.6 SVM has a 70% of accurately predicting whether to
buy or sell the S&P500 index every day for the last
c 0.5333 0.6333
30 days!
sp_500 0.5333 0.6667
Table 9: Results from SVM using both no news and
with news (RBF kernel 3) 3. MANIPULATION DETECTION
Test data: Last 30 days. Without retraining
Historically, there have been many cases with stock
investors trying to artificially influence stock
prices. In this final section of the paper, we will
C 78 investigate how abnormal changes in volume could
gamma 100 potentially suggest a manipulator in the market, and
kernel rbf use that knowledge with our SVM engine to see if
No there is a way to ride the wave with a
ticker News News manipulator to gain some additional profit.
pot 0.5667 0.7
aapl 0.5333 0.6667 3.1 Introduction to Manipulation Theories
gs 0.6 0.5333 3.1.1 The Rational Model
mos 0.6 0.6
ibm 0.5667 0.5333 Allen and Gale[21] have classify 3 different types of
msft 0.8 0.6333 manipulations: information-based, action-based,
gg 0.5333 0.6333
and trade-based. The first two are now regulated
by the SEC, so chances of them happening, such as
bac 0.5667 0.6333 insider trading, has been significantly reduced.
goog 0.5333 0.5667 However, every now and then, we will see cases
c 0.5 0.5667 like Enron.
sp_500 0.5333 0.7
Table 10: Results from SVM using both no news On the flip side, trade-based manipulations are
and with news with (RBF kernel 2) much harder to detect due to the difficulty in
Test data: Last 30 days. Without retraining distinguishing a large trader from a manipulator.
Both of them will probably have enough capital to
buy/short large volumes of any particular stock, so

Page15
to a regular investor, he is uncertain whether or not achieve a positive profit. The proof of this claim is
there is a manipulator in the market. left for the reader to read in their paper.

Allen and Gales model assumes that the regular 3.1.2 The Behavioral Model
traders are rational and that there is information
asymmetry because large traders (whether Complementing Allen and Gales model, Mei, Wu
manipulator or not) have access to private and Zhou[22] incorporated behavioral studies into
information. Under these conditions, there are 3 the model. They did so because there are a large
steps: number of cases where the asset prices deviate from
their fundamental values and it was difficult to
explain using the rational model. Thus, they
investigated this situation and utilized the fact that
people do not always act rationally to try to explain
anomalies in the previous model.

They used various empirical studies to show how


manipulators can exploit the irrational traders
behavior biases. For instance, using Jegadeesh and
Titman[24] report, which suggests that investors can
make substantial abnormal profits by selling past
losers and or buying past winners, Mei, Wu and
Zhou suggests that manipulators can exploit this
behavior bias to cash in some profit. Since these
investors are momentum traders, they will buy
fewer shares in a down market. Furthermore, when
Figure 9: Allen and Gales Model[21]
these traders experience a downward market, they
are more inclined to keep the shares instead of
At time t=0, there is no large trader in the market. selling them in hopes that the market will bounce
At time t=1, there is an probability that a back up. Incorporating this behavior bias into the
informed large trader enters and a probability that model makes it more realistic and thus more
a manipulator enters. Since the manipulator has the applicable in real world situations.
resources to mimic a larger trader, the dotted circle
represents that in the common investors
This new model utilizes a couple of key
perspective, there is an uncertainty whether or not
assumptions:
the large trader is a manipulator or not. Then, at
t=2, the manipulator will take profits and t=3 is
when the true value of the stock is revealed. If the 1) Time begins at t=0 and ends at t=T.
manipulator was in the market before, the price of Behavioral traders enter the market at the
the stock will drop back down to its fundamental
beginning of each time period and they are
value VL. If there are good news, well see that the
true value of the stock is actually high, VH. price takers. Each has a probability of q1 of
buying a share if Pt > Pt-1, and q2 if not. Due
to the behavioral bias, we assume q1 > q2.
The model is used by Allen and Gale to prove that
Behavior traders like to take immediate
under the conditions stated above, there can exist a
pooling equilibrium in which the manipulator can profits. So the moment Pt > P0, they will
successfully mimic a large trader and thus always sell their shares. Also, if Pt >= Pt+k (k =

Page16
number of days he has kept his shares), In figure 10, we see that tu = 6, which means the
there will be a probability of q3<1 to take a manipulator is moving the prices of the shares up 6
loss and liquidate all his shares at time t+k. consecutive days by purchasing shares. td = 3
means that it takes 3 days for the manipulator to
Moreover, at t=0, the price of the stock is at
liquidate all his shares and take a profit at the high
its fundamental value. price. Lastly, T=10 means that the manipulation
2) Manipulator enters the market at t=1 ends at day 10 and the investors have realized that
without any prior shares. the fundamental price is actually at its initial value
3) Arbitragers enters at t=1. Shares are traded and the sudden jump in price before was due to a
based on recent price movements. manipulator in the market.
Arbitragers keep the market fluid because if
prices increased, they will sell some shares Using the assumptions to solve the model, Mei, Wu
to take profits. If prices decreased, they will and Zhou focused on two key propositions:
buy more shares. They will submit the
following order at time t: 1) By the end of tu, the manipulator has
Da,t = -a(Pt-Pt-1) = -a(Pt) accumulated N=a*tu* shares with an
Where a>0 and a = arbitrage parameter
average cost of P0 + / share. We get
4) A manipulator tends to move the asset price
by > 0 for tu consecutive days. He will this from assumptions 1-3 because the
also start liquidating his shares at time manipulator buys a* shares at price
t=tu+1 until T-1 where he sells all his P0 + t. Recall that the manipulator tends to
shares. Therefore, we will let td = T-1-tu or move the price by and in order to do so,
they submit an order of a shares each time.
the length it takes for the manipulator to
liquidate all his shares. IMPLICATION: If the arbitrage parameter
is large, the manipulator must be very
5) Market ends at time = T and investors
receive the wealth accumulated from wealthy to move the market. If a, there
liquidating their positions. is no limits to arbitrage and thus, there is
no way for the manipulator to move the
To help see the assumptions more clearly, lets look market.
at a chart given shown in their report: 2) Assuming q3=0 (a.k.a behavior traders are
extremely unwilling to take losses see
assumption #1), manipulators can sell at the
high price of P3 = P0 + tu from t=tu+1 to
t=T-1 (a.k.a the period of td). Therefore the
profit becomes the number of shares *
(price sold average price bought):

=N*[ - (P0 + )]

After simplifying, we get:


Figure 10: Typical Manipulated Price Fluctuations
(Parameters: tu = 6, td = 3, a = 0.1, q1 = 0.8, q2 = =N*( ) = a*( tu)*
0.4, q3 = 0)

Page17
IMPLICATION: From t=1 to t=tu, the price In this particular model, we have three types of
increases by for each period. Past investors:
behavioral traders sell q1 shares at each
period to take profits because the price went 1) Informed Party (I)
up while new behavioral traders buy q1 a. Truthful large traders (T)
shares. Arbitragers will sell a shares (from b. Manipulators (M)
assumption #3), and thus for the 2) Information Seekers (Arbitragers) (Ai )
manipulator to move the price up, they have 3) Uninformed Traders (U)
to buy a shares in each time period up to
The N numbers of information seekers all observe
t=tu. Furthermore, from t=tu+1 to t=T-1,
past prices and volumes, and they are susceptible to
past behavioral traders already bought rumors. Moreover, they have no access to
shares at the highest price at tu. Thus, they fundamental information. Lastly, the uniformed
wont sell because q3=0. Price stays traders provide liquidity to the market, so whenever
constant so arbitragers wont trade either. someone wants to make a trade, we can be certain
Thus, new behavioral traders in these that the transaction will happen.
periods will buy q2 shares from the
manipulators while they take profits. By Before we discuss the model, we will state a couple
t=T, the manipulators will have sold all of key terms and assumptions:
their shares and the price of the stock will
return to its fundamental value. Market Price: P(Q) = a+bQ, where
To summarize, this model shows that under a few a=price of stock if no one buys any shares
assumptions, the manipulator will be able to mimic b=slope of supply curve
a large trader successfully and make some decent Q=quantity demanded
profits. This model goes beyond Allen and Gales
model because this one factors in empirical studies
of how people behave in the market mainly that Total Share Outstanding: , where
many investors are momentum traders and thus
they like to buy when the market is going up. VH=price of stock if someone buys all the shares
Moreover, these behavioral traders try to avoid
selling their shares at a loss, since they all hope that t=0: All shares are held by uninformed traders
the down market will recover soon.
t=1: Informed trader may enter the market:
Prob(M) =
3.1.3 The Information-Seeker Model Prob(T) = = Prob(VH ) Value High
Aggarwal and Wu[23] extended Allen and Gales Prob(~I && VL) = 1-
framework by considering the case of what happens a = unconditional expected value of final
when a manipulator can trade in the presence of cash flows = VH + (1- )VL
other rational investors who are able to seek t=2: Information seeks can buy/sell shares
information about the fundamental values of a
t=3: Fundamental stock price is reveal to be VH or
stock. In other words, what happens if information-
VL. The cost of holding shares of stock at this
seekers, who are knowledgeable of the market,
period is k.
enter the markets too? Will the manipulators still be
able to consistently make profits? The answer is:
yes.

Page18
Now consider a model with manipulators, truthful trader must want to sell shares at t=2 than
information seekers, and uninformed traders in the hold them till t=3 since that is what the
economy. Ais posterior beliefs that the purchaser manipulators will do (take profits). The value for
of the shares at t=1 is: holding shares at t=3 is VH k, where k is the cost
for holding shares given from the above
assumptions. So, the incentive compatibility
= condition is:

Each arbitrager (Ai) will solve the following = 1 VH VL VH


problem at t=2:

Using a= VH + (1- )VL and = , the pooling


^ ^ ^
^ [(1-){VH* -[a+b( )] }
equilibrium is sustainable if:
^ ^ ^
+ {VL -[a+b( ] }]
1 1
Imposing symmetry on the total shares outstanding, 1 1
we calculate the following of the information
seekers:
IMPLICATIONS:
1) in : The more likely that the purchaser at
^ VH VL
= t=1 is a manipulator, the less likely of
VH VL pooling
^
Aggregate Demand= Q2= 2) in : The more likely that a truthful large
= 1 VH VL trader enters, the easier it is to sustain
^ ^ ^
pooling
(1-){VH* -(a+bQ2) } 3) in k: The higher the cost of holding the
^ ^
+ {VL -(a+bQ2) } shares at t=3, the more likely the truthful
VH VL trader will pool with the manipulator
=
4) in (VH-VL): The greater the dispersion,
the more valuable it is for the truthful trader
The informed party (either M or T) will solve the to wait till t=3; hence, decreasing the
following at t=1: chance of pooling
5) in N: The more information seekers there
P2*q1-(a+bq1)q1 are, the better the price is at t=2. Thus, both
VH VL manipulators and truthful traders will be
=
tempted to take profits at the time,
VH VL
= increasing the chances of pooling
VH VL
= = Through this model, Aggarwal and Wu have shown
that even with information seekers in the economy,
the manipulator can still make a profit. As long as
Note that in order for this pooling equilibrium to be the pooling equilibrium is sustainable, it will be
sustainable, the truthful traders actions must not difficult for traders to distinguish a truthful large
deviate from that of the manipulators. Thus, the trader from a manipulator.

Page19
3.2 Attempt to Detect Abnormal Volumes To see an example of how this experiment works:
3.2.1 Concept Detection Date Volume Price Notes
0 20090911 61935 89.8 t=0
The theoretical models are only practical if they can 0 20090914 63620 88.75
be useful in real life situations. Thus, we leveraged 1 20090915 113153 93.85 t=1
the Simlio engine to help us check how often
0 20090916 100348 94.18
manipulation cases actually happen in the recent
years. 0 20090917 104319 96.51
0 20090918 61286 97.14 t=2
1 20090921 124496 93.09
Instead of checking the legitimacy of all the
models, we have decided to combine ideologies 0 20090922 72236 94.3
from all 3 frameworks. To check if there had been 0 20090923 59021 92.92
any manipulations, we created the following rules: 0 20090924 54149 90.86 t=3
1) First, we check to see if there are any Table 11: A Manipulation Case Found Through
abnormal volumes. We define abnormal Experiment for Stock: Potash
volume to be:
Table 11 shows an example of a potential
Voltoday > Vol200 moving average manipulation case that the Simlio engine has
detected for the stock Potash. From Sept 11, 2009
to Sept 24, 2009, we see that the fundamental price
2) For all days with abnormal volume, we
of Potash went from $89.8 to a high of $97.14 and
check to see what the prices were for X dropped back down to $90.86. Analyzing the notes
previous days. According to the above column:
models, today is the day the manipulators
entered; hence the abnormal volume. Thus,
1) At t=0, no large traders or only uninformed
the prices the previous days are the
traders are in the market.
fundamental prices of the stock
2) At t=1, theres a volume spike. We interpret
3) We also check to see if prices have
this as a large trader has entered the market.
increased today. This is because as
3) At t=2, the price of the stock has reached its
manipulators buy a lot of shares, they also
climax. A large trader will keep his shares
move the prices up. For the purpose of this
till t=3; hence, maintaining the high price.
experiment, we will assume that as the price
A manipulator will start selling now,
increases more and more, the manipulators
causing the price to decreases.
are still buying. The moment the prices start
4) At t=3, we see that the price has essentially
falling is the when the manipulators begin
returned back to its fundamental value.
to sell.
Thus, we conclude that this temporary price
4) Lastly, we check to see if the prices fall
spike is due to a manipulator in the market.
back to their fundamental price. If so, we
have found a potential period of time where The Simlio engine allows us to give it any ticker
this particular stock has been manipulated. symbol, and it will report all potential manipulation
If not, then most likely a truthful trader had cases like the one shown in Table 11. In the next
entered and thus the price stays at VH. section, we will analyze our results.

Page20
3.2.2 Results and Model Confirmation We decided to run another experiment with the
same two filters, but with different values. The first
We decided to run the automatic manipulation filter is the same: when the manipulator purchases
detector on the following stocks below, since we stocks, it must be at a high enough volume that the
used them for our HMM and SVM experiments too. price of the stock increases by at least $2. We
In our first attempt, we have two filters. changed the second filter so that the profit must be
at least $4/share instead of $2/share. With these
parameters, we get the following results:
The first filter says that the manipulator must be a
large trader; hence, he will buy a lot of shares,
causing an abnormal spike in volume. This will Stock Cases YearRange:
cause the price to increase and the first filter says goog 6 2004to2009
that this price increase must be at least $2. pot 3 1996to2009
aapl 7 1996to2009
The second filter says that the manipulator would gs 2 1999to2009
only successfully manipulate the market if he mos 1 1996to2009
profits from this scheme. So the second filter is set
msft 0 1996to2009
to $2/profit per share.
gg 1 1996to2009
wfc 1 1996to2009
If both filters are met, the Simlio engine marks the bac 1 1996to2009
period of time as a possible case for manipulation
sp_500 37 1996to2009
activities to happen. Using these two filters, we get
the following results: Table 13: Significant manipulation opportunities
over the years (profit of over $4 per share)

Stock Cases YearRange:


Comparing Table 12 to Table 13, we see that there
goog 6 2004to2009 are fewer cases for our second experiment, but for
pot 4 1996to2009 each one of these potential manipulation cases, the
aapl 9 1996to2009 manipulator will be able to profit $4/share.
gs 9 1999to2009 Assuming that manipulator buys 100,000 shares per
mos 6 1996to2009 case and that he only plays the S&P500 index, he
averages around 3 manipulation cases per year with
msft 2 1996to2009
a profit of:
gg 5 1996to2009
wfc 1 1996to2009
3 cases * 100,000 shares/case *4 dollars / share
bac 3 1996to2009
sp_500 44 1996to2009 = $1,200,000 annually
Table 12: Manipulation opportunities over the years
(profit of over $2 per share) Those numbers are quite conservative considering
that the manipulator is a large trader. Moreover, if
As one can see from Table 12, profitable we factor that the manipulator can make an average
manipulation cases do happen quite often over the profit of $1.2M annually on just the S&P500 alone,
years. Considering that each manipulation case and that there are thousands of other stocks and
allows the manipulator to profit $2/share and they indices the manipulators can play with, we can
probably buy thousands of shares at a time, this can easily project the manipulators profits to be in the
be quite a profitable scheme. hundreds of millions annually!

Page21
3.3 Incorporation with SVM Engine 3.3.2 The Results / Analysis
3.3.1 Adding the Abnormal Volume Detection
Feature C 78
gamma 100
Now that we have analyzed a couple manipulation kernel Rbf
models and the Simlio engine can tell us when there VolDetectW/O VolDetectW/
are abnormal volume spikes, we decided to add this ticker News News
into our SVM. Before we examine the results, we pot 0.52 0.51
will first discuss how we will train the SVM and aapl 0.54 0.57
what we will test it on.
gs 0.53 0.53
mos 0.51 0.54
Since the Simlio engine reports a 1 when there is an ibm 0.51 0.54
abnormal volume spikes and 0 otherwise, we msft 0.5 0.5
figured that this will work perfectly as a feature for
gg 0.55 0.51
the SVM. Thus, besides the 10 features we used in
our SVM experiment discussed on page 12, we also bac 0.54 0.54
added an abnormal volume feature to the list. Our goog 0.52 0.53
idea is that we hope the feature will help the SVM c 0.51 0.51
better detect when to buy/sell stocks. Since we sp_500 0.54 0.54
know that if we have successfully detected a Table 14: Results from SVM using volume
manipulation case, the price of the stock will shoot detection for both no news and with news with
up soon after the abnormal volume detection. (RBF kernel 2)
Afterwards, it will return back to its fundamental
price if there was a manipulator in the market or the Test data: Last 180 days. Without retraining
price will stay high if the larger trader was a
truthful one. Using this additional feature, perhaps Table 14 shows the accuracy results of testing our
the SVM will be able to learn from past most recent 180 days of data on the models our
manipulation cases. For our experiments, we will SVM trained. As we can see on the left column,
train the SVM on a couple years worth of data and adding the volume detection feature without news
test the model on the most recent 6 months of data. produces mediocre results. The average accuracy is
around 53%. On the right hand side, if we add both
After that experiment, we will also incorporate the features, we get a slightly better accuracy
news features and see if using both features will prediction of around 54-55%. However, both sets of
help the SVM even more. Although the idea sounds results do not surpass what we got earlier in Table
promising, we must mention one issue that we can 10. We suspect that the reason for this is because
foresee. All our SVMs are trained to tell the user the abnormal volume detection is not a great feature
whether or not to buy/sell stocks the next day. to use in our SVM. We have developed our SVM to
However, the abnormal volume detection feature predict what investors should do the next day.
does not necessary tell the investor when to buy However, our manipulation detection models that
stocks the next day. When there is a 1, we know we surveyed in the previous section span multiple
that a large trader may have entered. All the other days. Moreover, whenever the detector reports a 1,
days will return a 0 even if a large trader just we know that there has been an abnormal volume
entered. In other words, due to the fact that our spike. However, when it reports a 0, there are mix
manipulation detection only suggests what will interpretations. For example, it could be 0 because
happen for a period of time instead of tomorrow, nothing special has happened. It could also be 0 the
the SVM may get confused from the feature. Let us days after an abnormal spike because even though
see what the results are. the price increases the next few days, the volume

Page22
does not deviate enough from the 200 moving 5. OPEN QUESTIONS / FUTURE WORK
average, so it still reports a 0. Thus, due to the
multiple interpretations of the results with zeros, 1) Instead of just using the number of
the SVM might have gotten confused on what to documents on Google News for a particular
predict the next day. Nonetheless, this was still an stock on a specific day, we can modify the
interesting experiment because if our SVM models TF-IDF algorithm to give us the important
were different, such as instead of predicting what to terms for a particular stock and use that as
do the next day they forecast the general trends in features in our SVMs. We have already
prices for the next few days, than perhaps this implemented a prototype of this, but due to
feature would have been of great significance. time constraints, we were unable to finish this
However, due to our time constraints, we were portion. We believe there is much potential in
unable to explore this further, but the experiments this area because R. Cooleys paper
that we have done do show that this is worthy of demonstrated that using SVM with TF-IDF
future research. resulted in a 96.75% accuracy in classifying
news stories[16]. After classifying them, we
could potentially pass them in as features to
4. CONCLUSION SVM and that might enhance the ability of
our SVM to accurately forecast stock
movements.
Traditional financial models answer many
questions we may have about the financial markets. 2) For the HMM model, we assumed that the
However, many suffer from over-simplistic observations or inputs follow a Gaussian
assumptions that do not truly reflect the real state of distribution. However, the technical
the financial market. The two main problems indicators or even the stock prices may not
associated with the conventional approaches follow such a distribution. We should plug in
other distributions to see if they would help
described in this paper are: symmetric information
the HMM better forecast stock movements.
and limited input factors. With these two problems,
the traditional approaches can only provide a 3) For the SVM approach, we used the RBF
macro-representation of the market. kernel because papers have suggested that
results are superior than using other basic
kernels. However, there are so many possible
Artificial Intelligence is seen as a plausible parameters to tweak that it is not entirely
approach to financial modeling that overcomes the impossible that some other kernel might
two flaws of traditional models. There has been an allows the SVM to perform better when
increase in interest in financial prediction markets, forecasting stock movements.
where many hedge funds have been using similar 4) Code up a random walk algorithm and
models discussed in this paper to make quick compare that result to the results we got from
inform decisions for high frequency trading. using HMM and SVMs. Is it possible that we
Although no model discussed so far is close to got lucky and just so happened to get
predicting stock movements perfectly, the results promising results? More research will need to
shown in this paper warrant that more research be done before we can confidently conclude
should be done. If the EMH were true, there should anything.
be no way for an investor to gain an edge by using 5) Instead of training SVM models on predicting
technical indicators and news. However, as we see stock movements for the next day, train it to
from the results (especially with news), we were predict general trends for the next few days.
able to train the SVM to accurately predict the With that, we can try to add in the abnormal
S&P500 up to 70%! volume detection feature and see if that
would help the SVM better predict future
stock trends.

Page23
6. ACKNOWLEDGMENTS [10] S. Borman, The Expectation Maximization
Algorithm: A Short Tutorial,
We thank Simlio LLC and all its team members for http://www.isi.edu/natural-
providing many useful market tools to allow us to language/teaching/cs562/2009/readings/B06.pd
graph and display market trends with ease. We also f, 2004-2009
thank Professor Peter Bartlett for providing many [11] M. Jordan, Graphical Models, University of
useful lectures about statistical graphical models, California, Berkeley, Ch12
such as exponential families and maximum
[12] R. Dugad and U. B. Desai, A Tutorial on
likelihood, and for helping us get started on this
Hidden Markov Models, Technical Report No:
topic. Furthermore, Professor Satish Rao played a
SPANN-96.1, May 1996
crucial role in helping us determine what feature
sets to try as well as suggesting different trading [13] D. Klein, Lecture 21,
algorithms. Satishs ideas were instrumental in http://inst.eecs.berkeley.edu/~cs188/fa09/slides/
developing the manipulation detection section of FA09%20cs188%20lecture%2021%20--
this paper. %20speech%20%282PP%29.pdf, Fall 2009
[14] K. Kim, Financial Time Series Forecasting
7. REFERENCES Using Support Vector Machines, Elsevier,
[1] F. Allen and S. Morris, Finance Applications of March 2003
Game Theory, Game Theory and Business [15] C. Hsu, C. Chang, C. Lin, A Practical Guide to
Applications, Sept 1998 Support Vector Classification, National Taiwan
[2] G. S. Atsalakis and K. P. Valavanis, Surveying University, 2003
Stock Market Forecasting Techniques Part II: [16] R. Cooley, Classification of News Stories Using
Soft Computing Methods, Expert Systems with Support Vector Machines, University of
Applications, 2009 Minnesota, April 1999
[3] C. Truong, Value Investing Using Price [17] Z. Hua, Y. Wang, X. Xu, B. Zhang, L. Liang,
Earnings Ratio in New Zealand, University of Predicting Corporate Financial Distress Based
Auckland, Business Review Volume 11 No.1, on Integration of Support Vector Machine and
2009 Logistic Regression, Expert Systems with
[4] B. G. Malkiel, The Efficient Market Hypothesis Applications, 2007
and Its Critics, Princeton University, CEPS [18] Jahmm, http://code.google.com/p/jahmm/
Working Paper No. 91, April 2003
[19] JavaML, http://java-ml.sourceforge.net/
[5] Wikipedia on MACD.
[20] LibSVM, http://java-ml.sourceforge.net/
http://en.wikipedia.org/wiki/MACD
[21] F. Allen and D. Gale, Stock-Price
[6] Wikipedia on HMM
http://en.wikipedia.org/wiki/Hidden_Markov_ Manipulation, The Review of Financial Studies,
model Volume 5 No. 3, 1992
[7] R. Hassan and B. Nath, Stock Market
[22] J. Mei, G. Wu, and C. Zhou, Behavior Based
Forecasting Using Hidden Markov Model: A Manipulation: Theory and Prosecution
New Approach, IEEE, 2005 Evidence, JEL Classifications: G12, G18,
http://papers.ssrn.com/sol3/papers.cfm?abstract
[8] Y. Zhang, Prediction of Financial Time Series _id=457880, April 2004
with Hidden Markov Models, Simon Fraser
[23] R. K. Aggarwal and G. Wu, Stock Market
University, May 2004
Manipulations, Journal of Business, 2006
[9] http://en.wikipedia.org/wiki/Baum-
Welch_algorithm

Page24
[24] Jegadeesh, Narasimhan and S. Titman, Returns
to Buying Winners and Selling Losers:
Implications for Stock Market Efficiency, The
Journal of Finance, 1993
[25] R. J. Shiller, From Efficient Market Theory to
Behavioral Finance, Cowles Foundation
Discussion Paper, 2002

Page25

You might also like