Estimating Stochastic Volatility Via Filtering For The Micro-Movement of Asset Prices
Estimating Stochastic Volatility Via Filtering For The Micro-Movement of Asset Prices
Estimating Stochastic Volatility Via Filtering For The Micro-Movement of Asset Prices
Assistant Professor
Department of Mathematics and Statistics,
University of Missouri at Kansas City
November 4, 2002
I am grateful to Thomas G. Kurtz for his inspiration and the insightful and productive conversations.
The major part of the paper was done when the author visited Professor Michael Kouritzin in University of
Alberta during the summer of 2002. I thank Mike for the hospitality and many productive conversations.
I thank participants at the Workshop on Filtering Theory and Applications (2002) held at University of
Alberta, the IMS Annual Meeting/4th International Probability Symposium (2002) held at Ban, Canada,
and seminar participants at University of Alberta, University of Kansas, and University of Missouri at Kansas
City for helpful comments. This work is partially supported by Faculty Research Grant of the University of
Missouri at Kansas City.
Department of Mathematics and Statistics, University of Missouri at Kansas City, Kansas City, MO
64110, USA. Tel No: (816) 235 5850 and e-mail: [email protected]
Estimating Stochastic Volatility via Filtering
for the Micro-movement of Asset Prices
ABSTRACT
Under the general framework of Zeng (2002), a unied approach via ltering is developed to esti-
mates stochastic volatility for micro-movement models. The key feature of the models is that they
can be transformed as ltering problems with counting process observation. In order to obtain
trade-by-trade, real-time Bayes estimates of stochastic volatility, the Markov chain approximation
method is applied to ltering equation to construct consistent recursive algorithms, which compute
the joint posterior. To illustrate the approach, the recursive algorithm is constructed in detail for
a jumping stochastic volatility micro-movement model. Simulation results show that the Bayes
estimates for stochastic volatilities capture the movement of volatility. Trade-by-trade stochastic
volatility estimates for a Microsoft transaction data set are obtained and they provide strong ar-
mative evidence that volatility changes even more dramatically at trade-by-trade level.
KEY WORDS: Filtering, counting process, Markov chain approximation, Stochastic Volatility, High-
frequency data
0
1 Introduction
Stochastic volatility is well documented for asset prices in both macro-movement and micro-
movement (Bollerslev, Chou & Kroner (1992)). Macro-movement refers to daily, weekly, and
monthly closing price behavior while micro-movement refers to transactional (trade-by-trade) price
behavior. There are the strong connection as well as striking distinctions between the macro- and
micro-movements. The strong connection is observed through the identity of the overall shapes
of both, because the macro-movement is an equally-spaced time series drawn from the micro-
movement data. Their striking distinction are mainly due to nancial noise. In macro-movement,
the impact of noise is small and is usually neglected. In micro-movement, however, the impact of
noise is substantial and noise must be modeled explicitly. If the noise is ignored, then the impact
of noise is transferred to volatility, and the volatility estimates are substantially inated. This is
documented by Gottlieb & Kalay (1985), Ball (1988) and Cho & Frees (1988) for discrete noise
and further in Harris (1990) and Zeng (1999 and 2002) for discrete plus other types of noise.
Economically, asset price is distinguished from its intrinsic value and their distinction is also
noise. Noise, as contrasted with information, is well-documented in the market microstructure
literature. Black (1986) noted that noise renders our observations of value imperfect by creating a
wedge between price and intrinsic value, and provides an impetus for trading in nancial markets.
Hasbrouck (1996) pointed out that information has a long-term, permanent impact while noise has
a short-term, transitory impact on price. Three important types of noise have been identied and
extensively studied: discrete, clustering and non-clustering. First, intraday prices move discretely
(tick by tick), resulting in discrete noise. Second, because prices do not distributed evenly on
all ticks, but gather more on some ticks such as integers and halves, price clustering is obtained.
Harris (1991) conrms that this phenomenon is remarkably persistent through time, across assets,
1
and across market structures. Third, the non-clustering noise includes other unspecied noise,
and the outliers in prices are one of the evidence for the existence of non-clustering noise.
In Zeng (2002), a novel, economically well-grounded, partially-observed micro-movement model
for asset price is proposed to bridge the gap between the macro- and micro- movements caused
by noise. The most prominent feature of the proposed model is that it can be formulated as
a ltering problem with counting process observations. This connects the model to the ltering
literature, which has found great success in engineering and networking. Under this framework,
the observables are the whole sample paths of the counting processes, which contain the complete
information of price and trading time. Then, the continuous-time likelihoods and posterior, built
upon the sample paths, not only exist, but also are uniquely characterized by the unnormalized,
Duncan-Mortensen-Zakai(DMZ)-like and the normalized, Kushner-Stratonovich (KS) (or Fujisaki-
Kallianpur-Kunita)-like ltering equations respectively.
Transaction (or tick) data are discrete in value, irregularly-spaced in time and extremely large
in size. Despite recent advances in statistics and econometrics, obtaining reliable parameter esti-
mates for even simple, non-stochastic volatility, micro-movement models is extremely challenging.
Zeng (2002) develops continuous-time Bayes estimation via ltering with ecient algorithms for the
parameter estimation of the micro-movement model. That represents a signicant advance in the
estimation for micro-movement models, also because the continuous-time likelihoods and posterior
are utilized as the foundation for statistical inference. This foundation is informationally better
than those provided by the discrete-time likelihoods and posterior, which merely make use of a
discrete-time subset of the sample paths. In Zeng (2002), however, only the parameters of a simple
model with GBM as value process is estimated.
In this paper, rst, a class of stochastic volatility micro-movement models is developed from
the macro-movement models by incorporating the three types of noise summarized. A new, jump-
2
ing stochastic volatility (JSV) micro-movement model, stemmed from geometric Brownian motion
(GBM), is proposed and studied (later, it is called JSV-GBM micro-movement model). Second, a
unied approach, Bayes parameter estimation via ltering, is developed for the micro-movement
models, especially for estimating stochastic volatility. Stochastic volatility models are more realis-
tic and more interesting but more dicult to estimate than the simple model with GBM as value
process, where the parameters, the signal of interest, are xed. In stochastic volatility model, esti-
mation becomes a real ltering problem: the stochastic volatility, the signal of interest, changes
over time and the stock prices are the observations corrupted by discrete types of noise.
JSV-GBM model is employed to demonstrate the eectiveness of estimating stochastic volatility
using Bayes estimation via ltering. To illustrate the approach, JSV-GBMs consistent recursive
algorithm, which approximates the normalized ltering equation and calculates the joint posterior,
is constructed in detail. Simulation results show that the Bayes estimates for stochastic volatilities
are close to their true volatilities, and are able to capture the movement of volatility. Trade-by-
trade volatility estimates for an actual transaction data set are computed and they conrm that
the volatility changes even more dramatically in micro-movement.
The rest of the article proceeds as follows. Section 2 presents the class micro-movement models
with stochastic volatility. Section 3 develops the unied approach, Bayes estimation via ltering,
for the proposed models and the recursive algorithm of JSV-GBM is constructed in detail as an
illustration of the approach. Section 4 presents simulation and estimation results based on actual
data. Concluding remarks and future works are oered in Section 5.
2 The Stochastic Volatility Micro-movement Models
The micro-movement model is built upon the macro-movement model, or, economically, the price
is built upon the intrinsic value by incorporating noise.
3
Suppose that the instrinsic value process X can not be observed directly, but can be partially
observed through the price process, Y . X lives in a continuous state space while Y lives in a
discrete state space given by the multiples of the minimum price variation, a tick, which is assumed
to be
1
M
. The combination of (X, Y ) provides a natural partially-observed framework for the
micro-movement process. So, we start with stochastic volatility models for value process.
2.1 Stochastic Volatility Models for Value Process
Suppose (t) is a vector of parameters in the model. (X, Y ) is augmented to (, X, Y ) in preparation
for parameter estimation, and the following assumption is invoked on (, X) as in Zeng (2002).
Assumption 2.1 (, X) is the solution of a martingale problem for a generator A such that
M
f
(t) = f((t), X(t))
_
t
0
Af((s), X(s))ds
is a F
,X
t
-martingale, where F
,X
t
is the -algebra generated by ((s), X(s))
0st
.
Remark 2.1 This is a very general assumption including many stochastic volatility models and
even more. Three examples are given below.
Example 2.1 Hull & White (1987)s model in SDE form is
dX(t)
X(t)
= dt +(t)dW(t),
dV (t)
V (t)
= dt +dB(t),
where V (t) =
2
(t) and W(t) and B(t) are Weiner processes with correlation coecient . Its
generator is
Af(v, x) = x
f
x
(v, x) +
1
2
vx
2
2
f
x
2
(v, x) +v
f
v
(v, x) +
1
2
2
v
2
2
f
v
2
(v, x) +xv
3/2
2
f
xv
(v, x).
4
The parameters of this model are (, (t), , , ). If is allowed to be a nice function that can
depend on V (t), then this model can include another stochastic volatility model the limiting
diusion model of AR(1)EARCH model (Nelson (1990)).
Example 2.2 Heston (1993)s model in SDE form is
dX(t)
X(t)
= dt +(t)dW(t), dV (t) = ( V (t))dt +(t)dB(t),
where V (t), W(t) and B(t) are dened in Example 2.1. Its generator is
Af(v, x) = v
f
x
(v, x) +
1
2
v
2
f
x
2
(v, x) + ( v)
f
v
(v, x) +
1
2
2
v
2
2
f
v
2
(v, x) +vx
2
f
xv
(v, x).
This model also has parameters (, (t), , , , ).
The following example presents a new JSV-GBM model for value process. Its micro-movement
version is developed in Section 2.2 and further in Section 3.2.
Example 2.3 A jumping stochastic volatility model in SDE form is
dX(t)
X(t)
= dt +(t)dW(t), d(t) =
_
J
N(t)
(t)
_
dN(t),
where N(t) is a Poisson process with intensity , and {J
i
} is a sequence of i.i.d. random variables
with density g(z), and is assumed to be independent of W(t) and N(t). When the ith Poisson
event happens at time t
i
, the volatility changes from (t
i1
), which is J
i1
, to J
i
, because (t
i
) =
(J
i
(t
i1
)) + (t
i1
). Then, the volatility remains the same until the next Poisson event
occurs. Its generator is
Af(, x) = x
f
x
(, x) +
1
2
2
x
2
2
f
x
2
(, x) +
_
_
f(z, x) f(, x)
_
g(z)dz.
5
The parameters of this model are (, , ) and may include parameters in g(z) if exist. This model
is closely related to the asset price models with Markov modulated volatilities studied by Elliot,
Malcolm & Tsoi (2002).
After specifying the value process X(t), there are two equivalent methods to build the price
process from X(t). The rst constructs Y from X by incorporating noises. The second formulates
(X, Y ) as a ltering problem with counting process observations. The former approach is intuitive,
while the latter approach is important for parameter estimation.
2.2 Building Price Process by Construction
Since prices can only be observed at irregularly-spaced trading times, there are two steps in con-
structing Y from X. First, determine trading times t
1
, t
2
, . . . , t
i
, . . . , which are assumed to be
modeled by a conditional Poisson process. The rate of trading activity is described by an intensity
function, denoted by a((t), X(t), t). Second, Y (t
i
), the price at time t
i
, is constructed from X(t
i
)
via a random function F(). That is, Y (t
i
) = F(X(t
i
)), where y = F(x) is a random function with
the transition probability p(y|x).
Under this construction, the observable price is produced from the value process by combining
noises when a trade occurs. Information aects X(t), the value of an asset, and has a permanent
inuence on the price, while noise aects F(x), the random transition function, and only has a
transient impact on price.
This article focuses on estimating this class of micro-movement stochastic volatility models: the
value process, X(t), is a continuous-time stochastic volatility model which can be formulated as a
martingale problem, the trading times t
1
, t
2
, . . . , t
i
, . . . are driven by a conditional Poisson process,
and the price at time t
i
is Y (t
i
) = F(X(t
i
)). The advantage of these models is that they not
only has stochastic volatility, but also can capture the micro-movement features created by noise.
6
Below, a simple F(x) is built to accommodate the three types of noise that are well-documented
in nancial literature: discrete noise, clustering noise, and non-clustering noise.
To simplify notation, x i and set x = X(t
i
), y = Y (t
i
), and y
= Y
(t
i
) = R[X(t
i
) + V
i
,
1
M
],
where V
i
is dened as the non-clustering noise. Instead of directly specifying p(y|x) of F(x), we
move the value x to the price y in three steps.
Step 1: Incorporate discrete noise by rounding o x to its closest tick, R[x,
1
M
]. Without other
noises, trades should occur at this tick, which is closest to the stock value.
Step 2: Incorporate non-clustering noise by adding V ; y
= R[x + V,
1
M
], where V is the
non-clustering noise of trade i at time t
i
.
We assume {V
i
}, are independent of the value process, and they are i.i.d. with a doubly
geometric distribution:
P{V = v} =
_
_
(1 ) if v = 0
1
2
(1 )
M|v|
if v =
1
M
,
2
M
,
.
We pick the doubly geometric distribution because it is uni-modal, symmetric and bell shaped
with the implication that the trading price at a tick closer to the stock value is more likely to occur
and trading prices with the same distance to the stock value have the same chances.
Non-clustering noise is intended to explain three discreteness-related sample characteristics of
transactional data that cannot be explained by the rounding model. First, the non-clustering noise
considerably increases the probability of the successive price changes that are more than a tick.
Second, it allows the prices of trades occurring within the same second to dier and the dierence
can be two or more ticks. Finally, it produces outliers.
Step 3: Incorporate clustering noise by biasing y
i
} and serially independent given the sequence {y
i
}.
To be consistent with the data analyzed in Section 4, we construct a simple random biasing
function only for the tick of 1/8 dollar. We can generalize it to other ticks such as 1/100 dollar
easily. Although since 2000, the trading tick has been converted to decimal system, but this does
not change the validity of the proposed model.
The data to be analyzed has this clustering phenomenon: integers and halves are most likely
and have about the same frequencies; odd quarters are the second most likely and have about
the same frequencies; and odd eighths are least likely and have about the same frequencies. To
generate such clustering, a random biasing function is constructed based on the following rule: if
the fractional part of y
Y (t) =
_
_
_
_
_
_
_
_
_
_
_
_
N
1
(
_
t
0
1
((s), X(s), s)ds)
N
2
(
_
t
0
2
((s), X(s), s)ds)
.
.
.
N
n
(
_
t
0
n
((s), X(s), s)ds)
_
_
_
_
_
_
_
_
_
_
_
_
, (1)
where Y
k
(t) = N
k
(
_
t
0
k
((s), X(s), s)ds) is the counting process recording the cumulative
number of trades that have occurred at the kth price level (denoted by y
k
) up to time t.
Under this representation, ((t), X(t)) becomes the signal process, which cannot be observed
directly, and
Y (t) becomes the observation process, which is corrupted by noise. Hence, (, X,
Y )
is framed as a ltering problem with counting process observations.
Four mild assumptions are invoked so that the general construction is equivalent to the counting
process observations in Equation (1). The equivalence ensures that Bayes estimation based on the
latter specication can be applied to the former.
Assumption 2.2 N
k
s are unit Poisson processes under the physical measure P.
Assumption 2.3 (, X), N
1
, N
2
, . . . , N
n
are independent under the physical measure P.
Remark 2.2 These two assumptions imply that there exists a reference measure Q and
that after a suitable change of measure to Q, (, X), Y
1
, . . . , Y
n
will become independent, and
Y
1
, Y
2
, . . . , Y
n
will become unit Poisson processes. These two assumptions are general since a large
9
class of counting processes can be transformed into this setup by the technique of change of measure
(See Bremaud (1981), page 165).
Assumption 2.4 The intensity,
k
(, x, t) = a(, x, t)p(y
k
|x), where a(, x, t) is the total intensity
at time t and p(y
k
|x) is the transition probability from x to y
k
, the kth price level.
Remark 2.3 This assumption imposes a desirable structure for the intensities of the model. It
means that the total trading intensity determines the overall rate of trade occurrence at time t and
p(y
k
|x) determines the proportional intensity of trade at the price level, y
k
, when the value is x.
The structure of intensities guarantees the equivalence of the above two approaches, because
the conditional nite dimensional distributions of the price Y given {X(s) : 0 < s < t} can be
shown to be identical in these two setups.
A nal technical assumption ensures the uniqueness of the ltering equation.
Assumption 2.5 The total intensity, a(, x, t), is uniformly bounded from below and above; i.e.,
there exist constants, C
1
and C
2
, such that C
1
< a(, x, t) C
2
for all t > 0.
3 Bayesian Estimation via Filtering
Bayes estimate, which is the posterior mean, is the least Mean Square Errors (MSE) estimate. The
normalized ltering equation characterizes how the posterior evolves. The core of the Bayesian
estimation via ltering is to construct an algorithm to compute the conditional distribution, which
becomes a posterior after a prior is assigned. The algorithm, which is based on the ltering
equation, is naturally recursive with every trade. One basic requirement for the recursive algorithm
is consistent, namely, the conditional distribution computed by the recursive algorithm converges
to the true one determined by the ltering equation. This is guaranteed by a theorem on the
convergence of conditional expectation.
10
In this section, we rst review two main results: the ltering equation and the theorem on the
convergence of conditional expectation. Then, we further prepare JSV-GBM as an illustration for
the approach. Finally, we construct a consistent recursive algorithm for JSV-GBM.
3.1 Review
The following two results are from Zeng (2002).
3.1.1 Filtering Equation
Before the ltering equation is presented, some terms are dened below. Let F
Y
t
= {(
Y (s))|0
s t} be the -algebra generated by the observed sample path of
Y . F
Y
t
is all the available
information up to time t.
Denition 3.1 Let
t
be the conditional distribution of ((t), X(t)) given F
Y
t
.
Remark 3.1 Assuming priors on ((0), X(0)),
t
becomes the joint posterior of ((t), X(t)).
Denition 3.2 Let (f, t) = E
P
[f((t), X(t))|F
Y
t
] =
_
f(, x)
t
(d, dx) be the conditional expec-
tation of ((t), X(t)) given F
Y
t
.
Theorem 3.1 Suppose that (, X) satises Assumption 2.1, and
Y is dened in Equation (1)with
Assumptions 2.2 - 2.5. Then, for every t > 0 and every f in the domain of generator A, (f, t)
is the unique solution of the SDE, the normalized ltering equation,
(f, t) =(f, 0) +
_
t
0
_
(Af, s) (fa, s) +(f, s)(a, s)
ds
+
n
k=1
_
t
0
_
(fap
k
, s)
(ap
k
, s)
(f, s)
_
dY
k
(s).
(2)
11
Remark 3.2 When the trading intensity is deterministic, a((t), X(t), t) = a(t), the normalized
ltering equation is simplied as
(f, t) = (f, 0) +
_
t
0
(Af, s)ds +
n
k=1
_
t
0
_
(fp
k
, s)
(p
k
, s)
(f, s)
_
dY
k
(s). (3)
Note that a(t) disappears in Equation (3). This reduces the computation greatly in the Bayesian
parameter estimation. Hence, this convenient case is assumed in JSV-GBM. The tradeo is that
the relationship between trading intensity and other parameters (such as volatility) is excluded.
Let the trading times be t
1
, t
2
, . . . , then Equation (3) can be written in two parts. The rst is
called the propagation equation, describing the evolution without trades and the second is called
the updating equation, describing the updating when a trade occurs. The propagation equation
has no random component and is written as
(f, t
i+1
) = (f, t
i
) +
_
t
i+1
t
i
(Af, s)ds.
This implies that when there are no trades, the posterior evolves deterministically.
Assume the price at time t
i+1
occurs at the kth price level, then the updating equation is
(f, t
i+1
) =
(fp
k
, t
i+1
)
(p
k
, t
i+1
)
.
It is random because the price level is random.
3.1.2 A Convergence Theorem on Conditional Expectation
Suppose the state space of (, X) is discretized with
i
as the length between lattices in the ith
component of and
x
as that of X. Let = (
1
, ...,
p
). Then, (
, X
x
), an approximation for
12
(, X), can be constructed. Dene
(t) =
_
_
_
_
_
_
_
_
_
_
_
_
N
1
(
_
t
0
1
(
(s), X
x
(s), s)ds)
N
2
(
_
t
0
2
(
(s), X
x
(s), s)ds)
.
.
.
N
n
(
_
t
0
n
(
(s), X
x
(s), s)ds)
_
_
_
_
_
_
_
_
_
_
_
_
, (4)
where = max(
x
, || ), and dene F
t
= (
X, to
mean X
, X
x
,
Y
) is on (
, F
, P
, X
x
,
Y
). If (
, X
x
) (, X) as = max{
x
, || }0, then
(i)
Y
Y as 0; and
(ii)E
P
[F(
(t), X
x
(t))|F
t
] E
P
[F((t), X(t))|F
Y
t
] as 0 for function F in the domain of the
generator A.
Remark 3.3 This theorem provides a recipe for constructing a consistent recursive algorithm.
It guarantees that as long as (
, X
x
) is an approximation of (, X), Y
is an approximation of
, X
x
) is an approximation of (, X) given the observed
sample paths of the counting processes. When we take F as an appropriate indicator function,
E
P
[F(
(t), X
x
(t))|F
t
] becomes the conditional probability mass function for (
, X
x
). Theo-
rem 3.2, then, implies that the conditional probability mass function is an approximation to the
conditional distribution of (, X). The recursive algorithm to be constructed is to compute such a
conditional probability mass function, which becomes a posterior after a prior is assigned.
13
3.2 Further Preparation of JSV-GBM
The value process is the JSV-GBM model in Example 2.3. This model builds upon GBM as the
models of Hull & White (1987) and Heston (1993) do. Their dierence is in the volatility dynamics.
In Hull & White (1987) and Heston (1993), the path of stochastic volatility is continuous. However,
in JSV-GBM, the volatility jumps according to a Poisson process.
To reduce computation, we further assume that the distribution for the i.i.d. sequence of
jumps, {J
i
}, is uniformly distributed on a range, [
].
Accommodating other parameters, the generator becomes
Af(, , , , x) =x
f
x
(, , , , x) +
1
2
2
x
2
2
f
x
2
(, , , , x)
+
_
_
f(, z, , , x) f(, , , , x)
_
1
dz
(5)
Two reasons to choose this model for illustration. First, the model is built upon GBM, the
standard model in much of mathematical nance, with stochastic volatility. Second, the stochastic
volatility jumps and the generator of the model involves both diusion and jump generators, the
two major types of generator. Then, how to build the approximate Markov chain and further
recursive algorithms for both diusion and jump processes is illustrated in one example.
There are six parameters in the model: (, (t), , , , ). The two clustering parameters (, )
can be estimated by the method of relative frequency. The estimation of stochastic volatility as
well as the other parameters is done by Bayes estimation via ltering.
14
3.3 Construction of the Recursive Algorithm
For the nonlinear ltering problem, Kushner (See Chapter 12 of Kushner & Dupuis (1994)) develops
approximation methods based on replacing the signal process by a nite state Markov chain that
approximates the signal equally-spaced in time. Although trades occur irregularly-spaced in time,
the same idea can be applied here.
Theorem 3.1 provides the optimum lter and Theorem 3.2 provides a recipe for constructing a
consistent recursive algorithm as an approximate optimum lter. The recursive algorithm in this
section is constructed for Equation (3), which is the case when the trading intensity is deterministic.
There are three steps in the process of deriving the recursive algorithm.
Step 1: Construct a Markov Chain (
(t), X
x
(t)) as an approximation to ((t), X(t)). Here,
(t) = (, (t), , ), all the parameters to be estimated by ltering.
First, we discretize the parameter spaces of , , , and X. Suppose there are n
+1, n
+1,
n
+ 1, n
+ 1 and n
x
+ 1 lattices in the discretized spaces of , , , and X respectively. For
example, the discretization for is, : [
] {
+2
, . . . ,
+j
, . . . ,
+n
}
where
+n
+1. Dene
j
=
+j
+k
,
l
=
+l
,
m
=
+m
,
and x
w
= x
w
(t) =
x
+ w
x
. Let = max(
,
x
). The discretized space of (, , ) is a
natural approximation for the parameter space. But, Markov chain is a natural approximation for
the stochastic process ((t), X(t)).
Second, it is observe that the construction of an approximate Markov chain can be transformed
to construct a Markov chain generator, A
, such that as 0. A
as follows:
A
f(
j
,
k
,
l
,
m
, x
w
)
=
j
x
w
_
f(
j
,
k
,
l
,
m
, x
w
+
x
) f(
j
,
k
,
l
,
m
, x
w
x
)
2
x
_
+
1
2
2
k
x
2
w
_
f(
j
,
k
,
l
,
m
, x
w
+
x
) +f(
j
,
k
,
l
,
m
, x
w
x
) 2f(
j
,
k
,
l
,
m
, x
w
)
2
x
_
+
l
n
i=0
_
f(
j
,
i
,
l
,
m
, x
w
) f(
j
,
k
,
l
,
m
, x
w
)
_
1
n
+ 1
=a(
j
,
k
, x
w
)(f(
j
,
k
,
l
,
m
, x
w
+
x
) f(
j
,
k
,
l
,
m
, x
w
))
+b(
j
,
k
, x
w
)(f(
j
,
k
,
l
,
m
, x
w
x
) f(
j
,
k
,
l
,
m
, x
w
))
+
l
_
f(
j
,
l
,
m
, x
w
) f(
j
,
k
,
l
,
m
, x
w
)
_
,
(6)
where
a(
j
,
k
, x
w
) =
1
2
_
2
k
x
2
w
2
x
+
j
x
w
x
_
, b(
j
,
k
, x
w
) =
1
2
_
2
k
x
2
w
2
x
j
x
w
x
_
,
and
f(
j
,
l
,
m
, x
w
) =
1
n
+ 1
n
i=0
f(
j
,
i
,
l
,
m
, x
w
).
Remark 3.4 a(
j
,
k
, x
w
) is the birth rate and b(
j
,
k
, x
w
) is the death rate, which should
always be nonnegative. If for some values of x, ,and in their ranges, one of the rates becomes
negative, then we always make the negative rate positive by making
x
small. And
f(
j
,
l
,
m
, x
w
)
is the mean of f on .
Clearly, A
(t) be Equation (4) depending upon whether the driving process is ((t), X(t)) or
(
(t), X
x
(t)). By Theorem 3.2, when is small, the recursive algorithm computes the posterior
16
for the approximate model (
, X
x
,
Y
, X
x
), A by A
,
Y by
Y
to replace P,
then Assumptions 2.1 - 2.5 also hold for (
, X
x
,
Y
). Let (
, X
, X
x
(t)). To present the ltering equation for the approximate model,
the discretized approximations of
t
and (f, t) in Denitions 3.1 and 3.2 are dened as follows:
Denition 3.3 Let
,t
be the conditional probability mass of (
, X
(t)) given F
t
.
Denition 3.4 Let
(f, t) = E
P
[f(
, X
(t)) | F
t
] =
,,,,x
f(, , , , x)
,t
(, , , , x),
where (, , , , x) covers all lattices in the discretized state space, be the conditional expectation
of f(
, X
(t)) given F
t
.
Applying Theorem 3.1, we obtain the similar ltering equation for the approximate model:
(f, t) =
(f, 0) +
_
t
0
(A
f, s)ds +
n
k=1
_
t
0
_
(fp
k
, s)
(p
k
, s)
(f, s)
_
dY
,k
(s), (7)
where Y
,k
is the k-th component of
Y
.
Similarly, the above ltering equation can be separated into the propagation equation:
(f, t
i+1
) =
(f, t
i
) +
_
t
i+1
t
i
(A
f, s)ds. (8)
17
and the updating equation (assuming that a trade at kth price level occurs at time t
i+1
):
(f, t
i+1
) =
(fp
k
, t
i+1
)
(p
k
, t
i+1
)
. (9)
Step 3: Convert Equations (8) and (9) to the recursive algorithm. First, we dene the approx-
imate posterior that the recursive algorithm computes.
Denition 3.5 The posterior of the approximate model, (
(t),
, X
(t), Y
(t)), at time t is
denoted by
p
(
j
,
k
,
l
,
m
, x
w
; t) = P{
=
j
,
(t) =
k
,
=
l
,
=
m
, X
x
(t) = x
w
| F
t
}.
The core of the conversion is to take f as the following indicator function:
I
{
=
j
,
(t)=
k
,
=
l
,
=
m
,X
=x
w
}
(
, X
)
def
= I(
j
,
k
,
l
,
m
, x
w
). (10)
Then, the following facts emerge:
(I(
j
,
k
,
l
,
m
, x
w
)) = p
(
j
,
k
,
l
,
m
, x
w
; t),
and
(a(
, X
)I(
j
,
k
,
l
,
m
, x
w
+
x
)) = a(
j
,
k
, x
w1
)p
(
j
,
k
,
l
,
m
, x
w1
; t).
18
Along with three similar results,
(A
(A
I, t) =a(
j
,
k
, x
w1
)p
(
j
,
k
,
l
,
m
, x
w1
; t)
+b(
j
,
k
, x
w+1
)p
(
j
,
k
,
l
,
m
, x
w+1
; t)
(a(
j
,
k
, x
w
) +b(
j
,
k
, x
w+1
))p
(
j
,
k
,
l
,
m
, x
w
; t)
+
l
_
p(
j
,
l
,
m
, x
w
; t) p(
j
,
k
,
l
,
m
, x
w
; t)
_
,
where
p(
j
,
l
,
m
, x
w
; t) =
1
n
+ 1
n
i=0
p(
j
,
i
,
l
,
m
, x
w
; t).
Therefore, the propagation equation in Equation (8) becomes
p
(
j
,
k
,
l
,
m
, x
w
; t
i+1
) = p
(
j
,
k
,
l
,
m
, x
w
; t
i
)
+
_
t
i+1
t
i
_
a(
j
,
k
, x
w1
)p
(
j
,
k
,
l
,
m
, x
w1
; t)
(a(
j
,
k
, x
w
) +b(
j
,
k
, x
w+1
))p
(
j
,
k
,
l
,
m
, x
w
; t)
+b(
j
,
k
, x
w+1
)p
(
j
,
k
,
l
,
m
, x
w+1
; t)
+
l
_
p(
j
,
l
,
m
, x
w
; t) p(
j
,
k
,
l
,
m
, x
w
; t)
_
(11)
If a trade at kth price level occurs at time t
i+1
, the updating Equation (9) can be written as,
p
(
j
,
k
,
l
,
m
, x
w
; t
i+1
) =
p
(
j
,
k
,
l
,
m
, x
w
; t
i+1
)p(y
k
|x
w
,
m
)
,k
,l
,m
,w
p
(
j
,
k
,
l
,
m
, x
w
; t
i+1
)p(y
k
|x
w
,
m
)
, (12)
where the summation is over the latticized spaces of , , , and X(t
i+1
). Note that p(y
k
|x
w
,
m
),
the transition probability from x
w
to y
k
, is specied in Equation (15) in Appendix A.
Next, we make Equation (11) a recursive algorithm. Equation (11) is deterministic and Euler
scheme is employed for approximation. After excluding the probability-zero event that two or more
19
jumps occur at the same time, there are two possible cases for the inter-trading time. Case 1, if
t
i+1
t
i
LL, the length controller, then we can approximate p(
j
,
k
,
l
,
m
, x
w
; t
i+1
) as
p(
j
,
k
,
l
,
m
, x
w
; t
i+1
) p(
j
,
k
,
l
,
m
, x
w
; t
i
)
+
_
a(
j
,
k
, x
w1
)p(
j
,
k
,
l
,
m
, x
w1
; t
i
)
_
a(
j
,
k
, x
w
) +b(
j
,
k
, x
w
)
_
p(
j
,
k
,
l
,
m
, x
w
; t
i
)
+b(
j
,
k
, x
w+1
)p(
j
,
k
,
l
,
m
, x
w+1
; t
i
)
+
l
_
p(
j
,
l
,
m
, x
w
; t) p(
j
,
k
,
l
,
m
, x
w
; t)
_
_
(t
i+1
t
i
).
(13)
Case 2, if t
i+1
t
i
> LL, then we can choose a ne partition {t
i,0
= t
i
, t
i,1
, . . . , t
i,n
= t
i+1
}
of [t
i
, t
i+1
] such that max
j
|t
i,j+1
t
i,j
| < LL and then approximate p(
j
,
k
,
m
, v
l
; t
i+1
) by
applying repeatedly the recursive algorithm given by Equation (13) from t
i,0
to t
i,1
, then t
i,2
,. . . ,
until t
i,n
= t
i+1
.
Equations (12) and (13) consist of the recursive algorithm we employ to calculate the approxi-
mate posterior at time t
i+1
for (, (t
i+1
), , , X(t
i+1
)) based on the posterior at time t
i
.
Finally, we choose a reasonable prior. We assume the independence between X(0) and (, (0), , ).
Set P{X(0) = Y (t
1
)} = 1 where Y (t
1
) is the rst trade price of a data set because they are very
close. If there is no special information of (, (0), , ) available, we may simply assign uniform
distributions to the latticized state space of (, (0), , ) and obtain the prior at t = 0 as,
p(
j
,
k
,
l
,
m
, x
w
; 0) =
_
_
1
(1+n
)(1+n
)(1+n
)(1+n
)
if x
w
= Y (t
1
)
0 otherwise
.
20
3.3.1 Consistency of the Recursive Algorithm
There are two approximations in the construction of the recursive algorithm. The rst is to approach
the integral in the propagation equation (11) by Euler scheme, whose convergence is well-known.
The second is more important. It is the approximation of the ltering Equation (3) (the optimum
lter) by the ltering Equation (7) of the approximate model (the approximate optimum lter).
Since (
, v
v
) (
(
j
,
k
,
m
, v
l
; t). According to Theorem 3.2, p
(
j
,
k
,
m
, v
l
; t) converges
to an expression related to the true posterior (see below). In order to identify this expression, we
rst dene some neighborhoods.
Dene the neighborhood of
j
as N
j
= {
j
0.5
<
j
+ 0.5
j
,(t)N
k
,N
l
,N
m
,X(t)N
x
w
}
(, (t), , , X(t))|F
Y
t
_
.
Note that the indicator function dened in the above p(
j
,
k
,
l
,
m
, x
w
; t) becomes the indi-
cator function dened in Equation (10) in the approximate model. Therefore, Theorem 3.2 implies
p
(
j
,
k
,
m
, v
l
; t) p(
j
,
k
,
m
, v
l
; t) as 0.
21
4 Simulation and Real Data Examples
The consistent (or robust) recursive algorithm for computing the joint posterior and Bayes esti-
mates, the margin posterior means, is extensively tested and validated on simulated data. One
simulation example is provided to demonstrate that the Bayes estimates for stochastic volatility
are able to capture the moving of volatility quickly and are close to the true values. Then, the
recursive algorithm is applied in detail to two months of transaction prices of Microsoft and the
stochastic volatility estimates can be obtained in real-time.
4.1 Simulation Study
Extensive simulation studies are done and one of them is reported below.
4.1.1 Simulated Data and its Micro-movement Features
In the following example, these parameters are picked for the jumping stochastic volatility models:
= 4.5 10
8
, corresponding to the annualized expected return 27.38% with annualized factor
260 days and each day with 6.5 business hours; and = 3.75 10
4
, , which means one change of
volatility in 1/3.75 10
4
= 2666.67 seconds averagely. The range of volatility is [0.00004, 0.0004],
corresponding to the annualized range of [9.866%, 98.66%]. Since a(t) has no impact in estimation
and noise, the trading intensity is assumed to be constant: a(t) = 0.9 for all t > 0(i.e., one trade in
about 1/0.9 = 1.11 seconds). For the parameters of noise, in order to show the model can produce
the micro-movement features of actual transaction price data, I choose the parameters close to
those of the Microsoft data set, which is discussed in Section 4.2. Let = 0.2, = 0.4, and = .2.
Using these parameters, 90,000 observations are simulated.
Figures 1 has two pair of density histograms of price changes and of the fractional parts of price.
22
The upper pair are produced from a simulated data and the lower pair from the actual Microsoft
data. Their similarity shows that by incorporating discrete, non-clustering, and clustering noises,
the proposed model possesses the micro-movement features of the actual prices.
4.1.2 Bayes Estimates for the Simulated Data
A Fortran program for the recursive algorithm is constructed to calculate, at each trading time t
i
,
the joint posterior of (, (t), , , X(t)), their marginal posteriors, their Bayes estimates and their
standard errors, respectively.
For time-invariant parameters (, , ), they converge to their true values and the two-SE
bounds become smaller and smaller, and goes to zero as in the case of GBM in Zeng (2002). Hence,
only the nal Bayes estimates, their SE, and true values are presented in Table 1. The true values
are close to the Bayes estimates and all within two SE bounds.
Estimating stochastic volatility is the focus of this paper. Figure 2 shows how the Bayesian
estimates of volatility evolve in comparison of the time-varying true values of volatility, (t), for
all 90,000 data. Overall, we see that the Bayes estimates of volatility are close to their true values.
As the true volatility changes, the Bayes estimates catch up with the movement quickly. Figure 3
presents them for about the last 25,000 data. We observe that most of the true values of stochastic
volatility are within the two SE bounds.
4.2 An Application to Real Data
The tested recursive algorithm is applied to a two-month (January and February, 1994, 40 business
days) transaction data of Microsoft.
23
4.2.1 The Data
The data are extracted from the Trade and Quote (TAQ) database distributed by NYSE. We
apply standard procedures to lter the data but with one important exception. Previous studies
cannot handle multiple trades at a given point in time and they exclude those trades with zero
time duration. Autoregressive conditional duration model proposed by Engle & Russell (1998) is
such an example. The present method can handle such cases and we keep all zero durations. The
nal sample has 49,937 observations.
Based on the relative frequencies of the fractional parts of the price for Microsoft, we may use
the method of relative frequencies, a variant ot method of moments, to estimate = .2414, and
= .3502. The empirical distribution of trade waiting times does not support a pure exponential
distribution for duration, but does support a mixed exponential distribution, which is consistent
with the assumption that the total trading intensity, a(t), can depend on time. The mean duration
is 18.63 seconds with a standard error 30.01 seconds.
4.2.2 Bayes Estimates for Volatility: JSV-GBM vs. GBM
The recursive algorithm for JSV-GBM is applied to the transaction data of Microsoft. For compar-
ison, the recursive algorithm for GBM is applied to the same data set to obtain Bayes estimates.
Table 2 presents the nal annualized (with annual factor 260) Bayes estimates and their SE
bounds for , , for JSV-GBM, and for GBM. For JSV-GBM, the jump intensity, , is 5176.79,
which means there are 5176.79 changes in volatility annually, or about 20 times daily. It is observed
that intraday volatility has U-shaped (OHara (1995)): higher volatility in opening and closing of
the market. So volatility changes at least twice a day. The large is consistent with the observation
and indicates the volatility changes even more frequent. The parameter for non-clustering noise,
, is reduced signicantly in JSV-GBM, probably because more price variation is explained in
24
stochastic volatility.
Figure 4 shows how the Bayesian estimates of volatility for JSV-GBM and GBM evolve and
demonstrate how dierent the volatility estimates are for the two models. In GBM, the volatility
is assumed to be constant, and its estimates tend to be stable and fail to capture the movement
of volatility. In JSV-GBM, the stochastic volatility feature in the Microsoft data is clearly demon-
strated. This is also clearly conrmed by the model selection approach of Bayes Factor via ltering
developed in Zeng & Kouritzin (2002). To avoid crowdedness in picture, Figure 5 presents only the
last 5,000 Bayesian estimates of volatility for JSV-GBM and their estimated two-SE bounds. We
can see the volatility estimates varies greatly. Sometimes it moves continuously and sometimes it
jumps just as the model suggests. Overall, the smallest volatility estimate is 5.424% annually, and
the largest is 72.58%.
5 Conclusions
A unied Bayesian estimation via ltering approach is developed for estimating stochastic volatility
for a class of micro-movement models, which capture the impact of noise in micro-level. The class of
models has an important feature in that it can be formulated as a ltering problem with counting
process observations. Under this formulation, the whole sample paths are observable, and the
complete tick data information is used in Bayes parameter estimation via ltering. A consistent
recursive algorithm is developed to compute the Bayes estimates for the parameters in the model,
especially, the stochastic volatility. Simulation studies show that Bayes estimates for time-invariant
parameters are consistent, and Bayes estimates for stochastic volatility are close to their true values
and are able to capture the movement of volatility quickly. The recursive algorithm is fast and
feasible for large data sets and it has the recursive feature allowing quick and easy update. The
25
recursive algorithm is applied to Microsofts transaction data and we obtain Bayes estimates and
provide strong armative evidence that volatility changes even more dramatically in trade-by-trade
level.
The model and its Bayes estimation via ltering equation can be extended to jump-diusion
process for the value process, and other kinds of noise according to the sample characteristics of
data. The models and the Bayes estimation can be applied to other asset markets such as exchange
rates and commodity prices. It can also apply to assess the quality of security market, and to
compare information ows and noises in dierent periods and dierent markets.
A Appendix A
To formulate the biasing rule, we rst dene a classifying function r(),
r(y) =
_
_
3 if the fractional part of y is odd eighth
2 if the fractional part of y is odd quarter
1 if the fractional part of y is a half or zero
. (14)
The biasing rules specify the transition probabilities from x
to y, p(y|x
|x
) = 1,
if r(x
) = 1 or 2; p(x
|x
) = 1 , if r(x
) = 3; p(x
+
1
8
|x
) = , if x
floor(x
) =
1
8
or
5
8
; p(x
1
8
|x
) = , if x
floor(x
) =
3
8
or
7
8
; p(x
1
8
|x
) = , if x
floor(x
) =
1
8
or
5
8
;
p(x
+
1
8
|x
) = , if x
floor(x
) =
3
8
or
7
8
. Note that the floor(x) function gives the integer part
of x.
Then, p(y|x), the transition probability can be computed through p(y|x) =
x
p(y|x
)p(x
|x)
where p(y|x
R[x,
1
8
])}. Suppose D = 8|p
26
R[x,
1
8
]|. Then, p(y|x) is
p(y|x) =
_
_
(1 )(1 ) if r(y) = 3 and D = 0
1
2
(1 )(1 )
D
if r(y) = 3 and D 0
(1 )(1 +) if r(y) = 2 and D = 0
1
2
(1 )[ +(2 +
2
)] if r(y) = 2 and D = 1
1
2
(1 )
D1
[ +(1 +
2
)] if r(y) = 2 and D 2
(1 )(1 +) if r(y) = 1 and D = 0
1
2
(1 )[ +(2 +
2
)] if r(y) = 1 and D = 1
1
2
(1 )
D1
[ +(1 +
2
)] if r(y) = 1 and D 2
. (15)
References
Ball, C. (1988), Estimation bias induced by discrete security prices, Journal of Finance 43, 841
865.
Black, F. (1986), Noise, Journal of Finance 41, 529543.
Bollerslev, T., Chou, R. & Kroner, K. (1992), Arch modeling in nance: A review of the theory
and empirical evidencs, Journal of Econometrics 52, 5 59.
Bremaud, P. (1981), Point Processes and Queues:Martingale Dynamics, Springer-Verlag, New York.
Cho, D. & Frees, E. W. (1988), Estimating the volatility of discrete stock prices, Journal of
Finance 43, 451466.
Elliot, R., Malcolm, W. & Tsoi, A. H. (2002), Robust parameter estimation for asset price mod-
els with markove modulated volatilities. To appear in Journal of Economic Dynamics and
Control.
Engle, R. & Russell, J. (1998), Autoregressive conditional duration: A new model for irregularly
spaced transaction data, Econometrica 66, 11271162.
Gottlieb, G. & Kalay, A. (1985), Implications of the discreteness of observed stock prices, Journal
of Finance 40, 135153.
Harris, L. (1990), Estimation of stock price variances and serial covariances from discrete observa-
tions, Journal of Financial and Quantitative Analysis 25, 291306.
Harris, L. (1991), Stock price clustering and discreteness, Review of Financial Studies 4, 389415.
27
Hasbrouck, J. (1996), Modeling market microstructure time series, in G. Maddala & C. Rao, eds,
Handbook of Statistics, Vol. 14, Elsevier, pp. 647692.
Heston, S. (1993), A closed-form solution for options with stochastic volatility with applications
to bond and currency options, The Review of Financial Studies 6, 327343.
Hull, J. & White, A. (1987), The pricing of options on assets with stochastic volatilities, Journal
of Finance 42, 281300.
Kushner, H. & Dupuis, P. (1994), Numerical Methods for Stochastic Control Problems in Contin-
uous Time, Springer-Verlag, New York.
Nelson, D. B. (1990), Arch models as diusion approximations, Journal of Econometrics 45, 739.
OHara, M. (1995), Market Microstructure Theory, Blackwell, Oxford.
Zeng, Y. (2002), A partially-observed model for micro-movement of asset prices with bayes esti-
mation via ltering. to appear in Mathematical Finance.
Zeng, Y. & Kouritzin, M. (2002), Bayesian model selection via ltering: Application to partially-
observed micro-movement models of asset prices.
28
Table 1: Bayes Estimates for a Simulated Data
Parameter True Values Bayes Est. Standard Errors
4.5e-8 5.555e-7 3.874e-7
3.75e-4 4.733e-4 7.149e-5
0.20 0.2018 .0013
Table 2: Bayes Estimates for Transaction Data of MSFT, Jan. and Feb. 1994
Model
GBM
with 12.89% 5176.79 0.2070
JSV (24.91%) (392.72) (.0024)
GBM
only 23.01% NA 0.2226
(83.27%) NA (0.0017)
Bayes estimates of and are annualized with annual factor 260.
Standard errors are in parentheses
29