Fast Strong Approximation Monte-Carlo Schemes For Stochastic Volatility Models

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Fast strong approximation Monte-Carlo schemes for

stochastic volatility models


Christian Kahl

Peter J ackel

First version: 28th September 2005


This version: 22nd May 2006
Abstract
Numerical integration methods for stochastic volatility models in nancial markets are discussed.
We concentrate on two classes of stochastic volatility models where the volatility is either directly
given by a mean-reverting CEV process or as a transformed Ornstein-Uhlenbeck process. For the
latter, we introduce a new model based on a simple hyperbolic transformation. Various numerical
methods for integrating mean-reverting CEV processes are analysed and compared with respect
to positivity preservation and efciency. Moreover, we develop a simple and robust integration
scheme for the two-dimensional system using the strong convergence behaviour as an indicator for
the approximation quality. This method, which we refer to as the IJK (4.47) scheme, is applicable
to all types of stochastic volatility models and can be employed as a drop-in replacement for the
standard log-Euler procedure.
Acknowledgment: The authors thank Vladimir Piterbarg and an anonymous referee for helpful com-
ments and suggestions.
1 Introduction
Numerical integration schemes for differential equations have been around nearly as long as the formal-
ism of calculus itself. In 1768, Euler devised his famous stepping method [Eul68], and this scheme has
remained the fallback procedure in many applications where all else fails as well as the benchmark in
terms of overall reliability and robustness any new algorithm must compete with. Many schemes have
been invented since, and for most engineering purposes involving the numerical integration of ordinary
or partial differential equations there are nowadays a variety of approaches available.
With the advent of formal stochastic calculus in the 1920s and the subsequent application to real
world problems came the need for numerical integration of dynamical equations subject to an external
force of random nature. Again, Eulers method came to the rescue, rst suggested in this context by
Maruyama [Mar55] whence it is also sometimes referred to as the Euler-Maruyama scheme [KP99].
An area where the calculus of stochastic differential equations became particularly popular is the
mathematics of nancial markets, more specically the modelling of nancial movements for the pur-
pose of pricing and risk-managing derivative contracts.
Most of the early applications of stochastic calculus to nance focussed on approaches that permit-
ted closed form solutions, the most famous example probably being the Nobel prize winning article

Quantitative Analytics Group, ABN AMRO, 250 Bishopsgate, London EC2M 4AA, UK, and Department of Mathemat-
ics, University of Wuppertal, Gaustrae 20, Wuppertal, D-42119, Germany

Global head of credit, hybrid, commodity, and ination derivative analytics, ABN AMRO, 250 Bishopsgate, London
EC2M 4AA, UK
1
by Fischer Black and Myron Scholes [BS73]. With increasing computer power, researchers and prac-
titioners began to explore avenues that necessitated semi-analytical evaluations or even required fully
numerical treatment.
A particularly challenging modelling approach involves the coupling of two stochastic differential
equations whereby the diffusion term of the rst equation is explicitly perturbed by the dynamics of
the second equation: stochastic volatility models. These became of interest to nancial practitioners
when it was realised that in some markets deterministic volatility models do not represent the dynamics
sufciently. Alas, the rst publications on stochastic volatility models [Sco87, Wig87, HW88] were
ahead of their time: the required computer power to use these models in a simulation framework was
simply not available, and analytical solutions could not be found. One of the rst articles that provided
semi-analytical solutions was published by Stein and Stein [SS91]. An unfortunate feature of that model
was that it did not give enough exibility to represent observable market prices, i.e. it did not provide
enough degrees of freedom for calibration. In 1993, Heston [Hes93] published the rst model that
allowed for a reasonable amount of calibration freedom permitting semi-analytical solutions. Various
other stochastic volatility models have been published since, and computer speed has increased signi-
cantly. However, despite the fact that at the time of this writing computer power makes fully numerical
treatment of stochastic volatility a real possibility, comparatively little research has been done on the
subject of efcient methods for the numerical integration of these models. In this article, we present and
discuss some techniques that help to make the use of fully numerically integrated stochastic volatility
models a viable alternative to semi-analytic solutions, despite the fact that major advances on the ef-
cient implementation of Hestons model have been made [KJ05]. In section 2, we present the specic
stochastic volatility models that we subsequently use in our demonstrations of numerical integration
methods, and discuss some of their features in the context of nancial markets modelling. In section 3,
we elaborate on specic methods suitable for the volatility process in isolation. Next, in section 4, we
discuss techniques that accelerate the convergence of the numerical integration of the combined sys-
tem of stochastic volatility and the directly observable nancial market variable both with respect to
the discretisation renement required and with respect to CPU time consumed. This is followed by the
presentation of numerical results in section 5. Finally, we conclude.
2 Some stochastic volatility models
We consider stochastic volatility models of the form
dS
t
= S
t
dt + V
p
t
S
t
dW
t
(2.1)
where S describes the underlying nancially observable variable and V , depending on the coefcient p
given by the specic model, represents either instantaneous variance (p =
1
/
2
) or instantaneous volatility
(p = 1).
As for the specic processes for instantaneous variance or volatility, we distinguish two different
kinds. The rst kind is the supposition of a given stochastic differential equation directly applied to the
instantaneous variance process. Since instantaneous variance must never be negative for the underlying
nancial variable to remain on the real axis, we specically focus on a process for variance of the
form [Cox75, CR76, Bec80, AA00, CKLS92]
dV
t
= ( V
t
)dt + V
q
t
dZ
t
, V
t
0
= V
0
. (2.2)
with , , , q > 0, and p =
1
/
2
in equation (2.1). We assume the driving processes W
t
and Z
t
to be
correlated Brownian motions satisfying dW
t
dZ
t
= dt.
The second kind of stochastic volatility model we consider is given by a deterministic transformation

t
=
0
f(y
t
) , f : R R
+
, (2.3)
2
with f() being strictly monotonic and differentiable, of a standard Ornstein-Uhlenbeck process
dy
t
= y
t
dt +

2dZ
t
, y
t
0
= y
0
, (2.4)
setting V
t
=
t
and p = 1 in equation (2.1). The transformation f() is chosen to ensure that 0
for the following reason. It is, in principle, possible to argue that instantaneous volatility is undened
with respect to its sign. However, when volatility and the process it is driving are correlated, a change
of sign in the volatility process implies a sudden change of sign in effective correlation, which in turn
implies a reversal of the conditional forward Black implied volatility skew, and the latter is a rather
undesirable feature to have for reasons of economic realism. As a consequence of this train of thought,
we exclude the Stein & Stein / Sch obel & Zhu model [SS91, SZ99] which is encompassed above by
setting f(y) = y.
In order to obtain a better understanding of the different ways to simulate the respective stochastic
volatility model we rst give some analytical properties of the different approaches.
2.1 The mean-reverting CEV process
By mean-reverting CEV process we mean the family of processes described by the stochastic differential
equation (2.2). Hestons model, for instance, is given by q =
1
/
2
with p =
1
/
2
in the process for the
underlying (2.1). The family of processes described by (2.2) has also been used for the modelling of
interest rates [CKLS92].
For the special case q =
1
/
2
, i.e. for the Heston variance process, the stochastic differential equation
is also known as the Cox-Ingersoll-Ross model [CIR85]. In that case, the transition density is known
analytically as
p(t
0
, t, V
t
0
, V
t
) =
2
d
(V
t
, ) (2.5)
with
=
4

2
(1 e
t
)
(2.6)
=
4e
t

2
(1 e
t
)
V
t
0
(2.7)
t = t t
0
(2.8)
d =
4

2
(2.9)
where
2
d
(x, ) denotes the noncentral chi-square density of variable x with d degrees of freedom and
non-centrality parameter . Broadie and Kaya used this transition density for the Monte-Carlo simula-
tion of European options [BK04].
With q = 1, equation (2.2) turns into a stochastic differential equation which is afne in the drift and
linear in the diffusion also known as the Brennan-Schwartz model [BS80]. To the best of our knowledge,
there are no closed form explicit solutions for this equation allowing for a fully analytical expression,
despite its apparent simplicity. A formal solution for equations of the form
dX
t
= (a
1
(t)X
t
+ a
2
(t)) dt + (b
1
(t)X
t
+ b
2
(t)) dW
t
, X
t
0
= X
0
, (2.10)
is described in [KP99, Chap. 4.2 eq. (2.9)] as
X
t
=
t
0
,t

X
0
+
t

t
0
a
2
(s) b
1
(s)b
2
(s)

t
0
,s
ds +
t

t
0
b
2
(s)

t
0
,s
dW
s

(2.11)
with
t
0
,t
given by [KP99, Chap. 4.2 eq. (2.7)]

t
0
,t
= e
R
t
t
0
(
a
1
(s)
1
2
b
2
1
(s)
)
ds+
R
t
t
0
b
1
(s)dW
s
. (2.12)
3
Applying this to equation (2.2) with a
1
(t) = , a
2
(t) = , b
1
(t) = and b
2
(t) = 0 leads to

t
0
,t
= e

(
+
1
2

2
)
(tt
0
)+(W
t
W
t
0
)
. (2.13)
as well as
X
t
= e

2
2

t+W
t

X
0
+
t

t
0
e

2
2

sW
s
ds

. (2.14)
The functional form of solution (2.14) is somewhat reminiscent of the payoff function of a continuously
monitored Asian option in a standard Black-Scholes framework, and thus it may be possible to derive
the Laplace transform of the distribution of X
t
analytically following the lead given by Geman and
Yor [GY93]. However, whilst this is noteworthy in its own right, it is unlikely to aid in the develop-
ment of fast and efcient numerical integration schemes for Monte Carlo simulations, especially if the
ultimate aim is to use the process X to drive the diffusion coefcient in a second stochastic differential
equation.
Beyond the cases q = 0, q =
1
/
2
, and q = 1, as far as we know, there are no analytical or semi-
analytical solutions. Nevertheless, we are able to discuss the boundary behaviour solely based on our
knowledge of the drift and diffusion terms:
1. 0 is an attainable boundary for 0 < q <
1
/
2
and for q =
1
/
2
if <

2
/
2
2. 0 is unattainable for q >
1
/
2
3. is unattainable for all q > 0.
These statements can be conrmed by the aid of Fellers boundary classication which can be found
in [KT81]. The stationary distribution of this process can be calculated as (see [AP04, Prop. 2.4])
(y) = C(q)
1
y
2q
e
M(y,q)
, C(q) =

0
y
2q
e
M(y,q)
dy (2.15)
with the auxiliary function M(y, q) given by
1. q =
1
/
2
M(y, q) =
2

2
( ln(y) y) (2.16)
2. q = 1
M(y, q) =
2

2
(/y ln(y)) (2.17)
3. 0 < q <
1
/
2
and
1
/
2
< q < 1
M(y, q) =
2

y
12q
1 2q

y
22q
2 2q

. (2.18)
The above equations can be derived from the Fokker-Planck equation which leads to an ordinary differ-
ential equation of Bernoulli type. The rst moment of the process (2.2) is given by
E[V
t
] = (V
t
0
)e
t
+ . (2.19)
We can also calculate the second moment for q =
1
/
2
or q = 1 :
E

V
2
t

e
2t
(
e
t
1
)(
2V
0
+
(
e
t
1
)

)(

2
+2
)
2
for q =
1
/
2
2e
2t

e
2t

2
)
+e
t
(V
0
)
(
2
2
)
+e

2
t
(
V
0(

2
2
)
+
)

4
3
2
+2
2
for q = 1 .
(2.20)
4
This means that in the case q = 1, for
2
> which is typically required in order to calibrate to the
market observable strongly pronounced implied-Black-volatility skew, the variance of volatility grows
unbounded, despite the fact that the model appears to be mean-reverting. For long dated options, this is
a rather undesirable feature to have. On the other hand, in the case q =
1
/
2
, for
2
> , instantaneous
variance can attain zero, which is also undesirable for economical reasons. In addition to that, for
the modelling of path dependent derivatives, the model (2.2) requires the use of numerical integration
schemes that preserve the analytical properties of the variance process such as to remain on the real axis,
or to simply stay positive. In the next section, we discuss alternatives for the generation of the stochastic
volatility process that make the integration of volatility itself practically trivial.
2.2 Transformed Ornstein-Uhlenbeck
The origin of this process goes back to Uhlenbeck and Ornsteins publication [UO30] in which they
describe the velocity of a particle that moves in an environment with friction. Doob [Doo42] rst
treated this process purely mathematically and expressed it in terms of a stochastic differential equation.
In modern nancial mathematics, the use of Ornstein-Uhlenbeck processes is almost commonplace.
The attractive features of an Ornstein-Uhlenbeck process are that, whilst it provides a certain degree of
exibility over its auto-correlation structure, it still allows for the full analytical treatment of a standard
Gaussian process.
In this article, we chose the formulation (2.4) to describe the Ornstein-Uhlenbeck process since we
prefer a parametrisation that permits complete separation between the mean reversion speed and the
variance of the limiting or stationary distribution of the process. The solution of (2.4) is
Y
t
= e
t

y
0
+
t

0
e
u

2dZ
u

(2.21)
with initial time t
0
= 0. In other words, the stochastic process at time t is Gaussian with
Y
t
N

y
0
e
t
,
2

1 e
2t

(2.22)
and thus the stationary distribution is Gaussian with variance
2
: a change in parameter requires no
rescaling of if we wish to hold the long-term uncertainty in the process unchanged. It is straightfor-
ward to extend the above results to the case when (t) and (t) are functions of time [J ac02]. Since the
variance of the driving Ornstein-Uhlenbeck process is the main criterion that determines the uncertainty
in volatility for the nancial underlying process, all further considerations are primarily expressed in
terms of
(t) :=

1 e
2t
. (2.23)
There are fundamental differences between the requirements in the nancial modelling of underlying
asset prices, and the modelling of instantaneous stochastic volatility, or indeed any other not directly
market-observable quantity. For reasons of nancial consistency, we frequently have to abide by no-
arbitrage rules that impose a specic functional form for the instantaneous drift of the underlying. In
contrast, the modelling of stochastic volatility is typically more governed by long-term realism and
structural similarity to real-world dynamics, and no externally given drift conditions apply. No-arbitrage
arguments and their implied instantaneous drift conditions are omnipresent in nancial arguments, and
as a consequence, most practitioners have become used to thinking of stochastic processes exclusively
in terms of an explicit stochastic differential equation. However, when there are no explicitly given
conditions on the instantaneous drift, it is, in fact, preferable to model a stochastic process in the most
analytically convenient form available. In other words, when preferences as to the attainable domain
of the process are to be considered, it is in practice much more intuitive to start with a simple process
of full analytical tractability, and to transform its domain to the target domain by virtue of a simple
analytical function. For the modelling of stochastic volatility, this means that we utilise the exible yet
5
tractable nature of the Ornstein-Uhlenbeck process (2.4) in combination with a strictly monotonic and
differentiable mapping function f : R R
+
.
One simple analytical transformation we consider is the exponential function, and the resulting
stochastic volatility model was rst proposed in [Sco87, equation (7)]. The model is intuitively very
appealing: for any future point in time, volatility has a lognormal distribution which is a very comfort-
able distribution for practitioners in nancial mathematics. Alas, though, recent research [AP04] has
cast a shadow on this models analytical features. It appears that, in its full continuous formulation, the
log-normal volatility model can give rise to unlimited higher moments of the underlying nancial asset.
However, as has been discussed and demonstrated at great length for the very similar phenomenon of in-
nite futures returns when short rates are driven by a lognormal process [HW93, SS94, SS97a, SS97b],
this problem vanishes as soon as the continuous process model is replaced by its discretised approx-
imation which is why lognormal volatility models remain numerically tractable in applications. Still,
in order to avoid this problem altogether, we introduce an alternative to the exponential transformation
function which is given by a simple hyperbolic form.
In the following, we refer to
f
exp
(y) := e
y

exp
(y) :=
0
f
exp
(y) (2.24)
as the exponential volatility transformation also known as Scotts model [Sco87], and to
f
hyp
(y) := y +

y
2
+ 1
hyp
(y) :=
0
f
hyp
(y) (2.25)
as the hyperbolic volatility transformation. The densities of the exponential and hyperbolic volatilities
are given by

exp
(
exp
,
0
, ) =

f
1
exp
(

exp
/

0
),

d
exp
/ dy
(2.26)
hyp
(
hyp
,
0
, ) =

f
1
hyp
(

hyp
/

0
),

d
hyp
/ dy
(2.27)
with
f
1
exp
(

0
) = ln (

0
) (2.28)
f
1
hyp
(

0
) = (


0
/

)/ 2 (2.29)
d
exp
/ dy =
exp
(2.30)
d
hyp
/ dy =
2
0

2
hyp

2
0
+
2
hyp
(2.31)
and
(y, ) :=
e

1
2
(
y

)
2

2
. (2.32)
The hyperbolic transformation has been chosen to match the exponential form as closely as possible near
the origin, and only to differ signicantly in the regions of lower probability given for |
y
/

| > 1. The
functional forms of the exponential and hyperbolic transformation are shown in comparison in gure 1.
2.2.1 Exponential vs. Hyperbolic transformation
In gure 2, we compare the densities of the Ornstein-Uhlenbeck process transformed with (2.24)
and (2.25) given by equations (2.26) and (2.27). At rst glance on a linear scale, we see a reason-
able similarity between the two distributions. However, on a logarithmic scale, the differences in the
tails of the distributions become clear: the hyperbolic transformation has signicantly lower probability
for both very low values as well as for large values.
6
0
1
2
3
4
5
6
7
8
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
f
exp
(y)
f
hyp
(y)
Figure 1: The exponential and hyperbolic transformation functions.
0
0.5
1
1.5
2
2.5
3
3.5
4
0% 20% 40% 60% 80% 100%

exp
(,
0
,)

hyp
(,
0
,)
1e-14
1e-12
1e-10
1e-08
1e-06
0.0001
0.01
1
0% 50% 100% 150% 200%

exp
(,
0
,)

hyp
(,
0
,)
Figure 2: Densities of instantaneous volatility using the exponential and the hyperbolic transformation of the driving
Ornstein-Uhlenbeck process for
0
= 25% and =
1
/
2
. Note the distinctly different tails of the distributions.
Returning to the analytical form of the density functions (2.26) and (2.27), it is interesting to note
that, given that y = f
1
(

0
) is Gaussian, the volatility distribution implied by the inverse hyperbolic
transformation (2.29) is nearly Gaussian for large values of instantaneous volatility 0 since we
have

hyp
2
0
y for
hyp
0 . (2.33)
This feature is particularly desirable since it ensures that the tail of the volatility distribution at the
higher end is as thin as the Gaussian process itself, and thus no moment explosions are to be feared for
the underlying. Conversely, for small values of instantaneous volatility 1, the volatility distribution
implied by the hyperbolic volatility model is nearly inverse Gaussian because of

hyp

0
/
2
/ y for
hyp
0 . (2.34)
In a certain sense, the hyperbolic model can be seen as a blend of an inverse Gaussian model at the
lower end of the distribution, and a Gaussian density at the upper end, with their respectively thin tails.
In contrast, exponential volatility is simply lognormally distributed, which in turn gives rise to distinctly
fatter tails than the normal (at the high end) or inverse normal (at the low end) density.
From the basis of our complete analytical understanding of both the exponential and the hyperbolic
volatility process, we can use It os lemma to derive their respective stochastic differential equations. For
7
the exponential transformation (2.24) we obtain Scotts original SDE [Sco87, equation (7)]
d =

2
ln(/
0
)

dt +

2dZ . (2.35)
It is remarkable to see the difference in the stochastic differential equation we obtain for the hyperbolic
volatility process (2.25):
d =

6
+
4

2
0
(8
2
+ 1)
2

4
0

6
0
(
2
+
2
0
)
3
dt +

8

0

2
+
2
0
dZ . (2.36)
The complexity of the explicit form (2.36) of the hyperbolic volatility process may help to explain
why it has, to the best of our knowledge, not been considered before. As we know, though, the re-
sulting process is of remarkable simplicity and very easy to simulate directly, whilst overcoming some
of the weaknesses of the (calibrated) CIR/Heston process (namely that zero is attainable), as well as
the moment divergence when volatility or variance is driven by lognormal volatility as incurred by the
Brennan-Schwartz process for volatility and Scotts model.
The moments of the exponential transformation function are
E[(f
exp
(y))
m
] = e
1
2
m
2

2
. (2.37)
For the hyperbolic transformation we obtain the general solution
E[(f
hyp
(y))
m
] =
2
3
2
n

1+n
2

1
F
1
(
n
2
, 1 n,
1
2
2
) +
2

3
2
n

4

n

1n
2

1
F
1
(
n
2
, 1 + n,
1
2
2
) (2.38)
in terms of Kummers hypergeometric function
1
F
1
. The rst two moments, specically, are given by
E

(f
hyp
(y))
1

2 U(
1
2
, 0,
1
2
2
) (2.39)
E

(f
hyp
(y))
2

= 1 + 2
2
(2.40)
where U(a, b, z) is the logarithmic conuent hypergeometric function. More revealing than the closed
form for the moments of the respective transformation functions is an analysis based on their Taylor
expansions
f
exp
(y) = 1 + y +
y
2
2
+O

y
3

(2.41)
f
hyp
(y) = 1 + y +
y
2
2
+O

y
4

. (2.42)
Thus, for both of these functions,
(f

(y))
n
= 1 + n y +

n
2

+
n
2

y
2
+O

y
3

. (2.43)
Since y is normally distributed with mean 0 and variance
2
, and since all odd moments of the Gaussian
distribution vanish, this means that for both the exponential and the hyperbolic transformation we have
E[(f

(y))
n
] = 1 +

n
2

+
n
2


2
+O

. (2.44)
The implication of (2.44) is that all moments of the exponential and the hyperbolic transformation
function agree up to order O(
3
). We show an example for this in gure 3. As increases, the moments
of the exponential function grow faster by a term of order O(
4
).
8
n

E[f
exp
(y)
n
]
E
[
f
hyp
(y)
n
]
for y N(0,
2
)
0
1
2
3
4
0
0.1
0.2
0.3
0.4
0.5
1.2
1.4
1.6
0
1
2
3
1.2
Figure 3: Comparison of the moments of the exponential and the hyperbolic transformation functions. Note that the oor
level is exactly 1.
3 Numerical integration of mean-reverting CEV processes
The numerical integration of the coupled stochastic volatility system (2.1) and (2.2) is composed of two
different parts. First, we have to nd an appropriate method for the approximation of the stochastic
volatility process itself, and secondly we need to handle the dynamics of the nancial underlying (2.1)
whose diffusion part is affected by the stochasticity of volatility.
Since the volatility process does not explicitly depend on the underlying, we can treat it separately.
In order to retain numerical stability and to achieve good convergence properties, it is desirable for the
numerical integration scheme of the volatility or variance process to preserve structural features such
as positivity. For the exponentially or hyperbolically transformed Ornstein-Uhlenbeck process, this is
trivially taken care of by the transformation function itself. For the mean-reverting CEV process (2.2),
however the design of a positivity preserving scheme is a task in its own right. The simplest approach,
for instance, namely the explicit Euler scheme
X
n+1
= X
n
+ ( X
n
)t
n
+ X
q
n
W
n
, (3.1)
fails to preserve positivity. The same deciency is exhibited by the standard Milstein and the Milstein+
scheme whose formul we give in appendices A.1 and A.2, respectively. The Balanced Implicit Method
(BIM) as introduced by Milstein, Platen and Schurz [MPS98], however,
X
n+1
= X
n
+ ( X
n
)t
n
+ X
q
n
W
n
+ C(X
n
)(X
n
X
n+1
) (3.2)
C(X
n
) = c
BIM
0
(X
n
)t
n
+ c
BIM
1
(X
n
)|W
n
| (3.3)
with control functions
c
BIM
0
(x) = (3.4)
c
BIM
1
(x) = x
1q
(3.5)
is able to preserve positivity as is shown in [Sch96]. Alas, this scheme only achieves the same strong or-
der of convergence as the Euler scheme, i.e.
1
/
2
. This means, whilst the Balanced Implicit Method helps
to overcome the problem of spurious negative values for variance, it does not increase the convergence
speed. In fact, when a step size is chosen such that for the specic set of parameters at hand the explicit
9
Euler scheme is usable
1
, the Balanced Implicit Method often has worse convergence properties than the
explicit Euler method. This feature of the Balanced Implicit Method is typically caused by the fact that
the use of the weight function c
BIM
1
effectively increases the unknown coefcient dominating the leading
error terms.
Another scheme that has been shown to preserve positivity for certain parameter ranges is the adap-
tive Milstein scheme [Kah04] with suitable stepsize
n
and z N (0, 1)
X
n+1
= X
n
+ ( X
n
)
n
+ X
q
n

n
z +
1
2

2
qX
2q1
n

n

z
2
1

. (3.6)
Unfortunately, this scheme requires adaptive resampling and thus necessitates the use of a pseudo-
random number pipeline which in turn disables or hinders a whole host of independently available
convergence enhancement techniques such as low-discrepancy numbers, importance sampling, strati-
cation, latin-hypercube methods, etc. An advanced method that obviates the use of pseudo-random
number pipelines is based on the combination of the Milstein scheme with the idea of balancing: the
Balanced Milstein Method (BMM)
X
n+1
= X
n
+ ( X
n
)t
n
+ X
q
n
W
n
+
1
2

2
qX
2q1
n

W
2
n
t
n

(3.7)
+D(X
n
) (X
n
X
n+1
) ,
D(X
n
) = d
BMM
0
(X
n
)t
n
+ d
BMM
1
(X
n
)

W
2
n
t
n

. (3.8)
As in the Balanced Implicit Method we can control the integration steps by using weighting functions
d
BMM
0
() and d
BMM
1
(). The choice of these weighting functions strongly depends on the structure of the
SDE. It can be shown (see [KS05, Theorem 5.9]) that the BMM preserves positivity for the mean-
reverting CEV model (2.2) with the following choice
d
BMM
0
(x) = +
1
2

2
q|x|
2q2
, (3.9)
d
BMM
1
(x) = 0 . (3.10)
The parameter [0, 1] provides some freedom for improved convergence speed but it has to be chosen
such that
t
n
<
2q 1
2q(1 )
. (3.11)
It is always safe to choose = 1, though, for improved performance, we used =
1
/
2
whenever this
choice was possible
2
.
The above mentioned integration methods, namely the standard explicit Euler scheme, the Bal-
anced Implicit Method, the Balanced Milstein Method, and the adaptive Milstein scheme, deal with
the stochastic differential equation in its original form (2.2). Another approach to integrate (2.2) whilst
preserving positivity is to transform the stochastic differential equation to logarithmic coordinates us-
ing It os lemma as suggested by Andersen and Brotherton-Ratcliffe [ABR01]. Applying this to the
mean-reverting CEV process leads to
d ln V
t
=
2( V
t
)
2
V
2q1
t
2V
t
dt + V
q1
t
dZ
t
(3.12)
which can be solved by the aid of a simple Euler scheme. The major drawback with this approach is
that, whilst the Euler scheme applied to the transformed stochastic differential equation (3.12) preserves
1
For most schemes, spurious negative values incurred as an undesirable side effect of the numerical method disappear as
the step size t is decreased. For a negative variance to appear at any one step, the drawn normal variate generating the step
typically has to exceed a certain threshold. This threshold tends to grow as step size decreases. Thus, with decreasing step
size, eventually, the threshold exceeds the maximum standard normal random number attainable on the nite representation
computer system used.
2
For q =
1
/
2
the numerator becomes zero. Despite this, positivity can be preserved with d
BMM
0
= .
10
positivity, it is also likely to become unstable for suitable time steps [ABR01]. These instabilities are a
direct consequence of the divergence of both the drift and the diffusion terms near zero. For that reason
Andersen and Brotherton-Ratcliffe suggested a moment matched log-normal approximation
V
n+1
=

+ (V
n
) e
t
n

1
2

2
n
+
n
z
, (3.13)

n
= ln

1 +
1
2

2
V
2p
n

1

1 e
2t
n

( + (V
n
) e
t
n
)
2

(3.14)
with z N (0, 1). We will refer to this integration scheme as moment matched Log-Euler in the
following. This method is at its most effective for the Brennan-Schwartz model (3.23) as we can see
in gure 7 (B) since for p = 1 the logarithmic transformation leads to an additive diffusion (3.12)
term. However, even in that case, it is outperformed by the bespoke method we call Pathwise Adapted
Linearisation which is explained in section 3.1, as well as the Balanced Milstein method (3.7). For the
Heston case, where the stochastic volatility is given by the Cox-Ingersoll-Ross equation with q =
1
/
2
which is shown in gures 5 and 6, the moment matched log-Euler method has practically no convergence
advantage over straightforward explicit Euler integration as long as the size of is reasonably small.
Contrarily, the approximation quality of all integration schemes is decisively reduced when dealing
with large as we can see in gure 5 (B). Making matters even worse, one can observe that schemes
of Milstein type are losing their strong convergence order of 1. The explanation for this behaviour is
rather simple: the Milstein method is not even guaranteed to converge at all for the mean-reverting
CEV process (2.2)! Having a closer look at the diffusion b(x) = x
q
, we recognize that for q < 1
this function is not continuously differentiable on R which is necessary for the application of stochastic
Taylor expansion techniques. Nonetheless, as long as the stochastic process is analytically positive, i.e.
x > 0 there exists a local stochastic Taylor expansion preserving strong convergence of the Milstein
method. However, when zero is attainable, the discontinuity of the rst derivative of the diffusion b(x)
reduces the strong convergence order to
1
/
2
.
In gures 5, 6, and 7 we present examples for the convergence behaviour of the different methods in
comparison. For the standard Milstein (A.5) and the Milstein+ scheme (A.15), for some of the parameter
congurations, it was necessary to oor the simulated variance values at zero since those schemes do
not preserve positivity by construction.
The depicted strong approximation convergence measure is given by the L
2
norm of the difference
between the simulated terminal value, and the terminal value of the reference calculation, averaged over
all M paths, i.e.

1
M
M

i=1

X
(n
steps
)
i
(T) X
(n
reference
)
i
(T)

2
. (3.15)
This quantity is shown as a function of average CPU time per path. This was done because the ulti-
mate criterion for the choice of any integration method in applications is the cost of accuracy in terms
of calculation time since calculation time directly translates into the amount of required hardware for
large scale computations such as overnight risk reports, or into user downtime when interactive valua-
tions are needed. This does, of course, make the results dependent on the used hardware
3
, not only in
absolute terms but also in relative terms since different processor models require different numbers of
CPU clock cycles for all the involved basic oating point operations. Nevertheless, the pathwise error
as a function of average CPU time is probably the most signicant criterion for the quality of any inte-
gration method. Examples for this consideration are the fact that in gure 6 the nominal advantage of
the moment matched Log-Euler is almost precisely offset by the additional calculation time it requires
compared to the Euler scheme, and also that in gure 7 (B) the relative performance of the Balanced
Milstein Method is compatible with the scheme denoted as Pathwise Adapted Linearisation which is
explained in section 3.1.1.
3
Throughout this article, all calculations shown were carried out on a processor from the Intel Pentium series (Family 6,
Model 9, Stepping 5, Brand id 6, CPU frequency 1700 MHz).
11
The curves in gures 5, 6 and 7 have been constructed by repeated simulation with increased num-
bers of steps in the Brownian bridge Wiener path generation in powers of two from 1 to 128:
n
steps
{1, 2, 4, 8, 16, 32, 64, 128} . (3.16)
The reference solution was always computed with 2
15
steps. The number generation mechanism used
was the Sobol algorithm [J ac02] throughout apart from gure 6 (B) where we also show the results
from using the Mersenne Twister [MN98] in comparison. Note that the results are fairly insensitive
to the choice of number generator. In addition to the methods discussed above, we also included the
results from bespoke schemes denoted as Pathwise Adapted Linearisation. These schemes are carefully
adapted to the respective equation and we introduce them in the following section.
3.1 Pathwise approximations for specic cases
Yet another approach for the numerical integration of stochastic differential equations of the form
dX = a(X)dt + b(X)dZ (3.17)
as it is the case for (2.2) is to apply Dosss [Dos77] method of constructing pathwise solutions rst used
in the context of numerical integration schemes by Pardoux and Talay [PT85]. The formal derivation of
Dosss pathwise solution can be found in [KS91, pages 295296].
In practice, Dosss method can hardly ever be applied directly since it is essentially just an exis-
tence theorem that states that any process for which there is a unique strong solution can be seen as a
transformation of the solution to an ordinary differential equation with a stochastic inhomogeneity, i.e.
a solution of the form
X = f(Y, Z) with boundary condition f(Y, Z
0
) = Y (3.18)
with
dY = g(Y, Z)dt (3.19)
implying
X
0
= Y
0
(3.20)
whereby the functions f and g can be derived constructively from the stochastic differential equation
for X:

Y
f(Y, Z) = e
R
Z
Z
0
b

(f(y,z)) dz
(3.21)
g(Y, Z) =

a (f(Y, Z))
1
2
b (f(Y, Z)) b

(f(Y, Z))

R
Z
Z
0
b

(f(Y,z)) dz
. (3.22)
Even though one can rarely use Dosss method in its full analyticity, one can often devise a powerful
bespoke approximate discretisation scheme for the stochastic differential equation at hand based on
Dosss pathwise existence theorem by the aid of some simple approximative assumptions without the
need to go through the Doss formalism itself.
3.1.1 Pathwise approximation of the Brennan-Schwartz SDE
For q = 1, the mean-reverting CEV process (2.2) becomes
dX = ( X)dt + XdZ . (3.23)
Assuming
> 0 , > 0 , > 0 , and X(0) > 0 , (3.24)
12
we must have
X
t
0 for all t > 0 . (3.25)
Using equation (3.21), we obtain
f(Y, Z) = Y e
Z
. (3.26)
and by the aid of (3.22), we have
dY =

e
Z

+
1
2

dt . (3.27)
We cannot solve this equation directly. Also, a directly applied explicit Euler scheme would permit Y to
cross over to the negative half of the real axis and thus X = f(Y, Z) = Y e
Z
would leave the domain
of (3.23). Whats more, an explicit Euler scheme applied to equation (3.27) would mean that, within the
scheme, we interpret Z
t
as a piecewise constant function. Not surprisingly, it turns out below that we
can do better than that!
Recall that, for the given time discretisation, we explicitly construct the Wiener process values Z(t
i
)
and thus, for the purpose of numerical integration of equation (3.23), they are known along any one
given path. If we now approximate Z
t
as a piecewise linear function in between the known values at t
n
and t
n+1
, i.e.
Z
t

n
t +
n
for t [t
n
, t
n+1
] (3.28)
with

n
= Z(t
n
)
n
t
n
and
n
=
Z(t
n+1
) Z(t
n
)
t
n+1
t
n
, (3.29)
then we have the approximating ordinary differential equation
d

Y =

e
(
n
t+
n
)

+
1
2

dt . (3.30)
Using the abbreviations

n
:= +
1
2

n
, t
n
:= t
n+1
t
n
, and Z
n+1
:= Z(t
n+1
)
we can write the solution to equation (3.30) as

Y
n+1
=

Y
n
e

(
+
1
2

2
)
t
n
+ e
Z
n+1

1 e

n
t
n

, (3.31)
which gives us

X
n+1
=

X
n
e

n
t
n
+

1 e

n
t
n

. (3.32)
This scheme is unconditionally stable. We refer to it as Pathwise Adapted Linearisation in the following.
Apart from its stability, this scheme has the additional desirable property that, in the limits 0 and/or
0, i.e. in the limit of equation (3.23) resembling a standard geometric Brownian motion, it is free
of any approximation. Since in practice and/or tend to be not too large, the schemes proximity to
exactness translates into a remarkable acccuracy when used in applications.
It is interesting to note that a similar approach based on replacing the term dZ directly in the stochas-
tic differential equation
dX = ( X)dt + XdZ (3.33)
by a linear approximation dZ dt gives rise to a scheme that does not converge in the limit t 0 as
rst observed by Wong and Zakai [WZ65]. However, if we make the same replacement in the Milstein
scheme and drop terms of order O(dt
2
) and higher, which for (3.23) means
X ( X)t + XZ +
1
2

2
X

Z
2
t

(3.34)
X ( X)t + Xt +
1
2

2
X

2
t
2
t

(3.35)
dX
dt
( X)
1
2

2
X + X , (3.36)
13
and integrate, we arrive at exactly the same scheme (3.32) as if we had gone through the full Doss
formalism. The reason for this is that the lowest order scheme that includes explicitly all terms that
are individually in expectation of order dt is the Milstein scheme, not the Euler scheme, and the differ-
ence terms are crucial to preserve strong convergence when we introduce piecewise linearisation of the
discretised Wiener process.
3.1.2 Pathwise approximation of the Cox-Ingersoll-Ross / Heston SDE
The special case q =
1
/
2
of (2.2) represents the stochastic differential equation of the variance process
in the Heston model [Hes93], as well as the short rate process in the Cox-Ingersoll-Ross model [CIR85]
dV = ( V )dt +

V dZ . (3.37)
In this case, an explicit solution of the Doss formalism (3.21) is not obvious. However, by conditioning
on one specic path in Z we can bypass this difculty by directly approximating Z
t
as a piecewise
linear function in between the known values as given in equations (3.28) and (3.29). Using the resulting
dependency dZ =
n
dt in the Milstein scheme applied to (3.37)
dV ( V )dt +

V dZ +
1
4

dZ
2
dt

, (3.38)
i.e.
dV ( V )dt +

V
n
dt +
1
4

2
n
dt
2
dt

, (3.39)
and dropping terms of order dt
2
, we obtain the approximate ordinary differential equation
dV
dt
( V )
1
4

2
+
n

V (3.40)
which has the implicit solution
t t
n
= T(V
t
,
n
) T(V
t
n
,
n
) (3.41)
with
T(v, ) :=
2

2
+4
2

2
atanh

2
+4
2

ln

(v ) +
1
4

.
(3.42)
Equation (3.42) can be solved numerically comparatively readily since we know that, given
n
, over the
time step from t
n
to t
n+1
, V
t
will move monotonically, and that for all t
n
:= (t
n+1
t
n
) we have
V
t
n+1
>

|
n
|
2

n
2

2
+

2
4

2
(3.43)
which can be shown by setting the argument of the logarithm in the right hand side of equation (3.42)
to zero. Alternatively, an inverse series expansion can be derived. Up to order O(t
4
n
), we nd
V
n+1
= V
n
+

V
n
) +
n

V
n

t
n

1 +

n
2

V
n
4

V
n
t
n
+

(
V
n(
4

V
n
3
n)

)
24

V
n
3
t
2
n
(3.44)
+

(
3
n

2
+V
2
n
(
7
n
8

V
n)
+2
n

V
n(

n
+

V
n))
192

V
n
5
t
3
n

+O(t
5
n
)
14
with

:=

2
4
. (3.45)
The shape of the curves generated by (3.42) and its 4
th
order inverse expansion (3.44) is shown in gure 4
where values for directly represent the standard normal deviation equivalent of the drawn Gaussian
random number. In the following, we denoted the expansion (3.44) as Pathwise Adapted Linearisation
Quartic, and its second order truncation
V
n+1
= V
n
+

V
n
) +
n

V
n

t
n

1 +

n
2

V
n
4

V
n
t
n

+O(t
3
n
) (3.46)
as Pathwise Adapted Linearisation Quadratic. We only show results for expansions of even order for
reasons of numerical stability since all odd order expansion can reach zero which is undesirable. For
small values of as in gure 5 (A) both schemes are remarkable effective. Unfortunately, these schemes
are inappropriate for large values of due to numerical instabilities.
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0 0.2 0.4 0.6 0.8 1


(
t
)
t
= 3
4
th
order expansion for = 3
= 2
4
th
order expansion for = 2
= 1
4
th
order expansion for = 1
= 0
4
th
order expansion for = 0
= -1
4
th
order expansion for = -1
= -2
4
th
order expansion for = -2
= -3
4
th
order expansion for = -3
Figure 4: Approximation (3.42) and its quartic expansion (3.44) for the CIR/Heston volatility process for (0) =

V (0) =
20%, = V (0), = 20%, = 1 over a unit time step for different levels of the variate = Z(1) Z(0).
4 Approximation of stochastic volatility models
In this section, we discuss the numerical treatment of the full two-dimensional stochastic volatility
model. Irrespective of the volatility or variance process, the dynamics of the nancial underlying are
given by equation (2.1). As for the stochasticity of volatility/variance, both the transformed Ornstein-
Uhlenbeck process as well as the mean-reverting CEV process (2.2) can be cast in the form
dV
t
= a(V
t
)dt + b(V
t
)dZ
t
. (4.1)
For the mean-reverting CEV process, the functional forms for a and b are directly given. For the expo-
nentially and hyperbolically transformed Ornstein-Uhlenbeck process, they can be obtained from (2.35)
and (2.36), respectively.
In logarithmic coordinates, the process equation for the nancial underlying is given by
ln S
t
= ln S
t
0
+
t

t
0
(s)ds
1
2
t

t
0
V
2p
s
ds +
t

t
0
V
p
s
dW
s
. (4.2)
15
(A)
0.0001
0.001
0.01
1 10 100
Euler
Milstein
Milstein+
Balanced Implicit Method
Balanced Milstein Method
Moment matched log-Euler
Pathwise Adapted Linearisation Quadratic
Pathwise Adapted Linearisation Quartic
(B)
0.01
0.1
1 10 100
Euler
Milstein
Milstein+
Balanced Implicit Method
Balanced Milstein Method
Moment matched log-Euler
Pathwise Adapted Linearisation Quadratic
Figure 5: Strong convergence measured by expression (3.15) as a function of CPU time [in msec] averaged over 32767 paths
for the mean reverting CEV model (2.2) for q =
1
/
2
, = 1, V
0
= =
1
/
16
, T = 1, c
BIM
0
= 1, c
BIM
1
= /

x, d
BMM
0
= ,
d
BMM
1
= 0. The number generator was the Sobol method. (A): = 0.2,
2
2 = 0.085; zero is unattainable. (B):
= 0.8,
2
2 = 0.515; zero is attainable.
(A)
0.001
0.01
0.1
1 10 100
Euler
Milstein
Milstein+
Balanced Implicit Method
Balanced Milstein Method
Moment matched log-Euler
Pathwise Adapted Linearisation Quadratic
Pathwise Adapted Linearisation Quartic
(B)
0.001
0.01
0.1
1 10 100
Euler
Milstein
Milstein+
Balanced Implicit Method
Balanced Milstein Method
Moment matched log-Euler
Pathwise Adapted Linearisation Quadratic
Pathwise Adapted Linearisation Quartic
Figure 6: Strong convergence measured by expression (3.15) as a function of CPU time [in msec] averaged over 32767 paths
for the mean reverting CEV model (2.2) for q =
1
/
2
, = 1, V
0
= =
1
/
16
, = 0.5,
2
2 = 0.125, zero is attainable,
T = 1, c
BIM
0
= 1, c
BIM
1
= /

x, d
BMM
0
= , d
BMM
1
= 0. The number generator method was (A) Sobols and (B) the Mersenne
Twister.
The easiest approach for the numerical integration of (4.2) is the Euler-Maruyama scheme
ln S
t
n+1
= ln S
t
n
+ t
n

1
2
V
2p
t
n
t
n
+ V
p
t
n
W
n
. (4.3)
This scheme has strong convergence order
1
/
2
, is very easy to implement, and will be our benchmark for
all other methods discussed in the following.
An alternative is of course the two-dimensional Milstein scheme (see Appendix A.3) which has
strong convergence order 1. It requires the simulation of the double Wiener integral

I
(2,1)
(t
0
, t) =
t

t
0
s

t
0
d

W
2
(u)d

W
1
(s) (4.4)
16
(A)
0.0001
0.001
0.01
1 10 100
Euler
Milstein
Milstein+
Balanced Implicit Method
Balanced Milstein Method
Moment matched log-Euler
(B)
0.0001
0.001
0.01
1 10 100
Euler
Milstein
Milstein+
Balanced Implicit Method
Balanced Milstein Method
Moment matched log-Euler
Pathwise Adapted Linearisation
Figure 7: Strong convergence measured by expression (3.15) as a function of CPU time [in msec] averaged over 32767 paths
for the mean reverting CEV model (2.2) for = 1, V
0
= = 0.0625 =
1
/
16
, T = 1. The number generator was the Sobol
method. (A): q =
3
/
4
, c
BIM
0
= 1, c
BIM
1
= |x|

1
/
4
, d
BMM
0
=

/
2
+
3
/
8

2
/

x
, d
BMM
1
= 0. (B): q = 1, c
BIM
0
= 1,c
BIM
1
= ,
d
BMM
0
=
1
/
2

+
2

, d
BMM
1
= 0.
for two uncorrelated standard Wiener processes

W
1
and

W
2
. The standard approximation for this cross
term requires several additional random numbers which we consider undesirable for the same reasons
we gave to exclude the adaptive Milstein scheme (3.6). There are, however, approaches [Abe04, GL94]
to avoid the drawing of many extra random numbers by using the relation of this integral to the Levy-
area [L ev51]
A
(1,2)
(t
0
, t) =
t

t
0
s

t
0

d

W
1
(u)d

W
2
(s) d

W
2
(u)d

W
1
(s)

. (4.5)
The idea is to employ
t

t
0
s

t
0

d

W
1
(u)d

W
2
(s) + d

W
2
(u)d

W
1
(s)

=

W
(t
0
,t)
1


W
(t
0
,t)
2
(4.6)
to obtain

I
(2,1)
(t
0
, t) =
1
2


W
(t
0
,t)
1


W
(t
0
,t)
2
A
(1,2)
(t
0
, t)

. (4.7)
The joint density of the Levy-area is known semi-analytically
(a, b, c) =
1
2
2

0
x
sinh(x)
e
(b
2
+c
2
)x
2 tanh(x)
cos(ax)dx (4.8)
with a = A
(1,2)
(0, 1), b =

W
(0,1)
1
and c =

W
(0,1)
2
. Hence, the simulation of the double integral (4.4)
is reduced to the drawing of one additional random number (conditional on

W
1
and

W
2
) from
this distribution. Gaines and Lyons [GL94] used a modication of Marsaglias rectangle-wedge-tail
method (see [MAP76, MMB64]) to draw from (4.8) which works well for small stepsizes t
n
. We are,
however, interested in methods that also work well for moderately large step sizes, and are simple in
their evaluation analytics in order to be sufciently fast to be useful for industrial purposes.
In essence, all of the above means that we would like to construct a fast numerical integration scheme
without the need for auxiliary random numbers. The formal solution (4.2) requires that we handle two
17
stochastic integral terms. First, we need to approximate the stochastic part of the drift
t

t
0
V
2p
s
ds , (4.9)
and secondly, we have to simulate the diffusion term
t

t
0
V
p
s
dW
s
. (4.10)
For both parts we make intensive use of the It o-Taylor expansion of the process followed by the m-th
power of V
s
,
V
m
s
= V
m
t
0
+
s

t
0
mV
m1
u
b(V
u
)dZ
u
+
s

t
0

mV
m1
u
a(V
u
) +
1
2
m(m1)V
m2
u
b
2
(V
u
)

du , (4.11)
with positive exponent m, for any s [t
0
, t]. The term that dominates the overall schemes convergence
is the Wiener integral over dZ
u
.
4.1 Interpolation of the drift term (4.9)
A simple way to improve the approximation of the drift integral somewhat is
t
n+1

t
n
V
2p
s
ds
1
2

V
2p
t
n
+ V
2p
t
n+1

t
n
, t
n
= (t
n+1
t
n
) (4.12)
which gives us
ln S
t
n+1
= ln S
t
n
+ t
n

1
4

V
2p
t
n
+ V
2p
t
n+1

t
n
+ V
p
t
n
W
n
. (4.13)
This Drift interpolation scheme comprises practically no additional numerical effort due to the fact that
we already know the whole path of the volatility V
t
i
. Unfortunately, a pure drift interpolation has only a
minor impact on the strong approximation quality. Moreover, having a closer look at gure 8, it seems
that the Drift interpolation method is inferior to the standard log-Euler scheme (4.3). Nevertheless, this
approximation has some side effects of benet for applications that are not fully strongly path dependent
whence we discuss it in more detail.
In order to analyse the Drift interpolation scheme (4.13), we start with the It o-Taylor expansion of
the integral of the 2p-th power of stochastic volatility by setting m = 2p in equation (4.11) to obtain
t
n+1

t
n
V
2p
s
ds V
2p
t
n
t
n+1

t
n
ds
. .. .
Euler
+

2pV
2p1
t
n
b
n

t
n+1

t
n
s

t
n
dZ
u
ds
. .. .
First remainder term: R
1
+

2pV
2p1
t
n
a
n
+ p(2p 1)V
2p2
t
n
b
2
n

t
n+1

t
n
s

t
n
duds
. .. .
Second remainder term: R
2
(4.14)
18
with t
n
:= (t
n+1
t
n
), a
n
:= a(V
t
n
), and b
n
:= b(V
t
n
). In comparison, the It o-Taylor expansion of
the drift-interpolation scheme (4.12) leads to
1
2
t
n

V
2p
t
n
+ V
2p
t
n+1


1
2
t
n

V
2p
t
n
+ V
2p
t
n
+

2pV
2p1
t
n
b
n

Z
n
+

2pV
2p1
t
n
a
n
+ p(2p 1)V
2p2
t
n
b
2
n

t
n

. (4.15)
This means that the leading order terms of the local approximation error incurred by the drift interpola-
tion scheme are
f
t
n
:=
t
n+1

t
n
V
2p
s
ds
1
2

V
2p
t
n
+ V
2p
t
n+1

t
n
(4.16)
= 2pV
2p1
t
n
b
n
t
n+1

t
n

s
t
n
dZ
u

1
2

t
n+1
t
n
dZ
u

ds
= 2pV
2p1
t
n
b
n
t
n+1

t
n

1
2

t
n+1
t
n
du

s
t
n
du

dZ
s
. (4.17)
Thus, by interpolating the drift, the term on the second line of (4.14) involving the double integral
I
(0,0)
(t
n
, t
n+1
) =
t
n+1

t
n
s

t
n
duds (4.18)
is catered for. In expectation, we have the unconditional local mean-approximation error
E[f
t
n
|F
0
] = O

t
3
n

. (4.19)
In order to analyse the relation between local and global convergence properties, we assume that the
integration interval [0, T] is discretised in N steps, 0 < t
1
< . . . < t
N1
< t
N
= T with stepsize t =
T
N
. Let X
t
i
,x
(t
i+1
) be the numerical approximation at t
i+1
starting at time t
i
at point x and let Y
t
i
,x
(t
i+1
)
be the analytical solution of the stochastic differential equation starting at (t
i
, x). Furthermore, we
already know the local mean-approximation errors for i = 0, . . . , N 1,
E

|X
t
i
,Y
i
(t
i+1
) Y
t
i
,Y
i
(t
i+1
)|

F
t
i

= O

t
3
n

. (4.20)
Next we consider the global mean-approximation error
|E[X
0,X
0
(T)] E[Y
0,X
0
(T)]| = |E[X
0,X
0
(T) Y
0,X
0
(T)]| (4.21)
=

X
0,X
0
(t
N1
) Y
0,X
0
(t
N1
) +O

t
3

X
0,X
0
(t
1
) Y
0,X
0
(t
1
) + (N 1) O

t
3

= N O

t
3

= O(t
2
) . (4.22)
This means, the use of the drift interpolation term
1
2

V
2p
t
n
+ V
2p
t
n+1

t
n
instead of the straightforward
Euler scheme term V
2p
t
n
t
n
improves the global mean-approximation order of convergence. Alas, it is
not possible to improve the global weak
4
order of convergence in the two-dimensional case without gen-
erating additional random numbers. Nevertheless, the interpolation of the drift leads to a higher global
4
The global weak order of convergence is dened by |E[g(X
0,X
0
(T))] E[g(Y
0,X
0
(T))]| with g being a sufciently
smooth test-function. One can nd the multidimensional second order weak Taylor approximation scheme in section 14.2
of [KP99].
19
mean-convergence order (4.22) which may be of benet when the simulation target is the valuation of
plain-vanilla or weakly path dependent options, and this issue will be the subject of future research.
Having analysed the approximation quality of the term governed by I
(0,0)
in (4.13), we now turn our
attention to the local estimation error induced by the handling of the double Wiener integral
I
(2,0)
(t
n
, t
n+1
) =
t
n+1

t
n
s

t
n
dZ
u
ds (4.23)
which can be simulated by the aid of our knowledge of the distribution I
(2,0)
(t
n
, t
n+1
):
I
(2,0)
(t
n
, t
n+1
)
1
2
Z
n
t
n
+
1
2

3
t
n
, with N(0, t
n
) . (4.24)
Sampling I
(2,0)
exactly would thus require the generation of an additional random number for each
step. In analogy to the reasoning leading up to the approximation (A.14) which is at the basis of the
Milstein+ scheme in appendix A.2, we argue that
I
(2,0)
(t
n
, t
n+1
) ;
1
2
Z
n
t
n
(4.25)
is, conditional on our knowledge of the simulated Wiener path that drives the volatility process, or, more
formally, conditional on the -algebra P
2
N
generated by the increments
Z
0
= Z
1
Z
0
, Z
1
= Z
2
Z
1
, . . . , Z
N1
= Z
N
Z
N1
, (4.26)
the best approximation attainable without resorting to additional sources of (pseudo-)randomness. Ap-
plying the approximation (4.25) to the term R
1
in (4.14) leads us to precisely the corresponding term
in the expansion (4.15) (last term on the rst line) of the drift interpolation scheme, and hence the
scheme (4.13) also aids with respect to the inuences of the term I
(2,0)
(t
n
, t
n+1
).
The conditional expectation of the local approximation error (4.16) of the scheme (4.13) conditional
on knowing the full path for Z is thus of order
E

f
t
n
|P
2
N

= O

Z
2
n
t
n

+O

t
3
n

. (4.27)
The quality of this path-conditional local approximation error is not visible in error measures designed
to show the strong convergence behaviour of the integration scheme. However, it is likely to be of
benet for the calculation of expectations that do not depend strongly on the ne structure of simulated
paths, but on the approximation quality of the distribution of the underlying variable at the terminal time
horizon of the simulation.
Another aspect of the drift interpolation scheme (4.13) is that it reduces the local mean-square error
E

f
2
t
n
|F
t
n

2pV
2p1
t
n
b
n

2
E

t
n+1

t
n
(
t
n
/
2
(s t
n
)) dZ
s

(4.28)
=

2pV
2p1
t
n
b
n

2
t
n+1

t
n
(
t
n
/
2
(s t
n
))
2
ds (4.29)
=

2pV
2p1
t
n
b
n

2 1
12
t
3
n
(4.30)
compared with the mean-square error of the rst remainder term R
1
in (4.14) of the Euler scheme
E

(R
1
)
2
|F
t
n

2pV
2p1
t
n
b
n

2
E

t
n+1

t
n
s dZ
s

(4.31)
=

2pV
2p1
t
n
b
n

2 1
3
t
3
n
. (4.32)
20
In summary, the interpolation of the drift given by the scheme (4.13) effectively improves the nu-
merical integration by fully representing terms governed by I
(0,0)
in the It o-Taylor expansion of the
formal solution, and by improving the approximation for the term governed by I
(2,0)
. We could not
really expect to enhance the global strong convergence order induced by the drift term (4.9) without
the drawing of additional random numbers. Still, with nearly no extra computational effort one can
improve, at least theoretically, over the conventional Euler scheme. Specically, we are not completely
erasing the leading error term of the Euler scheme which is of order O(Z t). However, by using
approximation (4.15), conditional on any one given path in Z, we are able to remove the leading order
bias term which is of order O(Zt). Effectively, the drift interpolation scheme (4.13) simply reduces
the absolute value of the coefcient of the lowest strong convergence order error term.
4.2 Mixed interpolation of the diffusion term (4.10)
A suitable approximation of the diffusion is a little bit more difcult than the integration of the drift.
The rst idea might be to use
t
n+1

t
n
V
p
s
dW
s

1
2

V
p
t
n
+ V
p
t
n+1

W
n
, (4.33)
resulting in
ln S
t
n+1
= ln S
t
n
+ t
n

1
2
V
2p
t
n
t
n
+
1
2

V
p
t
n
+ V
p
t
n+1

W
n
(4.34)
which was a simple interpolation for the drift approximation. Furthermore combining the drift and
diffusion interpolation leads to
ln S
t
n+1
= ln S
t
n
+ t
n

1
4

V
2p
t
n
+ V
2p
t
n+1

t
n
+
1
2

V
p
t
n
+ V
p
t
n+1

W
n
. (4.35)
We will denote these schemes as Diffusion interpolation (4.34) and Drift +Diffusion interpolation (4.35).
Considering gure 8 we recognize that these integration schemes are remarkable effective in case of no
correlation between the underlying and the stochastic volatility. In contrast, convergence is lost al-
(A)
1
10
0.01 0.1 1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
(B)
1
10
0.01 0.1 1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
Figure 8: Strong convergence of the nancial underlying measured by expression (3.15) averaged over 32767 paths as a
function of scheme step size for T = 1, = 0.05, S
0
= 100, = 0. The volatility dynamics were given by the (A)
exponentially (2.24) and (B) the hyperbolically (2.25) transformed Ornstein-Uhlenbeck process (2.4) with y
0
= 0,
0
=
1
/
4
,
= 1, and =
7
/
20
.
together for the diffusion interpolation scheme (4.34) when correlation is non-zero as we can see in
gures 9 and 10.
21
(A)
1
10
0.01 0.1 1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
(B)
1
10
0.01 0.1 1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
Figure 9: Strong convergence of the nancial underlying measured by expression (3.15) averaged over 32767 paths as a
function of scheme step size for T = 1, = 0.05, S
0
= 100, =
2
/
5
. The volatility dynamics were given by the (A)
exponentially (2.24) and (B) the hyperbolically (2.25) transformed Ornstein-Uhlenbeck process (2.4) with y
0
= 0,
0
=
1
/
4
,
= 1, and =
7
/
20
.
(A)
1
10
0.01 0.1 1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
(B)
1
10
0.01 0.1 1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
Figure 10: Strong convergence of the nancial underlying measured by expression (3.15) averaged over 32767 paths as a
function of scheme step size for T = 1, = 0.05, S
0
= 100, =
4
/
5
. The volatility dynamics were given by the (A)
exponentially (2.24) and (B) the hyperbolically (2.25) transformed Ornstein-Uhlenbeck process (2.4) with y
0
= 0,
0
=
1
/
4
,
= 1, and =
7
/
20
.
In order to understand the loss of convergence we take a closer look at the diffusion interpolation.
The rst step is to decompose the correlated Wiener processes into independent components by the aid
of the Cholesky decomposition
dW = d

Z +

d

W , (4.36)
dZ = d

Z , (4.37)
where

W and

Z are uncorrelated, and

:=

1
2
. (4.38)
22
This gives us
t
n+1

t
n
V
p
s
dW
s
=

t
n+1

t
n
V
p
s
d

W
s
+
t
n+1

t
n
V
p
s
d

Z
s
. (4.39)
The reason for the loss of convergence is that the volatility process V
s
is driven itself by the Wiener
process

Z
s
. Thus by using the trapezoidal rule we are not interpreting the stochastic integral in the It o
but in the Stratonovich sense. As a consequence, we are overestimating the inuence of the Wiener
process

Z
s
. We can circumvent this problem by applying the trapezoidal rule only on the uncorrelated
part of the diffusion

t
n+1

t
n
V
p
s
d

W
s
+
t
n+1

t
n
V
p
s
d

Z
s

1
2

V
p
t
n
+ V
p
t
n+1


W
n
+ V
p
t
n


Z
n
, (4.40)
which gives us in combination with (4.35)
ln S
t
n+1
= ln S
t
n
+ t
n

1
4

V
2p
t
n
+ V
2p
t
n+1

t
n
+
1
2

V
p
t
n
+ V
p
t
n+1

W
n
+
1
2

V
p
t
n
V
p
t
n+1

Z
n
(4.41)
which we shall refer to as Drift + Diffusion interpolation + decorrelation scheme. This scheme not
only restores convergence but also improves the approximation quality when correlation is non-zero. To
verify these statements we analyse the local approximation error again
f
t
n
:=

t
n+1

t
n
V
p
s
d

W
s
+
t
n+1

t
n
V
p
s
d

Z
s

1
2

V
p
t
n
+ V
p
t
n+1


W
n
+ V
p
t
n


Z
n

(4.42)
=

pV
p1
t
n
b
n
t
n+1

t
n
s

t
n
d

Z
u
d

W
s
+

pV
p1
t
n
a
n
+
1
2
p(p 1)V
p2
t
n
b
2
n

t
n+1

t
n
s

t
n
dud

W
s

pV
p1
t
n
b
n
t
n+1

t
n
s

t
n
d

Z
u
dZ
s
+

pV
p1
t
n
a
n
+
1
2
p(p 1)V
p2
t
n
b
2
n

t
n+1

t
n
s

t
n
dud

Z
s

1
2

pV
p1
t
n
b
n


Z
n
+

pV
p1
t
n
a
n
+
1
2
p(p 1)V
p2
t
n
b
2
n

t
n



W
n
=

pV
p1
t
n
b
n
t
n+1

t
n

(

Z
s


Z
t
n
)
1
2


Z
t
n

d

W
s
. .. .
f
t
n
,1
+

pV
p1
t
n
a
n
+
1
2
p(p 1)V
p2
t
n
b
2
n

t
n+1

t
n

(s t
n
)
1
2
t
n

d

W
s
. .. .
f
t
n
,2

pV
p1
t
n
b
n
t
n+1

t
n
s

t
n
d

Z
u
d

Z
s
. .. .
f
t
n
,3
+

pV
p1
t
n
a
n
+
1
2
p(p 1)V
p2
t
n
b
2
n

t
n+1

t
n
s

t
n
dud

Z
s

. (4.43)
23
In analogy to the interpolation of the drift, the trapezoidal integration rule applied to the uncorrelated
part of the It o integral in (4.40) leads to a reduced variance for the local truncation errors f
t
n
,1
and f
t
n
,2
.
Taking the conditional expectation based on the knowledge of our Wiener paths W and Z we obtain
E

f
t
n
,1
|

P
1
N
,

P
2
N

= 0, and E

f
t
n
,2
|

P
1
N

= 0 , (4.44)
where the -algebras

P
1
N
and

P
2
N
are generated by the increments of the Wiener processes

W and

Z.
Once again the interpolation is the best estimate based on the knowledge of the paths of

W and

Z.
Especially in the case of low correlation this scheme is remarkable effective. Taking a closer look at the
local approximation error (4.43) of the correlated part we recognize that the leading error term is
f
t
n
,3
= pV
p1
t
n
b
n
t
n+1

t
n
s

t
n
d

Z
u
d

Z
s
= pV
p1
t
n
b
n
1
2


Z
2
n
t
n

. (4.45)
To make matters even better, we can improve the integration by including this term to our integration
scheme. Luckily the double It o integral I
(2,2)
does not require additional random numbers. The impor-
tance of the inclusion of this term grows with increasing correlation coefcient , unlike the benet form
the (de-correlated) diffusion interpolation (4.41) which diminishes with increasing correlation. For the
sake of brevity, we will call this integration scheme based on interpolation of the drift, interpolation of
the diffusion term, consideration of decorrelation of the diffusion term, and inclusion of a higher order
Milstein term, simply the IJK scheme in the following. Its explicit propagation equation is given by
ln S
t
n+1
= ln S
t
n
+ t
n

1
4

V
2p
t
n
+ V
2p
t
n+1

t
n
+ V
p
t
n


Z
n
+
1
2

V
p
t
n
+ V
p
t
n+1


W
n
+
1
2
pV
p1
t
n
b
n


Z
2
n
t
n

(4.46)
or, equivalently,
ln S
t
n+1
= ln S
t
n
+ t
n

1
4

V
2p
t
n
+ V
2p
t
n+1

t
n
+ V
p
t
n
Z
n
+
1
2

V
p
t
n
+ V
p
t
n+1

(W
n
Z
n
) +
1
2
pV
p1
t
n
b
n

Z
2
n
t
n

. (4.47)
Since the Drift interpolation (4.13) scheme was not able to increase the strong approximation quality of
the standard log-Euler we also tried the IJK method without using a drift interpolation which we denote
as IJK no Drift interpolation.
In gures 8, 9, and 10 we show all considered approximation procedures in comparison and we see
that a combination of drift interpolation, diffusion interpolation allowing for (de-)correlation as given
in (4.40), and the addition of the higher order term (4.45) outperforms any of the other approximation
schemes. The advantage of the IJK scheme is that we get good approximation results for low and high
correlations due to the fact that we cover both the dominant error terms for low correlation (4.40) and for
high correlation (4.45), and that comparatively little extra computational effort is required. In addition,
one can observe that in the case of high correlation, as given in gures 10 and 13, the drift-interpolation
is a small but valuable enhancement for the IJK scheme particularly for large stepsizes.
gure Exp. Hyp.
11 0.0 2.0 1.9
12 0.4 2.2 2.1
13 0.8 4.4 4.2
Table 1: Average speed-up IJK (4.47)
compared with log-Euler (4.3).
Until this point we have only compared integration schemes by
looking at the approximation quality as a function of stepsize. In -
nancial applications, however, a scheme is considered better if it is
more accurate and faster. It is thus of paramount interest to compare
the residual error as a function of calculation time. In gures 11, 12,
and 13 we can see that the combination of all speed-ups (drift and dif-
fusion interpolation, decorrelation and additional term) does not affect
the computational effort signicantly. We also notice that, when simulation of the stochastic volatility
process itself is trivial as is the case for the exponential and the hyperbolic volatility processes discussed
in section 2.2, the use of the IJK scheme provides on average a speed-up of approximately a factor 2
24
for low correlation and as much as a factor 4 for pronounced negative correlation (see table 1). In this
context, it is noteworthy to recall that most market calibrations require a strong negative correlation to
reproduce the observable implied volatility skews, which makes the use of the IJK scheme particularly
attractive.
(A)
1
10
0.001 0.01 0.1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
(B)
1
10
0.001 0.01 0.1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
Figure 11: Strong convergence of the nancial underlying measured by expression (3.15) averaged over 32767 paths as a
function of CPU time [in msec] for T = 1, = 0.05, S
0
= 100, = 0. The volatility dynamics were given by the (A)
exponentially (2.24) and (B) the hyperbolically (2.25) transformed Ornstein-Uhlenbeck process (2.4) with y
0
= 0,
0
=
1
/
4
,
= 1, and =
7
/
20
.
(A)
1
10
0.001 0.01 0.1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
(B)
1
10
0.001 0.01 0.1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
Figure 12: Strong convergence of the nancial underlying measured by expression (3.15) averaged over 32767 paths as a
function of CPU time [in msec] for T = 1, = 0.05, S
0
= 100, =
2
/
5
. The volatility dynamics were given by the (A)
exponentially (2.24) and (B) the hyperbolically (2.25) transformed Ornstein-Uhlenbeck process (2.4) with y
0
= 0,
0
=
1
/
4
,
= 1, and =
7
/
20
.
In section 5, we present the results of further numerical tests for the case when the volatility process
itself requires a numerical integration scheme. In particular, we consider the situation where the volatil-
ity is given by the mean-reverting CEV process (2.2). For this case, we wish to nd the best combination
of integration schemes for the stochastic volatility as well as for the nancial underlying.
25
(A)
1
10
0.001 0.01 0.1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
(B)
1
10
0.001 0.01 0.1
log-Euler
Drift interpolation
Diffusion interpolation
Drift + Diffusion interpolation
Drift + Diffusion interpolation + decorrelation
IJK
IJK no Drift interpolation
Figure 13: Strong convergence of the nancial underlying measured by expression (3.15) averaged over 32767 paths as a
function of CPU time [in msec] for T = 1, = 0.05, S
0
= 100, =
4
/
5
. The volatility dynamics were given by the (A)
exponentially (2.24) and (B) the hyperbolically (2.25) transformed Ornstein-Uhlenbeck process (2.4) with y
0
= 0,
0
=
1
/
4
,
= 1, and =
7
/
20
.
5 Numerical results for mean-reverting CEV volatility processes
In this section we go one step further as we consider a two-dimensional stochastic volatility model
where the stochastic volatility process is given by the mean-reverting CEV process (2.2). We already
recognized in section 3 that the numerical results for the integration of a mean-reverting CEV process
are sensitive to the size of the diffusion exponent q [
1
/
2
, 1]. Hence we focus on the two extreme
choices the Brennan-Schwartz (3.23) and the Cox-Ingersoll-Ross (3.37) equation. In the following we
consider four schemes for the integration of stochastic volatility or variance:
1. Euler (3.1) ,
2. Milstein (A.4) ,
3. BMM (3.7) ,
4. Pathwise Adapted Linearisation (3.32) for Brennan-Schwartz and (3.44) for CIR .
We combine these with suitable integration schemes for the whole system. Specically, we consider
1. Euler-Maruyama (4.3) ,
2. IJK (4.47) .
The Euler scheme was already the benchmark in section 4 where we developed the IJK scheme. It is of
interest to see if the IJK scheme can preserve its advantage even if we have to integrate the stochastic
volatility process numerically.
In the following we concentrate on two different test cases. The rst one is based on the Brennan-
Schwartz equation (3.23) for the modelling of the stochastic volatility. This equation is directly coupled
to the underlying with exponent p =
1
/
2
dS
t
= S
t
dt +

V
t
S
t
dW
t
,
dV
t
= ( V
t
) dt + V
t
dZ
t
.
(5.1)
26
The parameter conguration is chosen as follows S
t
0
= 100, = 0.05, V
0
= =
1
/
16
, = 1, = 0.5
where we present results for different levels of correlation {0.0, 0.4, 0.8}.
As a second benchmark we consider the Heston model
dS
t
= S
t
dt +

V
t
S
t
dW
t
,
dV
t
= ( V
t
) dt +

V
t
dZ
t
,
(5.2)
where the parameters are given by S
t
0
= 100, = 0.05, V
0
= =
1
/
16
, = 1, = 0.5. Again we show
results for decreasing correlation {0.0, 0.4, 0.8}.
We see in gures 1416 that the decisive point for the strong approximation quality is the choice of
the integration scheme IJK as the integration of the stochastic volatility has just a minor impact on the
numerical results. In accordance with the numerical results of the last section, we can observe that the
IJK scheme is at its most impressive when dealing with high correlation as in gure 16. In fact for high
negative correlation, the approximation efciency of the IJK scheme in comparison to conventional Eu-
ler & log-Euler methods appears to be even greater in the case when the volatility process itself requires
numerical integration considering that the speed gain appears to be approximately a factor 5 in gure 16.
gure (5.1) (5.2)
14 0.0 2.1 2.6
15 -0.4 2.5 2.9
16 -0.8 4.6 4.5
Table 2: Average speed-up BMM &
IJK (4.47) compared with Euler &
log-Euler (4.3).
Nonetheless, even if we can neglect the inuence of the numeri-
cal integration of the stochastic volatility process on the strong con-
vergence behaviour of the underlying, the details of the integra-
tion of the stochastic volatility process become important when pric-
ing derivatives that are sensitive to the dynamics of the volatility.
In that case, the results of section 3 can give guidance in the se-
lection of the integration scheme for the mean-reverting CEV pro-
cess. In any case, one should be aware of the fact that an un-
stable integration of the stochastic volatility can crash the integration of the whole system in the
sense that the occurrence of spurious paths where variance crosses over to the negative domain can
spoil the convergence behaviour irrecoverably as we saw in gure 5 (B) for the Milstein+ scheme.
On that note, we have a closer look at gure 16 (B) where we see that the convergence behaviour of the
Milstein-IJK scheme seems somewhat unexpected as the approximation error does not decrease when
halving the stepsize from t = 1 to t =
1
/
2
even though this integration scheme is competitive to
the BMM-IJK scheme for small stepsizes. The explanation for this is surprisingly simple. In table 3
we compare the percentage of non-positive paths for the integration of the stochastic volatility where
we do not count those paths becoming non-positive in the nal integration step
5
. With this counting
convention, no non-positive paths occur for t = 1 as we only have to take a single step. In comparison,
for t =
1
/
2
we obtain the highest number of non-positive paths for the Milstein scheme which explains
the bump in the convergence plot of Milstein-IJK in gure 16 (B). Thus, even in this very simple case of
estimating the strong convergence error of the nancial underlying, an appropriate integration scheme
for the stochastic volatility process is key to guaranteeing a stable approximation.
6 Conclusion
t Euler Milstein BMM
2
0
0 % 0 % 0 %
2
1
23.9 % 33.2 % 0 %
2
2
37.5 % 15.7 % 0 %
2
3
43.8 % 0.1 % 0 %
2
4
46.7 % 0 % 0 %
Table 3: Number of non-positive stochastic
volatility paths in gure 16 (B).
In this article, we discussed various Monte-Carlo approximation
schemes for stochastic volatility diffusion models. Our main fo-
cus was on the strong convergence behaviour as an indicator for
the valuation of path dependent derivatives. In order to main-
tain the ability to apply exogenous variance reduction techniques
such as low discrepancy numbers, importance sampling, and oth-
ers [J ac02, Gla03], with ease, we restricted our research to meth-
ods that effectively require only two simulated uniform variates
5
It is of minor importance if one path becomes negative or zero in the last integration step as we do not have to use the
nal value as a starting point for the next integration step.
27
(A)
1
0.001 0.01 0.1
Euler & log-Euler
Milstein & log-Euler
Balanced Milstein Method & log-Euler
Pathwise Adapted Linearisation & log-Euler
Euler & IJK
Milstein & IJK
Balanced Milstein Method & IJK
Pathwise Adapted Linearisation & IJK
(B)
1
10
0.001 0.01 0.1
Euler & log-Euler
Milstein & log-Euler
Balanced Milstein Method & log-Euler
Pathwise Adapted Linearisation & log-Euler
Euler & IJK
Milstein & IJK
Balanced Milstein Method & IJK
Pathwise Adapted Linearisation & IJK
Figure 14: Strong convergence measured by expression (3.15) as a function of CPU time [in msec] averaged over 32767
paths for (A) model (5.1) and (B) for model (5.2). The number generator method was Sobols. Correlation: = 0.
(A)
1
0.001 0.01 0.1
Euler & log-Euler
Milstein & log-Euler
Balanced Milstein Method & log-Euler
Pathwise Adapted Linearisation & log-Euler
Euler & IJK
Milstein & IJK
Balanced Milstein Method & IJK
Pathwise Adapted Linearisation & IJK
(B)
1
10
0.001 0.01 0.1
Euler & log-Euler
Milstein & log-Euler
Balanced Milstein Method & log-Euler
Pathwise Adapted Linearisation & log-Euler
Euler & IJK
Milstein & IJK
Balanced Milstein Method & IJK
Pathwise Adapted Linearisation & IJK
Figure 15: Strong convergence measured by expression (3.15) as a function of CPU time [in msec] averaged over 32767
paths for (A) model (5.1) and (B) for model (5.2). The number generator method was Sobols. Correlation: = 0.4
per step in the time discretisation of the volatility-underlying evo-
lution pair. Given this self-imposed constraint, we attempted to
exploit all information available from the simulated primary standard Wiener diffusion process pair, and
to adapt the integration scheme of the volatility and nancial underlying process as much as possible to
each simulated primary process path pair. Whilst we had to realise that within our scope we are limited
to improving simulation results not by increasing the convergence order, but mainly by decreasing the
magnitude of the leading order error term, we found that signicant speed gains can be accomplished at
surprisingly little expense in numerical effort. In fact, for the stochastic volatility models we examined,
the observed acceleration in comparison to the standard Euler & log-Euler varies from a factor two for
zero correlation to as much as a factor ve when correlation is signicantly negative as usually required
for calibration to market-observable implied volatility proles.
As part of our investigations, we also introduced a new variation of stochastic volatility models,
namely the hyperbolically transformed Ornstein-Uhlenbeck process model given by equation (2.25).
This model inherits the benets of Scotts model [Sco87] such as mean reversion and the fact that
zero is not attainable, but, like Scotts model, does not provide us with (semi-)closed form analytical
28
(A)
1
0.001 0.01 0.1
Euler & log-Euler
Milstein & log-Euler
Balanced Milstein Method & log-Euler
Pathwise Adapted Linearisation & log-Euler
Euler & IJK
Milstein & IJK
Balanced Milstein Method & IJK
Pathwise Adapted Linearisation & IJK
(B)
1
10
0.001 0.01 0.1
Euler & log-Euler
Milstein & log-Euler
Balanced Milstein Method & log-Euler
Pathwise Adapted Linearisation & log-Euler
Euler & IJK
Milstein & IJK
Balanced Milstein Method & IJK
Pathwise Adapted Linearisation & IJK
Figure 16: Strong convergence measured by expression (3.15) as a function of CPU time [in msec] averaged over 32767
paths for (A) model (5.1) and (B) for model (5.2). The number generator method was Sobols. Correlation: = 0.8
solutions for the density or characteristic function of the distribution of the underlying, or plain-vanilla
option prices. It does, however, avoid the issues raised in [AP04] regarding the explosion of moments
etc. due to the fact that the tails of its distribution, both towards low and towards high volatility levels,
are signicantly thinner than those of the Scott model. We examined the behaviour of this model as
part of our research because we believe that advanced Monte Carlo integration schemes, in combination
with modern variance reduction methods, in conjunction with ever increasing computer power, will
make the use of stochastic volatility models that are not readily amenable for convenient plain-vanilla
option pricing formul an industrially viable possibility, even at the point where model parameters or
parameter term structures are calibrated to market observable plain vanilla option prices. This latter
conjecture will be the subject of future research.
As for currently favoured stochastic volatility models such as Hestons [Hes93], whose instantaneous
variance is driven by the Cox-Ingersoll-Ross process, we found in our numerical experiments that the
use of pathwise adapted linearisations of the driving Wiener process introduced in section 3.1, with
approximate analytical solutions along the path, can provide numerical integration improvements for
small to moderate values of . However, from our set of investigated methods, the clear overall favourite
for the integration of the Cox-Ingersoll-Ross process is the Balanced Milstein method since it performs
as efciently as the CIR-specic pathwise adapted linearisation based expansions, but remains stable
for all parameter values. The Balanced Milstein method is thus our method of choice for the numerical
integration of the CIR process, both inside stochastic volatility model applications and otherwise.
Finally, it remains to be stated that the main focus of our research, namely the investigation of
integration schemes for stochastic volatility models that combine an essentially geometric Brownian
motion process with a secondary source of noise inuencing the volatility term of the former, resulted
in the method we named IJK scheme given by equation (4.47). The purpose of this method is not to
show the highest possible convergence order as a function of the step size. Instead, we endeavoured to
nd a simple and robust method that, without the need for adaptive renement, or additional random
numbers per step, is as efcient as possible. In other words, we were looking for the most efcient
method that is robust and yet essentially as simple as the standard Euler-Maruyama algorithm. The IJK
scheme is, as a consequence, not superior in convergence order, but excels over other equally simple
methods by showing a superior convergence behaviour that is faster by a multiplicative factor. The IJK
method is as easy to implement as the standard (log-)Euler scheme, and can thus be used as a so-called
drop-in replacement for the (log-)Euler scheme since it requires no additional random numbers or other
convergence acceleration aids.
29
A Milstein schemes
A.1 The one-dimensional Milstein method
The Milstein scheme [Mil74] for the stochastic differential equation
dx = a(x)dt + b(x)dW (A.1)
can be derived from the It o-Taylor expansion [KP99, equation 5.5.4]
x
t
= x
0
+ a
0
t

0
ds + b
0
t

0
dW
s
+ b

0
b
0
t

0
s

0
dW
u
dW
s
+ a

0
b
0
t

0
s

0
dW
u
ds +

a
0
b

0
+
1
2
b

0
b
2
0

0
s

0
dudW
s
+

b
0
b

0
2
+ b

0
b
2
0

0
s

0
u

0
dW
r
dW
u
dW
s
(A.2)
+O(t
2
)
with a
0
= a(x
0
), etc. Retaining terms up to order O(t), and evaluating the integral
I
(1,1)
(t) :=
t

0
s

0
dW
u
dW
s
=
1
2

W
2
t
t

, (A.3)
assuming (without loss of generality) that W
0
= 0, we obtain the one-dimensional Milstein scheme
x
t
n+1
= x
t
n
+ a(x
t
n
)t
n
+ b(x
t
n
)W
n
+
1
2
b

(x
t
n
)b(x
t
n
)

W
2
n
t
n

. (A.4)
Applied to equation (2.2), we obtain
V
Milstein
t
n+1
= V
t
n
+ ( V
t
n
)t
n
+ V
q
t
n
Z
n
+
1
2

2
qV
2q1
t
n

Z
2
n
t
n

. (A.5)
A.2 The Milstein+ scheme
All of the explicitly given integral terms on the second line of (A.2)
I
(1,0)
(t) =
t

0
s

0
dW
u
ds (A.6)
I
(0,1)
(t) =
t

0
s

0
dudW
s
= W
t
t I
(1,0)
(t) (A.7)
I
(1,1,1)
(t) =
t

0
s

0
u

0
dW
r
dW
u
dW
s
=
1
6
W
3
t

1
2
W
t
t (A.8)
can be expressed in terms of the primary constituents W
t
and I
(1,0)
(t). The two-dimensional distribution
of the random numbers W
t
and I
(1,0)
(t) is given by a bivariate Gaussian law with mean (0, 0) and
covariance matrix

t t
2
/ 2
t
2
/ 2 t
3
/ 3

. (A.9)
This means,
I
(1,0)
(t)
1
2
W
t
t +
1
2

3
t
3
/
2
y (A.10)
30
with y N(0, 1), and thus the distribution of x
t
is given by
x
t
x
0
+ a
0
t + b
0
W
t
+
1
2
b

0
b
0

W
2
t
t

+
1
6

b
0
b

0
2
+ b

0
b
2
0

W
3
t
+

a
0
b

1
2
b
0
b

0
2

W
t
t (A.11)
+

0
b
0
a
0
b

1
2
b

0
b
2
0

1
2
W
t
t +
1
2

3
t
3
/
2
y

+O(t
2
) .
Hence, in order to fully cater for all terms to order O(t
3
/
2
), an extra source of randomness is required.
It can also be seen that, conditional on a given value for W
t
, the terms of order O(t
3
/
2
) have non-zero
expectation. This means, that, not accounting for the terms on the second and third line of (A.11), i.e.
the terms not present in the standard Milstein scheme, introduces a conditional bias given by
1
6

b
0
b

0
2
+ b

0
b
2
0

W
3
t
+
1
2

0
b
0
+ a
0
b

0
b
0
b

0
2

1
2
b

0
b
2
0

W
t
t . (A.12)
This bias can be corrected by simply adding these terms to the Milstein scheme which gives us
x
t
n+1
= x
t
n
+ a(x
t
n
)t
n
+ b(x
t
n
)W
n
+
1
2
b

(x
t
n
)b(x
t
n
)

W
2
n
t
n

+
1
6

b(x
t
n
) b

(x
t
n
)
2
+ b

(x
t
n
)b(x
t
n
)
2

W
3
n
(A.13)
+
1
2

(x
t
n
)b(x
t
n
) + a(x
t
n
)b

(x
t
n
) b(x
t
n
) b

(x
t
n
)
2

1
2
b

(x
t
n
)b(x
t
n
)
2

W
n
t
n
.
For the sake of brevity in the main text, we refer to this method as Milstein+. An alternative way to
arrive at the approximation
I
(1,0)
(t) ;
1
2
W
t
t (A.14)
at the core of this scheme is to look for the most likely or the expected value of I
(1,0)
(t) conditional on the
discretised path for W
t
, i.e. in the ltration generated by the increments W
n
. Newton [New91, New94]
introduced the idea of strong asymptotically efcient schemes following a similar line of reasoning and
resulting in the approximation on (A.14). Specically for the term I
(1,0)
(t), it so happens that the most
likely and the expected value conditional on the discretised path {W
n
} are both given by (A.14). Still,
the advantage of the Milstein+ scheme over the original Milstein scheme is not that it has a higher
convergence order but that it reduces the magnitude coefcient of the leading order error terms.
Applied to equation (2.2), the Milstein+ scheme reads
V
Milstein+
t
n+1
= V
t
n
+ ( V
t
n
)t
n
+ V
q
t
n
Z
n
+
1
2

2
qV
2q1
t
n

Z
2
n
t
n

+
1
6

3
q(2q 1)V
3q2
t
n
Z
3
n
+
1
2

qV
q1
t
n
(q + 1)V
q
t
n

1
2

3
q (3q 1) V
3q2
t
n

Z
n
t
n
. (A.15)
A.3 The Milstein scheme for stochastic volatility systems
For the N-dimensional system of stochastic differential equations
dx = a(x)dt + B(x)d

W , (A.16)
with a C
1
(R
N
, R
N
) and B C
2
(R
N
, R
NN
), i = 1, . . . , M and uncorrelated Brownian motions

W R
N
, the i-th component of the multidimensional Milstein scheme is
x
i
(t + t) = x
i
(t) + a
i
t +
M

j=1
b
ij


W
j
+
M

j,k,l=1
b
jk
(
x
j
b
il
)

I
(k,l)
. .. .
Milstein term
, (A.17)
31
wherein all of the coefcient functions a
i
() and b
ij
(), etc., are to be evaluated with x(t). We dene the
double It o integral

I
(k,l)
as

I
(k,l)
=
t+t

s=t
s

u=t
d

W
k
(u) d

W
l
(s) . (A.18)
For k = l, it simplies to

I
(k,k)
=
1
2


W
2
k
t

. (A.19)
For the stochastic volatility system (2.1) and (2.2), we use a Cholesky decomposition of the correlated
Brownian motions
dW =

d

W
1
+ d

W
2
(A.20)
dZ = d

W
2
(A.21)
with

:=

1
2
and set x
1
:= ln S and x
2
:= V to obtain from (2.1) and (2.2) the coupled stochastic
differential equations

dx
1
dx
2


1
2
x
2p
2
( x
2
)

dt +

x
p
2
x
p
2
0 x
q
2

d

W
1
d

W
2

. (A.22)
Since the volatility process x
2
is not inuenced directly by the dynamics for x
1
, the Milstein scheme
for x
2
is given by the standard one-dimensional formula (A.5). For x
1
, the fact that b
21
= 0 and

x
1
B(x) = 0 simplies the calculation of the Milstein term:
M

j,k,l=1
b
jk
(
x
j
b
1l
)

I
(k,l)
= b
22
(
x
2
b
11
)

I
(2,1)
+ b
22
(
x
2
b
12
)

I
(2,2)
= px
p+q1
2

I
(2,1)
+

I
(2,2)

. (A.23)
References
[AA00] L. Andersen and J. Andreasen. Volatility Skews and Extensions of the Libor Market Model.
Applied Mathematical Finance, 7(1):132, March 2000.
[Abe04] K.E.S. Abe. Strong Taylor Schemes for Stochastic Volatility. Working paper, 2004. www.
maths.ox.ac.uk/

schmitz/project2.htm.
[ABR01] L. Andersen and R. Brotherton-Ratcliffe. Extended Libor Market Models with Stochastic
Volatility. Working paper, Gen Re Securities, 2001.
[AP04] L. Andersen and V. Piterbarg. Moment Explosions in Stochastic Volatilty Models. Technical
report, Bank of America, 2004. ssrn.com/abstract=559481.
[Bec80] S. Beckers. The constant elasticity of variance model and its implications for option pricing.
Journal of Finance, XXXV(3):661673, June 1980.
[BK04] M. Broadie and

O. Kaya. Exact Simulation of Stochastic Volatility and other Afne Jump
Diffusion Processes. Working paper, Columbia University, New York, 2004. www.orie.
cornell.edu/

aberndt/FEseminar/papers04/exact sim 200409.pdf.


[BS73] F. Black and M. Scholes. The Pricing of Options and Corporate Liabilities. Journal of
Political Economy, pages 637654, 1973.
32
[BS80] M.J. Brennan and E.S. Schwartz. Analyzing convertible bonds. Journal of Financial and
Quantitative Analysis, 15:907929, 1980.
[CIR85] J. C. Cox, J. E. Ingersoll, and S. A. Ross. A theory of the term structure of interest rates.
Econometrica, 53:385408, 1985.
[CKLS92] C.K. Chan, G.A. Karolyi, F.A. Longstaff, and A.B. Sanders. An empirical comparison of
alternate models of the short-term interest rate. Journal of Finance, pages 12091227, 1992.
www.cob.ohio-state.edu/

sanders/ckls.pdf.
[Cox75] J. C. Cox. Notes on option pricing I: Constant elasticity of variance diffusions. Working
paper, Stanford University, 1975.
[CR76] J. C. Cox and S. A. Ross. The valuation of options for alternative stochastic processes.
Journal of Financial Economics, 3:145166, March 1976.
[Doo42] J. L. Doob. The Brownian Movement and Stochastic Equations. The Annals of Mathematics,
43:351369, April 1942.
[Dos77] H. Doss. Liens entre equations diff erentielles stochastiques ordinaires. Annales de lInstitut
Henrie Poincar e. Probabilit es et Statistiques, 13:99125, 1977.
[Eul68] L. Euler. Institutiones Calculi Integralis. 1768.
[GL94] J.G. Gaines and T.J. Lyons. Random Generation of Stochastic Area Integrals. SIAM Journal
on Applied Mathematics, 54(4):11321146, 1994.
[Gla03] P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer, 2003.
[GY93] H. Geman and M. Yor. Bessel Processes, Asian Options, and Perpetuities. Mathematical
Finance, 3:349375, 1993.
[Hes93] S. L. Heston. A closed-form solution for options with stochastic volatility with applications
to bond and currency options. The Review of Financial Studies, 6:327343, 1993.
[HW88] J. Hull and A. White. An Analysis of the Bias in Option Pricing Caused by a Stochastic
Volatility. Advances in Futures and Options Research, 3:2761, 1988.
[HW93] M. Hogan and K. Weintraub. The Lognormal Interest Rate Model and Eurodollar Futures.
Discussion paper, Citibank, New York, 1993.
[J ac02] P. J ackel. Monte Carlo methods in nance. John Wiley and Sons, February 2002.
[Kah04] C. Kahl. Positive numerical integration of stochastic differential equations. Masters thesis,
Bergische Universit at Wuppertal, 2004. www.math.uni-wuppertal.de/

kahl/
publications/DT.pdf.
[KJ05] C. Kahl and P. J ackel. Not-so-complex logarithms in the Heston model. Wilmott, September,
September 2005.
[KP99] P. E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations.
Springer, 1992, 1995, 1999.
[KS91] I. Karatzas and S. E. Shreve. Brownian motion and Stochastic Calculus. Springer, 1991.
[KS05] C. Kahl and H. Schurz. Balanced Milstein Methods for Ordinary SDEs. Technical report,
Department of Mathematics, Southern Illinois University, 2005.
33
[KT81] S. Karlin and M. Taylor. A Second Course in Stochastic Processes. Academic Press, 1981.
[L ev51] P. L evy. Wieners random function, and other Laplacian random functions. In Proceedings
of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, pages
171187, Berkeley and Los Angeles, 1951. University of California Press.
[MAP76] G. Marsaglia, K. Anantharayanan, and N.J. Paul. Improvements on fast methods for gener-
ating normal random variables. Information Processing Letters, 5:2730, 1976.
[Mar55] G. Maruyama. Continuous Markov processes and stochastic equations. Rendiconti del
Circolo Matematico di Palermo, 4:4890, 1955.
[Mil74] G. N. Milshtein. Approximate integration of stochastic differential equations. Theory of
Probability and Applications, 19:557562, 1974.
[MMB64] G. Marsaglia, M.D. MacLaren, and T.A. Bray. A fast procedure for generating normal
random variables. Communications of the ACM, 5:2730, 1964.
[MN98] M. Matsumoto and T. Nishimura. Mersenne Twister: A 623-dimensionally equidistributed
uniform pseudorandom number generator. ACM Trans. on Modeling and Computer Simu-
lation, 8(1):330, January 1998.
[MPS98] G. N. Milshtein, E. Platen, and H. Schurz. Balanced implicit methods for stiff stochastic
systems. SIAM, 38(3):10101019, 1998.
[New91] N. J. Newton. Asymptotically efcient Runge-Kutta methods for a class of Ito and
Stratonovich equations. SIAM Journal of Applied Mathematics, 51:542567, 1991.
[New94] N. J. Newton. Variance reduction for simulated diffusions. Siam Journal of Applied Mathe-
matics, 54:17801805, 1994.
[PT85] E. Pardoux and M. Talay. Discretization and simulation of stochastic differential equations.
Acta Applicandae Mathematica, 3:2347, 1985.
[Sch96] H. Schurz. Numerical Regularization for SDEs: Construction of nonnegative solutions.
Dynamical Systems and Applications, 5:323352, 1996.
[Sco87] L. Scott. Option Pricing When the Variance Changes Randomly: Theory, Estimation and
An Application. Journal of Financial and Quantitative Analysis, 22:419438, December
1987.
[SS91] E. M. Stein and J. C. Stein. Stock Price Distribution with Stochastic Volatility : An Analytic
Approach. Review of Financial Studies, 4:727752, 1991.
[SS94] K. Sandmann and D. Sondermann. On the stability of log-normal interest rate models and
the pricing of Eurodollar futures. Discussion paper, Dept. of Statistics, Faculty of Eco-
nomics, SFB 303, Universit at Bonn, June 1994. ftp://ftp.wipol.uni-bonn.de/
pub/RePEc/bon/bonsfb/bonsfb263.pdf.
[SS97a] K. Sandmann and D. Sondermann. A Note on the Stability of Lognormal Interest Rate
Models and the Pricing of Eurodollar Futures. Mathematical Finance, 7(2):119, April 1997.
[SS97b] K. Sandmann and D. Sondermann. Log-Normal Interest Rate Models: Stability and Method-
ology. Discussion paper, Dept. of Statistics, Faculty of Economics, SFB 303, Univer-
sit at Bonn, January 1997. ftp://ftp.wipol.uni-bonn.de/pub/RePEc/bon/
bonsfb/bonsfb398.pdf.
34
[SZ99] R. Sch obel and J. Zhu. Stochastic Volatility With an Ornstein Uhlenbeck Process: An
Extension. European Finance Review, 3:2346, 1999. ssrn.com/abstract=100831.
[UO30] G.E. Uhlenbeck and L.S. Ornstein. On the theory of Brownian motion. Physics Review,
36:823841, 1930.
[Wig87] J. Wiggins. Option values under stochastic volatility: Theory and empirical estimates. Jour-
nal of Financial Economics, 19:351372, 1987.
[WZ65] E. Wong and M. Zakai. On the convergence of ordinary integrals to stochastic integrals.
Ann. Math. Stat., 36:15601564, 1965.
35

You might also like