Solving Finite Difference Schemes
Solving Finite Difference Schemes
Solving Finite Difference Schemes
[email protected]
Evis Kellezi
Department of Econometrics and FAME, University of Geneva, 1211 Geneva 4,
Switzerland
[email protected]
Giorgio Pauletto
Department of Econometrics, University of Geneva, 1211 Geneva 4, Switzerland
Abstract
We investigate computational and implementation issues for the valuation of options on three underlying assets, focusing on the use of the nite
dierence methods. We demonstrate that implicit methods, which have
good convergence and stability properties, can now be implemented eciently due to the recent development of techniques that allow the ecient
solution of large and sparse linear systems. In the trivariate option valuation problem, we use nonstationary iterative methods (also called Krylov
methods) for the solution of the large and sparse linear systems arising
while using implicit methods. Krylov methods are investigated both in
serial and in parallel implementations. Computational results show that
the parallel implementation is particularly ecient if a ne spatial grid is
needed.
Introduction
ij i j + rC .
t
Si
2 i,j=1 Si Sj
i=1
(1)
The terminal condition is specic for each type of payo. For example, in the
case of a European call on the maximum of three assets with strike price E, the
terminal condition is the option payo at maturity T :
C(S1 , S2 , S3 , T ) = max{S1 (T ), S2 (T ), S3 (T )} E
.
(2)
+
As shown by (Johnson, 1987), the pricing problem for this particular payo
function has an analytical solution. For more general settings, the PDE (1)
has to be solved numerically. One of the most popular methods is the nite
dierence method. It consists in replacing the partial derivatives by numerical
approximations and then solving the resulting discretised problem.
Before discretising it, the PDE should be simplied. Introducing the changes
of variables x = log(S1 /E), y = log(S2 /E), z = log(S3 /E), = T t and
3 See
y2 u
x2 u
2 u
)
+ (r
)
+ (r z )
2 x
2 y
2 z
2
2
2
1 u 2 1 u 2 1 u 2
+
+
+
2 x2 x 2 y 2 y 2 z 2 z
2u
2u
2u
+
x y xy +
x z xz +
y z xz ru
xy
xz
yz
(r
max{ex 1, ey 1, ez 1}
(3)
+
where V and D
1 2
r 12 x2
xy
x
y
xz
x
z
2 x
1 2
1 2
yx y x
D
=
V =
r
yz y z
2 y
2 y
r 12 z2
zx z x
zy z y
1 2
2 z
are called respectively the velocity tensor and the diusion tensor.
i = 0, 1, . . . , Nx
yj = Ly + j y
j = 0, 1, . . . , Ny
zk = Lz + k z
k = 0, 1, . . . , Nz
m = (m 1)
m = 1, . . . , M
where Lx , Ly and Lz are the lower bounds of the spatial grid, x , y and z
dene the spatial grid steps, the time step and Nx , Ny , Nz and M are the
number of spatial and time steps.
For a given time m , the spatial grid points (xi , yj , zk , m ) are contained in a
rectangular parallelepiped with edges parallel to the axes x, y and z. A particular grid point is then unambiguously dened by the triplet (i, j, k). Therefore the
4
and
= ,
where is the set of (Nx 1)(Ny 1)(Nz 1) interior points obtained for
i = 1, . . . , Nx 1, j = 1, . . . , Ny 1 and k = 1, . . . , Nz 1 and is the set of
grid points on the boundary.
In the appendix A we explain in detail how we obtain the discretised version
of the PDE (3) in matrix notation:
m+1
um
+ (1 )P um
um+1
= P u
(5)
m
where um
, m = 1, . . . , M is the vector of interior grid points at time m, u is
the vector of all grid points and 0 1. Figure 1 illustrates the structure
of matrix P corresponding to a grid with ve points in each direction. The
columns of matrix P which correspond to grid points uijk are indicated
with a circle and those which correspond to grid point uijk are indicated
with a bullet.
...
...
...
...
...
...
k=1
k=2
k=3
k=4
.
.
.
.
.
.........................................k=0
......................................................................................................................................................................................................................................................................................................................................................................................................................................
...
... j=0 j=1 j=2 j=3 j=4 .... j=0 j=1 j=2 j=3 j=4 .... j=0 j=1 j=2 j=3 j=4 .... j=0 j=1 j=2 j=3 j=4 .... j=0 j=1 j=2 j=3 j=4 ....
...
...
...
...
...
...
.
.
.
.
.
.
...
.................................................................................................................
..................
..................
.......................................................
..................
..................
.......................................................
..................
..................
............................................................................................................................
..
..
..
..
..
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
....
....
....
....
....
..
................................................................................................................................................................................................................................................................................................................................................................................................................................................................
(6)
i=1
where all elements on the right hand side are known at time step m . This
equation can be solved explicitly for each time step m . Its numerical stability
and convergence depend on the spectral radius of matrix I + A which has to be
inferior to unity.
When solving the system it is however more ecient to consider the form
m
= um
um+1
+ P u
Complexity
Statement 4 needs 2(d 1) operations to compute the index s and 2d2 operations to compute the set of indices s . The addition and the scalar product in
Statement 5 need 1 + 2(1 + 2d2 ) operations. We then establish that the number
of elementary operations for Algorithm 1 is given by
M (6d2 + 2d + 1) ||
(7)
and as we need two arrays um+1 and um the space complexity of the algorithm
is
2 ||.
There is a possibility of reducing the operation count given in (7) by computing rst all sets s and store them in an array of dimension (1 + 2d2 ) ||.
This reduces the factor in expression (7) from 6d2 + 2d + 1 to 4d2 + 3.
Implicit Euler method ( = 1)
If is set to one, equation (6) becomes
m+1
(I A)um+1
= um
+ Bu
and um+1
is the solution of a linear system. Algorithm 2 implements the implicit
method.
The implicit method is rst order accurate and is in this respect not superior
to the explicit method. Its advantage is however that it is stable and convergent
for all choices of space and time steps.
Initialize u0
Compute A and B
for m = 0 : M 1 do
Compute boundary values um+1
m+1
Solve (I A)um+1
= um
+ Bu
end for
The Crank-Nicolson method is second order accurate and its complexity is almost identical to the complexity of the implicit method. There are a few more
operations involved for the evaluation of the right-hand side vector of the linear
system.
The CFL condition for the explicit method implies a prohibitively large number
of time steps.4 For implicit methods the CFL condition is satised regardless of
the number of time steps. However, the implicit method requires the solution
of a linear system of size || at every time step. As the system becomes large,
it is crucial to choose an ecient solution method.
The structure of the matrix of the linear system is shown in Figure 2. This
matrix can either be considered as a banded matrix with bandwidth (Nx 1)Ny +
1 or as a block tridiagonal matrix with blocksize (Nx 1) (Nx 1). For the
solution of the linear system, one may think to apply an LU decomposition which
exploits the particular structure of the matrix. It appears however that direct
methods are unfeasible for systems of the size considered in our application.5
Iterative methods oer a possibility to overcome the diculties of direct
methods. We experimented the nonstationary iterative methods also called
Krylov subspace methods.6
4 The CFL condition is named after Courant, Friedrich and Lewy (see Courant et al. (1928)
or Courant et al. (1967)). This is only a necessary condition for the convergence of nite
dierence methods: If it is satised, the method might converge to the exact solution.
5 Gilli and Pauletto (1997) discuss this problem in the case of the solution of large economic
models.
6 A presentation of these techniques can be found in Barrett et al. (1994), Freund et al.
(1992), Axelsson (1994), Kelley (1995) and Saad (1996). For applications in economics see
Gilli and Pauletto (1998), Pauletto and Gilli (2000) and Mrkaic (2001).
Initialize x(0)
Compute r(0) = b Ax(0) , 0 = 1, 1 = r(0) r(0) , = 1, = 1, p = 0, v = 0
for k = 1, 2, . . . until convergence do
= (k /k1 )(/)
p = r + (p v)
v = Ap
= k /(r(0) v)
s = r v
t = As
= (t s)/(t t)
x(k) = x(k1) + p + s
r = s t
k+1 = r(0) t
end for
Complexity
If BiCGSTAB is used with no preconditioning, which is the case in our application, the algorithm computes in statement 2 a matrix-vector product and a
norm, and, for each iteration, 4 inner products, 6 vector updates (also called
saxpy operations) and 2 matrix-vector products. The number of elementary
operations corresponding to these computations is 2 (2d2 + 1) || for the matrixvector product, 2 || for one saxpy as well as for one inner product, and 4 ||
for the computation of the norm.
The overall complexity of the implicit method for d = 3 and M time steps
is then
M || (42 + 96k)
7 See
25
20
15
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Gbytes
LU
BiCGSTAB
LU
BiCGSTAB
10
..
0
0
. .
. .
. .
. .
.
.
. . .
. . . . . . . . .
50
100
150
.
.
.
.
.
. .
.
. .
.
.
.
0
0
50
100
. .
200
150
200
It clearly appears that the sparse direct method reaches its limits for a
number of grid points below 30, whereas for BiCGSTAB we can easily reach
grid sizes beyond 200, in particular with parallel machines where memory is less
of an issue.
Computational results
10
S1 (0) = S2 (0) = S3 (0) = 10. The analytical solution of the call corresponding
to this set of parameters is 2.2672.10
In the following we present computational results obtained both with a serial
and a parallel implementation.
Serial implementation
We implemented the explicit method described in Algorithm 1, the implicit
method described in Algorithm 2 and the BiCGSTAB method as described in
Algorithm 3 in a MATLAB 5.x programming environment.11 Computations
have been executed on a 500 MHz Pentium III PC with 512 Mbytes of memory.
With 71 space grid points in each of the three directions, the implicit method
needs 516 (71 2)3 170 Mbytes of memory for data storage. This is the
largest problem instance we can solve with MATLAB on our machine.
Parallel implementation
As commented in the previous paragraph, the space complexity of the implicit
method limits the size of the problem that can be solved in a serial environment.
This is a situation where Krylov methods become attractive as they are also
well suited for parallel implementation, allowing to tackle larger grid sizes. For
our application we employed the software PETSc 2.1.0 (Portable and Extensible
Toolkit for Scientic Computing) developed and maintained at the Mathematics
and Computer Science Division of the Argonne National Laboratory, see (Balay
et al., 2001). PETSc is a set of routines designed to solve large-scale computational problems in a parallel environment. The software is written in C and can
be linked to C, C++ or Fortran programs.
We used a computing platform constituted by 20 biprocessor PC LINUX
cluster. The individual machines are Pentium III running at 500 MHz and are
equipped with 384 Mbytes of memory each. These 20 PCs are part of a larger
cluster of 64 PCs, where the other 44 are Pentium II running at 400 MHz. We
chose to use the rst twenty in order to preserve a homogenous computing
environment.
On this cluster we solved with the implicit method problems with a grid
size of N = 200 in each of the three directions corresponding to linear systems
with 7 880 599 equations. With 37 processors, we achieved a computing speed
of about 0.7 Gop/s.
10 The analytical formula for a call on the maximum of three assets is derived in (Johnson,
1987). The evaluation of this formula requires the numerical integration of the trivariate
normal density function. We used the algorithms proposed by (Genz, 2001; Genz, 1993) and
MULNOR of (Schervish, 1984).
11 The MATLAB les containing the code can be downloaded from the URL www.unige.ch/
ses/metri/gilli/.
11
Scalability analysis
To evaluate the performance of the parallel algorithm on our parallel machine,
we performed an empirical scalability analysis. Let T1 be the time of a serial
execution for a xed size of our problem. As a measure for the problem size
we take the size of the linear system to be solved which is equal to the number
of interior grid points || = (N 1)3 , when we set Nx = Ny = Nz = N . Let
Tp be the time taken by the parallel algorithm to solve the same problem on
p identical processors. Then the ratio Sp = T1 /Tp is called the speedup and
it indicates what is the relative benet of solving a problem in parallel over a
sequential implementation.
As processors spend time in communicating, they cannot dedicate all the
time to computations of the algorithm. Eciency is a measure of the fraction
of time during which processors are usefully employed; it is dened as the ratio
of speedup to the number of processors, Ep = Sp /p. Figure 4 shows the speedup
and the eciency as functions of the number of processors for dierent problem
sizes ranging from N = 100 to N = 200 and which correspond to systems of
approximatively one to eight million equations.
The speedup curves decrease after an optimum number of processors. However this optimum lies beyond the number of processors available for our application. We observe that larger instances of the problem yield higher speedup
and eciency for the same number of processors.
25
0.65
Efficiency
Speedup
20
0.7
N=100
N=120
N=140
N=160
N=180
N=200
15
10
0.6
0.55
N=100
N=120
N=140
N=160
N=180
N=200
0.5
0.45
5
0.4
0
0
10
20
30
Number of processors
40
0.35
0
10
20
30
Number of processors
40
Figure 4: Speedup and eciency versus the number of processors for solving
the implicit scheme in parallel.
12
x 10
8
7
6
5
4
3
2
15
20
25
30
35
Number of processors
40
Conclusion
Based on the computational results performed in two dierent computing environments, we conclude that implicit nite dierence methods can be eciently
used for the valuation of options on three underlying assets. The availability
of nonstationary iterative methods and the fact that they are well suited for
parallelization makes the solution of very large linear systems possible. In our
experiments, the size of the system that we solve in parallel is approximatively
eight million. We think that a matrix free implementation of the algorithm
would enhance the performance and, with faster processors and more memory,
we can go even further, allowing for ner space grids.
In the serial case, the maximum grid size that we can solve in a standard
PC in Matlab environment is of about 70 in each direction.
We have used the Krylov methods without preconditioning. It would be
worthwhile studying whether preconditioning would improve the performance
of the algorithm. The inuence of the diusion and velocity tensors in the
convergence speed of the Krylov methods can also be the object of further
research.
Replacing all partial derivatives in equation (3) by nite dierence approximations we can write the following approximation to PDE (3) which holds for all
grid points um
ijk :
m
13
(8)
The 9 approximations q , q Q = {x, y, z, xx, yy, zz, xy, xz, yz} for the partial
derivatives with respect to the space variables x, y and z are dened as convex
linear combinations of central nite dierence approximations at time steps
m + 1 and m
m
q = q qm+1 + (1 q )qm .
(9)
The temporal weighting factors q , q Q take their values in [0, 1] and determine
the time at which the partial derivatives with respect to the state variables are
evaluated. The central nite dierence approximations qm , q Q are:
xm
ym
zm
m
xy
m
xz
m
yz
m
m
m
um
um
i+1,j,k ui1,j,k
i+1,j,k 2uijk + ui1,j,k
m
xx
=
2
2 x
x
m
m
m
m
2u
um
u
i,j+1,k
i,j1,k
i,j+1,k
ijk + ui,j1,k
m
yy
=
2
2 y
y
m
m
m
m
ui,j,k+1 ui,j,k1
ui,j,k+1 2um
ijk + ui,j,k1
m
zz
=
2 z
2z
m
m
m
m
ui+1,j+1,k ui1,j+1,k ui+1,j1,k + ui1,j1,k
4 x y
m
m
m
u
um
i+1,j,k+1
i1,j,k+1 ui+1,j,k1 + ui1,j,k1
4 x z
m
m
m
um
i,j+1,k+1 ui,j1,k+1 ui,j+1,k1 + ui,j1,k1
4 y z
= m
=
m
um+1
ijk uijk
um+1
ijk
um
ijk
+ (1 m )
m
um+1
ijk uijk
(10)
um
ijk
(11)
where um
ijk denotes a one-dimensional array. The set of indices ijk corresponds
to the points in the grid which are neighbours of um
ijk and which occur in the
nite dierence approximations for the partial derivatives. The product c um
ijk ,
14
m
m
m
= c1 um
i,j1,k1 + c2 ui1,j,k1 + d6 ui,j,k1 c2 ui+1,j,k1
m
m
m
c1 um
i,j+1,k1 + c3 ui1,j1,k + d4 ui,j1,k c3 ui+1,j1,k
m
m
+d2 um
(12)
i1,j,k + d1 ui,j,k + d3 ui+1,j,k
m
m
m
m
c3 ui1,j+1,k + d5 ui,j+1,k + c3 ui+1,j+1,k c1 ui,j1,k+1
m
m
m
c2 um
i1,j,k+1 + d7 ui,j,k+1 + c2 ui+1,j,k+1 + c1 ui,j+1,k+1
Dxy
4x y
Vx
2x
Dxx
2x
c2 =
c5 =
c8 =
Dxz
4x z
Vy
2y
Dyy
2y
c3 =
c6 =
c9 =
d2 = c9 c6
d6 = c7 c4
Dyz
4y z
Vz
2z
Dzz
2z
d3 = c9 + c6
d7 = c7 + c4
and where Vi are the elements of the velocity tensor and Dij those of the diusion
tensor in PDE (4).
Figure 6 reproduces the fragment of the grid containing point uijk (bullet)
and the set of neighbours ijk (circles).
(i,j+1,k+1)
...
. ......................................... . ......................................... .
......
...... ..
..... ...
..... ...
..... ...
.
..........
.
.
.
..
......
.
......
...
.
......
......
......
...
....
...
..
.....
.....
.....
...
....
........................................... ........................................... ......
.
.
..
...
.
.
.
.
.
... ...
.... ...
.
.... ...
.
.
.
.
.
.
.
.
.
.
.
...
...
.
.
.
.
.
...
... .. ..
.
.... .. ...
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
.
.
.
..
..... . .................................... . .............................. ..
.
.
.
...
.... ...........
.
.
.
(i+1,j+1,k)
.
.
........................................ ............................................... ..... ...... . ....
.
.
.
... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
. .
. .
..
...
.... .... .........
....
....
... ... ..............
... .... ...........
.. .........
...
...
... .. .......
... . .......
...
................................ ..................................
...
.
...
.... .
.... . ...
...
...
...... .. ..
...... .
..... ... ...
...
...
..... .. ..
..... .. .. ..
.....
.
... .... ......... .. ... .... ........... .. ....
... ........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... .. ......
.
.
.
.. ......
.
.
..................... ...
... ...
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(i1,j1,k) .
.
.
.
.
...
.
.
.
.
.... ... .........
.... ... .........
....
.
..
..
.. .........
.. .........
... ...........
...
. ......
. ....
. ..
...
.............................. .................................. ..
...
.
... (i+1,j,k1)
.... ....
...
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
....
....
.....
...
....
....
.....
.....
......
... .........
.. .......
.. ......
..................................... . ........................................ . ...........................................................
.....
.
.
.
................
(i,j1,k1)
........
.....
.
.
.
.
..
Figure 6: Grid fragment with node uijk (bullet) and neighbours uijk (circles).
In order to write the equations (12) for all the grid points uijk in matrix
form, we index the space grid points uijk with a one-dimensional array s. The
index sijk corresponding to a grid point uijk is dened as
sijk = i + j(Nx + 1) + k(Nx + 1)(Ny + 1)
i=0,1,...,Nx
j=0,1,...,Ny
k=0,1,...,Nz .
(13)
We now construct a matrix P where the rows contain the coecients of vector
c. From Figure 6 and from expression (13) we see that the vector o of relative
15
positions of the |ijk | = 18 neighbors of a grid point uijk can be computed with
the algorithm 4.
The row indices of matrix P corresponding to the points uijk are dened
by
rijk = i + (j 1)(Nx 1) + (k 1)(Nx 1)(Ny 1) ,
i=1,...,Nx 1
j=1,...,Ny 1
k=1,...,Nz 1
(14)
and a row rijk of matrix P , corresponding to the equation dening the grid
point uijk has the form
(15)
M (rijk , sijk + o) = c
where
c = c1 c2 d6 c2 c1 c3 d4 c3 d2 d1 d3 c3 d5 c3 c1 c2 d7 c2 c1
is dened in (12). Equation (11) in matrix form becomes
m+1
um+1
um
+ (1 )P um
.
= P u
References
Axelsson, O. (1994). Iterative Solution Methods. Oxford University Press. Oxford, UK.
Balay, S., W. D. Gropp, L. Curfman McInnes and B. F. Smith (2001). PETSc
Users Manual. Technical Report ANL 95/11 - Revision 2.1.0. Argonne National Laboratory. http://www.mcs.anl.gov/petsc/petsc.html.
Barrett, R., M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine and H. van der Vorst (1994). Templates for the
Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM.
Philadelphia, PA.
Baxter, M. and A. Rennie (1996). Financial Calculus: An Introduction to
Derivative Pricing. Cambridge University Press.
16
17
Gilli, M. and G. Pauletto (1998). Krylov methods for solving models with
forward-looking variables. Journal of Economic Dynamics and Control
22(6), 12751289.
Izvorski, I. (1998). A nonuniform grid method for solving PDEs. Journal of
Economic Dynamics and Control 22(6), 14451452.
James, J. and N. Webber (2000). Interest Rate Modelling. Wiley. Chichester,
UK.
Johnson, H. (1987). Options on the maximum or the minimum of several assets.
Journal of Financial and Quantitative Analysis 22, 277283.
Kelley, C. T. (1995). Iterative Methods for Linear and Nonlinear Equations.
SIAM. Philadelphia, PA.
Kurpiel, A. and T. Roncalli (1999). Hopscotch methods for two-state nancial
models. Journal of Computational Finance 3(2), 5389.
Marchuk, G.I. (1990). Splitting and alternating direction methods. In: Handbook
of Numerical Analysis (P.G. Ciarlet and J.-L. Lions, Ed.). Vol. I. pp. 197
464. North-Holland. Amsterdam.
Mrkaic, M. (2001). Policy iteration accelerated with Krylov methods. Journal
of Economic Dynamics and Control. (Forthcoming).
Pauletto, G. and M. Gilli (2000). Parallel Krylov Methods for Econometric
Model Simulation. Computational Economics 16(12), 173186.
Saad, Y. (1996). Iterative Methods for Sparse Linear Systems. PWS Publishing
Company. MA.
Schervish, M. (1984). Multivariate Normal Probabilities with Error Bound. Applied Statistics 33, 8187.
Tavella, D. and C. Randall (2000). Pricing Financial Instruments : The Finite
Dierence Method. Wiley series in Financial Engineering. Wiley. New York.
van der Vorst, H. A. (1992). Bi-CGSTAB: A Fast and Smoothly Converging
Variant of BI-CG for the Solution of Nonsymmetric Linear Systems. SIAM
Journal for Scientic and Statistical Computing 13(2), 631644.
Zvan, R., P.A. Forsyth and K.R. Vetzal (1998). A general nite element approach for PDE option pricing models. Mimeo.
18