An Introduction To Monte Carlo Simulations: Mattias Jonsson
An Introduction To Monte Carlo Simulations: Mattias Jonsson
An Introduction To Monte Carlo Simulations: Mattias Jonsson
Notes by
Mattias Jonsson
Contents
1. Introduction
2. Computing areas, volumes and integrals
2.1. Examples
2.1.1. Example A: Area of a rectangle
2.1.2. Example B: Area of a disc
2.1.3. Example C: Area of a superellipse
2.1.4. Example D: Volume of an n-dimensional ball
2.1.5. Example E: Computing a triple integral
2.2. Methods
2.3. First method: exact computation
2.4. Second method: the lattice method
2.4.1. Example A
2.4.2. Examples B and C
2.4.3. Example D
2.4.4. Example E
2.5. Third method: the Monte Carlo method
2.5.1. Examples B and C
2.5.2. Example D
2.5.3. Example E
3. A crash course in probability theory
3.1. Outcomes, events and probabilities
3.2. Random variables
3.3. Discrete random variables
3.4. Continuous random variables
3.5. Expectation and variance
3.6. Some important distributions
3.6.1. Binomial random variables
3.6.2. Poisson random variables
3.6.3. Normal random variables
3.7. Random vectors
3.8. Independence
3.9. Conditional expectation and variance
4. Computing expected values
4.1. General strategy
4.2. Sampling
4.2.1. Sampling discrete random variables
4.2.2. Sampling continuous random variables
4.3. The law of large numbers
4.4. Presenting the results
4.5. Examples
4.5.1. Flipping a coin
4.5.2. Rolling dice
4.5.3. Normal random variables
4
4
4
4
4
4
4
4
4
5
5
5
5
6
6
7
7
8
9
9
9
10
10
10
11
11
12
12
12
13
13
13
14
14
14
15
16
16
17
18
18
19
19
20
20
21
21
22
22
23
24
25
25
1. Introduction
These notes are meant to serve as an introduction to the Monte Carlo
method applied to some problems in Financial and Actuarial Mathematics.
Roughly speaking, Monte Carlo simulation is a method for numerically
computing integrals and expected values. It is a probabilistic method at
heart, based on simulating random outcomes or scenarios. Such a method
may bring up the analogy with gambling at the casino. At any rate the
method is named after the casino in Monte Carlo, Monaco.
2. Computing areas, volumes and integrals
2.1. Examples. In order to introduce the Monte Carlo method, we start
by giving a few examples of problems where the method applies. First we
list the problems, then we discuss methods of solving them.
2.1.1. Example A: Area of a rectangle. We start by a seemingly trivial problem: compute the area of the rectangle
= [0.2, 0.8] [0.1, 0.6]
2.2. Methods. How can we compute the quantities in the examples above?
We shall discuss three different methods.
Exact computations.
The lattice method.
The Monte Carlo method.
"
Area(Rk ) =
pk
1
#{k | pk }
N
Exercise: Check that A16 = 0.25 but that AN = 0.3 whenever J is divisible
by 10.
2.4.2. Examples B and C. The lattice method can be used to compute the
area of quite general domains in the plane. We proceed as follows:
First find a rectangle R that you know contains the region .
Pick a (large) integer J 1 and set N = J 2 .
Divide the rectangle R into N equally large subrectangles Rk , 1
k N.
Let pk be the center of Rk . The centers form a lattice, explaining
the name of the method!
Approximate the exact area A = A() of by
AN =
"
pk
Area(Rk ) =
Area(R)
#{k | pk }.
N
The drawback of this method is that for large dimensions n, and in fact
even for n = 3, the number of subcubes N = J n of subcubes has to be
very large in order to achieve a reasonably accurate number. In fact, one
can prove that for smoothly bounded domains (such as a ball), the rate of
convergence is
1
1
O( ) = O( 1/n ).
J
N
1
Notice that N 1/n decreases as n increases. This phenomenon is sometimes
called the curse of dimensionality.
2.4.4. Example E. Computing an integral instead of an area is not much
harder. In Example E we proceed as follows:
Let R be the unit cube [1, 1]3 and the unit ball.
Pick a (large) integer J 1 and set N = J 3 .
Divide the cube R into N equally large subcubes Rk , 1 k N .
Let pk be the center of Rk .
pk
2.5. Third method: the Monte Carlo method. The idea of the Monte
Carlo method is quite simple. Instead of using all points on a lattice, we
pick them totally randomly. We illustrate the method on Examples B E
above.
2.5.1. Examples B and C. Let us use the Monte Carlo method to compute
the are of a domain in the plane.
First find a rectangle R = Rx Ry that you know contains the region
.
Pick a (large) integer N (not necessarily of the form N = J 2 .
Pick 2N independent samples x1 , . . . , xN and y1 , . . . , yN from the
uniform distribution on Rx and Ry , respectively. Set pk = (xk , yk ).
Then p1 , . . . , pN are independent samples of points from the uniform
distribution of R.
Approximate the exact area A = A() of exactly as before, that
is, by AN , where
"
Area(R)
AN =
Area(Rk ) =
#{k | pk }.
N
pk
by
IN =
Vol(R) "
f (pk ),
N
pk
10
We can easily obtain the cdf (cumulative distribution function) from the
pdf:
"
FX (x) =
fX (n)
nx
3.4. Continuous random variables. Together with the discrete ones, the
continuous random variables are the most important ones. While a discrete
random variable can only take (at most) countably many values, the probability of a continuous random variable taking a specific value is always
zero.They can be defined as random variables X admitting a probability
11
density function (pdf) fX . The (pdf) is related to the cumulative distribution function by
! x
fX (u) du.
FX (x) =
"
E[X] =
nfX (n).
n=0
"
g(n)fX (n).
n=0
12
3.6.1. Binomial random variables. A binomial random variable with parameters (m, p) can be thought of as the number of heads after tossing m
independent biased coins, where the probability of heads is p for each single
throw. If X is binomial, then we write
X Bin(m, p).
Clearly X then is discrete, takes values 0, 1, . . . , m and the pmf of X is given
by
# $
m k
fX (k) =
p (1 p)mk .
k
and
Binomial random variables are often used to model the number of events
during a fixed time period.
3.6.2. Poisson random variables. Another distribution commonly used to
model the number of events is the Poisson distribution. We write
X Poisson()
for a Poisson random variable with parameter . Then X takes values
X = 0, 1, 2, . . . and has pmf
fX (k) = e
k
.
k!
13
fX (x) =
exp (x m) R (x m) .
2
(2)n/2 detR
P (A B)
.
P (B)
This is called the probability of A conditioned on B, or the conditional probability of A given B. If A and B are independent, then P (A|B) = P (A).
The expectation of a random variable X with respect to this measure is
given by
E[X 1A ]
E[X|B] =
,
P (B)
14
15
pk < Y <
n
"
pk ,
k=1
where Y has a uniform distribution on the interval (0, 1). The idea is of
course that Y can be easily sampled in matlab. We leave it to the reader to
figure out the exact matlab implementation of the algorithm above.
16
that is,
Y = FX (X),
which implies
In general, however, it can be difficult to apply this procedure in a computationally efficient way, unless there is an explicit formula for the inverse
cdf FX1 .
A special (and important) case that matlab knows how to handle well is
that of a standard normal distribution. The command
x=randn(1,N)
generates a vector of N independent samples from the standard normal
distribution function.
Sometimes one can do simple transformations to get new distributions.
For instance,
x=mu+sigma*randn(1,N)
generates samples from the normal distribution N (, 2 ) with mean
and variance 2 , and
x=exp(mu+sigma*randn(1,N))
generates samples from a log-normal distribution.
4.3. The law of large numbers. The reason why Monte Carlo works is
the Law of Large Numbers.
Theorem (Kolmogorovs Strong Law of Large Numbers). If X1 , X2 , . . . , Xn , . . .
are i.i.d. random variables with (well-defined) expected value E[Xn ] = ,
%
then N1 N
n=1 Xn a.s. as N .
This version of the law of large numbers does not tell us how fast the
convergence is. In general, we have a rule of thumb: In Monte Carlo Simulations, the error is O( 1N ) when using N samples.
Note: this is quite slow: to get 1 more decimal, need 100 times more
simulations!
The reason for this rule of thumb is
17
n=1
%N
1
The formula for the variance
n=1 Xn
implies that the random variable N
has standard deviation / N .
While the strong law of large numbers is rather difficult to prove, the
two formulas above are elementary. A more precise (and non-elementary!)
version of the rule of thumb is given by
Theorem (Central Limit Theorem). If X1 , X2 , . . . , Xn , . . . are i.i.d. random variables with mean and variance 2 : i.e. E[Xn ] = and Var[Xn ] =
2 , then
%N
n N
1 X
N (0, 1) as N .
N
In other words, after subtracting the mean and dividing by the standard
deviation, the sum converges to a standard normal/Gaussian.
In particular we get
N
1 "
2
Xn N (0, )
N
N
( 1 )*
+
error
so the error has standard deviation N .
4.4. Presenting the results. The Monte Carlo method differs from most
other methods by its probabilistic nature. In particular, the approximation
we get will vary from simulation to simulation. For this reason it is important
to present the results of a Monte Carlo simulation in an instructive way.
It is a good habit to always include the following elements in a presentation of the results of a Monte Carlo simulation:
N : the number of paths
x
N : the Monte Carlo estimate
&N : the standard error
A convergence diagram
Let us explain the items above. We have a random variable X with
(unknown) expected value E[X] = and variance Var[X] = 2 and we
are interested in computing . To this end, we use independent samples
x1 , . . . , xN from X and form the average
1
(x1 + x2 + + xN ).
N
This is the Monte Carlo estimate of referred to above.
x
N =
18
The standard error &N is supposed to measure the error in the Monte Carlo
simulation. Of course, we cannot know the error exactly, or else there would
be no point in doing the Monte Carlo simulation in the first place. What we
would like to know is the approximate size of the error. Now, if we think of
x
N as a random variable rather than a number, then
by the Central Limit
Theorem, xN is approximately distributed as N (, 2 / N ). Unfortunately,
we do not know ! However, we can use the following estimate for :
,
/N
0
- 1
"
N = .
x2i N x
2N
N 1
i=1
N
&N = .
N
Finally, a convergence diagram is a plot of x
n against n for 1 n N . It
is a good visual tool for determining whether a given Monte Carlo simulation
is close to having converged. See the examples below for more details on
how to do this.
4.5. Examples. Let us illustrate the Monte Carlo method for approximating expected values in a few simple cases. More interesting examples will be
given later on.
4.5.1. Flipping a coin. Let X be a Bernoulli random variable, i.e. X = 0
or X = 1 with probability fX (0) = 1 p and fX (1) = p. Then X can be
though of as the number of heads after a single toss of an unfair (if p -= 0.5)
coin. We wish to use Monte Carlo simulations to approximate the expected
value of X and compare it with the exact value E[X] = p.
We saw above how to sample X using matlab. The following matlab code
generates N samples, computes a Monte Carlo estimate approximate value
of E[X], computes the standard error, and produces a convergence diagram.
In the code, we have set p = 0.4 and N = 100000 but this can of course be
modified.
clear;
N=100000;p=0.4;
x=(rand(1,N)<p);
hx=cumsum(x)./[1:N];
eps=sqrt((sum(x.^2)-N*hx(N)^2)/(N-1))/sqrt(N);
fprintf(No paths: %i\n,N);
fprintf(MC estimate: %f\n,hx(N));
fprintf(standard error: %f\n,eps);
Nmin=ceil(N/100);
19
figure(1);plot([Nmin:N],hx(Nmin:N));
title(MC convergence diagram: expectation of Bin(1,0.4));
xlabel(No paths);
ylabel(estimate);
4.5.2. Rolling dice. Next we want to use Monte Carlo simulations to compute the probability of rolling a full house in one roll in the game of
Yahtzee. This means rolling five (fair, independent, six-sided) dice and getting a pattern of type jjjkk, where 1 j, k 6 and j -= k.
Recall that the probability of an even A is the same as the expected value
of the random variable 1A taking value 1 if A occurs and zero otherwise.
We can use the following matlab code to compute a Monte Carlo estimate
of the probability of a full house.
clear;
N=100000;
x=sort(ceil(6*rand(5,N)));
y=((x(1,:)==x(2,:)==x(3,:))&(x(4,:)==x(5,:))&(x(3,:)~=x(4,:)))|...
((x(3,:)==x(4,:)==x(5,:))&(x(1,:)==x(2,:))&(x(2,:)~=x(3,:)));
hy=cumsum(y)./[1:N];
eps=sqrt((sum(y.^2)-N*hy(N)^2)/(N-1))/sqrt(N);
fprintf(No paths: %i\n,N);
fprintf(MC estimate: %f\n,hy(N));
fprintf(standard error: %f\n,eps);
Nmin=ceil(N/100);
figure(1);plot([Nmin:N],hy(Nmin:N));
title(MC convergence diagram: probability of full house);
xlabel(No paths);
ylabel(estimate);
4.5.3. Normal random variables. Let Xi , i = 1, 2, 3 be independent random
normal variables with mean i and variance i2 , respectively. Let us compute
the expected value of the difference between the maximum and the minimum
of the Xi . In other words, we want to compute E[Y ], where
Y = max{X1 , X2 , X3 } min{X1 , X2 , X3 }
For this we use the normal random number generator discussed above. A
possible partial matlab code is as follows:
clear;
N=100000;
mu=[0.1 0.3 -0.1];
sigma=[0.1 0.2 0.5];
20
x=randn(3,N);
z=repmat(mu,1,N)+repmat(sigma,1,N).*x;
y=max(z)-min(z);
hy=cumsum(y)./[1:N];
eps=sqrt((sum(y.^2)-N*hy(N)^2)/(N-1))/sqrt(N);
fprintf(No paths: %i\n,N);
fprintf(MC estimate: %f\n,hy(N));
fprintf(standard error: %f\n,eps);
(We have not included the code for plotting the convergence diagram.)
4.5.4. Computing integrals. Our first examples of Monte Carlo simulations
concerned the computation of multiple integrals and, as special cases, areas
and volumes of regions in the plane or space. Let us show how these computations fit into the framework of computing expected values of random
variables.
Let us stick to dimension three for definiteness. Suppose we want to
compute the triple integral
!!!
I=
f (x1 , x2 , x3 ) dx1 dx2 dx3 ,
I = E[f(X1 , X2 , X3 )],
21
the variance of the Monte Carlo estimator is small, explaining the name
variance reduction methods for these techniques.
In these notes we shall cover two variance reduction methods: antithetic
variables and moment matching. There are many more variance reduction
techniques that we have to leave out due to limited space. Some of the more
important ones that we have left out are stratified sampling, importance
sampling, control variates and low-discrepancy sequences (or quasi-Monte
Carlo methods).
4.6.1. Antithetic variables. The method of antithetic variables is based on
the symmetry of the distributions U (0, 1) and N (0, 1):
if Y U (0, 1) then 1 Y U (0, 1).
if Y N (0, 1) then Y N (0, 1).
This can be exploited as follows. Suppose we need to generate 2N samples
of a random variable Y , and that this is done through 2N samples of a
uniform random variable Y U (0, 1), say X = f (Y ) for some function f .
Normally we would use 2N independent samples
y1 , . . . , y2N
of Y and covert these to samples xn = f (yn ) of X.
With antithetic variables, we would instead only use N independent variables
y1 , . . . , y N
and generate the 2N samples of Y as
f (y1 ), f (1 y1 ), f (y2 ), f (1 y2 ), . . . , f (yN ), f (1 yN ).
If the underlying random variable Y was normal rather than uniform,
say Y N (0, 1), then we would generate 2N samples of X as
f (y1 ), f (y1 ), f (y2 ), f (y2 ), . . . , f (yN ), f (yN ).
= N1 N
n=1 yn
1 %N
2
= N 1 n=1 (yn
)2
E[
] = 0
and E[
2 ] = 1.
22
= 0 and
2 = 1.
The idea is now to correct the numbers y1 , . . . , yN to match the first two
moments:
1
zn = (yn
), n = 1, . . . , N
Now use y1 , . . . , yN as the generated samples of N (0, 1), i.e. set xn = f (yn ).
5. Examples from Financial and Actuarial Mathematics
To end these notes we give outline how to apply the Monte Carlo technique
on three different problems in Financial and Actuarial mathematics.
5.1. Option prices in the Black-Scholes model. Let us consider a (simplified) model for the price of a call option on a stock. We have a so called
one-period model with two time points: today (t = 0) and one time unit
(e.g. one year) from now (t = 1). The stock price S0 today is known. The
stock price at t = 1 is a random variable. We assume that it is a log-normal
random variable. This means that
S1 = S0 eY ,
where Y N (r 2 /2, 2 ) is a normal random variable with mean r 2 /2
and variance . Here r is the interest rate and is the volatility of the stock.
We are interested in computing the price of a call option expiring at time
t = 1 and with strike price K. Such an option gives the holder (i.e. the
owner) the right, but not the obligation, to buy one stock at time t = 1 for
the predetermined price K. At time t = 1, the value of this option will be
1
S1 K if S1 > K
C1 = max{0, S1 K} =
0
otherwise.
One can then argue that the market price of the option at time t = 0 (today)
should be
C0 = er [C1 ] = er max{0, S1 K}.
23
Report N , the final MC estimate cN , the standard error &N and plot
a convergence diagram.
The reader should note that there is an exact formula, the celebrated
Black-Scholes formula for the option price C0 . In practice, one would therefore not use Monte Carlo simulations to compute the option prices. However,
it is well known that the Black-Scholes model does not accurately reflect all
the features of typical stock prices. In more advanced (and realistic) models,
it may not be possible to produce an exact formula for the option price and
then numerical methods are crucial. Monte Carlo simulations is one such
method that is easy to implement.
5.2. The collective loss model. Next we study an example from Actuarial
Mathematics. Pretend that you are an insurance company insuring, say,
cars. There are a lot of policy holders (car owners) and each time a policy
holder has an accident, loses his/her car to theft or fire etc, he/she makes a
claim and you have to pay some money. In compensation for this, you receive
money in the forms of insurance premia. How big premia is it reasonable to
collect?
A general way to look at this problem is to say that the total loss S, i.e.
the total amount of money that you (the insurer) must pay out during, say, a
year, is a random variable. Definitely, you would like to set the premium so
that the total premium collected is at least the expected value E[S], or else
you be in the red on average. In practice, premia are higher than that. Let
us for simplicity say that the total premium collected is twice the expected
value E[S]. Let us ask what the probability is that the collected premia
exceeds the total (or aggregate) loss.
To attack this problem, we must know the distribution of the aggregate
loss S (during a fixed time period, say a year). A common model for S is
the so called collective risk model, in which
S = X1 + X2 + + XN ,
where N is the total number of accidents (the frequency) and Xj is what you
have to pay to the insured in accident j. Here both N and Xj are random
variables. We assume that:
the random variables X1 , X2 ,. . . are i.i.d. ., so they have the same
distribution as a random variable X called the severity;
N and X1 , X2 ,. . . are all independent.
One can then show that
E[S] = E[N ]E[X],
and we are interested in computing the probability
P {S > 2E[S]}.
24
fX (x) =
,
(x + )
for some constants > 0 and > 1. Then we have
1
y = FX (x) = 1
x=
1 .
(x + )
(1 y)1/
To obtain a Monte Carlo estimate for the probability P {S > 2E[S]} we
proceed as follows:
Pick a large number M (we do not call it N this time. . . )
Generate M independent samples n1 , n2 , . . . , nM from the Poisson
distribution.
Generate n1 + + nm independent samples
xmj , 1 m M, 1 j nm
M
.
j=1
Report M , the final Monte Carlo estimate zM , the standard error
&M and draw a convergence diagram.
5.3. The binomial tree model. Finally we consider another model for
the evolution of stock prices: the binomial tree model. In this model, time
starts at t = 0 and ends at some time t = T in the future. In between, there
are M time steps, so time is discrete,
t = 0, t, 2t, . . . , M t = T.
We consider a market with a single stock. The price of the stock at time
mt is denoted by Sm . It is a random variable. Given its value Sm at time
mt, its value Sm+1 is equal to
1
Sm u with probability pu
Sm+1 =
Sm d with probability pd
25
5.3.1. Calibration. We wish to use the binomial tree to model the behavior
of a real stock. How should we pick the parameters u, d, pu and pd . Basically
we need four conditions to nail down these four unknown. One condition is
(1)
pu + pd = 1,
2 2
Var[Sm+1 |Sm ] = Sm
t,
where r is the interest rate and is the volatility of the stock (assumed
constant). This leads to the equations
(2)
(3)
pu u + pd d = 1 + rt
pu (u (1 + rt)) + pd (1 + rt d)2 = 2 t
2
ud = 1
if SM K
0
CM = 0
if Sm B for some m, 0 < m M
SM K otherwise
26