DSC4821 Full Study Guide

Department of Decision Sciences
STOCHASTIC MODELLING
DSC4821
Study guide
University of South Africa

Contents
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
1 Introduction, syllabus and assessment criteria 1
2 Probability 7
2.1 LESSON 1: Basic concepts of probability theory . . . . . . . . . . . 7
2.2 LESSON 2: Study solved exercises in Hsu (2014), chapter 1 . . . . 10
2.3 LESSON 3: Test your understanding with supplementary problems 10
3 Random variables 11
3.1 LESSON 4: Definition and basic properties of random variables . . 11
3.3 LESSON 6: Test your understanding with the supplementary prob-
lems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 LESSON 7: Exercises on chapters 1 and 2 . . . . . . . . . . . . . . 16
4 Multiple random variables 25

4.1 LESSON 8: Definitions and properties of random vectors . . . . . . 25
lems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Functions of random variables and limit theorems 28

5.1 LESSON 11: Functions of random variables . . . . . . . . . . . . . 28
iii
lems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 LESSON 14: Exercises on chapters 3 and 4 . . . . . . . . . . . . . . 31
6 Estimation theory 39
6.1 LESSON 15: Estimation theory . . . . . . . . . . . . . . . . . . . . 39
7 Stochastic processes 40
7.1 Lesson 16: Generating random numbers using R . . . . . . . . . . . 40
7.2 LESSON 17: Basic properties and examples . . . . . . . . . . . . . 42
7.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2.2 Some important classes of stochastic processes. . . . . . . . 43
7.2.3 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . 45
7.3 LESSON 18: Markov chains and applications to queueing theory . . 47
7.4 LESSON 19: Martingales . . . . . . . . . . . . . . . . . . . . . . . . 47
7.5 LESSON 20: Brownian motion . . . . . . . . . . . . . . . . . . . . 48
7.5.1 Random walk . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.5.2 Brownian motion or Wiener process . . . . . . . . . . . . . . 49
7.5.3 Some properties of Brownian motion . . . . . . . . . . . . . 53
7.5.4 First passage time and maximum of Brownian motion . . . . 54
7.5.5 Simulation of Brownian motion in R . . . . . . . . . . . . . 55
7.5.6 Martingale property of Brownian motion . . . . . . . . . . . 56
7.5.7 Geometric Brownian motion . . . . . . . . . . . . . . . . . . 57
7.6 LESSON 21: Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.7 LESSON 22: More exercises . . . . . . . . . . . . . . . . . . . . . . 59
8 A brief introduction to stochastic calculus 75

8.1 LESSON 24: Stochastic integral . . . . . . . . . . . . . . . . . . . . 75
8.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.1.2 Some properties of stochastic integration . . . . . . . . . . . 79
8.1.3 Itô’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 LESSON 25: A brief note on stochastic differential equations . . . . 86
8.3 LESSON 26: Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 88
iv
v
Chapter 1
Introduction, syllabus and

assessment criteria
A model is a representation of a system and modelling refers to the process of

of producing a model. The main purpose of building a model is to gain a better
understanding of the behaviour of system of study in order to be able to answer
certain questions pertinent to the system, to predict the effect of changes to the
system resulting from changes in the input, etc.
Deterministic models are specified by a set of equations that describe exactly

how the system behaves and how it evolves over time. There is no element of
randomness in such models. For stochastic models, the evolution of the system is
random (fully or partially). Such models contain random variables that help to
model the uncertainty in the system. If a deterministic model is run repeatedly
with the same initial conditions, it always yields the same result. This is not the
case for a stochastic model. If a stochastic model is run several times, it will not
give identical results. Most stochastic models are given in the form of a stochastic
process, which is simply a family of random variables {X(t) : t ∈ T }, where T is
an index set. The concepts “random variable” and “stochastic process” are central
in stochastic modelling.
In these notes, we mainly work through the following book:
1
Hsu, HP. 2014. Probability, random variables and random processes. 3rd edition.
New York: McGraw-Hill Education.
The purpose of these notes is to guide you through the prescribed book by Hsu in
order to understand this module.
Purpose of the module: Students who have completed this module will be able
to identify and apply relevant concepts and techniques or methods of probability
theory and stochastic processes to construct, design, analyse and solve mathe-
matical models representing real-life problems involving uncertainty that arise in
the fields of operations research, financial modelling, data science and decision
sciences.
Syllabus
• Axiomatic probability theory
• Random variables
• Expectation, conditional expectation and moments
• Conditional distribution
• Multivariate distributions
• Functions of random variables
• Moment-generating functions and characteristic functions
• Laws of large numbers and the central limit theorem
• Basic properties of stochastic processes
• Markov chains, birth-and-death processes
• Poisson processes
• Brownian motion
• Martingales: definitions, conditional expectations and filtrations
2
• Stopping times and the optional stopping theorem for martingales
• Estimation theory
• Applications to queueing theory, reliability theory and renewal theory.
Further reading
• Basics of stochastic calculus: Applications to financial modelling
Computer requirement: We use simulation to obtain sample paths of stochas-

tic processes. For this purpose, basic programming skills in R will be necessary.
It is important to download and install R on your computer.
Specific outcomes and assessment criteria
Outcome 1: Apply the fundamental principles of probability theory and tech-

niques from algebra and calculus to calculate probabilities, characterise ran-
dom variables, and model and solve practical problems involving random
variables.
Assessment criteria
– calculating basic probabilities
– computing probabilities and moments by conditioning
– applying Bayes’ theorem to compute probabilities
– explaining the concepts: expectation and conditional expectation, and

describing their properties
– identifying and manipulating the probability mass functions or density

functions of classical discrete or continuous random variables
– modelling and solving practical problems where these distributions are

involved
3
Outcome 2: Calculate probabilities relevant to multivariate distributions and
distributions of functions of random variables, and model and solve practical
problems involving these distributions.
Assessment criteria
– calculating the joint distribution functions of jointly distributed random

variables.
– deriving conditional means and conditional variances
– deriving probability-generating functions, moment-generating functions

and characteristic functions of random variables
– applying the concept moment-generating functions to derive moments

of random variables
– modelling and solving practical problems involving multivariate distri-

butions and functions of random variables
Outcome 3: Apply the law of large numbers and the central limit theorem to
solve problems related to sums of random variables.
Assessment criteria
– explaining the law of large number and the central limit theorem
– deriving the distribution of certain sums of random variables using lim-

iting theorems
– applying these basic limit theorems of probability theory and key in-
equalities in probability theory to solve practical problems
Outcome 4: Formulate basic stochastic process models and analyse such models
qualitatively and quantitatively.
Assessment criteria
4
– classifying a stochastic process with respect to the concepts stationarity,
wide-sense stationarity, ergodicity and independent increments
– defining a Brownian motion and identifying its basic properties
– calculating probabilities related to joint distributions of stochastic pro-

cesses
– proving that a given process is either a Poisson process, a Brownian

motion, a Markov process or a martingale
– describing the basic properties of any of these stochastic processes
Outcome 5: Apply Markov chains, counting processes and Poisson processes to

construct models for practical problems, and apply theoretical properties of
these processes to solve these problems.
Assessment criteria
– calculating transition probabilities and high-order probabilities of Markov

chains
– modelling a real-life problem as a Markov chain and identifying its states

and transition probabilities
– calculating and interpreting the limiting distributions of a Markov chains
– characterising the arrival process and the inter-arrival process of a Pois-

son process
– modelling a real-life problem as renewal process or Poisson process
– applying the theory of queueing models to model real life problems,

analysing the corresponding models and formulating appropriate solu-
tions
Outcome 6: Apply classical methods for evaluating the performance of estima-

tors and the properties of desirable estimators to solve practical problems in
estimation theory.
5
Assessment criteria
– defining unbiased, efficient and consistent estimators
– computing maximum likelihood estimator, Bayes’ estimator, mean square

estimator and linear mean square estimator
– applying these estimators to solve practical problems in estimation the-

ory
6
Chapter 2
Probability
(This chapter in these notes corresponds to chapter 1 in the book by Hsu (2014).)
2.1 LESSON 1: Basic concepts of probability

theory
Study chapter 1 in Hsu (2014). Pay careful attention to the key concepts and
properties. Here the basic concepts of probability are clearly presented. You
should be able to define the following concepts:
• Sample sample: This is the set of all possible outcomes of a random ex-
periment. Understand the given examples.
• Event: It is a subset of the sample space. Example: Toss a coin twice. The
sample space is
S = {HH, HT, T H, T T }.
An example of an event is: “The first coin yields H.” This is the event
E = {HH, HT }. Because events are subsets, it is important to remember the
basic concepts and operations of set theory, such as “union”, “intersection”,
“complement”, “null set” and “disjoint sets”. Figure 1-1 is helpful.
7
Note:
∩ni=1 Ai = A1 ∩ A2 ∩ . . . ∩ An
and
∪ni=1 Ai = A1 ∪ A2 ∪ . . . ∪ An .
• De Morgan’s laws: The complement of A ∪ B is equal to A ∩ B and the

complement of A ∩ B is equal to A ∪ B. This is easily extended to unions
and intersections of any number of subsets.
• Event space or σ-algebra: This is a fundamental concept in probability

theory. Here S is the sample space (the set of all possible outcomes) and F
is a class or collection of some subsets of S. The class F is called a σ-algebra
or an event space if the following three properties are satisfied:
(i) The set S is itself an element of F and the empty set ∅ is also an
element of F . That is
S ∈ F and ∅ ∈ F .
(ii) If a subset A of S is in F , then its complement A in S is also in F .

That is:
A ∈ F ⇒ A ∈ F.
(iii) The class F is closed under countable union (and also countable inter-
section). That is, if A and B are subsets of S that are in F , then their
union A ∪ B and their intersection A ∩ B are also in F . Moreover, this
is also true for any sequence A1 , A2 , . . . , An , . . . of subsets of S that are
in F :
i=1 Ai = A1 ∪ A2 ∪ . . . ∪ An ∪ . . . ∈ F
∪∞
and
i=1 Ai = A1 ∩ A2 ∩ . . . ∩ An ∩ . . . ∈ F .
∩∞
8
The smallest example of a σ-algebra on an event space S is
F = {S, ∅}.
Note that in Example 1.6 in Hsu (2014), S = {H, T }, the classes
{S, ∅} and {S, ∅, {H}, {T }}
are event spaces but

{S, ∅, {H}}
is not an event space. Why?
• Probability measure: Assume that S is a sample space and F is an

event space (or σ-algebra) on S. Then a probability measure on (S, F ) is a
function P : F → [0, 1] satisfying the following conditions:
Axiom 1: P(∅) = 0 and P(S) = 1.
Axiom 2: For any A ∈ F , P(A) = 1 − P(A).
Axion 3: For any sequence A1 , A2 , . . . , An , . . . of subsets of S that are in F

and that are mutually exclusive (that is Ai ∩ Aj = ∅ for all i 6= j), we
have that
P(A1 ∪ A2 ∪ . . . ∪ An ∪ . . .) = P(A1 ) + P(A2 ) + . . . + P(An ) + . . .
or ∞
X
P (∪∞
i=1 Ai ) = P(Ai ).
i=1
• Probability space: A probability space is a triple (S, F , P) where S is a

sample space, F is an event space (or a σ-algebra) and P is a probability
measure on (S, F ).
• Conditional probability: Understand the definition of conditional proba-

bility and Bayes’ rule.
9
• Independent events: Two events A and B are independent if
P(A ∩ B) = P(A)P(B).
This implies that

P(A|B) = P(A).
2.2 LESSON 2: Study solved exercises in Hsu

(2014), chapter 1
Study solved exercises in Hsu (2014), chapter 1. The solved exercises have been
designed to help you understand the theoretical concepts and their properties.
Study all the problems and their solutions. You may skip problems 1.71 to 1.73.
2.3 LESSON 3: Test your understanding with

supplementary problems
Solve the supplementary problems in Hsu (2014), chapter 1. Use the hints given at
the end of the chapter and compare your solutions to those provided in the book.
10
Chapter 3
Random variables
Random variable is a key concept in probability theory and its applications.
3.1 LESSON 4: Definition and basic properties

of random variables
Study chapter 2 in Hsu (2014). Understand the definitions and important proper-
ties. From now on we will work in an underlying probability space (S, F , P) even
if we do not always specify it.
• The Borel σ-algebra: Recall that a σ-algebra on a set S is a family F of

subsets of S satisfying the following conditions:
1. S ∈ F .
2. For any A ∈ F, its complement A in S is also an element of F , that

is, A ∈ F implies A ∈ F .
3. For any sequence A1 , A2 , . . . , An , . . . of subsets of S that are in F , the

union
A1 ∪ A2 ∪ . . . ∪ An ∪ . . . is also in F . That is, if An ∈ F for all
n=1 An ∈ F .
n = 1, 2, . . ., then ∪∞
11
On the set R consider all the open intervals (a, b) with a < b. The smallest
σ-algebra containing all these open intervals is called the Borel σ-algebra
on R. We denote it by B. The σ-algebra B is very large. Examples of sets
that are in B are:
R, ∅, (a, b) for all a < b; (−∞, a]∪[b, ∞); [a, b]; (a, b];{a}; [a, ∞) and (−∞, a).
Clearly then, if F is another σ-algebra on R containing all the open intervals

of R, then B is a subset of F . A subset A in R is called a Borel set if
A ∈ B. For example, the interval [−1, 2] is a Borel set, the sets [3, +∞), N,
{1, 2, 3, 5, 7, 10} are also Borel sets.
• Definition of a random variable: A random variable X on a probability

space (S, F , P) is a function from S into the set of real numbers R; that is,
X : S → R, ω 7→ X(ω) satisfying the following property: for every subset A
of R such that A ∈ B (the Borel σ-algebra of R) the inverse image
X −1 (A) = {ω ∈ S : X(ω) ∈ A}
is in the σ-algebra F ; that is, A ∈ B implies X −1 (A) ∈ F .

If we have a random variable X : (S, F , P) → (R, B) then the events of the
form
X is less or equal than a
(for a real number a) are in the σ-algebra F . That is
{X ≤ a} ∈ F .
And then we can consider its probability P({X ≤ a}). The same applies to
all other Borel sets (e.g. [a, b] and (a, +∞)).
Note that the event
{X ∈ [a, b]} means {ω ∈ S : X(ω) ∈ [a, b]}
and it is an element of F , while [a, b] is an element of B. Also, {X = a} is

equal to {ω ∈ S : X(ω) = a}.
12
Remark: In most cases when the sample space S is finite and has say n
elements, the σ-algebra considered on S is simply its power set P(S) (i.e.
the class consisting of all subsets of S). In this case any function X : S → R
is always a random variable; however when S is the whole real line R or an
interval such as [a, b] or [a, ∞) one must carefully check if indeed the function
X satisfies the condition of the definition before considering X as a random
variable.
Remark: Also for a finite sample space S it is natural to consider the

underlying probability measure that assigns the value 1/n to each element
of S; that is, if S = {ω1 , ω2 , . . . , ωn }, then
1
P({ω1 }) = P({ω2 }) = . . . = P({ωn }) = .
n
(Here we say that all elementary events {ωi } are equally likely.)
• Understand Example 2.2 in Hsu (2014): Here a fair coin is tossed three
times. (By “fair coin” we mean that the two sides are equally likey to
appear.) The sample space is
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.
Assume all the elementary events are equally likely. Here S is finite and then
we take F to be the whole power set of S (every subset of S is in F ). Let
X : S → R be the mapping where X(ω) is the number of heads of ω for all
ω ∈ S. Then X is a random variable. We can compute the probability that
X = 2. That is, P{X = 2}. The question can be reformulated as follows: In
an experiment of tossing a fair coin three times, what is the probability of
obtaining two heads?
Note that
{X = 2} = {ω ∈ S : X(ω) = 2} = {HHT, HT H, T HH}
13
and then
P{X = 2} = 3/8.
You can now find P{X < 2}.
• Distribution functions: Given a random variable X : S → R, the distri-

bution function of X (also called the cumulative distribution function of X
[cdf]) is the function:
FX : R → [0, 1] defined by FX (x) = P{X ≤ x} for all x ∈ R.
Some properties are given in Hsu (2014). Note that
P{X > a} = 1 − FX (a) and P(a < X ≤ b) = FX (b) − FX (a)
and
lim FX (x) = 1.
x→+∞
• Understand the concepts: discrete random variable, continuous random

variable, probability mass function, probability density function, mean and
variance. These are clearly explained in Hsu (2014).
• Special distributions. It is crucial to understand the special distribu-

tions as described in Hsu. After studying this section, you should be able
to give the probability mass function or probability density function, the
mean and variance, and the possible applications of the following distribu-
tions: Bernoulli distribution, binomial distribution, geometric distribution,
negative binomial distribution, Poisson distribution, discrete uniform distri-
bution, continuous uniform distribution, exponential distribution, Gamma
distribution and normal distribution.
14
(2014), chapter 2
Study the solved exercises in Hsu (2014), chapter 2. These solved exercises have
been designed to help you understand the theoretical concepts and their properties.
Study all the problems and their solutions.

the supplementary problems
Solve the supplementary problems of chapter 2 in Hsu (2014). Use the hints given
at the end of the chapter and compare your solutions to those provided in the
book.
15
3.4 LESSON 7: Exercises on chapters 1 and 2
After studying chapters 1 and 2 in Hsu (2014) (i.e. after completing lessons 1– 6),
do the following exercises:
Question 1 Five people are sitting at a table in a restaurant. Two of them order
coffee and the other three order tea. The waiter forgot who ordered what
and hands the drinks in a random order to the five persons. Specify an
appropriate sample space and determine the probability that each person
gets the correct drink.
Question 2 In a high school class, 35% of the students take Spanish as a foreign
language, 15% take French as a foreign language, and 40% take at least one
of these languages. What is the probability that a randomly chosen student
takes French given that the student takes Spanish?
Question 3 An oil explorer performs a seismic test to determine whether oil is

likely to be found in a certain area. The probability that the test indicates the
presence of oil is 90% if oil is indeed present in the test area. The probability
of a false positive is 15% if no oil is present in the test area. Before the test
is done, the explorer believes that the probability of presence of oil in the
test area is 40%. Use Bayes’ rule to revise the value of the probability of oil
being present in the test area given that the test gives a positive signal.
Question 4 Suppose that in a certain US state, 55% of registered voters are Re-
publicans, 35% are Democrats and 10% are Independents. When these vot-
ers were surveyed about increased military spending, 25% of the Republicans
opposed it, 70% of the Democrats opposed it and 55% of the Independents
opposed it. (a) What is the probability that a randomly selected voter in
this state opposes increased military spending? (b) A registered voter from
that state writes a letter to the local paper arguing in favour of increased
military spending. What is the probability that this voter is a Democrat?
Question 5 100 patients with gastroesophageal reflux disease are treated with
new a new drug to relieve pain from heartburn. Following treatment, the
time Ti , until the patient i next experiences pain from heartburn is recorded.
16
It is assumed that times are independent and identically distributed, with
the probability density function given by
(
λe−λt for t > 0
fTi (t) =
0 otherwise.
(a) The doctor treating the patients thinks that there is a 50% chance that
a random patient will stay pain free for a least 21 days. Show that the
value of λ on which the doctor’s judgment is based is 0.0330.
(b) Using
1 1
E(Ti ) = and Var(Ti ) = 2
λ λ
find the expectation and variance of the average
100
1 X
T = Ti .
100 i=1
Question 6 It is known that if Y is a random variable with the Poisson distribu-

tion of parameter λ and if λ is sufficiently large, then the random variable
Y −λ
√
λ
can be approximated by the standard normal distribution (of mean 0 and

variance 1). Then for all a < b in R,

a−λ Y −λ b−λ
P (a < Y < b) = P √ < √ < √
λ λ λ

a−λ b−λ
≈ P √ <Z< √
λ λ
where Z is a random variable following the standard normal distribution.
Now use this approximation to answer the following question.
An insurance company issues 1 250 vision care insurance policies. The num-
ber of claims filed by a policyholder under a vision care insurance policy
17
during one year is a Poisson random variable with mean 2. Assume the
numbers of claims filed by different policyholders are mutually independent.
Calculate the approximate probability that there is a total of between 2 450
and 2 600 claims during a one-year period.
Hint: Use the fact that the total number of claims follows a Poisson distri-
bution of mean λ = 2 500.
Question 7 A mail order company receives telephone orders at a constant rate of

three orders per hour. Assume that the number of telephone orders follows
a Poisson distribution with parameter λ = 3.
(a) What is the probability that the number of telephone orders is more
than one in one hour?
(b) Let S be the total number of telephone orders in a 75-hour period.

Assuming that the numbers of telephone orders in each hour are inde-
pendent, what is the expectation and variance of S?
(c) Use Chebyshev’s inequality (see relation (2.116) in Hsu (2014, p. 86))
to give a lower bound of P (200 ≤ S ≤ 250). Use
P (200 ≤ S ≤ 250) = 1 − P (|S − 225| ≥ 25).
Question 8 A factory uses a diagnosis test to determine whether a part is defec-

tive or not. This test has a 0.90 probability of giving a correct result when
applied to a defective part and a 0.05 probability of giving an incorrect result
when applied to a non-defective part. It is believed that one out of every
1 000 parts will be defective.
(a) Calculate the posterior probability that a part is defective if the test
says it is defective.
(b) Calculate the posterior probability that a part is non-defective if the

test says it is non-defective.
(c) Calculate the posterior probability that a part is misdiagnosed.
18
Question 9 Let Ω = {1, 2, 3, 4} and consider the set
A = {∅, Ω, {1}, {2, 3}}.
(a) Show that A is not an event space (or a σ-algebra).
(b) Find a subset E of Ω such that
{∅, Ω, {1}, {2, 3}, E}
is an event space (or a σ-algebra).
Question 10 Let S = {1, 2, 3, 4, 5, 6, 7, 8} be a sample space of an experiment,

and assume that each element of S is equally likely to occur. Define the
events:
A1 = {1, 3, 5, 7}, A2 = {1, 2, 3, 4} and A3 = {6, 7, 8}.
(a) Give the probabilities of each of the events A1 , A2 and A3 .
(b) Are the following true of false? Give reasons for your answers.
(i) A1 and A2 are independent.
(ii) A1 and A3 are independent.
(iii) A2 and A3 are independent.
(c) Define a random variable X by saying that if the observed outcome of

the experiment is s, then the value of X is (s − 4)2 .
(i) Tabulate the probability mass function of the random variable X.
(ii) Give the mean and variance of the random variable X.
Question 11 A radioactive source is monitored for an hour and the number of

detected emissions from it, denoted n, is counted. The source is known to be
one of two substances A or B. If the source is A, then the theory says that
the number of emissions should have a Poisson distribution with parameter
3.1. If the source is B, then the theory says that the number of emissions
should have a Poisson distribution with parameter 4.9. It is known that the
19
probability that the source is substance A is p.
(a) Give the conditional probability that n emissions are detected given
that the source is substance A.
(b) Give the (unconditional) probability that n emissions are detected.
(c) Show that the conditional probability that the source is substance A
given that n emissions are detected denoted p(A|n) is given by
1
p(A|n) = 1−p −1.8
.
4.9 n
1+ p
e 3.1
(d) Determine also p(B|n). Under which condition on p is
p(B|n) ≥ p(A|n)
for all possible observations n?
Question 12 Suppose that X and Y are random variables where both X 2 and
Y 2 have finite expectations; that is,
E(X 2 ) < ∞ and E(Y 2 ) < ∞.
Consider the random variable
g(t) = (X + tY )2 , where t is a real number.
(a) Develop E(g(t)) and using the obvious fact that E(g(t)) ≥ 0 for all
t ∈ R, show that
p p
|E(XY )| ≤ E(X 2 ) E(Y 2 ).
(This is the celebrated Cauchy–Schwarz inequality, also known as

the Cauchy–Bunyakovsky–Schwarz inequality.)
20
(b) Deduce from (a) that if E(X 2 ) if finite, so is E(X), and in fact
|E(X)|2 ≤ E(X 2 ).
(c) Deduce from (b) that if X is a random variable having a finite mean µ,
E(X 2 ) is finite if and only if Var(X) is finite; that is,
E(X 2 ) < ∞ ⇐⇒ Var(X) < ∞.
Moreover, deduce from (b) that
(E(|X − µ|))2 ≤ Var(X).
Question 13 A coin, having probability p of landing heads, is flipped until a

head appears for the nth time. Let N denote the number of flips required.
Calculate E(N ) and Var(X). In particular, calculate these two quantities
for n = 5 and p = 1/2.
Question 14 Each customer in a shop, independently of other customers, spends

an amount which is given by a random variable with mean R240 and standard
deviation R80. Let S be the amount spent by the first 100 customers to visit
the shop.
(a) Give the mean and standard deviation of S.
(b) Use Chebyshev’s inequality to give a lower bound for
P (23 000 ≤ S ≤ 25 000).
(c) Assume that the amount spent by each customer follows the normal
distribution. Compute the same probability
P (23 000 ≤ S ≤ 25 000).
Question 15 An auto insurance company has 10 000 policyholders. Each policy-
21
holder is classified as
(i) young or old
(ii) male or female
(iii) married or single
Of these policyholders, 3 000 are young, 4 600 are male, and 7 000 are mar-
ried. The policyholders can also be classified as 1 320 young males, 3 010
married males, and 1 400 young married persons. Finally, 600 of the poli-
cyholders are young married males. Calculate the number of the company’s
policyholders who are young, female and single and calculate the probability
that a randomly selected policyholder is young, female and single. Hint:
Draw Venn diagrams.
Question 16 In modelling the number of claims filed by an individual under an

automobile policy during a three-year period, an actuary makes the simpli-
fying assumption that for all integers n ≥ 0,
p(n + 1) = 0.2p(n)
where p(n) represents the probability that the policyholder files exactly n
claims during the period. Under this assumption, calculate the probability
that a policyholder files more than one claim during the period.
Hint: Show that for all integer n ≥ 0,
p(n) = (0.2)n p(0)
and the obvious fact that ∞

X
p(n) = 1.
n=0
Question 17 An auto insurance company insures drivers of all ages. An actuary

compiled the following statistics on the company’s insured drivers:
22
Age of Probability Portion of Company’s
Driver of Accident Insured Drivers
18–20 0.06 0.08
21–30 0.03 0.15
31–65 0.02 0.49
66–99 0.04 0.28
A randomly selected driver that the company insures has an accident. Cal-
culate the probability that the driver was aged 18 to 20.
Question 18 Upon arrival at a hospital’s emergency room, patients are cate-

gorised according to their condition as critical, serious, or stable. In the past
year:
(i) 10% of the emergency room patients were critical.
(ii) 30% of the emergency room patients were serious.
(iii) The rest of the emergency room patients were stable.
(iv) 40% of the critical patients died.
(v) 10% of the serious patients died.
(vi) 1% of the stable patients died.
Given that a patient survived, calculate the probability that the patient was
categorised as serious upon arrival.
Question 19 The number of injury claims per month is modelled by a random

variable N with
1
P (N = n) = , n = 0, 1, 2, . . .
(n + 1)(n + 2)
(Note that
∞
X 1
=1
n=0
(n + 1)(n + 2)
and so the probability is well defined.)

Calculate the probability of at least one claim during a particular month,
23
given that there have been at most four claims during that month.
Question 20 The lifetime of a printer costing R2 000 is exponentially distributed

with mean 2 years. The manufacturer agrees to pay a full refund to a buyer if
the printer fails during the first year following its purchase, a one-half refund
if it fails during the second year, and no refund for failure after the second
year.
Calculate the expected total amount of refunds from the sale of 100 printers.
Hint: The expected cost for a sale of one printer is:
2 000P (X ≤ 1) + 1 000P (1 ≤ X ≤ 2) + 0P (X ≥ 2)
where X is the lifetime of the printer.
24
Chapter 4
Multiple random variables
This chapter in these notes corresponds to chapter 3 in Hsu (2014).
4.1 LESSON 8: Definitions and properties of

random vectors
Understand the definitions and properties of multiple random variables or random

vectors.
We consider our underlying probability space (S, F , P) and the Borel σ-algebra
B on R. We also consider the Borel σ-algebra on R2 and Rn in general. Borel
σ-algebra on R2 is the smallest σ-algebra on R2 containing subsets of the form:
(a, b) × (c, d) = {(x, y) : a < x < b and c < y < d}
for all a < b and c < d in R. This implies in particular that if A and B are
Borel subsets of R, then A × B is a Borel subset of R2 . Other examples of Borel
subsets of R2 are: the unit disc D = {(x, y) ∈ R2 : x2 + y 2 ≤ 1} and the circle
{(x, y) ∈ R2 : x2 + y 2 = 1}.
25
• A bivariate random variable (X, Y ) is a function
(X, Y ) : S → R2 , ω 7→ (X(ω), Y (ω)) ∈ R2
such that for each Borel set A of R2 , the subset of S given by
{ω ∈ S : (X(ω), Y (ω)) ∈ A}
is in the σ-algebra F . In this definition X and Y can each be considered as

random variable.
• Joint distribution function: If (X, Y ) is a bivariate random variable, the

joint cumulative distribution function of the two random variables X and Y
is the function
FXY : R2 → R, (x, y) 7→ FXY (x, y) = P(X ≤ x and Y ≤ y).
The event P(X ≤ x and Y ≤ y) is generally denoted P(X ≤ x, Y ≤ y). Then
FXY (x, y) = P(A ∩ B) where A = {X ≤ x} and B = {Y ≤ y}.
• Independent random variables: The random variables X and Y are

independent if
FXY (x, y) = FX (x)FY (y) for all x, y ∈ R.
This means that for all x, y ∈ R,
P(X ≤ x, Y ≤ y} = P(X ≤ x) × P(Y ≤ y).
• Understand the following concepts: Joint probability mass function,

joint probability density function, marginal density functions, conditional
probability mass function, conditional density function, covariance, correla-
tion coefficient, conditional mean and conditional variance.
26
• N-variate random variable: The notion of a bivariate random variable is
easily extended to n-variate ranom variate. Here we have n random variables
(X1 , X2 , . . . , Xn ). The joint cdf is
FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = P(X ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn ).
• Special distribution: One of the most important bivariate distributions

is the bivariate normal distribution. The same applies for the n-variate
distributions. Understand these distributions.

(2014), chapter 3
Study solved exercises in Hsu (2014), chapter 3. These solved exercises have been
designed to help you understand the theoretical concepts and their properties.
Study all the problems and their solutions. You may skip Exercises 3.21, 3.24,
3.46 and 3.54.

27
Chapter 5
Functions of random variables

and limit theorems
5.1 LESSON 11: Functions of random variables
• Function of a random variable: Given a random variable X : S →

R, and a function g : R → R, the function g(X) : S → R defined by
g(X)(ω) = g(X(ω)) is also a random variable provided that the function
g satisfies the following technical property: For any Borel subset A of R,
g −1 (A) = {x ∈ R : g(x) ∈ A} is also a Borel subset of R.
• Function of two or more random variables: The construction above

extends easily to bivariate and n-variate random variables. If (X, Y ) is a
bivariate random variate and g : R × R → R is a function of two variables,
then g(X, Y ) is again a random variable (under some technical restrictions
g as before).
• Two functions of two random variables: Here we have a bivariate ran-

dom variable (X, Y ) and two functions g : R × R → R and h : R × R → R.
28
Set
Z = g(X, Y ) and W = h(X, Y ).
Then (Z, W ) is also a bivariate random variable.
• Understand how to determine the pdf and cdf of g(X) and g(X, Y ), and
the joint pdf and cdf of (Z, W ). This extends to n-variate random variables.
• Understand how to compute E(g(X)) from the pdf of X.
• Jensen’s inequality is very important: If g : R → R is a convex function

and X is a random variable, then
E[g(X)] ≥ g(E[X]).
• The Caucy–Schwarz inequality is also very important: If (X, Y ) is a

bivariate random variable such that both E(X 2 ) and E(X 2 ) are finite, then
p p
E(|XY |) ≤ E(X 2 ) E(Y 2 ).
• Understand the concepts: probability–generating function, moment–generating

function and characteristic function.
• Understand the weak law of large numbers, the strong law of large numbers
and the central limit theorem.

(2014), chapter 4
Study the solved exercises in Hsu (2014), chapter 4. These solved exercises have
been designed to help you understand the theoretical concepts and their properties.
Study all the problems and their solutions. You may skip Exercises 4.35, 4.37 and
4.43.
29
30
5.4 LESSON 14: Exercises on chapters 3 and 4
After studying chapter 3 and chapter 4 in Hsu (2014) (i.e. after completing lessons
8–14), do the following exercises:
Question 1 An insurance company sells automobile liability and collision insur-

ance. Let X denote the percentage of liability policies that will be renewed
at the end of their terms and Y the percentage of collision policies that will
be renewed at the end of their terms. X and Y have the joint cumulative
distribution function
(
xy(x+y)
2 000 000
, 0 ≤ x ≤ 100, 0 ≤ y ≤ 100
F (x, y) =
0 otherwise.
Calculate the variance of X.
Question 2 A hurricane policy covers both water damage, X, and wind damage,
Y , where X and Y have the joint density function
f (x, y) = 0.13e−0.5x−0.2y − 0.06e−x−0.2y − 0.06e−0.5x−0.4y + 0.12e−x−0.4y

for x > 0, y > 0 and f (x, y) = 0 otherwise.
Calculate the standard deviation of X.
Question 3 At the start of a week, a coal mine has a high-capacity storage bin
that is half full. During the week, 20 loads of coal are added to the stor-
age bin. Each load of coal has a volume that is normally distributed with
mean 1.50 cubic metres and standard deviation 0.25 cubic metres. During
the same week, coal is removed from the storage bin and loaded into 4 rail-
road cars. The amount of coal loaded into each railroad car is normally
distributed with mean 7.25 cubic metres and standard deviation 0.50 cubic
metres. The amounts added to the storage bin or removed from the storage
bin are mutually independent.
Calculate the probability that the storage bin contains more coal at the end
of the week than it had at the beginning of the week.
31
Hint: Let K be the amount of coal in the storage bin at the beginning
of the week, X the total amount of coal added to the storage bin and Y
the amount of coal removed from the storage. You are asked to compute
P {K + X − Y > K}, that is, P {X − Y > 0}. Use the fact that the
sum of independent normally distributed random variables is also a nor-
mally distributed random variable (with appropriate mean and variance).
2
More precisely, if X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ), and X and Y are
independent, then
2
X + Y ∼ N (µX + µY , σX + σY2 ).
The same applies for X − Y .
Question 4 In a group of 15 health insurance policyholders diagnosed with can-

cer, each policyholder has probability 0.90 of receiving radiation and prob-
ability 0.40 of receiving chemotherapy. Radiation and chemotherapy treat-
ments are independent events for each policyholder, and the treatments of
different policyholders are mutually independent. The policyholders in this
group all have the same health insurance that pays 2 for radiation treatment
and 3 for chemotherapy treatment.
Calculate the variance of the total amount the insurance company pays for
the radiation and chemotherapy treatments for these 15 policyholders.
Question 5 The annual profits that company A and company B earn follow a
bivariate normal distribution.
Company A’s annual profit has mean 2 000 and standard deviation 1 000.
Company B’s annual profit has mean 3 000 and standard deviation 500.
The correlation coefficient between these annual profits is 0.80.
Calculate the probability that company B’s annual profit is less than 3 900,
32
given that company A’s annual profit is 2 300.
Question 6 The number of minor surgeries, X, and the number of major surg-
eries, Y , for a policyholder this decade have the joint cumulative distribution
function
F (x, y) = 1 − (0.5)x+1 1 − (0.2)y+1

for non–negative integers x and y.

Calculate the probability that the policyholder experiences exactly three
minor surgeries and exactly three major surgeries this decade.
Question 7 Every day, the 30 employees at an auto plant each has probability
0.03 of having one accident and zero probability of having more than one
accident. Given there was an accident, the probability of it being major is
0.01. All other accidents are minor. The numbers and severities of employee
accidents are mutually independent. Let X and Y represent the numbers
of major accidents and minor accidents, respectively, occurring in the plant
today for a single employee.
Denote by p(x, y) the joint mass distribution of the random variables X and
Y.
(a) Compute
p1 (0, 0); p1 (1, 0); p1 (0, 1) and p1 (1, 1).
(b) Deduce the joint moment–generating function, MX,Y (s, t) of the ran-
dom variables X and Y .
(c) Let V and W represent the total numbers of major accidents and minor
accidents, respectively, occurring in the plant today for all the 30 em-
ployees. Compute the joint moment–generating function, MV,W (s, t).
Substantiate your answer.
Question 8 Let X be the annual number of hurricanes hitting City A and let Y
be the annual number of hurricanes hitting City B. Assume X and Y are in-
dependent Poisson variables with respective means 1.70 and 2.30. Calculate
V ar(X − Y |X + Y = 3).
33
Question 9 Two random variables U and V have a joint distribution function as
tabulated below:
pU,V (u, v) u=0 u=1 u=2
v=0 0.1 0.05 0.3
v=1 0.05 0.25 0.25
(a) Find the marginal probability mass functions of U and V .
(b) If it is known that U = 2, what is the probability that V = 1?
(c) Calculate the means of U and V .
(d) Calculate the covariance between U and V .
(e) Are U and V independent? Justify your answer.

!
X
Question 10 Let Z = be a random vector with a bivariate normal distri-
Y
bution with mean vector µ and covariance matrix Σ given by
! !
4 16 −5
µ= , Σ= .
−3 −5 4
Let U = X + Y and V = 2X − Y − 3.
Find!the mean vector and the covariance matrix of the random vector W =
U
.
V
Question 11 Let R be the region
R = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1/2}
and let X and Y be random variables with joint probability density function
(
k(x + 2y), (x, y) ∈ R
fX,Y (x, y) =
0 otherwise.
(a) Determine the value of k.
(b) Find the marginal probability density functions of X and Y .
34
(c) Let 0 ≤ x ≤ 1. Find the conditional probability density function of Y
given that X = x.
(d) Calculate E[Y |X = x].
Question 12 The continuous type random variables X and Y have the joint
density function
(
x+y if 0 < x < 1 and 0 < y < 1
fX,Y (x, y) =
0 otherwise.
(a) Compute the expectation of XY .
(b) Obtain the correlation coefficient between X and Y .
Question 13 Let X be a random variable whose probability density function is

(
e−x
e−2x + 2
if x > 0
fX (x) =
0 otherwise.
(a) Write down the moment–generating function for X.
(b) Use this moment–generating function to compute the first and second
moments of X.
Question 14 Suppose that an analyst determines that the revenue (in rands) that
a small restaurant makes in a week is a random variable, X, with moment–
generating function
1
MX (t) = .
(1 − 25 000t)4
Find the standard deviation of the revenue that the restaurant makes in a
week.
Question 15 A company insures homes in three cities, J, K and L. The losses

occurring in these cities are independent. The moment-generating functions
for the loss distributions of the cities are
MJ (t) = (1 − 2t)−3 , MK (t) = (1 − 2t)−2.5 and ML (t) = (1 − 2t)−4.5 .
35
Let X represent the combined losses from the three cities. Calculate the
moment E(X 3 ).
Question 16 Given that a random variable X has moment-generating function
1 1 1 1
MX (t) = e−2t + e−t + et + e2t ,
6 3 4 4
find
P {|X| ≤ 1}.
(Hint: If necessary use the same argument as in Problem 4.60 (b) in Hsu
(2014, p. 191).
Question 17 This year, the number of accidents, X, at an amusement park has

probability–generating function
PX (t) = e−0.2(1−t)
where defined. The amusement park owner’s insurance company reimburses

up to a maximum of one accident this year. Let Y be the number of unre-
imbursed accidents.
Determine the probability generating function, PY (t), of the number of un-
reimbursed accidents at this amusement park this year, where defined.
Question 18 Assume that X is a Poisson distributed random variable with mean

λ. Let
√
Yλ = (X − λ)/ λ.
Use the central limit theorem to show that the distribution of the random
variable Yλ converges to the distribution of the standard normal distributed
for λ → ∞.
Question 19 (a) Give the characteristic function for the uniform distribution
on the interval [−1, 1].
Here you need to compute the integral
Z ∞
ϕ(t) = ejxt f (x)dx
−∞
36
where f (x) is the pdf of the uniform distribution on [−1, 1].
(b) The characteristic function of a certain random variable X is given by
1
ϕX (t) = (2 cos(t) + 3 cos(2t) + j sin(2t)) .
5
Compute the mean and variance of X through the characteristic func-

tion.
Note that in the section 4.8, Characteristic functions (Hsu 2014, p. 156), the
complex number i such that i2 = −1 is denoted as j.
A complex number is a number of the form a + bj, where a and b are real
numbers and j is such that j 2 = −1. Recall that there is no real number
x such that x2 = −1. Then j is not a real number. Example: 2 + 3j
is a complex number. The set of complex number is denoted C. Clearly,
R ⊂ C. Generally the complex number j is denoted as i and then a + bj is
also denoted as a + bi.
Note also that for real numbers a, b, c, d,
(a + bj) + (c + dj) = (a + b) + (c + d)j and

(a + bj).(c + dj) = ac − bd + (ad + bc)j.
Question 20 A market research analyst for a cell phone company conducts a

study of their customers who exceed the time allowance included in their
basic cell phone contract. The analyst finds that for those people who exceed
the time included in their basic contract, the excess time used follows an
exponential distribution with a mean of 25 minutes.
Consider a random sample of 120 customers who exceed the time allowance
included in their basic cell phone contract. Denote by X the excess time
used by one individual cell phone customer who exceeds her contracted time
allowance. Then X has the exponential distribution of mean µ = 25 minutes.
Denote by X the mean excess time used by a sample of n = 120 customers
37
who exceed their contracted time allowance; that is,
X1 + X2 + . . . + Xn
X=
n
where Xk is the excess time used by customer k.
(a) Determine the mean and standard deviation of the mean excess time
X.
(b) Using the central limit theorem, find an estimate of the probability
that the mean excess time used by the 100 customers in the sample is
between 24 and 26 minutes; that is, find P {24 < X < 26}.
(c) Suppose that one customer who exceeds the time limit for his cell phone
contract is randomly selected. Find the probability that this individual
customer’s excess time is between 24 and 26 minutes.
(d) Explain why the probabilities in (b) and (c) are different.
(e) Find the 95th percentile for the sample mean excess time for samples
of 100 customers who exceed their basic contract time allowances; that
is, determine k such that P {X < k} = 0.95.
38
Chapter 6
Estimation theory
6.1 LESSON 15: Estimation theory
• Understand the concepts parameter estimation, unbiased estimator, efficient

estimator and consistent estimator.
• Understand the concepts maximum-likelihood estimator, Bayes’ estimator

and linear mean square estimator, and be able to compute them.
• Understand the solved problems and their solutions.
• Solve the supplementary problems and compare your solutions to those pro-
vided in the book.
39
Chapter 7
Stochastic processes
In this chapter it is important to simulate stochastic processes in order to explain

the underlying concepts. For this purpose we will use the software R.
7.1 Lesson 16: Generating random numbers us-

ing R
The following codes can be used to generate or sample a list of n random numbers
from the given distributions:
• Uniform distribution on [0, 1]

runif(n, min = 0, max = 1)
• Normal distribution rnorm(n, mean = 0, sd = 1)
• Log normal distribution whose logarithm has mean equal to meanlog and
standard deviation equal to sdlog
rlnorm(n, meanlog, sdlog)
• Truncated normal distribution on the interval (a, b) (Here you need the
package “truncnorm”.)
rtruncnorm(n, a, b, mean, sd)
40
• Poisson distribution of parameter lambda
rpois(n, lambda)
• Exponential distribution with parameter rate

rexp(n, rate)
• Bernoulli distribution of parameter prob

rbinom(n, 1, prob)
• Binomial distribution of parameters size and probability prob

rbinom(n, size, prob)
• Gamma distribution with parameters shape, scale

rgamma(n, shape, scale)
Example: rgamma(5, shape = 3, scale = 2)
• Beta distribution with parameters shape1, shape 2

rbeta(n, shape1, shape2)
• Student t distribution with parameters df, ncp

rt(n, df, ncp)
• F distribution with parameters df1, df2, ncp

rf(n, df1, df2, ncp)
• Discrete uniform distribution with parameters min, max (here you need the
package “fitur”)
rdunif(n, min, max)
• Self-defined distribution (here you need the package “distr”)
– Discrete distribution of values x1, x2,...,xm and probabilities p1,

p2, ..., pm First define the distribution by
dis <- DiscreteDistribution(supp = c(x1,...,xm), prob = c(p1,...,pm))
Now you can generate n random numbers from the distribution dis us-
ing
r(dis)(n)
– Continuous distribution. First define a continuous distribution (for ex-
41
ample)
dis <- AbscontDistribution(d = function(x){exp(-(x^2)/8)/sqrt(8*pi)})}
This defines the distribution dis with pdf

2
e−x /8
f (x) = √ .
8π
Now generate n random numbers from the distribution dis as

r(dis)(10)
7.2 LESSON 17: Basic properties and examples
This section of these notes corresponds to chapter 5 in Hsu (2014).
7.2.1 Introduction
The theory of stochastic processes is a developed branch of probability theory.

One of the first processes investigated was that of the Brownian motion; this
process has been extensively studied, and from its investigation fundamental con-
tributions to probability theory have originated. The first important results in
the study of Brownian motion date back to the beginning of the 20th century.
At the same time telephone engineers were confronted with a type of stochastic
process, today called a birth-and-death process, which turned out to be of fun-
damental importance for designing telephone exchanges. The theory of stochastic
processes originating from needs in physics and technology is at present a rather
well–developed theory; a large number of basic processes have been classified and
the most important properties of these processes are known. The concept “ran-
dom variable” is fundamental in modern probability theory and its applications.
Let t denote a parameter assuming values in a set T , and let X(t) represent a
random variable for every t ∈ T . This yields a family {X(t) : t ∈ T } of random
variables. Such a family will be called a stochastic process. The elements of T will
be interpreted as time points, and T will be the set [0, ∞) or a subset of [0, +∞),
42
such as intervals [0, a], {0, 1, 2, . . . , n} or the whole set N. If T is an interval or
the full set [0, ∞), we say that the stochastic process {X(t) : t ∈ T } is a process
with a continuous time parameter, and if T is N or a subset of N, we say that the
process has a discrete time parameter.
Stochastic models can be contrasted with deterministic models. A deterministic

model is specified by a set of equations that describe exactly how the system will
evolve over time. There is no element of chance here. However, for stochastic
models, the system evolves randomly and if the process is run several times, it will
not give identical results. Different runs of a stochastic process are often called
realisations of the process.
Definition 7.1 A stochastic process or a random process is a family of random

variables defined on a fixed probability space.
We write a stochastic process as {X(t) : t ∈ T } or (X(t))t∈T or (Xt )t∈T . Here T

can be any set (it is called the index set). For each t ∈ T , the quantity X(t) is a
random variable. Most of the index set T will be taken to be [0, ∞) or an interval
of [0, ∞), such as [0, a] for some positive number a, and X(t) will be interpreted
as the value of the stochastic process at time t.
The theory of stochastic processes plays an important role in the investigation of

random phenomena depending on time.
Stock prices are generally modelled by stochastic processes. For a given stock,
X(t) or Xt represents the price at time t.
7.2.2 Some important classes of stochastic processes.
In Hsu (2014), the following examples of stochastic processes are introduced:

Wiener process or Brownian motion, Markov process, Poisson processes, mar-
tingales and Gaussian or normal processes. It is important to understand these
concepts very clearly.
• Given a stochastic process X = {X(t) : t ∈ T }, the first-order distribution of

X is the cumulative distribution function of the random variable X(t) for all
43
1500
1000
500
0
2005 2010 2015 2020
Figure 7.1: Plot of the stock price for GOOGLE since 1 January 2000
50
40
30
20
10
0
2000 2005 2010 2015 2020
Figure 7.2: Plot of the stock price for S&P500 since 1 January 2000
t ∈ T . The joint distribution of X(t1 ) and X(t2 ) is called the second-order

distribution of the process X. In general the nth order distribution of the
process X is the joint cdf of (X(t1 ), X(t2 ), . . . X(tn )) for all t1 , t2 , . . . , tn in
T.
• Understand the concepts mean, correlation and covariance functions of stochas-

tic processes.
• Understand the concepts stationary process, wide-sense stationary process,

independent processes, processes with independent increments, processes
with stationary increments, Gaussian processes (also called normal processes)
and ergodic processes.
44
7.2.3 Markov processes
• Markov property: A stochastic process {X(t) : t ∈ T } is said to be a Markov

process if it satisfies the following property called the Markov property or
memoryless property: For any numbers t1 < t2 < . . . < tn−1 < tn < tn+1 in
the index set T and for any real numbers x1 , x2 , . . . , xn and a, we have that
P{X(tn+1 ) ≤ a|X(t1 ) = x1 , X(t2 ) = x2 , . . . , X(tn−1 ) = xn−1 , X(tn ) = xn }

= P{X(tn+1 ) ≤ a|X(tn ) = xn }.
This means that the conditional probability of the future event

{X(tn+1 ) ≤ a} given the past events X(t1 ) = x1 , X(t2 ) = x2 , . . . , X(tn−1 ) =
xn−1 and the present event X(tn ) = a is simply equal to the the conditional
probability of the future {X(tn+1 ) ≤ a} given the present X(tn ) = a. The
past is irrelevant.
• A discrete-state Markov process (i.e. a Markov process that can only take a
finite of infinitely countable number of values) is called a Markov chain. For
a discrete-time and discrete-state Markov chain {Xn : n ∈ N}, we have that
for every n and all states i0 , i1 , . . . , in−1 , i, j,
P{Xn+1 = j|X0 = i0 , X1 = i1 , . . . , Xn−1 = in−1 , Xn = i} = P{Xn+1 = j|Xn = i}.
If for all i, j, the probability P{Xn+1 = j|Xn = i} is independent of n, then

the Markov chain is said to possess stationary transition probabilities and
we say that {Xn : n ∈ N} is a homogeneous Markov chain. Otherwise the
process is a non-homogeneous Markov chain.
In what follows only homogeneous Markov chains are considered.
• Understand the concept transition probability matrix. Assume that X =

{Xn : n ∈ N} is a homogeneous Markov chain with a discrete finite or
infinite state space E = {0, 1, 2, 3, . . .}. The transition probability matrix of
45
X is the matrix P = (pij ) defined by
pij = P{Xn+1 = j|Xn = i}, for all i, j ∈ N.
Note that ∞
X
pij ≥ 0 for all i, j and pij = 1 for all j.
j=0
• The n-step transition probabilities are defined by:
P{Xn = j|X0 = i}
and clearly this is

(n)
P{Xn = j|X0 = i} = pij
(n)
where pij is the entry of order (i, j) of the matrix P n = P.P . . . P. (See the
details in Hsu (2014).)
• The probability distribution of Xn . Assume that X = {Xn : n ∈ N} is

a homogeneous Markov chain with a discrete finite or infinite state space
E = {0, 1, 2, 3, . . .}. Let pi (n) = P{Xn = i} and
p(n) = (p0 (n), p1 (n), . . . , pn(n), . . .)
with
p0 (n) + p1 (n) + p2 (n) + . . . = 1.
Then pi (0) = P{X0 = i} are called the initial probabilities and
p(0) = (p0 (0), p1 (0), . . . , p0 (n), . . .)
is called the initial -state probability vector. Then
p(n) = p(0)P n
where P is the transition probability matrix.
46
• Understand the concepts accessible state, recurrent state, transient state,
period state, absorbing state, fundamental matrix, stationary distribution
and limiting distributions.
• Understand the concepts renewal process, counting process and Poisson pro-
cess.
7.3 LESSON 18: Markov chains and applica-

tions to queueing theory
Turn to chapter 9, Queueing theory, in Hsu (2014, p. 349). This chapter is

straightforward and you should be able to understand all the concepts quite easily.
7.4 LESSON 19: Martingales
The notion of martingale is one of the most important concepts in stochastic

modelling.
• Study the section corresponding to martingales in Hsu (2014, p. 219)
• Understand the concepts conditional expectation and filtrations.
• The properties conditional expectation are important.
• Understand the definition of a martingale.
• Understand the definition of a stopping time and the optional stopping the-
orem.
47
7.5 LESSON 20: Brownian motion
7.5.1 Random walk
A stochastic process {Xn : n ∈ N} = {X0 , X1 , X2 , . . . , Xn . . .} is called a (sym-

metrical) random walk if:
• X0 = 0.
• Xn+1 − Xn has the coin-toss distribution, that is,
1
P{Xn+1 − Xn = 1} = P{Xn+1 − Xn = −1} = .
2
• Xn+1 − Xn is independent of X0 , X1 , . . . , Xn for each n ∈ N.
Let {ξn : n = 1, 2, 3, . . .} be a sequence of i.i.d random variables such that
1
P{ξn = 1} = P{ξn = −1} = .
2
Then the sequence {Xn : n = 0, 1, 2, . . .} defined by
X0 = 0 and Xn+1 = Xn + ξn+1
is a random walk.
Simulation of random walk
n: number of steps
X0 = 0
For k = 1 to n do
Generate xi in {-1, +1} with equal probability
Xk = Xk + xi
Keep record of all the Xk’s.
In R this can be achieved by the following simple code:
48
20 40 60 80 100
-5
-10
-15
Figure 7.3: Simulation of random walk, n = 100 steps
n <- 100
X0 <- 0
L <- c(X0)
X <- X0
for(k in 1:n){
xi <- sample(c(-1,1), 1,prob = c(1/2, 1/2))
X <- X + xi
L <- append(L, X)
}
In a random walk, the random variable Xn (for n = 1, 2, 3, . . .) is such that

n
n 1
P{Xn = k} =
(n + k)/2 2
where
n
= 0 if (n + k)/2 is not an integer.
(n + k)/2
7.5.2 Brownian motion or Wiener process
The definition of a Brownian motion or a Wiener process is given in Hsu (2014, p.

218).
49
A Brownian motion is a stochastic process X = {Xt : t ≥ 0} defined on a proba-
bility space (Ω, F , P) satisfying the following properties:
1. X0 = 0.
2. The function t → Xt is continuous.
3. For each t the random variable Xt has the normal distribution with mean 0
and variance t.
4. For 0 < t1 < t2 < . . . < tn and real numbers a1 , a2 , . . . , an the sum
a1 Xt1 + a2 Xt2 + . . . + an Xtn
has also the normal distribution. This means that (Xt )t≥0 is a Gaussian
process.
5. For 0 < t1 < t2 < . . . < tn , the random variables
Xt1 , Xt2 − Xt1 , . . . , Xtn − Xtn−1
are independent and normally distributed with mean 0 and variance

t1 , t2 − t1 , . . . , tn − tn−1 respectively.
Then for each t > 0, the pdf of X(t) is given by
1 − x2
f (x) = √ e 2t , −∞ < x < +∞.
2πt
In particular, Z a
1 − x2
P{X(t) ≤ a} = √ e 2t dt
−∞ 2πt
and in general if A1 , A2 , . . . , An are Borel subsets of the reals, then
P {Xt1 ∈ A1 , Xt2 ∈ A2 , . . . Xtn ∈ An }
50
is given by
x2 2

2
1 + (x2 −x1 ) +...+ (xn −xn−1 )
Z Z Z
− 12 t1 (t2 −t1 ) (tn −tn−1 )
... e dx1 dx2 . . . dxn .
A1 A2 An
Intuitively the Brownian motion on a bounded interval [0, T ] can be seen as a limit
of a random walk rescaled as follows:
• Consider a sequence of iid random variables (ξk )k∈N such that
1
P{ξk = 1} = P{ξk = −1} = .
2
• Fix a number n ∈ N.
• Set
S0 = 0 and Sk+1 = Sk + ξk .
• Subdivide the interval [0, T ] into n subintervals of equal length.
• Let t0 = 0, t1 , t2 , . . . , tn−1 , tn = T be the endpoints.
• Set
Sk
X(0) = 0 and X(tk ) = √ , k = 1, 2, 3 . . . , n.
n
• Join these points to find the values of intermediary time points t ∈ [0, T ].
That is
X(t) = Sb nt
T
c
where
nt nt
b c is the largest integer m such that m ≤ < m + 1.
T T
The resulting function converges (weakly) to Brownian motion on the interval

[0, T ] when the number of steps n → ∞.
R Code
51
0.8
0.6
0.4
0.2
0.5 1.0 1.5 2.0 2.5 3.0
-0.2
Figure 7.4: Random walk approximation of Brownian motion on [0, 3]
T <- 3
n <- 1000
S0 <- 0
RW <- c(S0)
S <- S0
for(k in 1:n){
xi <- xi <- sample(c(-1,1), 1,prob = c(1/2, 1/2))
S <- S + xi
RW <- append(RW, S)
}
BM <- RW/sqrt(n)
B <- function(t){
u<- floor(n * t/T)
return(BM[u+1])
}
t <- seq(0,3,T/n)
plot(t, B(t),
main="Random walk approximation of Brownian motion on [0, 3]",
ylab="B(t)",
type="l",
col="blue")
52
7.5.3 Some properties of Brownian motion
Let Fs , for s > 0, be the sub σ–algebra of F generated by the family (Xu : 0 ≤
u ≤ s) of random variables. That is, Fs describes the past (together with the
present) of the Brownian motion up to time s. It contains all the information
pertaining to the Brownian motion from the origin until time s. Similarly, the
sub–σ-algebra spanned by (Xu : u > s) describes the future of Brownian motion
from time s.
Markov property of Brownian motion: For any Borel subset A of R and

s, t ≥ 0, a ∈ R,
P {Xs+t ∈ A|Fs and Xs = a} = P {Xs+t ∈ A|Xs = a}.
Assume that a Bronwian motion traveller started her journey at the origin (point
0) and currently at time s she is situated at position a on the real line. That is,
she has already spent s units of time (say four hours). Does she get tired? Bring
in a fresh Brownian motion traveller and ask him to start his journey at position a
where the first traveller is currently situated. The probability that after t hours the
first traveller will enter a given region is the same probability that after t hours the
second traveller will enter the same region (whether near or far from the current
position a). In other words, the fresh traveller will neither be faster nor slower
than the first traveler. That is, the first four hours spent by the first traveller did
not alter her movement. She is still as fresh as the new traveler at any point in
time. The past is irrelevant, only the present position matters.
The probability of the future given the present and the past is equal to the prob-
ability of the future given the present. Here the past is irrelevant.
That is the Markov property of Brownian motion. It is easy to see that the random
walk also has the Markov property.
53
Example Find
P{X(3) < 5|X(2) = 1, X(1.5) = −1, X(1) = 2, X(0.5) = −2}.
It is the same as
P{X(3) < 5|X(2) = 1}.
Now note that
P{X(3) < 5|X(2) = 1} = P{X(3)−X(2) < 5−X(2)|X(2) = 1} = P{X(3)−X(2) < 4}
because the increments are independent (the random variables X(3) − X(2) and
X(2) are independent).
Assume that a Brownian traveller (starting at the 0) reaches level 5 for the first
time at time t = 3. What is the probability that she will be beyond level 8 at time
T = 4? This is simply
P{X(4) − X(3) > 3}.
Explain why this is true.
Note: Any process that satisfies the Markov property is called a Markov process.
7.5.4 First passage time and maximum of Brownian mo-

tion
The first time a Brownian path reaches level a is called the passage time (or more
precisely first passage) at level a. We denote it τa . It is also a random variable.
The maximum level MT attained by a Brownian motion starting at level 0 on the

interval [0, T ] is also a random variable. It is well known that MT has the same
distribution as |X(T )| (the absolute value of the value of the Brownian motion at
the endpoint T of the interval). Since X(T ) has the normal distribution of mean
0 and variance T , the pdf of MT is given by
2 x2
fMT (x) = √ e− 2T , 0 ≤ x < ∞.
2πT
54
0.5
0.5 1.0 1.5 2.0 2.5 3.0
-0.5
-1.0
-1.5
Figure 7.5: Simulation of Brownian motion on [0, 3]
(That is the half-normal distribution.) (Note that MT is always positive because

the lowest possible value is 0.)
7.5.5 Simulation of Brownian motion in R
To simulate and plot a standard Brownian motion on the interval [0, T ] with n
steps, use the following code:
T <- 3
n<- 1000
dt <- T/n
t <- seq(0, T, dt) # time
## first, simulate a set of random deviates
x <- rnorm(n,0,1)*sqrt(dt)
## now compute their cumulative sum
x <- c(0, cumsum(x))
plot(t, x,
main="Simulation of Brownian motion on [0, 3]",
ylab="B(t)",
type="l")
55
Exercise: Simulate 10 000 standard Brownian motion paths on the interval [0, 1]
(use n = 1 000 steps), find the maximum of each path and give the corresponding
histogram. Estimate the probability that the maximum is less than or equal to 1.
Compare your answer to the exact value. Consider the following code:
m<- 10000
T <- 1
n<- 1000
dt <- T/n
t <- seq(0, T, dt)
ListMax <- c()
for(i in 1:m){
x <- rnorm(n,0,1) *sqrt(dt)
x <- c(0, cumsum(x))
mx <- max(x)
ListMax <- append(ListMax, mx)
}
hist(ListMax)
mean(ListMax <= 1)
7.5.6 Martingale property of Brownian motion
One of the most important properties of Brownian motion is that it is a martingale;

that is, if {X(t) : t ≥ 0} is a Brownian motion, then for all numbers 0 < s ≤ t,
E (X(t)|Fs ) = X(s).
Here, as we already know, Fs contains all the information related to the process
up to time s.
56
7.5.7 Geometric Brownian motion
Given real numbers µ and σ with σ > 0, if {W (t) : t ≥ 0} is the Brownian motion,
then the process {S(t) : t ≥ 0} defined by
σ2
S(t) = S0 e(µ− 2
)t+σW (t)
is called the geometric Brownian motion. The parameter µ is the drift, σ is the
volatility and S0 is the initial value S(0). The process is the solution to the
following stochastic differential equation:
(
dS(t) = µS(t)dt + σS(t)dW (t)
S(0) = S0
The following is an intuitive interpretation of this process on a finite interval [0, T ]:
• Fix a positive integer n.
• Subdivide the interval [0, T ] into n subintervals of equal length ∆ = T /n.
• The process starts at the point S(0) = S0 (initial point).
• At every point ti , write
S(ti+1 ) = S(ti ) + µS(ti )∆ + σS(ti )(W (ti+1 ) − W (ti ))
that is
√
S(ti+1 ) = S(ti ) + µS(ti )∆ + σS(ti )ξi ∆
where (ξi ) is a sequence of iid Gaussian random variables of mean 0 and

variance 1.
• Taking n → ∞, this process converges to the geometric Brownian motion.
In R we can use the following code:
n <- 1000;
T <- 3;
57
150
140
130
120
110
100
0.5 1.0 1.5 2.0 2.5 3.0
Figure 7.6: Simulation of a geometric Brownian motion on [0, 3]
mu <- 0.14;
sigma <- 0.20;
dt <- T/n;
S0 <- 100; (*just an example*)
xi <- rnorm(n, 0,1)
S <- S0;
GBM <- c(S0);
for(k in 1:n){
ti <- k*dt;
xi <- rnorm(1,0,1)
S <- S * (1 + mu*dt + (sigma)* xi* sqrt(dt))
GBM <- append(GBM, S)
}
t <- seq(0, T, dt)

plot(t, GBM,
main="Simulation of geometric Brownian motion on [0, 3]",
ylab="S(t)",
type="l")
58
7.6 LESSON 21: Exercises
Study the solved problems as well as the supplementary problems in Hsu(2014)

chapter 5. You may skip the following problems: 5.4, 5.12, 5.14, 5.18, 5.19, 5.20,
5.26, 5.39, 5.40, 5.41, 5.42, 5.43, 5.44, 5.49, 5.50, 5.51, 5.61, 5.62, 5.74, 5.78, 5.79,
5.80, 5.81 and 5.86. Note that there is a typing error in problem 5.70: Here Xi = 1
with probability p and Xi = −1 with probability q = 1 − p.
7.7 LESSON 22: More exercises
Solve the following problems:
Question 1 Consider the Markov chain whose transition probability matrix is:
 
0 0 1 0 0 0
 
0 0 0 0 0 1
 
0 0 0 0 1 0
P = 1 .
 
1 1
3
 3 3
0 0 0
1 0 0 0 0 0
 
1 1
0 2
0 0 0 2
(a) Classify the states {0, 1, 2, 3, 4, 5} into classes.
(b) Identify the recurrent and transient classes of (a).
Question 2 In a Markov chain model for the progression of a disease, Xn denotes

the level of severity in year n, for n = 0, 1, 2, 3, . . .. The state space is
{1, 2, 3, 4} with the following interpretations: in state 1 the symptoms are
under control; state 2 represents moderate symptoms; state 3 represents
severe symptoms; and state 4 represents a permanent disability.
59
The transition matrix is:
 
1 1 1
4 2
0 4
1 1 1
 
0 2 4 4
P =
0 0 1 1
.
 2 2 
0 0 0 1
(a) Classify the four states as transient or recurrent and give reasons for
your answers. What does this tell you about the long-run fate of some-
one with this disease?
(b) Calculate the 2-step transition matrix.
(c) Determine (i) the probability that a patient whose symptoms are mod-
erate will be permanently disabled two years later; and (ii) the proba-
bility that a patient whose symptoms are under control will have severe
symptoms one year later.
(d) Calculate the probability that a patient whose symptoms are moderate
will have severe symptoms four years later.
A new treatment becomes available but only to permanently disabled

patients, all of whom receive the treatment. This has a 75% success
rate, in which case a patient returns to the “symptoms under control”
state and is subject to the same transition probabilities as before. A
patient whose treatment is unsuccessful remains in state 4 receiving a
further round of treatment the following year.
(e) Write out the transition matrix for this new Markov chain and classify
the states as transient or recurrent.
(f) Calculate the stationary distribution of the new chain.
(g) The annual cost of health care for each patient is 0 in state 1, $1 000 in
state 2, $2 000 in state 3 and $8 000 in state 4. Calculate the expected
annual cost per patient when the system is in steady state.
Question 3 Let {Xn , n ≥ 1} be a sequence of independent and identically dis-
60
tributed random variables such that
1 2
P {Xn = 1} = and P {Xn = −1} = .
3 3
Define
Sn = X1 + X2 + . . . + Xn
and set Fn = σ(X1 , X2 , . . . , Xn ). Show that the process (Mn ) defined by

Mn = Sn + n3 is a martingale, but the process (Sn ) is not a martingale.
Question 4 Suppose that a fisherman catches fish according to a Poisson process

with rate of two per hour. We know that yesterday he began fishing at 9:00.
(a) What is the probability that he caught exactly two fish by 10:00 yes-
terday?
(b) What is the probability that he caught exactly two fish by 10:00 yes-
terday and five fish by 11:00 yesterday?
(c) What is the expected time that he caught his fifth fish?
(d) Suppose that we know that by 13:00 yesterday he caught exactly three
fish. What is the probability that by 14:00 yesterday he caught a total
of exactly ten fish? And what is the probability that he caught his first
fish after 10:00 yesterday?
(e) Suppose that we know that he did not catch any fish until after 10:00
yesterday. Given that information, what is the expected time that he
caught his first fish?
Question 5 At an underground station, trains arrive according to a Poisson pro-

cess of rate 20 per hour.
(a) Suppose that I arrive at the platform and intend to take the first west-
bound train that arrives. What is the expected time I have to wait until
the first train arrives?
(b) After ten minutes, no train has arrived and I am still on the platform.
What is the expected further time I have to wait until the first train
61
arrives?
Question 6 In this question, assume that {B(t), t = 0} is the standard Brownian

motion.
(a) Find P{B(1) + B(2) > 2}.
(b) For 0 < s < t, write a formula for the conditional pdf of B(t) given
B(s) = x.
(c) For 0 < s < t, write a formula for the conditional pdf of B(s) given
B(t) = x.
(d) Using conditions defining the Brownian motion, give a detailed proof
of the Brownian scaling property: {X(t), t ≥ 0} defined by X(t) =
√
cB(t/c) is also a Brownian motion, for c > 0 fixed.
(e) Which of the following defines a Brownian motion?

√
2B(t/4); tB(1); B(2t) − B(t); B(t + 1) − B(1).
(f) For the geometric Brownian motion {X(t) = eB(t) , t ≥ 0}, calculate
E[X(t)] and V ar[X(t)].
Question 7 In this question, assume that {B(t), t ≥ 0} is the standard Brownian

motion.
(a) Using the fact that limt→0 tB(1/t) = 0, show that the process {X(t) :
t ∈ [0, 1]} defined by
X(t) = tB(1/t)
is also a Brownian motion.
(b) The Brownian bridge on [0,1] is defined as
X(t) = B(t) − tB(1), t ∈ [0, 1].
Show that this process is Gaussian, and find its mean and covariance
functions.
62
(c) With the Brownian bridge defined in question (b), show that the process
{W (t) : t ∈ [0, 1]} defined by

t
W (t) = (1 + t)X , 0≤t≤1
1+t
is a Brownian motion on the interval [0, 1].
Question 8 Let {B(t) : t ≥ 0} be a standard Brownian motion with B(0) = 0.
(a) Given t > 0, state the distribution of B(t), including any parameters.
(b) Given real numbers 0 ≤ s < t, determine the joint probability density
function
fB(s),B(t) (x1 , x2 ).
Question 9 Let (X1 , X2 , . . . , Xn , . . .) be a sequence of independent random vari-

ables with zero mean and variance V ar[Xn ] = σn2 . Let Sn = X1 +X2 +. . .+Xn
and Tn2 = σ12 + σ22 + . . . + σn2 . Show that
Sn2 − Tn2 : n = 1, 2, . . .

is a martingale.
Question 10 A radioactive source emits particles according to a Poisson process

with rate two particles per minute.
(a) What is the probability that the first particle appears after three min-
utes?
(b) What is the probability that the first particle appears after three min-
utes but before five minutes?
(c) What is the probability that exactly one particle is emitted in the in-
terval from three to five minutes?
(d) What is the probability that exactly one particle is emitted in the inter-
val from zero to four minutes, and that exactly one particle is emitted
in the interval from three to five minutes?
63
Question 11 Let H be a real number such that 0 < H < 1. A Gaussian process
{X(t) : t ≥ 0} is called a fractional Brownian motion with Hurst parameter
H if
(i) X(0) = 0.
(ii) X(t) is normally distributed with mean 0 and variance t2H .
(iii) {X(t) : t ≥ 0} has stationary increments.
(a) Prove that for a fractional Brownian motion {X(t) : t ≥ 0}, the auto-
covariance function is given by
1 2H
t + s2H − |t − s|2H .

KX (t, s) = E[X(t)X(s)] =
2
(b) Prove that for H = 1/2, a fractional Brownian motion also has indepen-
dent increments and hence it is a Brownian motion, but for H 6= 1/2,
a fractional Brownian motion does not have independent increments.
Question 12 A certain supply company describes its receivable accounts (debts)

as follows:
– State P: A debt is in state P if it has been paid.
– State C: A debt is in state C if it is current (less than a month old).
– State I: A debt is in state I if it is one month old.
– State B: A debt is in state B if it is at least two months old. Debts in

state B are listed as bad debts.
An analyst considers historical data and model the dynamics of the company
accounts as a Markov chain with the following one-month transition matrix
(with respect to the states
P, B, C, I in that order):
64
 
1.00 0.00 0.00 0.00
 
0.00 1.00 0.00 0.00
T =
0.70
.
 0.00 0.00 0.30

0.35 0.65 0.00 0.00
Once a debt is paid (i.e., once the item enters state P), the probability of
moving to state B, C, or I is obviously zero. If a debt is in state C at
the beginning of the month, then at the end of the month, there is a 0.70
probability that it will enter state P and a 0.30 probability that it will enter
state I. If a debt is in state I at the beginning of the month, then at the end
of the month, there is a 0.35 probability that it will enter state P and a 0.65
probability that it will become a bad debt. Finally, after an account has
been listed as a bad account (i.e. state B), it is transferred to the company
overdue accounts section for collection.
(a) Classify the four states as transient, recurrent or absorbing states, and
give reasons for your answers.
(b) Calculate the two-step transition matrix.
(c) Calculate the fundamental matrix Φ of the Markov chain.
(d) Under normal circumstances, it is assumed that the store averages

R2 300 000 in outstanding debts during an average month; R1 500 000
of this amount is current and R800 000 is one month old.
Determine how much of this amount will eventually be paid or end up as
bad debts in a typical month. Hint: To answer this question, consider

1 500 000 800 000 Φ R
where
!
0.70 0.00
R=
0.35 0.65
65
where Φ is the fundamental matrix of the Markov chain (see relation (5.50)
in in Hsu (2014, p. 215)). The answer is of the form (a, b) where a is the
amount that will be paid while the amount b will be declared as bad debt.
Question 13 (Background)
As a result of intense competition and an economic recession, Davidson’s

Department Store in Atlanta was forced to pay particularly close attention
to its cash flow. Because of the poor economy, a number of Davidson’s
customers were not paying their bills upon receipt, delaying payment for
several months, and frequently not paying at all. In general, the Davidson’s
accounts receivable policy was to allow a customer to be two months late
on his or her bill before turning it over to a collection agency. However, it
was not quite as simple as that. Davidson’s had approximately 10 000 open
accounts at any time. The age of the account was determined by the oldest
dollar owed. This means that a customer could have a balance for items
bought in two different months, with the overall account being listed as old
as the earliest month of purchase. For example, suppose a customer has a
balance of $100 at the end of January, $80 of which was for items bought in
January and $20 for items bought in November. This meant that the account
was two months old at the end of January, because the oldest amount on
account was from November. If the customer subsequently paid $20 on the
bill in February, this cancels the November purchase. Then if the customer
made $100 worth of purchases in February, the account was $180, and it
would be one month old (since the oldest purchases were from January).
(Question) Carla Reata, Davidson’s comptroller, analysed the accounts re-

ceivable data for the store for an extended period. She summarised these
data and developed some probabilities for the payment (or non-payment) of
bills. She determined that for current bills (in their first month of billing),
there was an 0.86 probability that the bills would be paid in the month and
a 0.14 probability that they would be be carried over to the next month and
be one month late. If a bill was already one month late, there was a 0.22
probability that the oldest portion of the bill would be paid so that it would
66
remain one month old, a 0.46 probability that the entire bill would be carried
over so that it was two months old, and a 0.32 probability that the bill would
be paid in the month. For bills two months old, there was a probability of
0.54 that the oldest portion would be paid so that the bill would remain
one month old, a 0.16 probability that the next-oldest portion of the bill
would be paid so that it would remain two months old, a 0.18 probability
that the bill would be paid in the month, and a 0.12 probability that the bill
would be listed as a bad debt and turned over to a collection agency. If a
bill has been paid or listed as a bad debt, it would no longer move to any
other billing status. Under normal circumstances (i.e. not a holiday season),
the store averaged $1 350 000 in outstanding bills during an average month;
$750 000 of this amount would be current, $400 000 would be one month old,
and $200 000 would be two months old. The vice president of finance for the
store wanted Carla to determine how much of this amount would eventually
be paid or end up as bad debts in a typical month. She also wanted Carla
to tell her if an average cash reserve of $60 000 per month would be enough
to cover the expected bad debts that would occur each month.
Perform this analysis for Carla.
Hint: First determine the transition matrix and follow the same argument
as in the previous question to answer this question.
Question 14 In modelling insured automobile drivers’ ratings by the insurer, you

might want to consider states such as preferred, standard and substandard.
Models describe the probabilities of moving back and forth among these
states. Consider a driver–ratings model in which drivers move among the
classifications preferred, standard and substandard at the end of each year.
Each year 60% of preferreds are reclassified as preferred, 30% as standard,
and 10% as substandard; 50% of standards are reclassified as standard, 30%
as preferred, and 20% as substandard; and 60% of substandards are reclas-
sified as substandard, 40% as standard and 0% as preferred.
(a) Show that the probability that a driver classified as standard at the
beginning of the first year would be classified as standard at the start
67
Table 7.1: One-year transition probabilities matrix
Ratings at year-end
Initial ratings AAA AA A BBB BB B CCC Default
AAA 0.9366 0.0583 0.0040 0.0009 0.0002 0 0 0

AA 0.0066 0.9172 0.0694 0.0049 0.0006 0.0009 0.0002 0.0002
A 0.0007 0.0225 0.9176 0.0518 0.0049 0.0020 0.0001 0.0004
BBB 0.0003 0.0026 0.0483 0.8924 0.0444 0.0081 0.0016 0.0023
BB 0.0003 0.0006 0.0044 0.0666 0.8323 0.0746 0.0105 0.0107
B 0 0.0010 0.0032 0.0046 0.0572 0.8362 0.0384 0.0594
CCC 0.0015 0 0.0029 0.0088 0.0191 0.1028 0.6123 0.2526
Default 0 0 0 0 0 0 0 1.0000
of the fourth year is 0.409.
(b) Show that the probability that a driver classified as standard at the be-
ginning of the first year would be classified as standard at the beginning
of each of the first four years is 0.125.
Question 15 Over time, bonds are liable to move from one rating category to
another. This is sometimes referred to as credit ratings migration. Rating
agencies use historical data to produce a rating transition matrix. This
matrix shows the probabiltiy of a bond moving from one rating to another
during a certain period of time. Usually the period of time is one year.
Table 7.1. gives a rating transition matrix produced from historical data by
Standard and Poor’s (S&P) in 2001.
In reality, this transition matrix is updated every year. However, if we assume

that there will be no significant change in the transition matrix in the future,
then we can use the transition matrix to predict what will happen over several
years in the future. In particular, we can regard the transition matrix as a
specification of a Markov chain model. Answer the following questions:
(a) Classify the states of ratings (transient or recurrent). What ratings are
absorbing states?
68
(b) Determine the classes of communicating states.
(c) What is the probability that a currently AAA–rated bond in default

after four years?
(d) Now consider what will happen in the long run, assuming that the
transition matrix above operates every year. In the long run, what
fraction of bonds are in state AAA? What fraction of bonds are in
default?
Question 1 The performance of a system with one processor and another with
two processors are compared. Let the interarrival times of jobs be exponen-
tially distributed with parameter λ = 1 job per second. Let’s first consider
the system with one processor. The service time of the jobs is exponentially
distributed with a mean of 0.5 seconds.
(1) For an increase of λ with 10%, how much will the response time (i.e.
the total time for a job in the system) increase?
(2) Calculate the average waiting time, the average number of customers
in the server and the utilisation of the server. What is the probability
of the server being empty?
Let’s now compare this system with a system of two cheaper processors, each
with a mean service time of one second.
(1) For an increase of λ with 10%, how much will the response time in-
crease?
(2) Calculate the average waiting time, the average number of customers
in the server and the utilisation of the server. What is the probability
of both of the servers being busy?
Question 2 A take-away food counter has one server. Customers arrive randomly
at a rate of λ per hour. If there is a queue, some customers go elsewhere
69
1
so that the probability of a potential customer staying for service is n+1
when there are n customers in the shop already waiting or being served.
Service times are independent, but the server is new to the job and tends to
µ
make mistakes under pressure, so the rate of service drops to n+1 per hour
when there are n customers present, where µ > λ. Model this system as a
birth-and-death process.
(a) Show that pn (n = 0, 1, 2, . . .), the equilibrium probability that there

are n customers in the system, is given by:
λ
pn = (n + 1)ρn p0 , with ρ = .
µ
(b) Show that the server will be busy for a proportion ρ(2 − ρ) of the time.
Hint: You may need to use the identity
∞
X 1
(n + 1)xn = for |x| < 1.
n=0
(1 − x)2
Question 3 A small supermarket has two self-service scanners where customers

can pay for their purchases. Customers arrive at these randomly at an aver-
age rate of 20 per hour and have independent exponentially distributed ser-
vice times, taking on average five minutes each to complete their purchases.
Currently there is a single queue for both scanners, but a new manager is
considering two further options:
(i) relocating one of the scanners to the opposite end of the shop so that
there will be separate queues for each machine, with customers assumed
equally likely to enter either queue
(ii) selling one scanner, leaving only one to serve all the customers
For the three possible options (including the existing setup) calculate, where
possible, the proportion of customers who will be served immediately on
arrival (i.e. without joining the queue) and advise the manager appropriately.
Question 4 Suppose that X is a discrete random variable with the following
70
probability mass function (where 0 < θ < 1 is a parameter):
X 0 1 2 3
P(X) 2θ/5 3θ/5 2(1 − θ)/5 3(1 − θ)/5
The following ten independent observations were sampled from such a dis-
tribution:
(2; 0; 3; 2; 2; 2; 1; 0; 1; 1).
What is the maximum likelihood estimate of θ?
Question 5 A telephone switch has ten output lines and a large number of in-
coming lines. Upon arrival a call on the input line is assigned an output
line if such line is available – otherwise the call is blocked and lost. The
output line remains assigned to the call for its entire duration, which is of
exponentially distributed length. Assume that 180 calls per hour arrive in
Poisson fashion, whereas the mean call duration is 110 seconds.
(1) Determine the blocking probability, that is the probability that a ran-
dom call will be blocked.
(2) On average, how many calls are rejected per hour? Substantiate your
answer.
(3) What is the proportion of time that all ten output lines are available?
Question 6 A company has a single production machine as a key work centre

on its factory floor. Jobs arrive at this work centre according to a Poisson
process at a rate of two per day. The processing time to perform each job
has an exponential distribution with a mean of 41 day. Because the jobs
are bulky, those not being worked on are currently being stored in a room
some distance from the machine. However, to save time in fetching the jobs,
the production manager is proposing to add enough in-process storage space
next to the production machine to accommodate three jobs in addition to
the one being processed. (Excess jobs will continue to be stored temporarily
in the distant room.) Under this proposal, what proportion of the time will
this storage next to the turret lathe be adequate to accommodate all waiting
jobs?
71
Question 7 A small bank has two tellers, one for deposits and one for with-
drawals. The service time for each teller is exponentially distributed with
a mean of one minute. Customers arrive at the bank according to a Pois-
son process, with mean rate 40 per hour. It is assumed that depositors and
withdrawers constitute separate Poisson processes, each with mean rate 20
per hour, and that no customer is both a depositor and a withdrawer. The
bank is thinking of changing the current arrangement to allow each teller to
handle both deposits and withdrawals. The bank would expect that each
teller’s mean service time would increase to 1.2 minutes, but it hopes that
the new arrangement would prevent long lines in front of one teller while
the other teller is idle, a situation that occurs from time to time under the
current setup. Analyse the two arrangements with respect to the average
idle time of a teller and the expected number of customers in the bank at
any given time.
Question 8 A service station has one petrol pump. Cars arrive at the station
according to a Poisson process at a mean rate of 15 per hour. However, if
the pump is already being used, these potential customers may balk (drive
on to another service station). In particular, if there are already n cars in the
service station, the probability that an arriving potential customer will balk
is n/3 for n = 1, 2, 3. The time required to service a car has an exponential
distribution with a mean of four minutes.
(a) Assume that this system is modelled by a birth-and-death process. Give

the birth and death parameters.
(b) Develop the steady-state equilibrium equations.
(c) Solve these equations to find the steady-state probabilities p0 , p1 , p2 , . . .
(d) Find the expected time spent in the system for those cars that stay.
Question 9 Consider a sample X1 ; X2 ; . . . ; Xn of independent and identically dis-

tributed random variables, where for each i, Xi has a geometric distribution
72
with probability mass function
f (x, θ) = Prob{Xi = x} = θ(1 − θ)x−1 , ∀x ∈ {1; 2; 3; . . .}
where the success probability θ satisfies 0 < θ < 1 and is unknown.
(a) Give the likelihood function of the sample {x1 ; x2 ; . . . ; xn }.
(b) Determine the maximum likelihood estimator of the success probability

θ.
Question 10 A bank employs four tellers to serve its customers. Customers arrive
according to a Poisson process at a mean rate of two per minute. However,
business is growing and management projects that the mean arrival rate will
be three per minute a year from now. The transaction time between a teller
and a customer has an exponential distribution with a mean of one minute.
Management has established the following guidelines for a satisfactory level

of service to customers. The average number of customers waiting in line to
begin service should not exceed one. At least 95% of the time, the number
of customers waiting in line should not exceed five. For at least 95% of
customers, the time spent in line waiting to begin service should not exceed
five minutes.
(a) Use the M/M/s model to determine how well these guidelines are cur-
rently being satisfied.
(b) Evaluate how well the guidelines will be satisfied a year from now if no
change is made in the number of tellers.
Hint: Use the fact that in an M/M/s system, if Wq denotes the time that
a customer spends waiting in the queue, then
s−1
!
X
P {Wq > t} = 1− Pn e−sµ(1−ρ)t .
n=0
Question 11 The time T (in seconds) for a chemical reaction to take place at a
73
certain pressure has the distribution
( 2
2βte−tβ if t > 0
fT (t) =
0 otherwise
where the constant β > 0 is unknown. Independent observations t1 , t2 , . . . , tn

of the time T are made. Find the maximum likelihood estimator of the pa-
rameter β.
74
Chapter 8
A brief introduction to stochastic

calculus
This chapter is not part of the syllabus, but it contains useful concepts and tech-
niques with applications in financial modelling.
8.1 LESSON 24: Stochastic integral
Stochastic calculus is the cornerstone of financial mathematics. Here we are mainly

interested in studying how we can define the integral of a stochastic process with
respect to the Brownian motion {W (t) : t ≥ 0}. That is, if {H(t) : t ≥ 0} is
another stochastic process, how can we define
Z t
H(s)dW (s)?
0
8.1.1 Definition
The definition is not difficult to understand. We start with very simple cases.
75
(a) For 0 ≤ a ≤ b, by definition
Z b
dW (s) = W (b) − W (a)
a
Z b Z b
3dW (s) = 3 dW (s) = 3(W (b) − W (a))
a a
Z b
0 dW (s) = 0
a
Z t
dW (s) = W (t) − W (0) = W (t), for t ≥ 0.
0
(Clearly these stochastic integrals are random variables, not real numbers.)
Example 1: Compute the mean and variance of
Z 10
1[1,14] dW (s).
0
(b) If G is a fixed random variable (defined on the same probability space as the
process {W (t) : t ≥ 0}) and G does not depend on time t in all the interval
[a, b] and G is measurable with respect to Fa (this means intuitively that G
is completely determined by the information available at time t = a), then
Z b Z b
GdW (s) = G dW (s) = G(W (b) − W (a)).
a a
(Here we treat G as we treat constants in common integrals. Actually G is

constant as a function of s but not as a function of ω.)
Example 2: Find the mean and variance of
Z 7
W (2) dW (t).
2
Example 3: Compute the stochastic integral

Z 10
G1[1,14] dW (s)
0
where G is a fixed random variable as before.
76
(c) From (a) and (b) we can easily compute the stochastic integral of the form
Z t
G1 1[a1 , b1 ] (s) + G2 1[a2 , b2 ] (s) + . . . + Gn 1[an , bn ] (s) dW (s)
0
where ai , bi are all positive real numbers such that
a1 ≤ b 1 ≤ a2 ≤ b 2 . . . ≤ an ≤ b n
and G1 , G2 , . . . , Gn are fixed random variables (independent of the time in

the interval of integration) that are defined on the same probability space as
the Wiener process {W (t) : t ≥ 0} and each Gi is measurable with respect
to Fai .
In fact,
Z t
G1 1[a1 , b1 ] (s) + G2 1[a2 , b2 ] (s) + . . . + Gn 1[an , bn ] (s) dW (s)
0
Z t Z t Z t
= G1 1[a1 , b1 ] (s)dW (s) + G2 1[a2 , b2 ] (s)dW (s) + . . . + Gn 1[an , bn ] (s)dW (s).
0 0 0
That is,
n
! n Z
Z t X X t
Gi 1[ai , bi ] (s) dW (s) = Gi 1[ai , bi ] (s)dW (s) .
0 i=1 i=1 0
Note: A stochastic process of the form

n
X
Gi 1[ai , bi ] (s)
i=1
is called a simple process.
Example 4: Show that if G1 and G2 are fixed random variables as discussed

before, then
Z 10
G1 1[0,3] + G2 1[4,6] (s)dW (s) = G1 .(W (3) − W (0)) + G2 .(W (6) − W (4)).
0
77
(d) What can we do to define the stochastic integral
Z t
H(s)dW (s)
0
if the integrand process {Hs : s ≥ 0} is of a general form that is not described

above? For example, how can we define the stochastic integral
Z t
(W (s))2 dW (s)?
0
Answer: As we do for common integrals, the idea is to approximate the

integrand process {Hs : s ≥ 0} by a sequence of simple processes Hsn of the
form n
X
n
Hs (ω) = Gi (ω)1[ai , bi ] (s)
i=1
such that (in some sense)

lim Hsn = Hs .
n→∞
Rt
If this is possible, then we define the stochastic integral 0
Hs dW (s) by
Z t Z t
Hs dW (s) = lim Hsn dW (s)
0 n→∞ 0
where the limit is taken in a sense to be clarified.
Conditions under which this is possible are the following:
(i) The process {Hs : s ≥ 0} is adapted to the filtration {Fs : s ≥ 0}. This
means that for every s ≥ 0, the random variable Hs is measurable with
respect to the σ-algebra Fs (defined by the available information of the
dynamics of the Wiener process up to time s).
R∞
(ii) The expectation of the random variable 0 Hs2 ds is finite, that is
Z ∞
E Hs2 ds < ∞.
0
78
If these two conditions are verified, then there exists a sequence (Hsn ) of
simple processes that converges to the process (Hs ) in the sense that
Z ∞
lim E (Hsn 2
− Hs ) ds = 0.
n→∞ 0
(e) Note: We have briefly discussed how the stochastic integral with respect
to Brownian motion is defined. Note that in the general setting we will not
use the definition to calculate stochastic integrals. We will discuss the Itô
formula and see how it can be used to calculate some special stochastic in-
tegrals explicitly. However, note that it is generally very difficult to calculate
stochastic integrals.
Before discussing Itô’s formula, we need to study some beautiful properties of the
stochastic integral.
8.1.2 Some properties of stochastic integration
(a) First of all, understand the definition of the quadratic variation of a

continuous martingale. If {Mt : t ≥ 0} is a continuous martingale, such
that E(Mt2 ) < ∞ for all t, then the quadratic variation of the martingale
{Mt : t ≥ 0} is the stochastic process {At : t ≥ 0} such that the following
conditions hold:
(i) The process {At : t ≥ 0} is increasing; that is, As (ω) ≤ At (ω) for s ≤ t
and all ω ∈ Ω.
(ii) The process {At : t ≥ 0} is adapted to the filtration {Ft : t ≥ 0}.
(iii) The process {At : t ≥ 0} has continuous paths and A0 = 0.
(iv) The process {Mt2 − At : t ≥ 0} is also a martingale.
The quadratic variation of a martingale (Mt ) is denoted hM it . The

simplest example is given by the Brownian motion: If {Wt : t ≥ 0} is a
Brownian motion, then we know that {Wt : t ≥ 0} is a martingale. It is easy
79
to show that the process {Wt2 − t : t ≥ 0) is also a martingale. Clearly the
process {At : t ≥ 0}, defined by At = t (which is actually a deterministic
function of t), satisfies all the required properties to be a quadratic variation
of (Wt )). Therefore, the quadratic variation of the Wiener process
is At = t, that is
hW it = t.
An easy exercise: Prove that the quadratic variation of the martingale

{σ Wt : t ≥ 0}, where σ is a constant, is At = σ 2 t, that is
hσW it = σ 2 t.
(You only need to prove that {(σWt )2 − σ 2 t : t ≥ 0} is a martingale, which

is obvious.)
(b) If the stochastic integral

Z t
Nt = Hs dW (s)
0
exists, then the stochastic process N = {Nt : t ≥ 0} is such that:
(i) N is a martingale with respect to the filtration (Ft ) (recall that a

stochastic integral is a random variable) and then in particular Nt is
measurable with respect to Ft .
(ii) The quadratic variation of N is given by

Z t
hN it = Hs2 ds.
0
(iii) The mean of Nt is zero and its variance is given by

Z t
V ar(Nt ) = E Hs2 ds
0
(i.e. the variance of Nt is just the expected value of its quadratic vari-
80
ation). Note that
Z t
E[Nt ] = 0 and E[Nt2 ] =E Hs2 ds .
0
(c) The definition of a stochastic integral with respect to Brownian motion has
been extended to martingales. If {Mt : t ≥ 0} is a martingale (with respect
to a fixed filtration {Ft : t ≥ 0}) with continuous paths and {Ht : t ≥ 0}
is an adapted process, then we can repeat everything we have done for the
Wiener process and define the stochastic integral
Z t
Hs dM (s).
0
The condition under which the integral exists is

Z ∞
E Hs2 dhM is < ∞.
0
Observe that here we have replaced the dt in the case of Brownian motion
by dhM is , where as you know hM i is the quadratic variation of M .
For example, we have that

Z t
dM (s) = M (t) − M (0)
0
Z b
dM (s) = M (b) − M (a)
a
Z b
GdM (s) = G(M (b) − M (a))
a
if G does not depend on s and it is measurable with respect to the σ-algebra

generated by the random variables {Mt : 0 ≤ t ≤ a}.
(d) A function f : [0, ∞) → R is said to have bounded variation if it can be

expressed as a difference of two increasing functions, that is, if there exist
two functions g and h such that f = g − h and both g and h are increasing
(which means that if s < t, then g(s) ≤ g(t) and h(s) ≤ h(t)).
81
For example, an increasing function is a function of bounded variation; also, a
decreasing function is a function of bounded variation. The function f (t) = t
has bounded variation. Any continuously differentiable function f (i.e. f is
differentiable and its derivative f 0 is continuous) has bounded variation.
(e) A semi-martingale is a stochastic process {Xt : t ≥ 0} such that there

exists a martingale (Mt ), and a process (At ) of bounded variations such that
Xt = Mt + At . For example, if {Wt : t ≥ 0} denotes (as usual) the Wiener
process, then σWt +µt (where σ and µ are real numbers) is a semi-martingale
because (σWt ) is a martingale and the process At (ω) = µt ( - a deterministic
function) has bounded variation.
Under some conditions, we can define the stochastic integral with respect to
a semi-martingale. If Xt = Mt + At is a semi-martingale and {Ht : t ≥ 0} is
adapted, then Z Z t Z t t
Hs dXs = Hs dMs + Hs dAs .
0 0 0
For example, let Xt = σWt + µt, then
Z t Z t Z t Z t Z t
Hs dXs = Hs d(σWs ) + Hs d(µs) = σ Hs dWs + µ Hs ds.
0 0 0 0 0
The notation (I say notation)
hXit = hM it
(in the case Xt = Mt + At ) will be used in the next section.
Example: If X is the semi-martingale defined by
X(t) = 3W (t) + 5t, for t ≥ 0,
82
then
Z 8 Z 8
4dX(t) = 4 dX(t)
2 2
Z 8
= 4 d(3W (t) + 5t)
2
Z 8 Z 8
= 4 3 dW (t) + 5 dt
2 2
= 12(W (8) − W (2)) + 5(8 − 2)
= 12(W (8) − W (2)) + 30.
8.1.3 Itô’s formula
Itô’s formula is the cornerstone of stochastic calculus. It must be very well under-
stood.
One-dimensional Itô’s formula: Let f : R → R be a C 2 -function, that is, its

first and second derivative exist and are continuous.
1. For Brownian motion (Wt ), we have that

Z t Z t
0 1
f (Wt ) − f (W0 ) = f (Ws )dWs + f 00 (Ws )ds.
0 2 0
2. In general, for any martingale (Mt ), we have

Z t Z t
0 1
f (Mt ) − f (M0 ) = f (Ms )dMs + f 00 (Ms )dhM is .
0 2 0
3. And for any semi-martingale Xt = Mt + At (where (Mt ) is a martingale and
83
(At ) has bounded variation), we have the formula:
t
1 t 00
Z Z
0
f (Xt ) − f (X0 ) = f (Xs )dXs + f (Xs )dhXis
0 2 0
Z t Z t
1 t 00
Z
0 0
= f (Xs )dMs + f (Xs )dAs + f (Xs )dhM is .
0 0 2 0
(Recall the notation hXis = hM is .)
Examples where Itô’s formula is used
(1) Calculate explicitly the stochastic integral

Z t
Ws dW s.
0
The idea is to choose a function f such that f 0 (Ws ) = Ws It is therefore

sufficient to choose f (x) = x2 /2.
Now, by taking f (x) = x2 /2, we find f 0 (x) = x and f 00 (x) = 1 (a constant

function). Itô’s formula gives
Z t Z t
0 1
f (Wt ) − f (W0 ) = f (Ws )dWs + f 00 (Ws )ds.
0 2 0
Then t t t
Wt2
Z Z Z
1
= Ws dWs + ds = Ws dWs + t/2
2 0 2 0 0
because W0 = 0 and f (0) = 0. Therefore,

t
Wt2 − t
Z
Ws dWs = .
0 2
(We now have another proof that Wt2 − t is a martingale, because it is a

stochastic integral.)
84
An extension of Itô’s formula: Consider a C 2 function of two variables
f : [0, ∞) × R → R, (t, x) → f (t, x).
We have the following extension of Itô’s formula:

Z t
∂ 2f

∂f
f (t, W (t)) − f (0, 0) = (s, W (s)) + 2 2 (s, W (s)) ds
1
0 ∂t ∂x
Z t
∂f
+ (s, W (s))dW (s). (8.1)
0 ∂x
Example 1: Show that

Z t Z t
s dWs = tWt − Ws ds.
0 0
Consider the function f (t, x) = t x. We have that
∂f ∂f ∂ 2f
(s, x) = x, (s, x) = s and (s, x) = 0.
∂t ∂x ∂x2
Now we can apply formula (8.1) to find that

Z t 2

∂f 2 ∂ f
f (t, W (t)) − f (0, W0 ) = (s, W (s)) + 12 σ (s) 2 (s, W (s)) ds
0 ∂t ∂x
Z t
∂f
+ σ(s) (s, X(s))dW (s)
∂x
Z t0 Z t
= W (s)ds + sdW (s).
0 0
We find that Z t Z t
tW (t) − 0 = W (s)ds + sdW (s),
0 0
which is equivalent to
Z t Z t
s dW (s) = tW (t) − W (s)ds.
0 0
85
8.2 LESSON 25: A brief note on stochastic dif-
ferential equations
A wide class of continuous-time stochastic processes X = {X(t) : t ≥ 0} can be

described as solutions to stochastic differential equations (SDEs) of the form:
dX(t) = µ(t, X(t))dt + σ(t, X(t))dB(t)

X(0) = X0 .
Here B = {B(t) : t ≥ 0} is the Brownian motion, and µ and σ are functions

defined from [0, ∞) × R into R. X0 is the given initial value of the process. The
stochastic process X = {X(t) : t ≥ 0} is the “unknown” in this equation. A
solution to this equation is any process X such that satisfies this equation. Since
the Brownian motion B is involved in the equation, the solution X is generally
also a random variable. The number X0 is called the initial value of the SDE. The
function µ is called the drift of the SDE and the function σ is called the diffusion
coefficient or volatility of the SDE. A stochastic process X = {X(t) : t ≥ 0} is a
solution of the SDE if
Z t Z t
X(t) = X0 + µ(s, X(s))ds + σ(s, X(s))dB(s)
0 0
for all t ≥ 0.
In order for the SDE to have a solution, that is for a process X that satisfies the
SDE to exist, there are assumptions that must be imposed on the drift function µ
and the diffusion coefficient σ. We do not discuss these here.
To simulate the stochastic process X satisfying the SDE on a finite interval [0, T ],
we can use the following scheme (called the Euler–Maruyama method):
Input:
the drift µ
the diffusion coefficient σ
the initial value X0
86
the time horizon: interval [0, T ]
the discretisation parameter n ∈ N
• Subdivide the interval [0, T ] into n subintervals of equal length ∆ = T /n.
• The process starts at the point X(0) = X0 (initial point).
• At every point ti , write
X(ti+1 ) = X(ti ) + µ(ti , X(ti ))∆ + σ(ti , S(ti ))(W (ti+1 ) − W (ti )),
that is
√
X(ti+1 ) = X(ti ) + µ(ti , X(ti ))∆ + σ(ti , S(ti ))ξi ∆
where (ξi ) is a sequence of iid Gaussian random variables of mean 0 and

variance 1.
• Return the sequence of the values X(ti ) for i = 0, 1, 2, . . . , n.
• Taking n → ∞, this process converges to the solution process X.
Example: Consider the SDE
dX(t) = κ(θ − X(t))dt + σdW (t)

X0 = 0.5
where κ, σ and θ are constants given by:
κ = 10; σ = 0.25; and θ = 1.
Clearly the drift is

µ(t, X(t)) = κ(θ − X(t))
and the diffusion coefficient is simply the constant function
σ(t, X(t)) = 0.25.
Such process X is called an Ornstein–Uhlenbeck process. The Ornstein–Uhlenbeck

processes are used for example in interest rate modelling.
87
To simulate the solution process, use the following R code:
n <- 1000
T <- 10
kappa <- 10
theta <- 1
mu <- function(x){
return(kappa*(theta - x))
}
sigma <- 0.25
D <- T/n;
X0 <- 0.5;
X <- X0;
path <- c(X0)
for(k in 1:n){
t <- k*D;
xi <- rnorm(1,0,1)
X <- X + mu(X)*D + sigma* sqrt(D) * xi;
path <- append(path, X)
}
t <- seq(0, T, D)
plot(t, path,
main="Simulation of the solution to SDE on [0, 10]",
ylab="X(t)",
type="l")
8.3 LESSON 26: Exercises
Question 1 Consider the pricing of a European call option in the Black–Scholes

framework by simulating the SDE
dSt = rSt dt + σSt dWt
88
1.1
1.0
0.9
2 4 6 8 10
Figure 8.1: Simulation of the solution to SDE on [0, 10]
with parameters S0 = K = 100, T = 0.5 years, r = 1% and σ = 40%. (The

process defined by this SDE is the geometric Brownian motion.) We can
price such an option using the Black-Scholes formula and obtain the exact
value of 11.469. We want to see how one can approximate this value by using
a discretisation scheme to simulate the GBM . We know that the pay-off of
a European call option at maturity time T is given by:
V (T ) = max(0, ST − K),
that is
(
ST − K if ST > K
V (T ) =
0 otherwise
(a) Then simulate a sample path of the stock price process {St : t ∈ [0, T ]}
by subdividing the interval into 1 000 subintervals of equal length.
Use the following code:
(b) Compute the corresponding value of the pay-off V (T ) of the option.

Simply add the VT = Max[0, S - K] at the end of the previous code.
(c) Repeat this 2 000 times and deduce the value V0 of the option at time
t = 0. Use
V0 = e−rT E(VT ).
89
Compare your answer to the exact option price USD 11.469.
Question 2 Consider Heston’s stochastic volatility model where the evolution of

the stock price, St, under the risk-neutral probability measure satisfies
√
dSt = rSt dt + vt St dWt1
√
dvt = κ(θ − vt )dt + σ vt dWt2
where (Wt1 ) and (Wt2 ) are two Brownian motions (on the same probability
space) such that for each t > 0,
E(Wt1 Wt2 ) = ρt
where ρ is a number such that −1 ≤ ρ ≤ 1. (The process (Vt ) is the classical

Cox–Ingersoll–Ross (CIR) model. It is used to model interest rates.) Such
a Brownian motion can be obtained by first considering two independent
Brownian motions (Bt ) and (Zt ) and thereafter take
p
Wt1 = Bt and Wt2 = ρBt + 1 − ρ2 Zt .
Consider a European call option on the stock and take T = 1, S0 = 100 and
r = 0.05 for the call option parameters. Assume that the process parameters
are v0 = 0.04, κ = 1.2, θ = 0.04, ρ = −0.5 and σ = 0.3.
Then simulate a sample path of the stock price process {St : t ∈ [0, T ]} by
subdividing the interval [0, T ] into 10 000 subintervals of equal length. Use
the following code:
n <- 10000 # number of steps

T <- 1
r <- 0.05
v0 <- 0.04
kappa <- 1.2
theta <- 0.04
rho <- -0.5
90
sigma <- 0.3
S0 <- 100
v0 <- 0.04
D <- T/n
S <- S0
v <- v0
LL <- c(S0)
for(k in 1:n){
B <- rnorm(1,0,1)
Z <- rnorm(1, 0,1)
W1 <- B;
W2 <- rho * B + sqrt(1- rho^2)*Z
v < v + kappa*(theta - v)* D + sigma * sqrt(v)* sqrt(D)* W2;
S <- S + r * S* D + sqrt(v) * S * sqrt(D)* W1;
LL <- append(LL, S)
}
t <- seq(0, T, D)
plot(t, LL,
main="Simulation of the solution to SDE on [0, 1]",
ylab="X(t)",
type="l")
Question 3 Consider the Black–Scholes asset price model (under the risk-neutral
probability measure)
dSt = rSt dt + σSt dWt , t ∈ [0, 5]
with S0 = 1 , r = 10% and σ = 40%. Use simulation to compute the price

of a European contingent claim that pays X = S(5)2 at time 0.
Question 4 Given a subset D of Rd (here d is positive integer) and a function

λ : Rd → [0, ∞), a Poisson process on D with intensity function λ is a
random set Π ⊂ D such that the following two conditions hold:
91
(i) If A ⊂ D then the number of elements of the random set Π in A
(that number is a random variable) follows the Poisson distribution
with parameter Z
Λ(A) = λ(x)dx.
A
That is,
|Π ∩ A| ∼ P oisson(Λ(A)).
(ii) if A and B are subsets of D that are disjoint, then the random variables
|Π ∩ A| and |Π ∩ B| are independent.
The intensity function λ specifies the number of points of the Poisson process,
on average, are located in a given region. The process will have many points
where λ is large and will have only few points where λ is small. Clearly, for
any subset A of D,
E(|Π ∩ A|) = Λ(A).
If the intensity function λ is constant and d = 1 then we retrieve the Poisson

processes that we discussed earlier (see chapter 5 in Hsu (2014)).
To generate a sample from a Poisson process with constant intensity λ on a

interval D = [a, b] ⊂ R, we can simply use the following algorithm:
(1) Generate N from the Poisson distribution of parameter λ(b − a).
(2) Generate iid numbers X1 , X2 , . . . , XN from the uniform distribution

U [a, b] on the interval [a, b].
(3) Return Π = {X1 , X2 , . . . , XN }.
Simulate 1 000 paths of a Poisson process with parameter λ = 1.3 on the

interval [0, 15] In R, we can use the following code:
a <- 0
b<- 15
lambda <- 1.3
total_intensity <- lambda*(b-a)
92
N <- rpois(1, total_intensity)
X <- runif(N, a, b)
For each iteration, N is the number of events that occur in the interval [a, b]
and the sequence X contains the arrival times (times at which the events
occur). This algorithm is based on the following well known result: For a
Poisson process with constant intensity λ, under the condition that N events
happen in the interval [a, b], the set of the N event times is distributed as a
set of N independent U [a, b] random variables.
(Exercise 1) The Black–Scholes model of stock price is given by
1 2
S(t) = S0 eσW (t)+(µ− 2 σ )t , t ≥ 0
where S(0) is the stock price at time t = 0. Here {W (t) : t ≥ 0} is the

standard Brownian motion, σ is the volatility of the stock and µ is the
mean return of the stock. Show that S(t) is the solution of the stochastic
differential equation
(
dS(t) = σS(t)dW (t) + µS(t)dt
S(0) = S0 .
(Exercise 2) 1. Show that the solution to the SDE (stochastic differential equa-
tion)
( 1 2
dXt = 31 Xt3 dt + Xt3 dWt
X0 = x > 0.
is given by 3
1 1
Xt = x + Wt .
3
3
93
Hint: Consider the function
3
1 1
f (y) = x 3 + y
3
and apply Itô’s formula. (Here, x is a constant fixed in the problem

and y is the only variable.)
1
2. Show that Xt = 1+t
Wt solves the SDE
(
1 1
dXt = − 1+t Xt dt + 1+t
dWt
X0 = 0.
Hint: Here Xt depends on t and Wt . Then write
Xt = f (t, Wt )
where
1
f (t, x) = x.
1+t
The SDE is equivalent to:
Z t Z t
1 1
Xt = − Xs ds + dWs .
0 1+s 0 1+s
Now we use Itô’s formula (8.1) to verify that this equality holds for
1
Xt = Wt .
1+t
3. Solve the SDE

(
dXt = −Xt dt + e−t dWt
X0 = x.
Hint: Put the equation in the form
et dXt + et Xt dt = dWt
94
and use
d(et Xt ) = et d(Xt ) + Xt et dt.
4. In general, apply Itô’s formula to show that the solution to the SDE
(
dXt = (aXt + bt )dt + σt dWt
X 0 = x0
(where a is a constant and bt and σt are deterministic functions) is given

by Z t Z t
at a(t−s)
X t = e x0 + e bs ds + ea(t−s) σs dWs .
0 0
(Exercise 3) Find stochastic differential equations that are solved by the following
stochastic processes (α and β are constants):
1. Vt = W 3 (t). (Hint: Simply use Itô’s formula for one variable)
2. Vt = αt + βW 2 (t). (Hint: Express Vt = f (t, Wt ) for a suitable f and

apply the relevant Itô’s formula.)
3. Vt = exp (αt + βW (t)) .
4. Vt = tW (t).
5. Vt = exp (αt2 + βw2 (t)) .
(Exercise 4) If dS(t) = µS(t)dt + σS(t)dW (t), where µ and σ are constants, find
a stochastic differential equation satisfied by:
1. Vt = αt + βS(t)
2. Vt = ln S(t)
3. Vt = S(t)e−αt
Hint: We have that
dS(t) = µS(t)dt + σS(t)dW (t).
95
We recall that the solution of this SDE is (as already discussed)
2

σW (t)+ µ− σ2 t
St = S0 e .
Then for exercise 4.1, we find
σ2
V (t) = αt + βS0 eσW (t)+(µ− 2
)t
.
Then V (t) is a function of t and Wt . Therefore, we can just write V (t) =

f (t, Wt ) where
2

σx+ µ− σ2 t
f (t, x)) = αt + βS0 e
and apply Itô’s formula.
(Exercise 5) Consider the SDE

(
dXt = rXt dt + σdWt
X0 = x
where r is a constant.
(a) Show that its solution is

Z t
rt rt
Xt = xe + σe ērs dWs .
0
(b) Find the mean and the variance of Xt .
96
Solutions to exercises
(Exercise 2) (Exercise 2.1) The problem is to verify that Xt is the solution of the
given SDE.
1. Clearly
3
1 1
X0 = 0 3 + W0 = 0 (since W0 = 0) .
3
2. We use Itô’s formula again. Note that the given SDE is equivalent
to
1 t 13
Z Z 2
Xt = x + Xs ds + Xs3 dWs .
3 0
We want to verify that

3 1 2
1 t
Z Z t
1 1 1 1 1 1
x 3 + Wt = x + x 3 + Ws ds + x 3 + Ws dWs .
3 3 0 3 0 3
Let 3
1 1
f (y) = x 3 + y .
3
(Here x is a fixed constant and y is the only one variable.) Then
3
1 1
f (Wt ) = x + Wt = Xt .
3
3
We have that
2 2
0 1 1 1 1 1
f (y) = 3 x + y3 × = x + y
3
3 3 3
and then 2
0 1 1 2
f (Ws ) = x + Ws
3 = Xs3 .
3
97
Also, 1
00 1 1 1 2 1 1
f (y) = 2 x 3 + y × = x3 + y
3 3 3 3
and hence
00 2 1 1 2 1
f (Ws ) = x 3 + Ws = Xs3 .
3 3 3
Then, by Itô’s formula,

Z t Z t
2 1 2 31
Xt − x = Xs dWs +
3
Xs ds.
0 2 0 3
Therefore, Z t Z t
2 1 1
Xt = x + Xs dWs +
3
X 3 ds
0 3 0
and we are done.
(Exercise 2.2) Here Xt depends on t and Wt . Then we write
Xt = f (t, Wt )
where
1
f (t, x) = x.
1+t
(Do you understand why?) The SDE is equivalent to
Z t Z t
1 1
Xt = − Xs ds + dWs .
0 1+s 0 1+s
Now we use Itô’s formula to verify that this equality holds for
1
Xt = Wt .
1+t
Use
Z t
∂ 2f

∂f
f (t, W (t)) − f (0, W0 ) = (s, W (s)) + 2 2 (s, W (s)) ds
1
0 ∂t ∂x
Z t
∂f
+ (s, W (s))dW (s).
0 ∂x
98
We have that
∂f 1 ∂f 1
= ⇒ (s, Ws ) =
∂x 1+t ∂x 1+s
∂f −x ∂f −Ws
= 2
=⇒ (s, Ws ) =
∂t (1 + t) ∂y (1 + s)2
and
∂ 2f
= 0.
∂x2
Therefore,
Z t Z t
1 Ws
f (t, Wt ) − f (0, W0 ) = dWs − ds
0 1+s 0 (1 + s)2
Z t Z t
1 1
= dWs − Xs ds
0 1+s 0 1+s
Ws
because Xs = .
1+s
Then Z t Z t
1 1
Xt = dWs − Xs ds.
0 1+s 0 1+s
(Exercise 2.3) Multiply the equation
dXt + Xt dt = e−t dWt
by et to find
et dXt + et Xt dt = dWt .
This is equivqalent to
et dXt + Xt d(et ) = dWt
since d(et ) = et dt. Then, because
d(et Xt ) = Xt d(et ) + et dXt ,
99
the equation becomes
d et Xt = dWt .

Let
Yt = et Xt .
Then the equation is equivalent to
dYt = dWt .
The solution is
Yt = Y0 + Wt = e0 × X0 + Wt = x + Wt .
Replacing Yt by its value, we find
et Xt = x + Wt
or equivalently
Xt = e−t (x + Wt ).
(Exercise 3) (Just use Itô’s formula.)
(Exercise 3.1) Write Vt = f (Wt ) = Wt3 where f (x) = x3 . Then f 0 (x) = 3x2
and f 00 (x) = 6x. Hence, from the formula
Z t Z t
f (Wt ) = 3Ws2 dWs +3 Ws ds,
0 0
we find
Z t 2
Z t 1
Vt = 3 Vt dws + 3
3
Vt 3 ds
0 0
( 2 1
dVt = 3Vt 3 dWt + 3Vt 3 dt
V (0) = 0.
100
1
Note that Vt = Wt3 ⇒ Wt = Vt 3 and Vt denotes the same thing as V (t).
(Exercise 3.2) We express Vt as
Vt = f (t, Wt )
where
f (t, x) = αt + βx2 .
Then
∂f ∂f
= 2βx ⇒ (s, Ws ) = 2βWs ,
∂x ∂x
∂f ∂f
= α⇒ (s, Ws ) = α,
∂t ∂y
∂ 2f ∂ 2f
= 2β ⇒ (s, Ws ) = 2β.
∂x2 ∂x2
Therefore,
Z t
∂ 2f

∂f
Vt − V0 = (s, W (s)) + 2 2 (s, W (s)) ds
1
0 ∂t ∂x
Z t
∂f
+ (s, W (s))dW (s)
0 ∂x
Z t Z t Z t
= 2βWs dWs + α ds + βds.
0 0 0
Z t
= 2βWs dWs + (α + β) t.
0
The corresponding SDE is

(
dVt = 2βWt dWt + (α + β) dt
V0 = 0.
(Exercise 4) We have that
dS(t) = µS(t)dt + σS(t)dW (t).
101
We recall that the solution of this SDE is
2

σW (t)+ µ− σ2 t
St = S0 e
Then
σ2
V (t) = αt + βS0 eσW (t)+(µ− 2
)t
.
Then V (t) is a function of t and Wt . Therefore, we can just write V (t) =

f (t, Wt ) where
2

σx+ µ− σ2 t
f (t, x) = αt + βS0 e
and apply Itô’s formula.
(Exercise 5) (Exercise 5.1) Recall that if r is a constant, and f (t) and g(t) are
functions depending on t, then
d(e−rt ) = −re−rt dt and d(f (t)g(t)) = d(f (t))g(t) + f (t)d(g(t). (8.2)
We have that
dXt = rdXt + σdWt .
By multiplying this equation by e−rt , we find
e−rt dXt = re−rt Xt + σe−rt dWt .
Now note that

re−rt dt = −d(e−rt ).
Then we have that
e−rt dXt = −d(e−rt )Xt + σe−rt dWt
or equivalently
e−rt dXt + d(e−rt )Xt = σe−rt dWt .
102
But clearly, using relation (8.2), we have that
e−rt dXt + d(e−rt )Xt = d(e−rt Xt ).
Therefore,
d(e−rt Xt ) = σe−rt dWt .
Let
Yt = e−rt Xt .
Then
dYt = σe−rt dWt ,
Z t
Y (t) = Y0 + σe−rs dWs .
0
Since Y0 = e−r×0 X0 = x, we find that

Z t
Y (t) = x + σe−rs dWs .
0
Replacing Y (t) by its value e−rt Xt , we find that

Z t
−rt
e Xt = x + σ e−rs dWs ,
0
that is, Z t
rt rt
0
(Exercise 5.2) The mean and variance of Xt can easily be found from the
relation Z t
rt rt
0
We have that
103
Z t
rt rt rs
E [Xt ] = E xe + E σe ē dWs .
0
The mean of a constant is equal to that constant. Then
E xert = xert .

(This quantity is constant when x, r, and t are fixed.) Hence

Z t
rt rt rt rs
E [Xt ] = xe + σe E σe ē dWs .
0
Rt
Now remember that the mean of a stochastic integral 0 g(s)dW (s) is
zero. Then we find
Z t
rt rt rt
E [Xt ] = xe + σe E σe rs
ē dWs = xert + σert × 0 = xert .
0
We also have that
V ar(Xt ) = E Xt2 − [E [Xt ]]2

and
Z t Z t 2
Xt2 2 2rt
=x e + 2xσe 2rt rs
ē dWs + σ e 2 2rt rs
ē dWs .
0 0
Then
Z t " Z t 2 #
2 2 2rt 2rt rs 2 2rt rs
E Xt = E x e + E 2xσe ē dWs + E σ e ē dWs
0 0
"Z 2 #
t
2 2rt 2rt 2 2rt
= xe + 2xσe ×0+σ e E ērs dWs .
0
104
Now remember that if
Z t
Nt = g(s)dW (s),
0
then Z t
E Nt2 = E 2

(g(s)) ds .
0
Then, in our case,

"Z 2 #
t Z t
rs 2rs
E ē dWs = E ē ds
0 0
Z t
= ē2rs ds since it is a constant
0
1 −2rs
= − (e − 1).
2r
105

DSC4821 Full Study Guide

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

DSC4821 Full Study Guide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSC4821 Full Study Guide

Uploaded by

Copyright:

Available Formats

Department of Decision Sciences

University of South Africa

1 Introduction, syllabus and assessment criteria 1

4 Multiple random variables 25

5 Functions of random variables and limit theorems 28

8 A brief introduction to stochastic calculus 75

Introduction, syllabus and

A model is a representation of a system and modelling refers to the process of

Deterministic models are specified by a set of equations that describe exactly

In these notes, we mainly work through the following book:

• Axiomatic probability theory

• Expectation, conditional expectation and moments

• Functions of random variables

• Moment-generating functions and characteristic functions

• Laws of large numbers and the central limit theorem

• Basic properties of stochastic processes

• Markov chains, birth-and-death processes

• Martingales: definitions, conditional expectations and filtrations

• Applications to queueing theory, reliability theory and renewal theory.

• Basics of stochastic calculus: Applications to financial modelling

Computer requirement: We use simulation to obtain sample paths of stochas-

Specific outcomes and assessment criteria

Outcome 1: Apply the fundamental principles of probability theory and tech-

– calculating basic probabilities

– computing probabilities and moments by conditioning

– applying Bayes’ theorem to compute probabilities

– explaining the concepts: expectation and conditional expectation, and

– identifying and manipulating the probability mass functions or density

– modelling and solving practical problems where these distributions are

– calculating the joint distribution functions of jointly distributed random

– deriving conditional means and conditional variances

– deriving probability-generating functions, moment-generating functions

– applying the concept moment-generating functions to derive moments

– modelling and solving practical problems involving multivariate distri-

– deriving the distribution of certain sums of random variables using lim-

– defining a Brownian motion and identifying its basic properties

– calculating probabilities related to joint distributions of stochastic pro-

– proving that a given process is either a Poisson process, a Brownian

– describing the basic properties of any of these stochastic processes

Outcome 5: Apply Markov chains, counting processes and Poisson processes to

– calculating transition probabilities and high-order probabilities of Markov

– modelling a real-life problem as a Markov chain and identifying its states

– calculating and interpreting the limiting distributions of a Markov chains

– characterising the arrival process and the inter-arrival process of a Pois-

– modelling a real-life problem as renewal process or Poisson process

– applying the theory of queueing models to model real life problems,

Outcome 6: Apply classical methods for evaluating the performance of estima-

– defining unbiased, efficient and consistent estimators

– computing maximum likelihood estimator, Bayes’ estimator, mean square

– applying these estimators to solve practical problems in estimation the-

2.1 LESSON 1: Basic concepts of probability

• De Morgan’s laws: The complement of A ∪ B is equal to A ∩ B and the

• Event space or σ-algebra: This is a fundamental concept in probability

(ii) If a subset A of S is in F , then its complement A in S is also in F .

Note that in Example 1.6 in Hsu (2014), S = {H, T }, the classes

{S, ∅} and {S, ∅, {H}, {T }}

are event spaces but

is not an event space. Why?