All Chapters

Download as pdf or txt
Download as pdf or txt
You are on page 1of 180

Chapter 1

Elements of Stochastic
Processes

1
Definition 1.1 A family of random variables Xt , where t is a parameter running over an index
set T , is called a stochastic process. Write {Xt ,t ∈ T }.
A realization or sample function of stochastic process {Xt ,t ∈ T } is an assignment to each
t ∈ T , of a possible value of Xt — function of t.

1 Classification of stochastic processes


The key elements in describing a stochastic process are: (1) state space, (2) index parameter
T , and (3) dependence relations.
(1) State space
This is the set of all possible values taken by Xt ’s. In the case that S = {0, 1, · · · }, we refer
to the process as integer-valued or as a discrete state process. If S = (−∞, ∞), we call Xt a
real-valued stochastic process. If S = Rk , then Xt is a k-vector process.

2
(2) Index parameter T
• If T = {0, 1, 2, · · · }, then we say that Xt is a discrete time process. In this case, we shall
write Xn instead of Xt .
• If T = [0, ∞), then Xt is called a continuous time process.

(3) Dependence relations


Let {Xt ,t ∈ T } be a real-valued stochastic process with discrete or continuous parameter set.
Example A Independent increments
If r.v.’s Xt2 − Xt1 , · · · , Xtn − Xtn−1 are independent for all values of t1, · · · ,tn such that t1 <
t2 < · · · < tn, then Xt is a process with independent increments.

Example B Martingales
We say that {Xt ,t ∈ T } is a martingale if for any t1 < t2 < · · · < tn < tn+1, we have

E(Xtn+1 |Xt1 = a1, · · · , Xtn = an) = an,

3
for all values a1, · · · , an. It may be considered as a model for fair games, in the sense that Xt
signifies the amount of money that a player has at time t. The martingale property states that
the average amount a player will have at time tn+1, given that he has amount an at time tn, is
equal to an regardless of what his past fortune has been. For example, if Xn = Z1 + · · · + Zn,
n = 1, 2, · · · , is a discrete time martingale if the Zi are independent and have means zero.
(Exercise: Prove it.)
Example C Markov process
A Markov process is a process with the property that given the values of Xt , the values of Xs,
s > t, don’t depend on the values of Xu, u < t. Specifically, a process is said to be Markovian
if

P(a < Xt ≤ b|Xt1 = x1, · · · , Xtn = xn) = P(a < Xt ≤ b|Xtn = xn), (1.1)

where t1 < · · · < tn < t.

4
Let A be an interval of the real line. The function
P(x, s;t, A) = P(Xt ∈ A|Xs = x), t > s,
is called the transition probability function. We may write (1.1) as
P(a < Xt ≤ b|Xt1 = x1, · · · , Xtn = xn) = P(xn,tn;t, A),
where A = {ξ : a < ξ ≤ b}.

A Markov process with finite or denumerable state space is called a Markov chain.

A Markov process for which all realizations or sample functions {Xt ,t ∈ [0, ∞)} are continu-
ous functions is called a diffusion process.

Poisson process is a continuous time Markov chain, and Brownian motion is a diffusion
process.

5
Example D Stationary process
A stochastic process {Xt ,t ∈ T }1 is said to be strictly stationary if the joint distribution func-
tions of the families of r.v.’s

(Xt1+h, Xt2+h, · · · , Xtn+h) and (Xt1 , Xt2 , · · · , Xtn )

are the same for all h > 0 and arbitrary selections t1,t2, · · · ,tn from T . It means that the
process is in probabilistic equilibrium and that the particular times at which we examine the
process are of no relevance.
A stochastic process {Xt ,t ∈ T } is said to be wide sense stationary or covariance stationary
if it possesses finite second moments and if cov(Xt , Xt+h) depends only on h for all t ∈ T . A
stationary process with finite second moments is covariance stationary. There are covariance
stationary processes that are not stationary.

1 Here T could be one of the sets (−∞, ∞), [0, ∞), the set of all integers, or the set of all positive integers.

6
A Markov process is said to have stationary transition probabilities if P(x, s;t, A) is a func-
tion only of t − s. Note that a Markov process with stationary transition probability is not
necessarily a stationary process.
Neither the Poisson process nor the BM process is stationary. But for Poisson/BM process
{Xt ,t ≥ 0},
Zt = Xt+h − Xt , t > 0,

is a stationary process for any fixed h ≥ 0.

2 Some preliminaries
R∞
Proposition 2.1 Suppose P(X ≥ 0) = 1. Then EX = 0 P(X > x)dx. Moreover, if ∑∞k=0 P(X =
k) = 1, then EX = ∑∞k=0 P(X > k) = ∑∞k=0 P(X ≥ k).

7
Proof. We consider
Z X
 Z ∞

EX = E 1 dt = E I(X > t)dt
0 0
Z ∞ Z ∞
= E[I(X > t)]dt = P(X > x)dx.
0 0
If X is integer-valued, then P(X > s) = P(X > k) for k ≤ s < k + 1. Hence
∞ Z k+1 ∞
EX = ∑ P(X > s)ds = ∑ P(X > k).
k=0 k k=0
The last equality is obvious.

Corollary 2.1 Suppose P(X ≥ 0) = 1 and ϕ : R → [0, ∞), then


Z ∞
E[ϕ(X)] = ϕ 0(t)P(X > t)dt.
0

Proof. Let Y = ϕ(X) and apply Proposition 2.1.

8
Chapter 2

Markov Chains
1 Definitions
A discrete time Markov chain {Xn} is a Markov stochastic process whose state space is a
countable or finite set, and for which T = (0, 1, 2, · · · ).

9
We usually label the state space of the process by the non-negative integers (0, 1, 2, · · · ) and
speak of Xn being in state i if Xn = i.

The probability of Xn+1 being in state j given that Xn is in state i is denoted by Pin,n+1
j , i.e.,

Pin,n+1
j = P{Xn+1 = j|Xn = i} (1.1)

Pin,n+1
j is called a one-step transition probability.

In general the transition probabilities are functions of not only the initial and final state,
but also of the time of transition as well. When one-step transition probabilities are inde-
pendent of the time variable, then we say that the Markov process has stationary transition
probabilities and write Pin,n+1
j = Pi j .

10
We can arrange the probabilities as a matrix,
 
P00 P01 P02 P03 ...
P
 10 P11 P12 P13 ... 

P
 20 P21 P22 P23 ... 
P =  ..

 . ... ... ... 

 
 Pi0 Pi1 Pi2 Pi3 ... 
... ... ... ...

We call P = (Pi j ) the Markov matrix or transition probability matrix of the process. The
(i + 1)st row1 of P is the probability distribution of the values of Xn+1 given Xn = i. If the
number of states is finite, then P is a finite square matrix whose order (the number of rows)
is equal to the number of states.
1 It is because the states are 0, 1, . . . .

11
Clearly, the Pi j s satisfy
Pi j ≥ 0, i, j = 0, 1, 2, . . . ,

∑ Pi j = 1, i = 0, 1, 2, . . . .
j=0

The 2nd condition expresses the fact that some transition occurs at each trial. (One says that
a transition has occurred even if the state remains unchanged.)

Proposition 1.1 A discrete time Markov chain {Xn} with stationary transition probabilities
is completely determined by (1.1) and the initial value (or distribution) of X0.

Proof Let P{X0 = i} = pi. Since any probability involving X j1 , . . . , X jk for j1 < · · · < jk may
be obtained by summing terms of the form (1.2) below, it suffices to show how to compute
the quantities
P{X0 = i0, X1 = i1, . . . , Xn = in}. (1.2)

12
We have

P{X0 = i0, X1 = i1, . . . , Xn = in}


=P{Xn = in|X0 = i0, X1 = i1, . . . , Xn−1 = in−1}P{X0 = i0, X1 = i1, . . . , Xn−1 = in−1}
=P{Xn = in|Xn−1 = in−1}P{X0 = i0, X1 = i1, . . . , Xn−1 = in−1}
=Pin−1,n
n−1 ,in
P{X0 = i0, X1 = i1, . . . , Xn−1 = in−1}
=Pin−1,in P{X0 = i0, X1 = i1, . . . , Xn−1 = in−1} = · · · = Pin−1,in Pin−2,in−1 . . . Pi0,i1 pi0 .

2 Examples of Markov Chains


(A) Spatially homogeneous Markov Chains
Let ξ denote a discrete-valued random variable whose possible values are the nonnegative
integers, P{ξ = i} = ai ≥ 0, and ∑∞i=0 ai = 1. Let ξ1, . . . , ξn, . . . be independent copies of ξ .

13
(i) Consider the process Xn, n = 0, 1, . . . , defined by Xn = ξn, (X0 = ξ0 prescribed).2
Pi,n,n+1
j = P{Xn+1 = j|Xn = i} = P{ξn+1 = j|ξn = i} = P{ξn+1 = j} = a j
Its Markov matrix has the form
 
a0 a1 a2 . . .
P =  a0 a1 a2 . . . 
... ... ...

(ii) Partial sum ηn = ξ1 + · · · + ξn, n = 1, 2, . . . , and by definition, η0 = 0. Consider


Pi,n,n+1
j = P(ηn+1 = j|ηn = i) =P(ξ1 + · · · + ξn+1 = j|ξ1 + · · · + ξn = i)
=P(ξn+1 = j − i|ξ1 + · · · + ξn = i)
(
0 j<i
= .
a j−i j ≥ i
2 This example shows that sequence of IID rvs forms a MC.

14
Therefore, the transition probability matrix, P, is given as

 
a0 a1 a2 . . .
 0 a0 a1 . . . 
 
P= .
 0 0 a0 . . . 
... ... ...

(B) One-dimensional Random Walks


A one-dimensional random walk is a Markov chain whose state space is a finite or infinite
subset a, a + 1, . . . , b of the integers, in which the particle, if it is in state i, can in a single
transition either stay in i or move to one of the adjacent states i − 1, i + 1. If the state space is

15
taken as the nonnegative integers, then the transition matrix of a random walk has the form
 
r0 p 0 0 0 . . .
 q 1 r1 p 1 0 . . . 
 
P= ,
 0 q2 r2 p2 . . . 
... ... ... ...

where pi ≥ 0, qi ≥ 0, ri ≥ 0 and pi + qi + ri = 1, i = 1, 2, . . . , p0 ≥ 0, r0 ≥ 0, p0 + r0 = 1. If
Xn = i, then for i ≥ 1,

P(Xn+1 = i + 1|Xn = i) = pi, P(Xn+1 = i − 1|Xn = i) = qi, P(Xn+1 = i|Xn = i) = ri,

with obvious modifications for i = 0.


The term “random walk” describes the path of a “drunkard” moving randomly either one
step forward or backward, or staying put. The extension to two-dimensional and three-
dimensional random walks is straightforward.

16
(C) A discrete queueing Markov chain
Customers arrive for service and take their place in a waiting line. During each period of time
a single customer is served, provided that at least one customer is present. If no customer
awaits service then during this period no service is performed. (We can imagine, for example,
taxi stand at which a cab arrives at fixed time intervals to give service. If no one is present the
cab immediately departs.) During a service period new customers may arrive. We suppose
the actual number of arrivals in the nth service period is a r.v. ξn whose distribution function is
independent of the period and is given by P(k customers arrive in a service period ) = P(ξn =
k) = ak ≥ 0, k = 0, 1, . . . and ∑∞i=0 ai = 1. We also assume the r.v.’s ξn are independent. The
state of the system at the start of each period is defined to be the number of customers waiting
in line for service. If the present state is i then after a lapse of one period the state is
(
i − 1 + ξ if i ≥ 1
j=
ξ if i = 0

17
where ξ is the number of new customers having arrived in this period while a single customer
was served. We can write Xn+1 = (Xn − 1)+ + ξn, where y+ = max(y, 0).

Pi,n,n+1
j =P(Xn+1 = j|Xn = i) = P((Xn − 1)+ + ξn = j|Xn = i)
=P((i − 1)+ + ξn = j|Xn = i) = P(ξn = j − (i − 1)+)

 
a0 a1 a2 . . .
a a1 a2 . . . 
 
P= 0
 0 a0 a1 . . . 

... ... ...

(D) A discrete Inventory Model


Consider a situation in which a commodity is stocked in order to satisfy a continuing demand.
We assume that the replenishing of stock takes place at successive times t1,t2 . . . , and we

18
assume that the “total” demand for the commodity over the period (tn−l ,tn) is a random
variable ξn whose distribution function does not depend on the time period,

P(ξn = k) = ak , k = 0, 1, 2, . . .

where ak ≥ 0 and ∑∞k=0 ak = 1. The stock level is examined at the start of each period. An
inventory policy is prescribed by specifying two nonnegative critical values s and S > s. The
implementation of the inventory policy is as follows: If the available stock quantity is not
greater than s then immediate procurement is done so as to bring the quantity of stock on
hand to the level S. If, however, the available stock is in excess of s then no replenishment of
stock is undertaken. Let Xn denote the stock on hand just prior to restocking at tn. The states of
the process {Xn} consist of the possible values of the stock size S, S − 1, . . . , 1, 0, −1, −2, . . . ,
where a negative value is interpreted as an unfulfilled demand for stock, which will be sat-
isfied immediately upon restocking. So the stock levels at two consecutive periods are con-

19
nected by the relation
(
Xn − ξn+1 if s < Xn ≤ S,
Xn+1 =
S − ξn+1 if Xn ≤ s.

If we assume the ξn’s to be mutually independent, then the stock values X0, X1, . . . constitute
a Markov chain. Its transition probabilities are

P(Xn+1 = j|Xn = i) = P(i − ξn+1 = j|Xn = i) = P(ξn+1 = i − j), s < i ≤ S,


P(Xn+1 = j|Xn = i) = P(S − ξn+1 = j|Xn = i) = P(ξn+1 = S − j), i ≤ s,
P(Xn+1 = j|Xn = i) = 0, i > S.

(E) Success runs


Consider a Markov chain on the nonnegative integers with transition probability matrix of

20
the form
 
p0 q0 0 0 ...
p 0 q1 0 ... 
 
P= 1 , (2.3)
 p2 0 0 q2 . . . 
... ... ... ...

where qi > 0, pi > 0 and qi + pi = 1, i = 0, 1, 2, . . .


A special case of this transition matrix arises when one is dealing with success runs result-
ing from repeated trials each of which admits two possible outcomes, success (S) or failure
(F). More explicitly, consider a sequence of trials with two possible outcomes (S) or (F).
Moreover, suppose that in each trial, the probability of (S) is α and the probability of (F)
is β = 1 − α. We say a success run of length r happened at trial n if the outcomes in the
preceding r + 1 trials, including the present trial as the last, were respectively, F, S, S, ...,
S. Let us now label the present state of the process by the length of the success run cur-
rently under way. In particular, if the last trial resulted in a failure then the state is zero.

21
Similarly, when the preceding r + 1 trials in order had the outcomes F, S, S, ..., S, the state
variable would carry the label r. The process is clearly Markovian (since the individual trials
were independent of each other) and its transition matrix has the form (2.3) where pn = β ,
n = 0, 1, · · · .

(F) Branching processes


Suppose an organism at the end of its lifetime produces a random number ξ of offspring with
probability distribution

P(ξ = k) = ak ≥ 0, k = 0, 1, . . . (2.4)

where, ∑∞k=0 ak = 1. We assume that all offspring act independently of each other and at the
end of their lifetime individually have progeny in accordance with the probability distribution
(2.4). The process {Xn}, where Xn is the population size at the nth generation, is a Markov

22
chain. The transition matrix is obviously given by

Pi j = P(Xn+1 = j|Xn = i) = P(ξ1 + · · · + ξi = j),

where the ξi’s are independent observations of a r.v. with probability law (2.4).

(G) Markv chains in Genetics


We assume that we are dealing with a fixed population size of 2N genes composed of type-a
and type-A individuals. The make-up of the next generation is determined by 2N independent
binomial trials as follows: If the parent population consists of j a-genes and 2N−j A-genes
then each trial results in a or A with probabilities

p j = j/(2N), q j = 1 − p j

respectively. Repeated selections are done with replacement. By this procedure we generate a
Markov chain {Xn} where Xn is the number of a-genes in the nth generation among a constant

23
population size of 2N elements. The state space contains the 2N + 1 values {0, 1, 2, ..., 2N}.
The transition probability matrix is computed according to the binomial distribution as
 
2N k 2N−k
P(Xn+1 = k|Xn = j) = Pjk = p j q j , ( j, k = 0, 1, · · · , 2N).
k

3 Transition Probability Matrices of a Markov Chain


We always assume the processes have stationary transition probability unless we specify.

Theorem 3.1 If the one-step transition probability matrix of a Markov chain is P = (Pi j ),
then

Pinj = ∑ Pikr Pksj , (3.5)
k=0

for states i, j, and any fixed pair of nonnegative integers r and s satisfying r + s = n.

24
Here, we define
(
1, i = j,
Pi0j =
0 i 6= j,
and Pinj = P(Xm+n = j|Xm = i) independent of m. Eqn (3.5) is called Chapman-Kolmogorov
equations.
(n)
Note that Pi,nj does NOT mean Pi, j raised to the power n! We use Pi,nj (an abbreviation of Pi, j )
to denote the probability of going from state i to state j in exactly n transitions.

Proof. We prove the case n = 2. The event of going from state i to state j in two transitions
can be realized in the mutually exclusive ways of going to some intermediate state k (k =
0, 1, 2, . . . ) in the first transition and then going from state k to state j in the second transition.
Because of the Markovian assumption the probability of the second transition is Pk j , and that
of the first transition is Pik . If we use the law of total probabilities (3.5) follows. The argument
in the general case is identical.

25
Remark: It is easy to check that the nth step transition probability matrix is given by
n
n
z }| {
P =P × P × · · · × P .

4 Classification of States of a Markov Chain


State j is said to be accessible from state i (or i reaches j), denoted by i → j, if for some inte-
ger n ≥ 0, Pinj > 0. Two states i and j, each accessible to the other, are said to communicate;
and we write i ↔ j. The concept of communication is an equivalence relation. That is,

i. Reflexivity: i ↔ i for all i.

ii. Symmetry: If i ↔ j, then j ↔ i.

iii. Transitivity: If i ↔ j, j ↔ k, then i ↔ k.

26
The proof of transitivity proceeds as follows: i ↔ j and j ↔ k imply that there exist integers
n and m such that Pinj > 0 and Pjkm > 0. Consequently, by (3.5) and the nonnegativity of each
Prst , we conclude that

Pikn+m = ∑ Pirn Prkm ≥ Pinj Pjkm > 0.
r=0
A similar argument shows the existence of an integer v such that Pkiv > 0, as desired.

We can partition the states into equivalence classes.

We say that the Markov chain is irreducible if the equivalence relation induces only one class,
that is, all states communicate with each other.

27
Example 1

 
1 1
0
2 2 0 0
1 3
 0
4 4 0 0  
P 0

1
P= 0 0 0 1 0 =
 
0 P2
 0 0 12 0 1 

2 
0 0 0 1 0

This Markov chain is divided into two classes {0, 1} and {2, 3, 4}, and hence is reducible.

The period of state i, written d(i), is defined to be the greatest common divisor (g.c.d.) of all
integers n ≥ 1 for which Piin > 0.
Convention: If Piin = 0 for all n ≥ 1, we define d(i) = 0.

28
Example 2 Consider the n by n matrix below,
 
0 1 0 0 ... 0
 0 0 1 0 ... 0 
 
P =  . . . . . . . . . . . . . . .
 
 0 0 0 0 ... 1 
 
1 0 0 0 ... 0

Each state has period n.

Specifically, for n = 3,
     
0 1 0 0 0 1 1 0 0
P =  0 0 1  ⇒ Pii = 0; P2 =  1 0 0  ⇒ Pii2 = 0; P3 =  0 1 0  ⇒ Pii3 = 1.
1 0 0 0 1 0 0 0 1

29
Theorem 4.1 (i) If i ↔ j, then d(i) = d( j).

(ii) If state i has period d(i), then there exists an integer N depending on i such that for all
integers n ≥ N
nd(i)
Pii >0
m+nd(i)
(iii) If Pjim > 0, then Pji > 0 for all n sufficiently large.

Lemma 4.1 Let n1, . . . , nk be positive integers with greatest common divisor d. Then there
exists a positive integer M such that m ≥ M implies existing nonnegative integers {c j }kj=1 s.t.
md = ∑kj=1 c j n j .

Remark Lemma 4.1 or an alternative lemma for (ii) in Theorem 4.1 are number theoretic
results, and its proof is omitted here.3
3 For more detailed hint of the proof of Lemma 4.1, see page 77 Problem 2 in Karlin & Taylor (1975) A First Course in Stochastic
Process. For an alternative lemma, see Section 2.8.2 in Port, Hoel & Stone.

30
Proof of Theorem 4.1 (ii) If d(i) = 0, then the statement is obviously true for all n ≥ 1. So,
n n `
we assume d(i) > 0. That is, ∃n0 s.t. Pii 0 > 0 ⇒ Pii 0 > 0 for any positive integer ` because

n ` n (`−1) n (`−1)
Pii 0 = ∑ Pikn0 Pki0 n
≥ Pii 0 Pii 0
n
≥ · · · ≥ (Pii 0 )` > 0.
k=0

The definition of d(i) shows that there exist n1, · · · , nk s.t.


n
d(i) = g.c.d.(n1, n2, · · · , nk ), Pii ` > 0, ` = 1, 2, · · · , k.

Then by Lemma 4.1, there exists an integer N depending on i such that for all integers n ≥ N,
nd(i) ∑ c jn j c n
Pii = Pii ≥ Piic1n1 Piic2n2 . . . Pii k k > 0.
m+nd(i) nd(i)
(iii) Obviously, as Pji ≥ PjimPii > 0.

nd(i)
(i) From (ii), we know that ∃N, s.t. ∀n ≥ N, Pii > 0.

31
m
Given i ↔ j, ∃ m0, m1 s.t. Pi j 0 > 0 and Pjim1 > 0. Then,

m +m +nd(i) m +nd(i) m +nd(i)
Pj j0 1 = ∑ Pjkm1 Pk j0 ≥ Pjim1 Pi j 0
k=0

nd(i) m0 nd(i) m0
= Pjim1 ∑ Pik Pk j ≥ Pjim1 Pii Pi j > 0.
k=0

This gives that d( j)|m0 + m1 + nd(i) and d( j)|m0 + m1 + (n + 1)d(i). So d( j)|d(i). Similarly,
d(i)|d( j). So d( j) = d(i).

l d(i) l d(i)
(i) Method 2. ∃ l1, l2, s.t. Pii1 > 0, Pii2 > 0 and g.c.d. {l1, l2} = 1. Since i ↔ j, ∃ m, n
m+n+l1 d(i) m+n+l2 d(i) m+n+2l1 d(i)
such that Pinj > 0, Pjim > 0. So Pj j > 0. Similarly, Pj j > 0, Pj j > 0,
m+n+2l d(i)
2
Pj j > 0 ⇒ l1d(i) = k3d( j), l2d(i) = k4d( j) ⇒ d(i) = k5d( j). [Note that if g.c.d.(a, b) =
1, then there exist α and β s.t. aα + bβ = 1.] In a similar way, d( j) = k6d(i).

32
(i) Method 3. Since i ↔ j, ∃m, n, s.t. Pjim > 0, Pinj > 0. If Piis > 0, then Pjs+m+n
j ≥ PjimPiisPinj >
0 ⇒ s + m + n = ld( j). It follows from Piis > 0, then Pii2s ≥ PiisPiis > 0. So Pj2s+m+n
j >0⇒
2s + m + n = l1d( j). Thus s = (l1 − l)d( j) ⇒ d( j)|d(i). Similarly, d(i)|d( j).

A Markov chain in which each state has period one is called aperiodic. The vast majority
of Markov chain processes we deal with are aperiodic. Random walks usually typify the
periodic cases arising in practice. Results will be developed for the aperiodic case and the
modified conclusions for the general case will be stated usually without proof.

5 Recurrence
For any fixed i. We define, for each integer n ≥ 1,

finj = P{Xn = j, Xv 6= j, v = 1, 2, . . . , n − 1|X0 = i},

33
is the probability of the first passage from state i, to state j at the nth transition. We define
fi0j = 0 for all i and j. Recall that Pi0j = 1 if i = j, = 0 if i 6= j.

Clearly, fi1j = Pi j and finj may be calculated recursively, for i 6= j, n ≥ 0, by


n
Pinj = ∑ P(Xk = j, Xv 6= j, v = 1, 2, · · · , k − 1|X0 = i)P(Xn = j|Xk = j)
k=0
n
= ∑ fikj Pjn−k
j ;
k=0
n
Piin = ∑ fiikPiin−k, n ≥ 1.
k=0

Definition The generating function Pi j (s) of the sequence {Pinj } is



Pi j (s) = ∑ Pinj sn, for |s| < 1.
n=0

34
The generating function Fi j (s) of the sequence { finj } is

Fi j (s) = ∑ finj sn, for |s| < 1.
n=0

Note that for |s| < 1,


( )( )
∞ ∞
Fi j (s)Pj j (s) = ∑ finj sn ∑ Pjnj sn
n=0 n=0
( )
∞ n
= ∑ ∑ firj Pjn−r
j sn
n=0 r=0
(

Pi j (s), i 6= j,
= ∑ Pinj sn = Pii(s) − 1. i = j.
n=0

We say that a state i is recurrent if and only if ∑∞n=1 fiin = 1. In other words, a state i is
recurrent if and only if, starting from state i, the probability of returning to state i after some

35
finite length of time is one. A non-recurrent state is said to be transient, i.e., ∑∞n=1 fiin < 1.
(Note that ∑∞n=1 fiin ≤ 1. Why?)
Before stating a theorem relating the recurrence/non-recurrence of a state to the behavior
of its n-step transition probabilities Piin, we need the following

Lemma 5.1 (Abel) (a) If ∑∞k=0 ak converges, then


∞ ∞
k
lim ∑ ak s = a := ∑ ak .
s→1−
k=0 k=0

(Here, lims→1− denotes s tending to 1 from values less than 1).


(b) If ak ≥ 0 and lims→1− ∑∞k=0 ak sk = a ≤ ∞, then
∞ N
∑ ak = N→∞
lim ∑ ak = a.
k=0 k=0

36
Theorem 5.1 (i) A state i is recurrent iff ∑∞n=0 Piin = ∞.
(ii) If i ↔ j and i is recurrent, then j is recurrent.

Proof. (i)“⇒” Assume i is recurrent, i.e. ∑∞n=1 fiin = 1. Then by Lemma 5.1(a) lims→1− Fii(s) =
1. Thus lims→1− Pii(s) = lims→1−(1−Fii(s))−1 = ∞. Applying Lemma 5.1(b), we have ∑∞n=0 Piin =
∞.

“⇐” We shall prove by contradiction. Suppose i is transient, i.e., ∑∞n=1 fiin < 1. By Lemma
5.1(a), we have lims→1− Fii(s) < 1, so lims→1− Pii(s) < ∞. Now appealing to Lemma 5.1(b),
we have ∑∞n=0 Piin < ∞, which gives the desired contradiction.

(ii) Since i ↔ j, there exist m, n ≥ 1 s.t. Pinj > 0, Pjim > 0. For any positive integer v, we
have Pjm+n+v
j ≥ PjimPiivPinj , and on summing over ν,
∞ ∞ ∞
∑ Pjm+n+v
j ≥ ∑ PjimPiivPinj = PjimPinj ∑ Piiv.
v=0 v=0 v=0

37
Given the assumption that i is recurrent, by (i) ∑∞v=0 Piiv = ∞, hence ∑∞v=0 Pjvj = ∞.

6 Examples of Recurrent Markov Chains


Example 1 Consider the one-dimensional random walk on the integers (i.e., positive, nega-
tive integers and zero), where at each transition the particle moves with probability p one unit
to the right and with probability q one unit to the left (p + q = 1). We have, for n = 0, 1, 2, . . . ,
 
2n+1 2n 2n n n
P00 = 0 and P00 = pq.
n
n+ 12 −n

Recall Stirling’s formula, n! ∼ n e 2π,

2n (pq)n22n (4pq)n
P00 ∼ √ = √ .
πn πn

38
Note further, 4pq ≤ 1 with equality iff p = q = 12 . Hence, ∑∞n=0 P00
n
= ∞ iff p = q = 12 . By
Theorem 5.1, the one-dimensional random walk is recurrent if and only if p = q = 12 .

Example 1.2 Simple random walk on Z2.

Example 1.3 Simple random walk on Z3.

7 More on Recurrence
Define Qii = P{MC starting in state i returns infinitely often to state i}.
Theorem 7.1 State i is recurrent or transient according to whether Qii = 1 or 0, respectively.
Proof. Let QNii be defined as
QNii = P{a particle starting in state i returns to state i at least N times}

39
Then, QNii = ∑∞k=1 fiik QN−1
ii = QN−1
ii fii∗, where fii∗ := ∑∞k=1 fiik . Proceeding recursively,

QNii = fii∗QN−1
ii = ( fii∗)2QiiN−2 = · · · = ( fii∗)N−1Q1ii = ( fii∗)N .

Since limN→∞ QNii = Qii, we have Qii = 1 or 0 according to fii∗ = 1 or < 1, respectively.
Equivalently, according to whether state i is recurrent or transient.

Theorem 7.2 Suppose i ↔ j. If [i] (equivalence class of i) is recurrent, then



fi∗j := ∑ finj = 1.
n=1

Proof. Since j → i, ∃N, s.t. f jiN > 0. Let N0 be the smallest N such that f jiN > 0. So
N
1 − f j∗j = P{a particle starting in state j never returns to state j} ≥ f ji 0 (1 − fi∗j ),

40
where 1 − fi∗j is the prob of never going to state j from state i. If fi∗j < 1, then 1 − f j∗j > 0 ⇒
f j∗j < 1. This implies state j is transient, contradicting the fact that j is recurrent. So, fi∗j = 1.

Define Qi j = P{a particle starting in state i visits state j infinitely often}.

Corollary 7.1 If i ↔ j and the class is recurrent, then Qi j = 1.

Proof. It is easy to see that

Qi j = fi∗j Q j j .

Since j is a recurrent state, by Theorem 7.1, Q j j = 1. By Theorem 7.2 fi∗j = 1. Hence Qi j = 1.

41
42
Chapter 3

The Basic Limit Theorem of


Markov Chains and
Applications
43
1 Discrete Renewal Equation
Theorem 1.1 Let {ak }, {bk }, {uk } be sequences indexed by k = 0, ±1, ±2, . . . . Suppose that
ak ≥ 0, ∑ ak = 1, ∑ |k|ak < ∞, ∑ kak > 0, ∑ |bk | < ∞, and that the greatest common divisor of
the integers k for which ak > 0 is 1. If the renewal equation


un = bn + ∑ an−k uk , for n = 0, ±1, ±2, . . . ,
k=−∞

is satisfied by a bounded sequence {un} of real numbers, then (i) limn→∞ un exists, (ii) limn→−∞ un
exists. Furthermore, (iii) if limn→−∞ un = 0, then

∑∞k=−∞ bk
lim un = ∞ .
n→∞ ∑k=−∞ kka

44
In the case that ∑∞k=−∞ kak = ∞, the limit relations are still valid provided we interpret
∑∞k=−∞ bk
∞ = 0.
ka
∑k=−∞ k

The proof of this theorem in its general form as stated here is beyond the scope of this class.
We will apply this theorem only for the case where ak = uk = bk = 0 for k < 0, and bk ≥ 0.

Remark In the case where a−k = b−k = u−k = 0 for k > 0, the renewal equation becomes
n n
un = bn + ∑ ak un−k = bn + ∑ an−k uk , for n = 0, 1, 2, · · · .
k=0 k=0

Remark (Reason for the term “renewal equation”) Consider a light bulb whose lifetime,
measured in discrete units, is a random variable ξ where P(ξ = k) = ak , for k = 0, 1, 2, · · · .
Thus ak ≥ 0 and ∑∞k=0 ak = 1. Let each bulb be replaced by a new one when the one in use

45
burns out. Suppose the first bulb lasts until time ξ1, the second bulb until time ξ1 + ξ2, and
the n-th bulb until time ∑ni=1 ξi. Here, ξi’s are independent identically distributed as ξ . Let un
denote the expected number of renewals (replacements) up to time n. If the first replacement
occurs at time k then the expected number of replacements in the remaining time up to n is
un−k , and summing over all possible values for k, we obtain
n ∞
un = ∑ (1 + un−k)ak + 0 ∑ ak
k=0 k=n+1
n n
= ∑ ak + ∑ akun−k (1.1)
k=0 k=0
n
= bn + ∑ ak un−k ,
k=0

where bn := ∑nk=0 ak . The reasoning behind (1.1) goes as follows. There are two cases to
consider: (i) If the first bulb fails at time k(0 < k < n), which happens with probability ak ,

46
then the factor 1 + un−k is the expected number of replacements in time n. (ii) If the first
bulb lasts a duration exceeding n time units with probability ∑∞k=n+1 ak , no replacement has
happened.

The theorem below describes the limiting behavior of Pinj as n → ∞ for all i and j in the case
of an aperiodic recurrent Markov chain. The proof is a simple application of Theorem 1.1.

Theorem 1.2 (Basic limit theorem of Markov chains) Consider a recurrent irreducible ape-
riodic Markov chain. For all states, i and j,

(a) limn→∞ Piin = ∑∞ 1 n f n ;


n=0 ii

(b) limn→∞ Pjin = limn→∞ Piin.

47
Proof. (a) Note that (see Section 5)
(
n
1, n = 0;
Piin − ∑ fiin−k Piik =
k=0 0, n > 0.
Take
( ( (
Piik , k ≥ 0, fiik , k ≥ 0, 1, k = 0,
uk = ak = bk =
0, k < 0; 0, k < 0; 0, k 6= 0;
and then apply Theorem 1.1.
(b) We use the recursion relation
n
Pjin = ∑ f jik Piin−k i 6= j, n ≥ 0. (1.2)
k=0

Write
yn = Pjin , an = f jin , xn = Piin.

48
Eqn (1.2) is of the form yn = ∑nk=0 an−k xk , where am ≥ 0, ∑∞m=0 am = 1, limk→∞ xk = c. It is
known that limn→∞ yn = c. Apply Theorem 7.2 to arrive at the desired result.

If limn→∞ Piin = πi > 0, for one i in an aperiodic class, then we may show that π j > 0 for
all j in the class of i (Pjm+n+v
j ≥ PjimPiivPinj ). So in this case, we call the class positive recurrent
or strongly ergodic. If each πi = 0, and the class is recurrent we call the class null recurrent
or weakly ergodic.

Theorem 1.3 In a positive recurrent aperiodic class with states j = 0, 1, 2, · · · ,


∞ ∞
lim Pjnj = π j = ∑ πiPi j , ∑ πi = 1
n→∞
i=0 i=0

and πi’s are uniquely determined by the set of equations


∞ ∞
πi ≥ 0, ∑ πi = 1, π j = ∑ πiPi j . (1.3)
i=0 i=0

49
If {πi : i ≥ 0} satisfies (1.3), it is called a stationary probability distribution of the MC.
Proof. For every n and M, 1 = ∑∞j=0 Pinj ≥ ∑Mj=0 Pinj . Letting n → ∞, and using Theorem 1.2,
we obtain 1 ≥ ∑Mj=0 π j for every M. Thus ∑∞j=0 π j ≤ 1.
Also Pin+1
j ≥ ∑M n M
k=0 Pik Pk j ; if we let n → ∞, we obtain π j ≥ ∑k=0 πk Pk j . Since the left-hand
side is independent of M, letting M → ∞ gives

πj ≥ ∑ πkPk j . (1.4)
k=0

Multiplying (1.4) on the right by Pji, then summing over j yields π j ≥ ∑∞k=0 πk Pk2j . Inductively,
π j ≥ ∑∞k=0 πk Pknj for any n.
Suppose strict inequality holds for some j. Adding these inequalities with respect to j, we
have
∞ ∞ ∞ ∞ ∞ ∞
∑ πj > ∑ ∑ πk Pknj = ∑ πk ∑ Pknj = ∑ πk,
j=0 j=0 k=0 k=0 j=0 k=0

50
a contradiction. Thus, π j = ∑∞k=0 πk Pknj for any n. Letting n → ∞, since ∑ πk converges and
Pknj is uniformly bounded, we conclude that
∞ ∞
πj = lim
∑ πk n→∞ Pknj = π j ∑ πk for every j.
k=0 k=0

Since j is positive recurrence, π j > 0 and thus ∑∞k=0 πk = 1.

Suppose x = {xn} satisfies the relations (1.3). Then


∞ ∞
xk = ∑ x j Pjk = ∑ x j Pjkn ,
j=0 j=0

and if we let n → ∞ as before,


∞ ∞
xk = lim
∑ x j n→∞ Pjkn = πk ∑ x j = πk .
j=0 j=0

51
Example Consider the class of random walks whose transition matrices are given by
 
0 1 0 0 ...
 q1 0 p1 0 . . . 
 
P= .
 0 q2 0 p2 . . . 
... ... ... ...

This Markov chain has period 2. Nevertheless we investigate the existence of a stationary
probability distribution, i.e., we wish to determine the positive solutions of

xi = ∑ x j Pji = pi−1xi−1 + qi+1xi+1, i = 0, 1, · · · , (1.5)
j=0

under the normalization ∑∞i=0 xi = 1, where p−1 = 0 and p0 = 1, and thus x0 = q1x1. Letting
i = 1 in Eqn (1.5), x2 can be expressed in terms of x0; then letting i = 2 in Eqn (1.5), x3 can

52
be expressed in terms of x0; and so. Indeed, by induction,
i−1
pk
xi = x0 ∏ , i ≥ 1.
q
k=0 k+1

Since
∞ i−1
pk
1 = x0 + ∑ x0 ∏ ,
i=1 q
k=0 k+1

we have
1
x0 = i−1 pk
.
1 + ∑∞i=1 ∏k=0 q k+1

Therefore,
∞ i−1
pk
x0 > 0 ⇔ ∑ ∏ qk+1 < ∞.
i=1 k=0

53
In particular, if pk = p and qk = q = 1 − p for k ≥ 1, the series

∞ i−1  i

pk 1 p
∑ ∏ qk+1 = ∑
i=1 k=0 p i=1 q

converges only when p < q (i.e., p < 1/2).1

2 Absorption Probabilities
If T is the set of all transient states, then define

xi1 = ∑ Pi j ≤ 1, i ∈ T,
j∈T

1 That is, the MC is more likely to go to the left than to the right.

54
and define
xin = ∑ Pi j xn−1
j , n ≥ 2, i ∈ T.
j∈T
In words, xin denotes the probability that, starting from i, the Markov chain stays within T for
the next n transitions. Since xin ≤ 1 (why?) for all n ≥ 1, we shall prove by induction that xin
is non-increasing as a function of n as follows:
xi2 = ∑ Pi j x1j ≤ ∑ Pi j = xi1.
j∈T j∈T

Now assuming that xnj ≤ xn−1


j for all j ∈ T , we have
0 ≤ xin+1 = ∑ Pi j xnj ≤ ∑ Pi j xn−1
j = xin.
j∈T j∈T

Therefore, xin ↓ xi (i.e., xin decreases to a limit xi), and


xi = ∑ Pi j x j , i ∈ T. (2.6)
j∈T

55
It follows that if the only bounded solution of this set of equations is the zero vector
(0, 0, ...), then starting from any transient state absorption into a recurrent class occurs with
probability one. The reason is as follows. It is clear that xi (i ∈ T ) is the probability of never
being absorbed into a recurrent class, starting from state i. Since this sequence is a bounded
solution of (2.6), it follows that xi is zero for all i.
Let C,C1,C2, · · · denote recurrent classes. For a transient state i, let πi(C) be the probability
that the process starting at state i will be eventually absorbed in C.2
Let πin(C)= probability that the process will enter and thus be absorbed in C for the first time
at the n-th transition, given that the initial state is i ∈ T .

2 Once the process enters a recurrent class, it will never leaves the recurrent class. Proof Suppose α is in a recurrent class C, β
not in C and Pαβ > 0. Then Pβnα = 0 for any n > 0. Otherwise, β would be in C too. So if the process starts in α, there exists positive
probability, at least Pαβ , that the process will not return to α. This contradicts the fact that α is recurrent. Hence Pαβ = 0.

56
Then

πi(C) = ∑ πin(C) ≤ 1, (2.7)
n=1
πi1(C) = ∑ Pi j , πin(C) = ∑ Pi j π n−1
j (C), n ≥ 2. (2.8)
j∈C j∈T

In (2.8) we use the fact that if from i, the MC goes to j in a recurrent class, then π n−1
j (C) = 0.
Rewriting (2.7) using (2.8) gives
∞ ∞ ∞
πi(C) = πi1(C) + ∑ πin(C) = πi1(C) + ∑∑ Pi j π n−1
j (C) = πi1(C) + ∑ Pi j ∑ π n−1
j (C),
n=2 n=2 j∈T j∈T n=2
πi(C) = πi1(C) + ∑ Pi j π j (C), i ∈ T. (2.9)
j∈T

Theorem 2.1 Let j ∈ C, an aperiodic recurrent class. Then, for i ∈ T ,


lim Pinj = πi(C) lim Pjnj = πi(C)π j .
n→∞ n→∞

57
Proof. Clearly, πin(C) = ∑k∈C πikn (C), where πikn (C) is the probability of the MC starting from
i and being absorbed at the n-th transition into class C at state k. We have

πi(C) = ∑ ∑ πikv (C) ≤ 1.
v=1 k∈C

Therefore, for any ε > 0 there exist a finite number of states C0 ⊂ C and an integer N(ε) = N
s.t.
n ∞ n
|πi(C) − ∑ ∑0 πikv (C)| < ε, or |∑ ∑ πikv − ∑ ∑0 πikv | < ε, (2.10)
v=1 k∈C v=1 k∈C v=1 k∈C

for n > N(ε) and where we abbreviated πikv (C) by πikv .


For j ∈ C consider
n
Pinj − ∑ ∑0 πikv π j .
v=1 k∈C

58
Decomposing the events by the time of first entering some state in C, we have
n
Pinj = ∑ ∑ πikv Pkn−v
j , i ∈ T, j ∈ C.
v=1 k∈C

Combining these relations, we have


n n n
|Pinj − πikv πikv (Pkn−v ∑ ∑ 0 πikv Pkn−v

∑ ∑0 π j| = | ∑ ∑0 j − π j) + j |
v=1 k∈C v=1 k∈C v=1 k∈C\C
N
≤ |∑ ∑0 πikv (Pkn−v
j − π j )|
v=1 k∈C
n n
+ | ∑ ∑0 πikv (Pkn−v
j − π j )| + ∑ ∑ 0 πikv Pkn−v
j .
v=N+1 k∈C v=1 k∈C\C

0
But Pkn−v n−v n−v
j ≤ 1, |Pk j − π j | ≤ 2, and limn→∞ Pk j = π j if C is aperiodic and k ∈ C (Theorems

59
1.2 & 1.3). Hence there exists N 0 > N s.t. for n > N 0, |Pkn−N
j − π j | < ε(k ∈ C0). So, for n > N 0,
n n n
|Pinj − πikv πikv + ∑ ∑ 0 πikv Pkn−v

∑ ∑0 π j| ≤ ε + 2 ∑ ∑0 j .
v=1 k∈C v=N+1 k∈C v=1 k∈C\C

However, the choice of N and C assures us that the right-hand side is ≤ 4ε. Then appealing
to (2.10) and the above result, we obtain

|Pinj − πi(C)π j | ≤ 4ε + επ j for n > N 0,

and therefore
lim Pinj = πi(C) lim Pjnj = πi(C)π j .
n→∞ n→∞

We emphasize the fact that if i is a transient state and j is a recurrent state, then the limit of
Pinj depends on both i and j. This is in sharp contrast with the case where i and j belong to
the same recurrent class.

60
Example (The gambler’s ruin on n + 1 states). [Note: n does not denote time here.]

 
1 0 0 0 ... ... ... ...
q
 0 p 0 ... ... ... ...  
0 q 0 p ... ... ... ... 
 
P =  .. ... ... ... ... ... ... ... .
 . 
 
 ... q 0 p 
... 0 0 1

We shall calculate ui = πi(C0) and vi = πi(Cn), the probabilities that starting from i the process
ultimately enters the absorbing (and therefore recurrent) states3 0 and n, respectively. The

3 When the MC enters state 0, the gambler is ruined. When the MC enters state n, the gambler wins all (or his opponent is ruined)

61
system of equations (2.9) becomes

u1 = q + pu2;
ui = qui−1 + pui+1, 2 ≤ i ≤ n − 2; (2.11)
un−1 = qun−2.

We try a solution of the form ur = xr . Substituting in the middle equations and cancelling
common factors leads to px2 + q = x. This quadratic equation has two solutions: x = 1 and
x = q/p. So, ur = A + B(q/p)r , r = 1, 2, · · · , n − 1, satisfy the middle equations of (2.11) for
any A and B. We now determine A and B so that the first and the last equations hold.4 In the
case p/q 6= 1, the first equation leads to

q2
 
q
A+B = q+ p A+B 2 ;
p p
4 If q = p, the solution x = 1 is a double root of px2 + q = x, and one then has to replace (q/p)r by r. That is, ur = A + Br.

62
or equivalently,
A = 1 − B.
The last equation leads to
 n−1  n−2!
q q
A+B = q A+B ; or pnA + qnB = 0.
p p

Solving these equations in A and B, we get


qn −pn
A= n , B= n .
q − pn q − pn
From this, we have
(q/p)n − (q/p)r q
ur = , if 6= 1.
(q/p)n − 1 p

63
For the case q = p, similarly we find that A = 1 and B = −1/n. So,
n−r
ur = , when p = q.
n
A similar calculation shows that vi = 1 − ui, 0 ≤ i ≤ n. This is to be expected since there are
only two recurrent classes, C0 and Cn.

Consider the situation when gambler is playing against an infinitely rich adversary. The
equations for the probability of the gambler’s ruin (absorption into 0) become
u1 = q + pu2, (2.12)
ui = qui−1 + pui+1, i ≥ 2.
Again we find that, for i ≥ 0,
 i
q 1
ui = A + B , for q 6= p; and ui = A + Bi, for q = p = .
p 2

64
If q ≥ p, then the condition that ui is bounded requires that B = 0 and the equation (2.12)
shows that ui ≡ 1. If q < p, we find that ui = (q/p)i.

Remark In fact, a simple passage to the limit from the finite state gambler’s ruin yields
u1 = q/p and then it readily follows that ui = (q/p)i.

65
3 Criteria for Recurrence
We prove two theorems which will be useful in determining whether a given Markov chain
is recurrent or transient.

Theorem 3.1 Let B be an irreducible Markov chain whose state space is labeled by the
nonnegative integers. Then a necessary and sufficient condition that B be transient (i.e.,
each state is a transient state) is that the system of equations

∑ Pi j y j = yi, i > 0, (3.13)
j=0

has a bounded nonconstant solution.

66
Proof. Let the transition matrix for B be
 
P00 P01 . . .
P = (Pi j ) =  P10 P11 . . .  ,
... ...

and associate with it the new transition matrix


 
1 0 0 ...
  P10 P11 P12 . . . 

P̃ = P̃i j =  . (3.14)

 P20 P21 P22 . . . 
... ... ...

We denote the Markov chain with transition probability matrix (3.14) by B̃. For the necessity,
we shall assume that the process is transient and then exhibit a nonconstant bounded solution
of (3.13).

67
Let fi0∗ = probability of entering state 0 in some finite time given that i is the initial state
( fi0∗ = ∑∞n=1 fi0n ). Since the process B is transient f j0 ∗
< 1 for some j > 0. (If f j0 ∗
= 1 for all
∗ ∞ n ∞ ∞ n−1 ∞ ∗ ∞
j > 0, then f00 = ∑n=1 f00 = f00 + ∑n=2 ∑ j=1 P0 j f j0 = P00 + ∑ j=1 P0 j f j0 = ∑ j=0 P0 j = 1. That
is, state 0 is recurrent, a contradiction!)

For the process B̃ clearly π̃0(C0) = 1, π̃ j (C0) = f j0 < 1 for some j > 0, and π̃i(C0) =
∞ ∞
∑ j=0 P̃i j π̃ j (C0) for all i. Hence π̃i(C0) = ∑ j=0 P̃i j π̃ j (C0) for i > 0 and thus y j = π̃ j (C0)( j =
0, 1, 2, . . . ) is the desired bounded nonconstant solution.
Now assume that we have a nonconstant bounded solution {yi} of (3.13). Then


∑ P̃i j y j = yi, for all i ≥ 0,
j=0

and iterating this equation repeatedly, we have, for all i > 0 and all n ≥ 1,

68

∑ P̃inj y j = yi
j=0

If the chain B were recurrent, then


lim P̃i0n = 1,
n→∞

(since in B, ∑∞n=1 fi0n = 1, and P̃i0m = ∑m n


n=1 f i0 ); and as n → ∞,

∑ P̃inj y j ≤ M(1 − P̃i0n ) → 0,


j>0

where M is a bound for {y j }. Hence

yi = P̃i0n y0 + ∑ P̃inj y j → y0.


j>0

So, yi = y0 for all i, which contradicts the fact that the y j ’s are not all equal.

69
Theorem 3.2 In an irreducible Markov chain a sufficient condition for recurrence is that
there exists a sequence {yi} such that

∑ Pi j y j ≤ yi for i > 0 with yi → ∞. (3.15)
j=0

Proof. Using the same notation as in the previous theorem, we have



∑ P̃i j y j ≤ yi for all i.
j=0

Since zi = yi + b satisfies (3.15), we may assume yi > 0 for all i ≥ 0. Iterating the preceding
inequality, we have


∑ P̃imj y j ≤ yi.
j=0

70
Given ε > 0 we choose M(ε) such that 1/yi ≤ ε for i ≥ M(ε) (since yi → ∞). Now
M−1 ∞
∑ P̃imj y j + ∑ P̃imj y j ≤ yi,
j=0 j=M

and so
M−1 ∞
∑ P̃imj y j + min{yr } ∑ P̃imj ≤ yi.
r≥M
j=0 j=M

Since

∑ P̃imj = 1
j=0

we have
!
M−1 M−1
∑ P̃imj y j + min
r≥M
{yr } 1− ∑ P̃imj ≤ yi .
j=0 j=0

71
Suppose B is transient, noting in B̃, 0 is an absorbing state, we see that
lim P̃inj ≤ lim Pinj = 0 for j > 0.
n→∞ n→∞
Thus, passing to the limit as m → ∞, we obtain for each fixed i > 0,
π̃i(C0)y0 + min{yr } (1 − π̃i(C0)) ≤ yi;
r≥M
1
1 − π̃i(C0) ≤ (yi − π̃i(C0)y0) ≤ εK
minr≥M {yr }
∗ n
where K = yi − π̃i(C0)y0. Letting ε → 0, π̃i(C0) = 1 for each i > 0. Hence f00 = ∑∞n=1 f00 =
n−1 ∗
f00 + ∑∞n=2 ∑∞j=1 P0 j f j0 = P00 + ∑∞j=1 P0 j f j0 = ∑∞j=0 P0 j = 1. The Markov chain is recurrent.

4 Applications & References


Discrete time Markov Chain (DTMC) finds many and multifaceted applications. There are
too many to enumerate. Here are some resources you can explore.

72
Wikipedia https://en.wikipedia.org/wiki/Markov_chain#Applications
gives brief descriptions and references of the varied applications of DTMC in Physics, Chem-
istry, Biology, Speech recognition, Information theory, Queueing theory, Search engines
(e.g., Google PageRank), Statistics (e.g., MC Monte Carlo), Economics and Finance, So-
cial sciences, Games, Music, Sports (e.g., Baseball), ...

I also prompted ChatGPT to compile a list of some applications with references:

1) Economics and Finance

a) Markov Chains: From Theory to Implementation and Experimentation by Paul A.


Gagniuc. This book provides practical insights into the application of Markov chains
in various domains, including finance.
b) The Theory of Interest by Stephen G. Kellison. This book covers financial models
and applications, including those involving stochastic processes like Markov chains.

73
2) Operations Research

a) Introduction to Probability Models by Sheldon M. Ross. This textbook covers a


range of applications, including queueing theory and inventory management.
b) Operations Research: An Introduction by Taha H. A. This textbook includes appli-
cations of Markov chains in operations research.

3) Communication Systems

a) Probability and Random Processes for Electrical and Computer Engineers by John
A. Gubner. This book covers network traffic modeling and error correction.
b) Digital Communications: Fundamentals and Applications by Bernard Sklar. In-
cludes discussions on the use of Markov chains in communication systems.

4) Biology and Medicine

74
a) Mathematical Models in Biology by Elizabeth A. Allman and John A. Rhodes. This
book explores biological applications of mathematical models, including Markov
chains.
b) Biostatistical Methods: The Assessment of Relative Risks by John M. Lachin. Pro-
vides an overview of statistical methods in medicine, including disease modeling.

5) Computer Science

a) Algorithms by Robert Sedgewick and Kevin Wayne. This textbook covers algorithms
related to Markov chains and their applications in computer science.
b) Probability and Computing: Randomized Algorithms and Probabilistic Analysis by
Michel X. Goemans, David P. Williamson, and Stephen A. Vavasis. Discusses ran-
domized algorithms and Markov chains.

6) Game Theory, Decision Making and Learning

75
a) Markov Chains and Stochastic Stability by Sean P. Meyn and Richard L. Tweedie.
This book covers Markov chains in decision-making contexts and game theory.
b) Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G.
Barto. Covers applications of Markov decision processes in reinforcement learning.

7) Physics and Chemistry

a) Statistical Mechanics: Algorithms and Computations by Werner Krauth. Discusses


Markov chains in the context of statistical mechanics.
b) Molecular Dynamics: An Introduction by Mark Tuckerman. Provides insights into
molecular dynamics and the use of Markov chains in simulations.

8) Social Sciences

a) Social Network Analysis: Methods and Applications by Wasserman and Faust. Cov-
ers applications of Markov chains in social network analysis.

76
b) Behavioral Modeling and Simulation: From Individuals to Societies by Roger L.
Cooke. Discusses modeling human behavior and social systems using Markov chains.

9) Robotics and Control Systems

a) Robotics: Modelling, Planning and Control by Bruno Siciliano and Lorenzo Sciav-
icco. Covers the application of Markov chains in robotics and control systems.
b) Markov Chains: Theory and Applications by R. R. Yager and D. P. Sen. Provides a
comprehensive overview of Markov chains, including their application in robotics.

10) Environmental Science

a) Ecological Models and Data in R by Benjamin M. Bolker. This book includes dis-
cussions on using Markov chains for ecological modeling.

77
b) Climate Change: The Science of Global Warming and Our Energy Future by Ed-
mond A. Mathez. Discusses environmental modeling, including applications of
Markov chains.

78
79
Chapter 4

Classical Examples of
Continuous Time Markov
Chains
80
1 Poisson Processes and General Pure Birth Processes
In this section, we consider a family of random variables {X(t); 0 ≤ t < ∞} where the possi-
ble values of X(t) are the nonnegative integers. We shall restrict attention to the case where
{X(t)} is a Markov process with stationary transition probabilities. Thus, the transition prob-
ability function for t > 0,

Pi j (t) = P{X(t + u) = j|X(u) = i}, i, j = 0, 1, 2, . . . ,

is independent of u ≥ 0.

(A) Postulates for Poisson Process


A Poisson process is a Markov process on the nonnegative integers which has the following
properties:

i. P{X(t + h) − X(t) = 1|X(t) = x} = λ h + o(h) as h ↓ 0 (x = 0, 1, 2, . . . ).

81
ii. P{X(t + h) − X(t) = 0|X(t) = x} = 1 − λ h + o(h) as h ↓ 0.

iii. X(0) = 0.

The o(h) symbol means that if we divide this term by h then its value tends to zero as h
tends to zero. Notice that the right-hand side is independent of x.

(B) Examples of Poisson processes


Poisson processes arise naturally in many models of queueing phenomena. In these exam-
ples, attention is often placed upon the times at which X(t), length of the queue at time t,
jumps rather than the value of X(t). The fishing example in (a) is of course a special waiting
time example.

(C) Pure birth process


A natural generalization of the Poisson process is to allow the chance of an event occurring
at a given instant of time to depend upon the number of events which have already occurred.

82
Consider a sequence of positive numbers, {λi}. We define a pure birth process as a
Markov process satisfying the postulates:

i. P{X(t + h) − X(t) = 1|X(t) = i} = λih + o1,i(h) (h → 0+).

ii. P{X(t + h) − X(t) = 0|X(t) = i} = 1 − λih + o2,k (h) (h → 0+).

iii. P{X(t + h) − X(t) < 0|X(t) = i} = 0, i > 0.

As a matter of convenience, we often add the postulate


iv. X(0) = 0.
With this postulate X(t) does not denote the population size but rather the number of
births in the time interval [0,t].
Note that the left sides of Postulates i and ii are just Pi,i+1(h) and Pi,i(h), respectively
(owing to stationarity), so that o1,i(h) and o2,i(h) do not depend on t.

83
Next we will find Pn(t) = P{X(t) = n}, assuming X(0) = 0. Note that
∞ ∞
1= ∑ Pm(h) = 1 − λ0h + o2,0(h) + λ0h + o1,0(h) + ∑ Pm(h),
m=0 m=2

so

∑ Pm(h) = o(h).
m=2
This gives

P(h) = ∑ Pm(h) = λ0h + o(h).
m=1
Markov property shows

P0(t + h) = P0(t)P0(h) = P0(t)(1 − P(h)),

[P0(t + h) − P0(t)]/h = −P0(t)P(h)/h.

84
Letting h → 0 gives

P00 (t) = −λ0P0(t), i.e. P0(t) = ce−λ0t .

The constant c is determined by the initial condition P0(0) = 1. So

P0(t) = e−λ0t .

Note that for h > 0, n ≥ 1,


Pn(t + h) = ∑ Pk(t)P{X(t + h) = n|X(t) = k}
k=0

= ∑ Pk(t)P{X(t + h) − X(t) = n − k|X(t) = k}.
k=0

85
For k = 0, 1, · · · , n − 2 (i.e., n − k ≥ 2), we have
P{X(t + h) − X(t) = n − k|X(t) = k}
≤ P{X(t + h) − X(t) ≥ 2|X(t) = k}
= 1 − P{X(t + h) − X(t) = 0|X(t) = k} − P{X(t + h) − X(t) = 1|X(t) = k}
= o1,k (h) + o2,k (h).
So
Pn(t + h) = Pn(t)(1 − λnh + o2,n(h))
n−2
+ Pn−1(t)(λn−1h + o1,n−1(h)) + ∑ Pk (t)[o1,k (h) + o2,k (h)].
k=0

Or
Pn(t + h) − Pn(t) = Pn(t)(−λnh + o2,n(h))
+ Pn−1(t)(λn−1h + o1,n−1(h)) + on(h).

86
Dividing by h and letting h & 0, we obtain
Pn0 (t) = −λnPn(t) + λn−1Pn−1(t), n ≥ 1, (1.1)
P00 (t) = −λ0P0(t),
with boundary conditions
P0(0) = 1, Pn(0) = 0, n > 0.
Some characteristics of the process
Let Tk denote the time between the kth birth and the (k + 1)st birth, so that
n−1 n
Pn(t) = P{ ∑ Ti ≤ t < ∑ Ti}.
i=0 i=0

The random variables Tk ’s are called the “waiting times" between births, and
k−1
Sk = ∑ Ti = the time at which the kth birth occurs.
i=0

87
Since P0(t) = e−λ0t , P{T0 ≤ z} = 1 − P{X(z) = 0} = 1 − e−λ0z, i.e., T0 has an exponential
distribution with parameter λ0. We can show that (i) Tk has an exponential distribution with
parameter λk . and (ii) Tk ’s are independent.
Characteristic function of Sn, φn(t).

n−1 n−1
itSn it ∑n−1 λk
φn(t) = Ee = Ee k=0 Tk = ∏ Ee itTk
=∏ .
k=0 k=0 λk − it

(D) The Yule process


The Yule process is an example of a pure birth process. Assume that each member in a
population has a probability β h + o(h) of giving birth to a new member in an interval of time
length h (β > 0). Furthermore assume that there are X(0) = N members present at time 0.

88
Assuming independence and no interaction among members of the population, then
 
n
P{X(t + h) − X(t) = 1|X(t) = n} = [β h + o(h)][1 − β h + o(h)]n−1 = nβ h + on(h).
1
That is, in this example, λn = nβ . The system of equations (1.1), in the case that N = 1,
becomes

Pn0 (t) = −β [nPn(t) − (n − 1)Pn−1(t)], n = 1, 2, . . . ,

under the initial conditions

P1(0) = 1, Pn(0) = 0, n ≥ 2.

The solution is

Pn(t) = e−βt (1 − e−βt )n−1 n ≥ 1,

89
The generating function may be determined easily by summing a geometric series. We have

n −βt

−βt n−1 se−βt
f (s) = ∑ Pn(t)s = se ∑ [(1 − e )s)] =
1 − (1 − e−βt )s
.
n=1 n=1
Case X(0) = N Since we have assumed independence and no interaction among the mem-
bers, we may view this population as the sum of N independent Yule processes, each begin-
ning with a single member. Let

PN,n(t) = P{X(t) = n|X(0) = N} and fN (s) = ∑ PN,n(t)sn = [ f (s)]N
n=N
we have
 −βt
N ∞  
se −βt N m + N − 1 −βt m m
[ f (s)]N = = (se ) ∑ (1 − e ) s
1 − (1 − e−βt )s m=0 m
∞  
n−1
= ∑ (e−βt )N (1 − e−βt )n−N sn,
n=N n − N

90
where we have used the binomial series (1 − x)−N = ∑∞m=0 m+N−1
 m
m x . The coefficient of sn
in this expression must be
 
n−1
PN,n(t) = (e−βt )N (1 − e−βt )n−N f or n ≥ N.
n−N

2 More about Poisson Processes


(A) Characteristic function and Waiting times

iωX(t)

e−λt (λt)neiωn
φt (ω) = E{e }= ∑ = exp[λt(eiω − 1)]
n=0 n!

(From Eq. (1.1) we have P(X(t) = n) = (λt)ne−λt /n!.) Thus

E(X(t)) = λt, Var(X(t)) = λt.

91
In our discussion of the pure birth process we showed that

P(T0 ≤ x) = 1 − exp(−λ0x)

and mentioned that Tk follows an exponential distribution with parameter λk and that the
Tk ’s are independent. For the Poisson process, however, λk = λ for all k, so that the result
becomes

Theorem 2.1 The waiting times Tk are independent and identically distributed following an
exponential distribution with parameter λ .

(B) Uniform distribution


The class of distributions that are connected with the Poisson process does not stop with
the Poisson and exponential distributions. We shall show how the uniform and binomial
distributions also arise.

92
Consider the times {Si} at which changes of X(t) occur, i.e.,

i−1
Si = ∑ Tk.
k=0

Theorem 2.2 For any numbers si satisfying 0 ≤ s1 ≤ · · · ≤ sn ≤ t,

n!
Z s1 Z sn−1 Z sn
P{Si ≤ si, i = 1, . . . , n|X(t) = n} = n ··· dxndxn−1 . . . dx1,
t 0 xn−2 xn−1

which is the distribution of the order statistics from a sample of n observations taken from
the uniform distribution on [0,t].

93
Proof.
P{Si ≤ si, i = 1, . . . , n, X(t) = n}
n−1 n
=P{T0 ≤ s1, T0 + T1 ≤ s2, . . . , ∑ Ti ≤ sn, ∑ Ti > t}
i=0 i=0
Z s1 Z s2 −t1 Z sn −∑n−1
i=1 ti
Z ∞
n+1
= ··· λ n+1e−λ ∑i=1 ti dtn+1 . . . dt1
0 0 0 t−∑ni=1 ti
sn −∑n−1
s1 Z s2 −t1
 ∞
i=1 ti 1
Z Z
−λ ∑ni=1 ti
=λ n+1 ··· e − exp(−λtn+1) dtn . . . dt1
0 0 0 λ t−∑n i=1 ti
Z s1 Z s2 −t1 Z sn −∑n−1
i=1 ti
n −λt
=λ e ··· dtn . . . dt1
0 0 0
Z s1 Z s2 Z sn j
n −λt
=λ e ··· dun . . . du1 (Let u j = ∑ ti, j = 1, . . . , n)
0 u1 un−1 i=1

94
But
n
−λt (λt)
P{X(t) = n} = e ;
n!
Hence
P{Si ≤ si, i = 1, . . . , n, X(t) = n}
P{Si ≤ si, i = 1, . . . , n|X(t) = n} =
P{X(t) = n}

95
(C) Binomial distribution
For v < t and k < n

P{X(v) = k, X(t) − X(v) = n − k}


P{X(v) = k|X(t) = n} =
P{X(t) = n}
(e−λ vuk /k!)[e−λ (t−v)(t − v)n−k /(n − k)!]
=
e−λt (t n/n!)
  k
n v (t − v)n−k
=
k tn

A second example in which the binomial distribution plays a part may be given by con-

96
sidering two independent Poisson processes X1(t) and X2(t) with parameters λ1 and λ2 .
P(X1(t) = k, X2(t) = n − k)
P(X1(t) = k|X1(t) + X2(t) = n) =
P(X1(t) + X2(t) = n)
[exp(−λ1t)(λ1t)k /k!][exp(−λ2t)(λ2t)n−k /(n − k)!]
=
exp[−(λ1 + λ2)t](λ1 + λ2)nt n/n!
 
n λ1 k  λ2 n−k
=
k λ1 + λ2 λ1 + λ2

97
3 Birth and Death Processes
One of the obvious generalizations of the pure birth processes is to permit X(t) to decrease
as well as increase.
(A) Postulates
As in the case of the pure birth processes we assume that X(t) is a Markov process on the
states 0, 1, 2, . . . . and that its transition probabilities Pi j (t) are stationary, i.e., Pi j (t) = P{X(t +
s) = j|X(s) = i}. In addition we assume that the Pi j (t) satisfy

1. Pi,i+1(h) = λih + o(h) as h ↓ 0, i ≥ 0;

2. Pi,i−1(h) = µih + o(h) as h ↓ 0, i ≥ 1;

3. Pi,i(h) = 1 − (λi + µi)h + o(h) as h ↓ 0, i ≥ 0;

4. Pi j (0) = δi j ;

98
5. µ0 = 0, λ0 > 0, µi, λi > 0, i = 1, 2 . . . .

The matrix
 
−λ0 λ0 0 0 ···
 µ −(λ + µ ) λ1 0 ··· 
 1 1 1 
A= 0 −(λ2 + µ2) ··· . (3.2)
 
µ2 λ2
 0 0 −(λ3 + µ3) ··· 
 
µ3
... ... ... ...

is called the infinitesimal generator of the process. The λi and µi are called, the infinitesimal
birth and death rates respectively. In Postulates 1 and 2 we are assuming that if the process
starts in state i, then in a small interval of time the probabilities of the population increasing
or decreasing by 1 are essentially proportional to the length of the interval.

99
Since Pi j (t) are probabilities, we have Pi j (t) ≥ 0 and

∑ Pi j (t) = 1. (3.3)
j=0

Using the Markov property of the process we may derive the Chapman-Kolmogorov equation

Pi j (t + s) = ∑ Pik(t)Pk j (s). (3.4)
k=0

This equation states that in order to move from state i to j in time t + s, X(t) moves to some
state k in time t and then from k to j in the remaining time s.
In order to obtain the probability that X(t) = n, we must specify the probability distribu-
tion for the initial state. We have

P (X(t) = n) = ∑ qiPin(t),
i=0

100
where qi = P (X(0) = i).
(B) Waiting times
With the aid of the preceding assumptions we may calculate the distribution of the r.v. Ti,
which is the waiting time of X(t) in state i; i.e., given that the process is in state i, what is the
distribution of Ti until it first leaves i? If we set

P(Ti ≥ t) = Gi(t),

it follows by Markov property that as h ↓ 0,

Gi(t + h)
= P(X(s) = i, 0 ≤ s ≤ t + h|X(0) = i)
= P(X(s) = i, 0 ≤ s ≤ t + h|X(s) = i, 0 ≤ s ≤ h)P(X(s) = i, 0 ≤ s ≤ h|X(0) = i)
= Gi(t)Gi(h) = Gi(t)[Pii(h) + o(h)]
= Gi(t)[1 − (λi + µi)h] + o(h),

101
or
Gi(t + h) − Gi(t)
= −(λi + µi)Gi(t) + o(1),
h
so that

G0i(t) = −(λi + µi)Gi(t). (3.5)

If we use the conditions Gi(0) = 1, the solution is

Gi(t) = exp[−(λi + µi)t];

So Ti follows an exponential distribution with mean (λi + µi)−1. The proof presented above
is. not quite complete, since we have used the intuitive relationship Gi(h) = Pii(h) + o(h)
without a formal proof.
According to Postulates (1), (2), during a time duration of length h a transition occurs
from state i to i + 1 with pr. λih + o(h) and from state i to i − 1 with pr. µih + o(h). Intuitively,

102
given that a transition occurs at time t, the pr. that the transition is to state i + 1 is λi/(λi + µi)
and to state i − 1 is µi/(λi + µi).
It leads to an important characterization of a birth and death process: The process sojourns
in a given state i for a random length of time whose distribution function is an exponential
distribution with parameter λi + µi. When leaving state i the process enters either i + 1 or
i − 1 with pr. λi/(λi + µi) and µi/(λi + µi) respectively. The motion is analogous to that of a
random walk except that transitions occur at random times rather than at fixed time periods.
We determine realizations of the process as follows. Suppose X(0) = i; the particle spends
a random length of time, exponentially distributed with parameter λi + µi, in state i and
subsequently moves in pr. λi/(λi + µi) to state i + 1 and with pr. µi/(λi + µi) to state i −
1. Next, the particle sojourns a random length of time in the new state and then moves to
one of its neighboring states, and so on. More specifically, we observe a value t1 from the
exponential distribution with parameter λi + µi that fixes the initial sojourn time in state i.
Then we toss a coin with pr. of heads pi = λi/(λi + µi). If heads (tails) appear, we move

103
the particle to state i + 1(i − 1). In state i + 1 we observe a value t2 from the exponential
distribution with parameter λi+1 + µi+1 that fixes the sojourn time in the 2nd state visited.
If the particle at the first transition enters state i − 1, the subsequent sojourn time t20 is an
observation from the exponential distribution with parameter λi−1 + µi−1. After the 2nd wait
is completed, a Bernoulli trial is performed that chooses the next state to be visited, and the
process continues in the same way. The process obtained in this manner is called the minimal
process associated with the infinitesimal matrix A defined in (3.2).

4 Differential Equations of Birth and Death Processes


As in the case of the pure birth and pure death processes, the transition probabilities Pi j (t)
satisfy a system of differential equations known as the backward Kolmogorov differential

104
equations. These are given by

P00 j (t) = −λ0P0 j (t) + λ0P1 j (t), (4.6)


Pi0j (t) = µiPi−1, j (t) − (λi + µi)Pi j (t) + λiPi+1, j (t), i ≥ 1,

and the boundary condition Pi j (0) = δi j .


To derive these we have, from equation (3.4),


Pi j (t + h) = ∑ Pik(h)Pk j (t)
k=0
= Pi,i−1(h)Pi−1, j (t) + Pi,i(h)Pi, j (t) + Pi,i+1(h)Pi+1, j (t) (4.7)
+ ∑ Pi,k(h)Pk, j (t).
k: |k−i|>1

105
Using Postulates 1, 2, and 3 of Section 3, we obtain

∑ Pi,k (h)Pk, j (t) ≤ ∑ Pi,k (h)


k: |k−i|>1 k: |k−i|>1
= 1 − [Pi,i(h) + Pi,i−1(h) + Pi,i+1(h)]
= 1 − [1 − (λi + µi)h + o(h) + µih + o(h) + λih + o(h)]
= o(h).

so that
Pi j (t + h) = µihPi−1, j (t) + [1 − (λi + µi)h]Pi j (t) + λihPi+1, j (t) + o(h).

Transposing the term Pi j (t) to the left-hand side and dividing the equation by h, we obtain,
after letting h ↓ 0,

Pi0j (t) = µiPi−1, j (t) − (λi + µi)Pi j (t) + λiPi+1, j (t).

106
The backward equations are deduced by decomposing the time interval (0,t + h), where h is
positive and small, into the two periods

(0, h), (h,t + h)

and examining the transition in each period separately. In this sense the backward equations
result from a “first step analysis," the first step being over the short time interval of duration
h.

A different result arises from a “last step analysis," which proceeds by splitting the time
interval (0,t + h) into the two periods

(0,t), (t,t + h)

and adapting the proceeding reasoning. From this viewpoint, under more stringent conditions

107
we can derive a further system of differential equations
Pi00 (t) = −λ0Pi,0(t) + µ1Pi1(t), (4.8)
Pi0j (t) = λ j−1Pi, j−1(t) − (λ j + µ j )Pi j (t) + µ j+1Pi, j+1(t), j ≥ 1,
with the same initial condition Pi j (0) = δi j . These are known as the forward Kolmogorov
differential equations. To derive these equations we interchange t and h in equation (4.7),
and under stronger assumptions in addition to Postulates (1), (2), and (3) it can be shown that
the last term is again o(h). The remainder of the argument is the same as before.
A sufficient condition that (4.8) hold is that [Pk j (h)]/h = o(1) for k 6= j, j − 1, j + 1, where
the o(1) term apart from tending to zero is uniformly bounded with respect to k for fixed j as
h → 0. In this case it can be proved that ∑0k Pi,k (t)Pk, j (h) = o(h).

Example Linear Growth with Immigration A birth and death process is called a linear
growth process if λn = λ n + a and µn = µn with λ > 0, µ > 0, and a > 0. Such processes
occur naturally in the study of biological reproduction and population growth. If the state n

108
describes the current population size, then the average instantaneous rate of growth is λ n + a.
Similarly, the probability of the state of the process decreasing by one after the elapse of a
small duration of time h is µnh + o(h). The factor λ n represents the natural growth of the
population owing to its current size, while the second factor a may be interpreted as the
infinitesimal rate of increase of the population due to an external source such as immigration.
The component µn, which gives the mean infinitesimal death rate of the present population,
possesses the obvious interpretation.
If we substitute the above values of λn and µn in (4.8), we obtain
Pi00 (t) = −aPi0(t) + µPi1(t),
Pi0j (t) = [λ ( j − 1) + a]Pi, j−1(t) − [(λ + µ) j + a]Pi j (t) + µ( j + 1)Pi, j+1(t), j ≥ 1.
Now, if we multiply the jth equation by j and sum, it follows that the expected value

EX(t) = M(t) = ∑ jPi j (t)
j=1

109
satisfies the differential equation

M 0(t) = a + (λ − µ)M(t),

with initial condition M(0) = i, if X(0) = i. The solution of this equation is

M(t) = at + i if λ = µ,

and
a
M(t) = {e(λ −µ)t − 1} + ie(λ −µ)t if λ 6= µ. (4.9)
λ −µ
The second moment, or variance, may be calculated in a similar way. It is interesting to note
that M(t) → ∞ as t → ∞ if λ ≥ µ, while if λ < µ, the mean population size for large t is
approximately
a
.
µ −λ

110
Chapter 5

Brownian Motion
1 Background Material
Brownian motion process is an example of a continuous time, continuous state space Markov
process. In this course, we confine ourselves to one-dimensional process.

111
Let X(t) be the x component of a particle in Brownian motion. Let x0 be the position of
the particle at time t0, i.e., X(t0) = x0 . Let p(x,t|x0) represent the conditional probability
density of X(t + t0), given that X(t0) = x0. We have
Z ∞
p(x,t|x0) ≥ 0, and p(x,t|x0)dx = 1. (1.1)
−∞

Further, we stipulate that, for small t, X(t + t0) is likely to be near X(t0) = x0. So we require

lim p(x,t|x0) = 0, for x 6= x0.


t→0

Based on physical principles, Einstein showed that p(x,t|x0) must satisfy

∂p ∂2p
= D 2. (1.2)
∂t ∂x
This is called the diffusion equation, and D is the diffusion coefficient. If we choose D = 12 ,

112
then
(x − x0)2
 
1
p(x,t|x0) = √ exp −
2πt 2t
is a solution of (1.2)

2 Joint Probabilities for Brownian Motion


Definition 2.1 A Brownian motion is a stochastic process {X(t);t ≥ 0} with the following
properties:
(a) Every increment X(t + s) − X(s) is normally distributed with mean 0 and variance
σ 2t, σ > 0.
(b) For every pair of disjoint intervals [t1,t2], [t3,t4], t1 < t2 ≤ t3 < t4. The increments
X(t4) − X(t3) and X(t2) − X(t1) are independent random variables with distributions given
in (a).

113
(c) X(0) = 0 and X(t) is continuous at t = 0.

Theorem 2.1 The conditional density of X(t) for t1 < t < t2 given X(t1) = A and X(t2) = B
(t2 −t)(t−t1 )
is a normal density with mean A + tB−A
2 −t1
(t − t1 ) and variance t2 −t1 . (We assume σ = 1 in
(a).)

Proof. Making use of the fact that Z1 := X(t1), Z2 := X(t) − X(t1), Z3 := X(t2) − X(t) are,
respectively, independent normal distributed rvs with means 0 and variances t1,t − t1 and
t2 − t, the joint density of X(t1), X(t), X(t2) is
 2
(y − x)2 (z − y)2

1 x
f (x, y, z) = p exp − − − .
(2π)3/2 t1(t − t1)(t2 − t) 2t1 2(t − t1) 2(t2 − t)
Similarly, the joint density of X(t1), X(t2) is
 2
(z − x)2

1 x
f (x, z) = p exp − − .
2π t1(t2 − t1) 2t1 2(t2 − t1)

114
So the conditional density of X(t) given X(t1), X(t2) is (after some straightforward but tedious
algebra)
(y − a)2
 
f (x, y, z) 1
f (y|x, z) = =√ exp − ,
f (x, z) 2πc 2c
(t2 −t)(t−t1 )
where a = x + t2z−x
−t1 (t − t1 ), c = t2 −t1 . Letting x = A and z = B gives the result.

3 Continuity of Paths and the Maximum Variables


Possible realizations of X(t) as function of t (i.e., the sample paths) are continuous functions.

115
Reflection principle
Consider the collection of sample paths X(t), 0 ≤ t ≤ T, X(0) = 0, with the property that
X(T ) > a (a > 0). Since X(t) is continuous and X(0) = 0, there exists a time τ at which X(t)
first attains the value a.
For t > τ, we “reflect” X(t) about the line x = a to obtain
(
X(t), for t < τ,
X̃(t) =
a − [X(t) − a], for t > τ.

Note that X̃(T ) < a. Because the probability law of the path for t > τ, given X(τ) = a, is
symmetrical with respect to the values x > a and x < a and independent of the history prior
to time τ, the reflection argument displays for every sample path with X(T ) > a two sample
paths X(t) and X̃(t) with the same probability of occurrence such that

max X(u) ≥ a and max X̃(u) ≥ a.


0≤u≤T 0≤u≤T

116
Conversely, by the nature of this correspondence every sample path X(t) for which max0≤t≤T X(t) ≥
a results from either of two sample functions X(t) with equal probability, one of which is such
that X(T ) > a, unless X(T ) = a (but note that P(X(T ) = a) = 0):
2 ∞
Z
P{ max X(t) ≥ a} = 2P{X(T ) > a} = √ exp(−x2/2T )dx. (3.3)
0≤t≤T 2πT a

With the help of (3.3) we may determine the distribution of the first time of reaching a > 0
subject to the condition X(0) = 0. Let Ta denote the time at which X(t) first attains the value
a where X(0) = 0. Then, clearly

P(Ta ≤ t) = P( max X(u) ≥ a|X(0) = 0). (3.4)


0≤u≤t

Applying a change of variable x = y t to (3.3), we arrive at
√ Z
2 ∞ 2
P(Ta ≤ t) = √ √ exp(−x /2)dx. (3.5)
π a/ t

117
The density function of the random variable Ta is obtained by differentiating (3.5) with re-
spect to t. Thus

a −3/2 a2 
fTa (t|X(0) = 0) = √ t exp − . (3.6)
2π 2t

Because of the symmetry and spatial homogeneity of the Brownian motion process we infer
for the distribution (3.6) that for a > 0,
   
P min X(u) ≤ 0|X(0) = a =P max X(u) ≥ 0|X(0) = −a
0≤u≤t 0≤u≤t
 
=P max X(u) ≥ a|X(0) = 0 = P(Ta ≤ t)
0≤u≤t
Z t  2
a a
=√ u−3/2 exp − du. (3.7)
2π 0 2u

118
Another way to express the result of (3.7) is as follows: If X(t0) = a then the probability P(a)
that X(t) has at least one zero between t0 and t1 is

t1 −t0
 2
|a| a
Z
P(a) = √ u−3/2 exp − du = P(T|a| ≤ t1 − t0). (3.8)
2π 0 2u

Let’s calculate the probability α that if X(0) = 0 then X(t) vanishes at least once in the
interval (t0,t1).
In fact, we condition on the possible values of X(t0). Thus, if |X(t0)| = a then the proba-
bility that X(t) vanishes in the interval (t0,t1) is P(a). By the law of total probabilities

r  2
∞ 2 ∞ a
Z Z
α= P(a)P(|X(t0)| = a|X(0) = 0)da = P(a) exp − da.
0 πt0 0 2t0

119
Substituting from (3.8) and then interchanging the order of integration yields
r  2 Z t −t  2 
2 a a a
Z ∞ 1 0
α= exp − √ u−3/2 exp − du da
πt0 0 2t0 2π 0 2u
Z t1 −t0 Z ∞  2  
1 a 1 1
= √ u−3/2 a exp − + da du.
π t0 0 0 2 u t0

The inner integral can be integrated exactly and after simplifying we get
√ Z t −t
t0 1 0 du
α= √ .
π 0 (t0 + u) u

The change of variable u = t0v2 produces


Z √(t1 −t0 )/t0 r
2 dv 2 t1 − t0
α= = arctan ,
π 0 1 + v2 π t0

120
which we may write by virtue of some standard trigonometric relations in the form
r r
t0 πα 2 t0
= cos or α = arccos .
t1 2 π t1
In summary,

Theorem 3.1 The probability that X(t) has at least one zero in the interval (t0,t1), given
X(0) = 0, is
r
2 t0
α = arccos .
π t1

Applying the “reflection principle”, we can solve the following problem: Determine

At (x, y) = P(X(t) > y, min X(u) > 0|X(0) = x), x, y > 0. (3.9)
0≤u≤t

121
We start with

P(X(t) > y|X(0) = x)


= At (x, y) + P(X(t) > y, min X(u) ≤ 0|X(0) = x), (3.10)
0≤u≤t

where reflection principle is applied to the last term. Figure 2 is the appropriate picture to
guide the analysis. We deduce that

P(X(t) > y, min X(u) ≤ 0|X(0) = x)


0≤u≤t
= P(X(t) < −y, min X(u) ≤ 0|X(0) = x) (3.11)
0≤u≤t
= P(X(t) < −y|X(0) = x).

122
Inserting (3.11) in (3.10) yields

At (x, y) = P(X(t) > y|X(0) = x) − P(X(t) < −y|X(0) = x)


= P(X(t) > y − x|X(0) = 0) − P(X(t) < −y − x|X(0) = 0)
= P(X(t) > y − x|X(0) = 0) − P(X(t) > y + x|X(0) = 0) (symmetry)
Z y+x
= p(u,t)du, (3.12)
y−x

where p(u,t) = (2πt)−1/2 exp(−u2/2t) is the transition probability density function for the
Brownian motion process.

123
4 Variations and Extensions
We claim that if X(t) is a standard Brownian motion process, then each of the following
processes is a version of standard Brownian motion:

• X1(t) := cX(t/c2) for fixed c > 0.

• (
tX(1/t), f or t > 0,
X2(t) :=
0, f or t = 0,

• X3(t) := X(t + h) − X(h) for fixed h > 0.

We can verify the conditions in Definition 2.1. For i = 1, 2 or 3, (i) the increment, Xi(t + s) −
Xi(t), is normally distributed with zero mean; and (ii) the increments over disjoint time inter-
vals are clearly independent r.v’s. It remains to verify that the variances of these processes

124
equal to t:
   2  
t +s  s 
2 t +s s
E[{X1(s + t) − X1(s)}2] = c2 E X − X = c − = t,
c2 c2 c2 c2
    2
1 1
E[{X2(s + t) − X2(s)}2] = E sX − (t + s)X
s t +s
h 1 
1
i
2 h  1  i2
= s2 E X −X + t 2E X
s t +s t +s
 
1 1 1
= s2 − + t2 = t,
s t +s t +s
E[{X3(s + t) − X3(s)}2] = E[{X(s + t + h) − X(s + h)}2] = t.

To complete the proof that these processes are versions of a standard Brownian Motion, it is
necessary to check that each Xi(t) is continuous at the origin. This is obviously true for X1(t)
and X3(t), but needs some arguments for X2(t). Equivalently, in the latter case it is enough to

125
show that

 
X(t)
P lim = 0|X(0) = 0 = 1.
t→∞ t

The proof involves some advanced analysis. We omit it here.

(A) Brownian Mtion reflected at the origin


Let {X(t),t > 0} be a Brownian motion process. Brownian motion reflected at the origin
is defined as Y (t) = |X(t)|, t ≥ 0.

126
For any 0 ≤ t0 < t1 < · · · < tn and s > 0,

P (Y (tn + s) ≤ z|Y (t0) = x0, . . . ,Y (tn) = xn)


=P (|X(tn + s)| ≤ z||X(t0)| = x0, . . . , |X(tn)| = xn)
=P (|X(tn + s)| ≤ z||X(tn)| = xn)
=P (−z ≤ X(tn + s) ≤ z|X(tn) = ±xn)
=P (−z ≤ X(tn + s) ≤ z|X(tn) = xn) (by symmetry)
Z z
1
= p(y − xn, s)dy (where p(x,t) = √ exp(−x2/2t)) )
Z−z 2πt
z
= (p(y − xn, s) + p(y + xn, s)) dy
0

Write ps(x, y) = p(y − x, s) + p(y + x, s), hence,



(i) P (α ≤ Y (t) ≤ β |Y (0) = x) = α pt (x, y)dy,

127
(ii) and
∞ 1
Z
E[Y (t)] = E|X(t)| = |y| √ exp(−y2/2t))dy
−∞ 2πt
1
Z ∞
= 2 y√ exp(−y2/2t))dy
r0 2πt
2t
= ,
π
and
E[Y 2(t)] = EX 2(t) = t.

128
(B) Brownian Motion absorbed at the origin
Suppose the initial value of a Brownian motion process is X(0) = x, where x > 0. Let τ
be the first time the process reaches zero. Brownian motion absorbed at the origin is defined
as
(
X(t), for t ≤ τ,
Z(t) =
0, for t > τ.

129
For any 0 < t0 < · · · < tn, y > 0,

P(Z(tn + t) > y|Z(t0) = x0, . . . , Z(tn−1) = xn−1, Z(tn) = x)


=P(Z(tn + t) > y, |Z(t0) = x0, . . . , Z(tn−1) = xn−1, Z(tn) = x, min X(u) > 0)
0≤u≤tn
=P(X(tn + t) > y, min X(tn + u) > 0|X(t0) = x0, . . . , X(tn−1) = xn−1, X(tn) = x,
0≤u≤t
min X(u) > 0)
0≤u≤tn
=P(X(tn + t) > y, min X(tn + u) > 0|X(tn) = x)
0≤u≤t
=At (x, y) (see (3.9))
Z y+2x
= p(u − x,t)du
y

Under the condition Z(0) = x > 0, Z(t) is a random variable whose distribution has a discrete

130
part and a continuous part. The discrete part is
Z 2x
P(Z(t) = 0|Z(0) = x) =1 − At (x, 0) = 1 − p(u − x,t)du
0
Z x Z ∞ Z ∞
=1 − p(u,t)du = 2 p(u,t)du = 2 p(u + x,t)du.
−x x 0
For 0 < a < b,
P(a < Z(t) < b|Z(0) = x) = At (x, a) − At (x, b)
Z b Z b+2x
= p(u − x,t)du − p(u − x,t)du
a a+2x
Z b
= [p(u − x,t) − p(u + x,t)]du.
a
Thus the transition probability density function for the continuous part of the absorbed Brow-
nian motion is
pt (x, y) = p(y − x,t) − p(y + x,t).

131
(C) Brownian Motion with drift
Let {X̃(t),t > 0} be a Brownian motion process. Brownian motion with drift is a stochas-
tic process having the distribution of

X(t) = X̃(t) + µt,t > 0,

where µ is a constant, called the drift parameter. Alternatively, we may describe a Brownian
motion with drift in a manner that parallels Definition 2.1.

132
Definition 4.1 A Brownian motion with drift parameter µ is a stochastic process {X(t);t >
0} with the following properties:

(a) Every increment X(t + s) − X(s) ∼ N(µt, σ 2t), where σ is a fixed constant.

(b) For every pair of disjoint time intervals [t1,t2], [t3,t4], say, t1 < t2 < t3 < t4, the increments
X(t4) − X(t3) and X(t2) − X(t1) are independent.

(c) X(0) = 0 and X(t) is continuous at t = 0.

133
We have

P(X(t) ≤ x|X(t0) = x0)


=P(X(t) − X(t0) ≤ x − x0)
[y − µ(t − t0)]2
Z x−x0  
1
= exp − dy
2(t − t0)σ 2
p
−∞ 2π(t − t0)σ 2
y2
Z [x−x0 −µ(t−t0 )]/σ  
1
= p exp − dy.
−∞ 2π(t − t0) 2(t − t0 )

134
(D) Geometric Brownian Motion
Let {X(t),t ≥ 0} be a Brownian motion process with drift µ and diffusion coefficient σ 2.
Geometric Brownian motion is defined as

Y (t) = eX(t), t ≥ 0.
2 /2
Recall the moment generating function of a standard normal, Z, is given by et . It follows
that
2 /2)
E[Y (t)|Y (0) = y] =yE(eX(t)−X(0)) = y et(µ+σ ;

2
E[Y (t)2|Y (0) = y] =y2E(e2(X(t)−X(0))) = y2 et(2µ+2σ ).

135
Chapter 6

Renewal Processes
1 Definition of a Renewal Process and Related Concepts
A renewal process {N(t),t > 0} is a nonnegative integer-valued stochastic process that reg-
isters the successive occurrences of an event during the time interval (0,t], where the time

136
durations between consecutive “events" are positive, independent, and identically distributed
random variables. Let the successive occurrence times between events be {Xk }∞k=1 (represent-
ing the lifetimes of some units successively placed into service) such that Xi is the elapsed
time from the (i − 1)st event until the occurrence of the ith event. We write

F(x) = P(Xk ≤ x), k = 1, 2, 3, · · · ,

for the common probability distribution of {Xk }. A basic stipulation for renewal processes is
F(0) = 0, signifying that the Xk ’s are positive random variables.
Let S0 = 0, and Sn = ∑ni=1 Xi (n ≥ 1) denotes the waiting time until the occurrence of the
nth event. So

N(t) = number of indices n for which 0 < Sn ≤ t.

The principal objective of renewal theory is to derive properties of certain random variables
associated with {N(t)} and {Sn} from the inter-occurrence distribution F. For example, it

137
is of significance and relevance to compute the expected number of renewals for the time
duration (0,t]:
EN(t) = M(t)
is called the renewal function. To this end, several pertinent relationships and formulas are
worth recording. In principle, the probability law of Sn = X1 + X2 + · · · + Xn can be calculated
in accordance with the convolution formula

P{Sn ≤ x} = Fn(x),

where F1(x) = F(x) is assumed known or prescribed, and then


Z∞ Zx
Fn(x) = Fn−1(x − y)dF(y) = Fn−1(x − y)dF(y), n ≥ 2.
0 0

The fundamental link between the waiting time process {Sn} and the renewal counting

138
process {N(t)} is the observation that
N(t) ≥ k if and only if Sk ≤ t. (1.1)
That is, equation (1.1) asserts that the number of renewals up to time t is at least k if and only
if the kth renewal occurred on or before time t.
It follows from (1.1) that, for t > 0 and k = 1, 2, · · · ,
P{N(t) ≥ k} = P{Sk ≤ t} = Fk (t), (1.2)
and consequently,
P{N(t) = k} = P{N(t) ≥ k} − P{N(t) ≥ k + 1} = Fk (t) − Fk+1(t). (1.3)
For the renewal function, M(t) = EN(t), we sum the tail probabilities to derive EN(t) =

∑ P{N(t) ≥ k}, and then use (1.2) to obtain
k=1
∞ ∞ ∞
M(t) = EN(t) = ∑ P{N(t) ≥ k} = ∑ P{Sk ≤ t} = ∑ Fk(t). (1.4)
k=1 k=1 k=1

139
A number of random variables are of interest in renewal theory. Three of these are the
excess/residual life (also called the excess random variable), the current life (also called the
age random variable), and the total life:

• Excess or residual lifetime: γt = SN(t)+1 − t,

• Current life or age: δt = t − SN(t),

• Total life: βt = γt + δt .

2 Some Examples of Renewal Processes


A Poisson processes
As mentioned earlier, the Poisson process with parameter λ is a renewal process whose
inter-occurrence times have the exponential distribution F(x) = 1 − e−λ x , x ≥ 0. The memo-

140
ryless property of the exponential distribution serves decisively in yielding the explicit com-
putation of a number of functionals of the Poisson renewal process.
The Renewal Function Since N(t) has a Poisson distribution, then
(λt)k e−λt
P{N(t) = k} = , k = 0, 1, · · · ,
k!
and
M(t) = EN(t) = λt.
Excess Life Observe that the excess life at time t exceeds x if and only if there are no
renewals in the interval (t,t + x]. This event has the same probability as that of no renewals
in the interval (0, x], since a Poisson process has stationary independent increments. In formal
terms, we have

P{γt > x} = P{N(t + x) − N(t) = 0}


= P{N(x) = 0} = e−λ x . (2.5)

141
Thus, in a Poisson process, the excess life possesses the same exponential distribution

P{γt ≤ x} = 1 − e−λ x , . (2.6)

another manifestation of the memoryless property of the exponential distribution.


Current Life The current life δt , of course, cannot exceed t, while for x < t the current
life exceeds x if and only if there are no renewals in (t − x,t], which again has probability
e−λ x = P(N(x) = 0). Thus the current life follows the truncated exponential distribution

(
1 − e−λ x for 0 ≤ x < t,
P{δt ≤ x} = (2.7)
1 for t ≤ x.

R∞
Mean Total Life Using the evaluation EX = 0 P(X > x)dx for the mean of a nonnegative

142
random variable, we have

Eβt = Eγt + Eδt


Zt
1
= + P{δt > x}dx
λ
0
Zt
1
= + e−λ x dx
λ
0
1 1
= + (1 − e−λt ).
λ λ
Observe that the mean total life is significantly larger than the mean life 1/λ = EXk of any
particular renewal interval. A more striking expression of this phenomenon is revealed when
t is large, where the process has been in operation for a long duration. Then the mean total
life Eβt is approximately twice the mean life. These facts appear at first paradoxical.

143
Let us reexamine the definition of the total life βt with a view to provide an intuitive ba-
sis for the seeming discrepancy. First, an arbitrary time point t is fixed. Then βt measures
the length of the renewal interval containing the point t. Such a procedure will, with higher
likelihood, favor a longer renewal interval than one of shorter interval. This phenomenon is
known as length biased sampling and occurs in a number of sampling situations.

Joint Distribution of γt and δt The joint distribution of γt and δt is determined in the same
manner as the marginal. In fact, for any x > 0 and 0 < y < t, the event {γt > x, δt > y} occurs
if and only if there are no renewals in the interval (t − y,t + x], which has probability e−λ (x+y).
Thus
(
e−λ (x+y), if x > 0, 0 < y < t,
P{γt > x, δt > y} = (2.8)
0, if y ≥ t.
For the Poisson process, observe that γt and δt are independent, since their joint distribution
factors as the product of their marginal distributions.

144
3 Renewal Equations and the Elementary Renewal
Theorem
A. The renewal function
Note that
Z t
Fn(t) = Fn−m(t − ξ )dFm(ξ ) ≤ Fn−m(t)Fm(t), 1 ≤ m ≤ n − 1.
0

So

Fnr+k (t) ≤ F(n−1)r+k (t)Fr (t) ≤ [Fr (t)]nFk (t). (3.9)

Aim: M(t) = ∑∞k=1 Fk (t) < ∞ for any t. From (3.9), our purpose is to show that there must
exist r such that Fr (t) < 1 for each t > 0.

145
Since Xi are positive random variables with F(0+) = 0,
r
∑ Xi → ∞ a.s.
i=1
as r → ∞. So we conclude that for each t > 0 there must exist r such that Fr (t) < 1.
Conclusions For any given t > 0,
• Fn(t) → 0 as n → ∞,

• M(t) < ∞, and

• M(t) is an increasing functions, continuous from the right.


Let A and B be nondecreasing functions, continuous from the right with A(0) = B(0) = 0.
Define the convolution, denoted A ∗ B, by
Z t
A ∗ B(t) = B(t − y)dA(y).
0

146
R t R t−y
dB(z) dA(y) = 0t 0t−z dA(y)dB(z) = B ∗ A(t). So ∗ is a com-
 R R
Note that A ∗ B(t) = 0 0
mutative operation.

We next show that the renewal function M(t) satisfies the equation

Z t
M(t) = F(t) + M(t − y)dF(y) = F(t) + F ∗ M(t), t ≥ 0,
0

This identity will be proved by the renewal argument: By conditioning on the time, X1, of the
first renewal and counting the expected number of renewals thereafter.

147
Note that, for t < x, E(N(t)|X1 = x) = 0; and for t ≥ x,


E(N(t)|X1 = x) = 1 + ∑ P(Sk ≤ t|X1 = x)
k=2
!
∞ k
= 1+ ∑ P ∑ Xi ≤ t − x|X1 = x
k=2 i=2
!
∞ k
= 1+ ∑ P ∑ Xi ≤ t − x
k=2 i=2
= 1 + EN(t − x) = 1 + M(t − x).

148
Applying the law of total probability yields
Z t
M(t) = EN(t) = E(N(t)|X1 = x)dF(x)
0
Z t
= (1 + M(t − x))dF(x)
0
Z t
= F(t) + M(t − x)dF(x).
0

Much of the power of renewal theory derives from the preceding method of reasoning that
views the process starting anew at the occurrence of the first event.
Renewal Equations An integral equation of the form
Z t
A(t) = a(t) + A(t − x)dF(x), t ≥ 0,
0

is called a renewal equation. Here a(t) and F(x) are known and A(t) is unknown.

149
Without loss of ambiguity, we will employ the notation B ∗ c(t) for convolution of a
function c(t) (assumed reasonably smooth and bounded on finite intervals) with an increasing
right-continuous function B(t) with B(0) = 0 to stand for
Z t
B ∗ c(t) = c(t − τ)dB(τ).
0

Some properties of the convolution operation, ∗.

i. max0≤t≤T |(B ∗ c)(t)| ≤ max0≤t≤T |c(t)| · B(T )

ii. B ∗ c1 + B ∗ c2 = B ∗ (c1 + c2)

iii. If B1 and B2 are increasing, then B1 ∗ (B2 ∗ c) = (B1 ∗ B2) ∗ c

150
Theorem 3.1 Suppose a is a bounded function. There exists one and only one function A
bounded on finite intervals that satisfies
Z t
A(t) = a(t) + A(t − y)dF(y). (3.10)
0

This function is
Z t
A(t) = a(t) + a(t − x)dM(x), (3.11)
0

where M(t) = ∑∞k=1 Fk (t) is renewal function.

Proof. We verify first that the function A defined by (3.11) fulfills the boundedness property
and solves (3.10). Because a is a bounded function and M is nondecreasing and finite, for

151
every T > 0, we have

Z t
sup |A(t)| ≤ sup |a(t)| + sup |a(y)| dM(x)
0≤t≤T 0≤t≤T 0 0≤t≤T
= sup |a(t)|(1 + M(T )) < ∞,
0≤t≤T

establishing that the function, defined by (3.11), is bounded on finite intervals. To check that

152
A(t) of (3.11) satisfies (3.10), we have

A(t) = a(t) + M ∗ a(t)


!

= a(t) + ∑ Fk ∗ a(t)
k=1

= a(t) + F ∗ a(t) + ∑ Fk ∗ a(t)
k=2
! !

= a(t) + F ∗ a(t) + ∑ Fk ∗ a(t)
k=1
= a(t) + F ∗ A(t),

where we applied Fk = F ∗ Fk−1 in the second last equality. It remains to verify the uniqueness
of A. This is done by showing that any solution of the renewal equation (3.10), bounded in

153
finite intervals, is represented by (3.11).

A = a + F ∗ A = a + F ∗ (a + F ∗ A) = a + F ∗ a + F ∗ (F ∗ A)
!
n−1
= a + F ∗ a + F2 ∗ A = a + ∑ Fk ∗ a + Fn ∗ A.
k=1
Rt
≤ sup0≤y≤t |A(t −y)|·Fn(t) → 0 and limn→∞ ∑n−1

Note that |Fn ∗A(t)| = 0 A(t − y)dFn(y) k=1 Fk ∗
a(t) = (∑∞k=1 Fk ) ∗ a(t) = M ∗ a(t) by the boundedness of a(t). The proof is complete.
Another identity:

E[SN(t)+1] = E(X1) · [1 + M(t)]. (3.12)

At first glance, this identity resembles the formula for the mean of a random sum, which
asserts that E[X1 + · · · + XN ] = EX1EN when N is an integer-valued random variable that is
independent of X1, X2, · · · . The random sum approach cannot be applied here because the
random number of summands, N(t) + 1, is not independent of the summands themselves.

154
Indeed, in Section 2, when the Poisson process is viewed as a renewal process, we show that
the last summand XN(t)+1 has a mean that approaches twice the unconditional mean µ = EX1
for t large. For this reason, it is not correct, in particular, that ESN(t) can be evaluated as the
product of EX1 and EN(t). In view of this comment, the identity expressed in equation (3.12)
becomes more intriguing and remarkable.
To derive (3.12) we will use a renewal argument to establish a renewal equation for A(t) =
ESN(t)+1. As usual, we condition on the time of the first renewal X1 = x, and distinguish two
contingencies: (i) when x > t, so that N(t) = 0, and SN(t)+1 = X1 = x; and (ii) when x ≤ t,
ESN(t)+1 = x + A(t − x). Therefore,
Z ∞ Z t Z ∞
A(t) = E(SN(t)+1|X1 = x)dF(x) = (x + A(t − x))dF(x) + xdF(x)
0 0 t
Z ∞ Z t Z t
= xdF(x) + A(t − x)dF(x) = EX1 + A(t − x)dF(x)
0 0 0

Thus A(t) satisfies a renewal equation in which a(t) = constant EX1. By Theorem 3.1, A(t) =

155
Rt
a(t) + 0 a(t − x)dM(x) = (M(t) + 1) · EX1.

Observe that the excess life γt = SN(t)+1 − t has


Eγt = E[X1] [1 + M(t)] − t.

Theorem 3.2 (Elementary renewal theorem) Let {N(t),t ≥ 0} be a renewal process gener-
ated by the inter-occurrence times with finite mean. Then
M(t) 1
lim = .
t→∞ t µ
Proof. By definition, t < SN(t)+1. By (3.12), we have
t < ESN(t)+1 = µ[1 + M(t)],
and therefore
t −1M(t) > µ −1 − t −1.

156
It follows that
lim inf t −1M(t) ≥ µ −1. (3.13)
t→∞

To establish the opposite inequality, let c > 0 be arbitrary, and set


(
Xi, i f Xi ≤ c
Xic =
c, i f Xi > c.
and consider the renewal process, {N c(t),t ≥ 0}, with lifetimes {Xic}. Let Snc denote the
waiting times. Since the random variables Xic are uniformly bounded by c, t + c ≥ SNc c(t)+1.
Therefore,
t + c ≥ ESNc c(t)+1 = µ c[1 + M c(t)],
where µ c = EX1c = 0c[1 − F(x)]dx, and M c(t) = EN c(t). Obviously Xic ≤ Xi implies N c(t) ≥
R

N(t), and so M c(t) ≥ M(t). It follows that


t + c ≥ µ c[1 + M(t)].

157
After rearranging terms,
t −1M(t) ≤ µc−1 + t −1(c/µ c − 1).

Hence

lim sup t −1M(t) ≤ µc−1. (3.14)


t→∞

Rc R∞
Since limc→∞ µ c = limc→∞ 0 [1 − F(x)]dx = 0 [1 − F(x)]dx = µ while the left-hand side of
(3.14) is fixed, we deduce

lim sup t −1M(t) ≤ µ −1. (3.15)


t→∞

The proof of this theorem is complete by combining inequalities (3.13) and (3.15).

158
4 The Renewal Theorem
The subject of this section involves one of the most basic theorems in applied probability. The
renewal theorem can be regarded as a refinement of the asymptotic relation M(t) ∼ t/µ,t →
∞, established in Theorem 3.2.
The proof of the renewal theorem is lengthy and demanding. We will omit the details and
refer to Feller’s book for comprehensive details. However, its statement will be given with
care so that the student can understand its meaning and be able to apply it without ambiguity.
For the precise statement, we need several preliminary definitions.

Definition 4.1. A point α of a distribution function F is called a point of increase if for


every positive ε
F(α + ε) − F(α − ε) > 0.

A distribution function is said to be arithmetic if there exists a positive number λ such that F

159
exhibits points of increase exclusively among the points 0, ±λ , ±2λ , · · · . The largest such λ
is called the span of F.
A distribution function F that has a continuous part is not arithmetic. The distribution
function of a discrete random variable having possible values 0, 1, 2, · · · is arithmetic with
span 1.

Definition 4.2. Let g be a function defined on [0, ∞). For every positive δ and n = 1, 2, · · · ,
let
mn = min{g(t) : (n − 1)δ ≤ t ≤ nδ },
m̄n = max{g(t) : (n − 1)δ ≤ t ≤ nδ },
∞ ∞
σ (δ ) = δ ∑ mn, and σ̄ (δ ) = δ ∑ m̄n.
n=1 n=1

Then g is said to be directly Riemann integrable if both series σ (δ ) and σ̄ (δ ) converge


absolutely for every positive δ , and the difference σ̄ (δ ) − σ (δ ) goes to 0 as δ → 0.

160
Every monotonic function g which is absolutely integrable in the sense that
Z ∞
|g(t)|dt < ∞, (4.16)
0

is directly Riemann integrable, and this is the most important case for our purposes. Mani-
festly, all finite linear combinations of monotone functions satisfying (4.16) are also directly
Riemann integrable.

Theorem 4.1 (The Basic Renewal Theorem). Let F be the distribution function of a positive
random variable with mean µ. Suppose that a is directly Riemann integrable and that A is
the solution of the renewal equation
Z t
A(t) = a(t) + A(t − x)dF(x) (4.17)
0

161
i. If F is not arithmetic, then
( R∞
1
µ 0 a(x)dx, i f µ < ∞
lim A(t) =
t→∞ 0, i f µ = ∞.

ii. If F is arithmetic with span λ , then for all 0 ≤ c < λ ,


(
λ ∞
∑ a(c + nλ ), i f µ < ∞
lim A(c + nλ ) = µ n=0
n→∞ 0, i f µ = ∞.

There is a second form of the theorem, equivalent to that just given, but expressed more
directly in terms of the renewal function. Let h > 0 be given, and examine the special pre-
scription of (
1, i f 0 ≤ y < h
a(y) =
0, i f h ≤ y,

162
inserted in (4.17). In this example, for t > h, because of (3.11), we have
Z t
A(t) = a(t) + a(t − x)dM(x)
0
Z t
= dM(x)
t−h
= M(t) − M(t − h)

and µ −1 0∞ a(x)dx = h/µ. If F is not arithmetic, we may conclude on the basis of the renewal
R

theorem that
h
lim (M(t + h) − M(t)) = .
t→∞ µ
with the convention that h/µ = 0 when µ = ∞. If F is arithmetic with span λ , then letting
h = kλ and n > k,
A(c + nλ ) = M(c + nλ ) − M(c + (n − k)λ ).

163
So
λ ∞
lim M(c + nλ ) − M(c + (n − k)λ ) = ∑ a(c + nλ ) = kλ /µ.
n→∞ µ n=0

Theorem 4.2 Let F be the distribution function of a positive random variable with mean µ.
Let M(t) = ∑∞k=1 Fk (t) be the renewal function associated with F. Let h > 0 be fixed.

i. If F is not arithmetic, then


h
lim (M(t + h) − M(t)) = .
t→∞ µ

ii. If F is arithmetic, the same limit holds, provided h is a multiple of the span λ .

We conclude that the elementary renewal theorem is a corollary of Theorem 4.2.

164
Proof of Theorem 3.2 assuming Theorem 4.2 Assume F is not arithmetic. Take bn =
M(n + 1) − M(n). Then by Theorem 4.2, we have bn → 1/µ

1 1 n−1 1
M(n) = ∑ bk → .
n n k=0 µ

( One can refer to p15, the last paragraph.) Now for any t > 0, let [t] denote the largest integer
not exceeding t.
[t] M([t]) M(t) [t] + 1 M([t] + 1) M(t) 1
≤ ≤ ⇒ → .
t [t] t t [t] + 1 t µ
If F is arithmetic with span λ , we set bn = M[(n + 1)λ ] − M(nλ ). Then bn → λ /µ, which
implies n−1M(λ n) = n−1 ∑n−1
k=0 bk → λ /µ as n → ∞. Note that t = λ × t/λ and

[t/λ ] M(λ [t/λ ]) M(t) 1 + [t/λ ] M(λ (1 + [t/λ ])) M(t) 1


≤ ≤ ⇒ → .
t [t/λ ] t t 1 + [t/λ ] t µ

165
5 Applications of the Renewal Theorem
(a) Limiting Distribution of the Excess Life
Let γt = SN(t)+1 − t be the excess life at time t and for a fixed z > 0, set Az(t) = P(γt > z)
Using the renewal argument
!
N(t)+1
P(γt > z|X1 = x) = P(SN(t)+1 > z + t|X1 = x) = P ∑ Xi > z + t|X1 = x
i=1

1,

 if x > t +z
= 0, i f t + z ≥ x > t, (N(t) = 0)

Az(t − x), i f t ≥ x > 0.

Then by the law of total probability,


Z ∞ Z t
Az(t) = P(γt > z|X1 = x)dF(x) = 1 − F(t + z) + Az(t − x)dF(x).
0 0

166
Theorem 3.1 yields
Z t
Az(t) = 1 − F(t + z) + (1 − F(t + z − x))dM(x).
0
R∞
We assume that µ = EX1 = 0 (1 − F(x))dx < ∞. Then
Z ∞ Z ∞
(1 − F(t + z))dt = (1 − F(y))dy < ∞ ⇒ 1 − F(t + z)
0 z

is directly Riemann integrable as a function of t with z fixed. Applying Theorem 4.1 yields
Z ∞
−1
lim Az(t) = µ (1 − F(y))dy (5.18)
t→∞ z

in case F is not arithmetic.

Joint limit for δt and γt .

167
Note that {γt ≥ x and δt ≥ y} = {γt−y ≥ x + y}. (Proof of ’⇒’ By definition γt−y =
SN(t−y)+1 − (t − y). If δt ≥ y, there is no renewal in (t − y,t] and we have N(t − y) = N(t). So
γt−y = SN(t)+1 − (t − y) = γt + y ≥ x + y. Proof of ’⇔’ If there is a renewal in (t − y,t], then
SN(t−y)+1 ≤ t, which contradicts with γt−y ≥ x + y. So there is no renewal in (t − y,t]. Hence
δt ≥ y. Clearly, γt ≥ SN(t−y)+1 − t ≥ x.)
It follows that
Z ∞
−1
lim P(δt ≥ y, γt ≥ x) = lim P(γt−y ≥ x + y) = µ (1 − F(z))dz
t→∞ t→∞ x+y

Letting x = 0 gives limt→∞ P(δt ≥ y) = µ −1


R∞
y (1 − F(z))dz

Next Aim: Limit distribution for βt .


Define Kx (t) = P(βt > x). Given X1 = y, XN(t)+1 is just the total life at t − y. In other
words, since X1 = y, we can restart the renewal process at y, so t becomes t − y, and the
distribution of XN(t)+1 given X1 = y is just the distribution of XN(t−y)+1 which is the total life

168
at t − y. So conditioning on the time of the first renewal event, we have
P(βt > x|X1 = y) = P(SN(t)+1 − SN(t) > x|X1 = y) = P(XN(t)+1 > x|X1 = y)

1,

 i f y > max(x,t),
= Kx (t − y), i f y ≤ t,

0, i f otherwise (i.e. t < y ≤ x. So N(t) = 0.)

Taking integration gives


Z t
Kx (t) = 1 − F(x ∨ t) + Kx (t − y)dF(y).
0
Application of the renewal theorem furnishes the limit law
1 ∞ 1 ∞
Z Z
lim P(βt > x) = lim Kx (t) = (1 − F(max(x, τ)))dτ = ξ dF(ξ )
t→∞ t→∞ µ 0 µ x
1 x
Z
⇒ lim P(βt ≤ x) = ξ dF(ξ ).
t→∞ µ 0

169
(b) Asymptotic Expansion of the Renewal Function
Suppose F is a nonarithmetic distribution with a finite variance σ 2. We want to show

−1 σ 2 − µ2
lim (M(t) − µ t) = .
t→∞ 2µ 2

Define H(t) = M(t) + 1 − µ −1t = µ −1ESN(t)+1 − µ −1t = µ −1Eγt .


Note that
(
x − t, i f x > t,
E(γt |X1 = x) =
µH(t − x), i f x ≤ t.

Invoking the law of total probability yields


Z ∞ Z ∞ Z t
µH(t) = E(γt |X1 = x)dF(x) = (x − t)dF(x) + µ H(t − x)dF(x).
0 t 0

170
Now Z ∞ Z ∞ Z ∞
(x − t)dF(x) = ydF(t + y) = (1 − F(t + y))dy
t 0 0
R∞
is a monotonic function of t, and expressing 1 − F(t + y) = t+y dF(z) and interchanging the
orders of integration leads to
Z ∞ Z ∞  Z ∞Z ∞Z ∞
(1 − F(t + y))dy dt = dF(z)dydt
0 0 0 0 t+y
Z ∞ Z ∞ Z z−t  Z ∞Z ∞
= dy dF(z)dt = (z − t)dF(z)dt
0 t 0 0 t
Z ∞Z z Z ∞ z2 σ 2 + µ2
= (z − t)dtdF(z) = dF(z) = < ∞.
0 0 0 2 2
Thus the renewal theorem (Theorem 4.1) implies
σ 2 + µ2
lim µH(t) = ,
t→∞ 2µ

171
or

−1 σ 2 + µ2 (σ 2 − µ 2)
lim (M(t) − µ t) = lim (H(t) − 1) = −1 =
t→∞ t→∞ 2µ 2 2µ 2
as is to be shown.

172
173
Chapter 7

Review
1 Markov chain
(
1, i = j
• Pinj = ∑∞k=0 Pikr Pksj , r + s = n, and Pi0j =
0, i 6= j.

174
• If Pinj > 0 for some n ≥ 0, we denote it by i → j.

• fiin (for n ≥ 1) denotes the probability that the MC starts at i, the first time it is back to i
at the nth step.

• Note that fii1 = Pii; and Piin = ∑∞k=0 fiik Piin−k .

• Generating functions: Pi j (s) = ∑∞n=0 Pinj sn and Fi j (s) = ∑∞n=0 finj sn.
We have Fii(s)Pii(s) = Pii(s) − 1 or Pii(s) = 1−F1ii(s) .

• Pinj = ∑nk=0 fikj Pjn−k


j , and Pi j (s) = Fi j (s)Pj j (s).

• i is recurrent ⇔ ∑∞n=1 fiin = 1 ⇔ ∑∞n=1 Piin = ∞


Aperiodic: the period for state i is =1 for all i.

• Basic Limit Theorem of Markov Chain

175
– Suppose MC is recurrent, irreducible and aperiodic. Recall
(
n
1, n = 0
Piin − ∑ fiin−k Piik =
k=0 0, n > 0.

Then
1
i. limn→∞ Piin = k ;
∑∞
n=0 n f ii
ii. limn→∞ Pjin = limn→∞ Piin.

– If MC is positive recurrent, irreducible and aperiodic, then


limn→∞ Pjnj = π j = ∑∞i=0 πiPi j , ∑∞i=1 πi = 1
(
∑∞i=1 πi = 1, π ≥ 0,
⇒ πi are uniquely determined by
π j = ∑∞i=0 πiPi j , for all j.

176
2 Poisson Process {N(t)}
• N(t) − N(s) ∼ Poisson(λ (t − s)),

• independent increment,

• P(N(t + h) − N(t) = 1|N(t) = k) = P(N(0, h) = 1) = λ h + o(h),


P(N(t + h) − N(t) = 0|N(t) = k) = P(N(0, h) = 0) = 1 − λ h + o(h),
P(N(t + h) − N(t) ≥ 2|N(t) = k) = P(N(0, h) ≥ 2) = o(h).

• General pure birth process, and birth and death process.

3 Brownian Motion
Definition. Reflection principle.

177
4 Renewal Process {N(t)}
• Some key terms/relation:
Sn = ∑ni=1 Xi;
N(t) = max{n : Sn ≤ t};
{N(t) ≥ k} = {Sk ≤ t}.

• P(N(t) = k) = P(N(t) ≥ k) − P(N(t) ≥ k + 1) = P(Sk ≤ t) − P(Sk+1 ≤ t)

• Renewal function M(t) = E[N(t)]

• E[N(t)] = ∑∞k=1 P(N(t) ≥ k) = ∑∞k=1 P(Sk ≤ t)


(
0, x > t,
• Renewal argument E(N(t)|X1 = x) =
1 + M(t − x), x ≤ t.

178
This implies
Z t Z t
M(t) = E(N(t)|X1 = x)dF = F(t) + M(t − x)dF(x).
0 0

• E(SN(t)+1) = E[X1] · E[N(t) + 1] = E[X1][M(t) + 1].

• Elementary renewal theorem: Recall {Xi} i.i.d. with EXi = µ > 0. Then,
M(t) EN(t) 1
lim = lim = .
t→∞ t t→∞ t µ

• Three associated random variables


Excess (or Residual) life–the time from t to the next event (renewal): γ(t) = SN(t)+1 − t
Age: δ (t) = t − SN(t)
Total life: β (t) = γ(t) + δ (t)

179
• Inspection paradox

180

You might also like