All Chapters
All Chapters
All Chapters
Elements of Stochastic
Processes
1
Definition 1.1 A family of random variables Xt , where t is a parameter running over an index
set T , is called a stochastic process. Write {Xt ,t ∈ T }.
A realization or sample function of stochastic process {Xt ,t ∈ T } is an assignment to each
t ∈ T , of a possible value of Xt — function of t.
2
(2) Index parameter T
• If T = {0, 1, 2, · · · }, then we say that Xt is a discrete time process. In this case, we shall
write Xn instead of Xt .
• If T = [0, ∞), then Xt is called a continuous time process.
Example B Martingales
We say that {Xt ,t ∈ T } is a martingale if for any t1 < t2 < · · · < tn < tn+1, we have
3
for all values a1, · · · , an. It may be considered as a model for fair games, in the sense that Xt
signifies the amount of money that a player has at time t. The martingale property states that
the average amount a player will have at time tn+1, given that he has amount an at time tn, is
equal to an regardless of what his past fortune has been. For example, if Xn = Z1 + · · · + Zn,
n = 1, 2, · · · , is a discrete time martingale if the Zi are independent and have means zero.
(Exercise: Prove it.)
Example C Markov process
A Markov process is a process with the property that given the values of Xt , the values of Xs,
s > t, don’t depend on the values of Xu, u < t. Specifically, a process is said to be Markovian
if
P(a < Xt ≤ b|Xt1 = x1, · · · , Xtn = xn) = P(a < Xt ≤ b|Xtn = xn), (1.1)
4
Let A be an interval of the real line. The function
P(x, s;t, A) = P(Xt ∈ A|Xs = x), t > s,
is called the transition probability function. We may write (1.1) as
P(a < Xt ≤ b|Xt1 = x1, · · · , Xtn = xn) = P(xn,tn;t, A),
where A = {ξ : a < ξ ≤ b}.
A Markov process with finite or denumerable state space is called a Markov chain.
A Markov process for which all realizations or sample functions {Xt ,t ∈ [0, ∞)} are continu-
ous functions is called a diffusion process.
Poisson process is a continuous time Markov chain, and Brownian motion is a diffusion
process.
5
Example D Stationary process
A stochastic process {Xt ,t ∈ T }1 is said to be strictly stationary if the joint distribution func-
tions of the families of r.v.’s
are the same for all h > 0 and arbitrary selections t1,t2, · · · ,tn from T . It means that the
process is in probabilistic equilibrium and that the particular times at which we examine the
process are of no relevance.
A stochastic process {Xt ,t ∈ T } is said to be wide sense stationary or covariance stationary
if it possesses finite second moments and if cov(Xt , Xt+h) depends only on h for all t ∈ T . A
stationary process with finite second moments is covariance stationary. There are covariance
stationary processes that are not stationary.
1 Here T could be one of the sets (−∞, ∞), [0, ∞), the set of all integers, or the set of all positive integers.
6
A Markov process is said to have stationary transition probabilities if P(x, s;t, A) is a func-
tion only of t − s. Note that a Markov process with stationary transition probability is not
necessarily a stationary process.
Neither the Poisson process nor the BM process is stationary. But for Poisson/BM process
{Xt ,t ≥ 0},
Zt = Xt+h − Xt , t > 0,
2 Some preliminaries
R∞
Proposition 2.1 Suppose P(X ≥ 0) = 1. Then EX = 0 P(X > x)dx. Moreover, if ∑∞k=0 P(X =
k) = 1, then EX = ∑∞k=0 P(X > k) = ∑∞k=0 P(X ≥ k).
7
Proof. We consider
Z X
Z ∞
EX = E 1 dt = E I(X > t)dt
0 0
Z ∞ Z ∞
= E[I(X > t)]dt = P(X > x)dx.
0 0
If X is integer-valued, then P(X > s) = P(X > k) for k ≤ s < k + 1. Hence
∞ Z k+1 ∞
EX = ∑ P(X > s)ds = ∑ P(X > k).
k=0 k k=0
The last equality is obvious.
8
Chapter 2
Markov Chains
1 Definitions
A discrete time Markov chain {Xn} is a Markov stochastic process whose state space is a
countable or finite set, and for which T = (0, 1, 2, · · · ).
9
We usually label the state space of the process by the non-negative integers (0, 1, 2, · · · ) and
speak of Xn being in state i if Xn = i.
The probability of Xn+1 being in state j given that Xn is in state i is denoted by Pin,n+1
j , i.e.,
Pin,n+1
j = P{Xn+1 = j|Xn = i} (1.1)
Pin,n+1
j is called a one-step transition probability.
In general the transition probabilities are functions of not only the initial and final state,
but also of the time of transition as well. When one-step transition probabilities are inde-
pendent of the time variable, then we say that the Markov process has stationary transition
probabilities and write Pin,n+1
j = Pi j .
10
We can arrange the probabilities as a matrix,
P00 P01 P02 P03 ...
P
10 P11 P12 P13 ...
P
20 P21 P22 P23 ...
P = ..
. ... ... ...
Pi0 Pi1 Pi2 Pi3 ...
... ... ... ...
We call P = (Pi j ) the Markov matrix or transition probability matrix of the process. The
(i + 1)st row1 of P is the probability distribution of the values of Xn+1 given Xn = i. If the
number of states is finite, then P is a finite square matrix whose order (the number of rows)
is equal to the number of states.
1 It is because the states are 0, 1, . . . .
11
Clearly, the Pi j s satisfy
Pi j ≥ 0, i, j = 0, 1, 2, . . . ,
∞
∑ Pi j = 1, i = 0, 1, 2, . . . .
j=0
The 2nd condition expresses the fact that some transition occurs at each trial. (One says that
a transition has occurred even if the state remains unchanged.)
Proposition 1.1 A discrete time Markov chain {Xn} with stationary transition probabilities
is completely determined by (1.1) and the initial value (or distribution) of X0.
Proof Let P{X0 = i} = pi. Since any probability involving X j1 , . . . , X jk for j1 < · · · < jk may
be obtained by summing terms of the form (1.2) below, it suffices to show how to compute
the quantities
P{X0 = i0, X1 = i1, . . . , Xn = in}. (1.2)
12
We have
13
(i) Consider the process Xn, n = 0, 1, . . . , defined by Xn = ξn, (X0 = ξ0 prescribed).2
Pi,n,n+1
j = P{Xn+1 = j|Xn = i} = P{ξn+1 = j|ξn = i} = P{ξn+1 = j} = a j
Its Markov matrix has the form
a0 a1 a2 . . .
P = a0 a1 a2 . . .
... ... ...
14
Therefore, the transition probability matrix, P, is given as
a0 a1 a2 . . .
0 a0 a1 . . .
P= .
0 0 a0 . . .
... ... ...
15
taken as the nonnegative integers, then the transition matrix of a random walk has the form
r0 p 0 0 0 . . .
q 1 r1 p 1 0 . . .
P= ,
0 q2 r2 p2 . . .
... ... ... ...
where pi ≥ 0, qi ≥ 0, ri ≥ 0 and pi + qi + ri = 1, i = 1, 2, . . . , p0 ≥ 0, r0 ≥ 0, p0 + r0 = 1. If
Xn = i, then for i ≥ 1,
16
(C) A discrete queueing Markov chain
Customers arrive for service and take their place in a waiting line. During each period of time
a single customer is served, provided that at least one customer is present. If no customer
awaits service then during this period no service is performed. (We can imagine, for example,
taxi stand at which a cab arrives at fixed time intervals to give service. If no one is present the
cab immediately departs.) During a service period new customers may arrive. We suppose
the actual number of arrivals in the nth service period is a r.v. ξn whose distribution function is
independent of the period and is given by P(k customers arrive in a service period ) = P(ξn =
k) = ak ≥ 0, k = 0, 1, . . . and ∑∞i=0 ai = 1. We also assume the r.v.’s ξn are independent. The
state of the system at the start of each period is defined to be the number of customers waiting
in line for service. If the present state is i then after a lapse of one period the state is
(
i − 1 + ξ if i ≥ 1
j=
ξ if i = 0
17
where ξ is the number of new customers having arrived in this period while a single customer
was served. We can write Xn+1 = (Xn − 1)+ + ξn, where y+ = max(y, 0).
Pi,n,n+1
j =P(Xn+1 = j|Xn = i) = P((Xn − 1)+ + ξn = j|Xn = i)
=P((i − 1)+ + ξn = j|Xn = i) = P(ξn = j − (i − 1)+)
a0 a1 a2 . . .
a a1 a2 . . .
P= 0
0 a0 a1 . . .
... ... ...
18
assume that the “total” demand for the commodity over the period (tn−l ,tn) is a random
variable ξn whose distribution function does not depend on the time period,
P(ξn = k) = ak , k = 0, 1, 2, . . .
where ak ≥ 0 and ∑∞k=0 ak = 1. The stock level is examined at the start of each period. An
inventory policy is prescribed by specifying two nonnegative critical values s and S > s. The
implementation of the inventory policy is as follows: If the available stock quantity is not
greater than s then immediate procurement is done so as to bring the quantity of stock on
hand to the level S. If, however, the available stock is in excess of s then no replenishment of
stock is undertaken. Let Xn denote the stock on hand just prior to restocking at tn. The states of
the process {Xn} consist of the possible values of the stock size S, S − 1, . . . , 1, 0, −1, −2, . . . ,
where a negative value is interpreted as an unfulfilled demand for stock, which will be sat-
isfied immediately upon restocking. So the stock levels at two consecutive periods are con-
19
nected by the relation
(
Xn − ξn+1 if s < Xn ≤ S,
Xn+1 =
S − ξn+1 if Xn ≤ s.
If we assume the ξn’s to be mutually independent, then the stock values X0, X1, . . . constitute
a Markov chain. Its transition probabilities are
20
the form
p0 q0 0 0 ...
p 0 q1 0 ...
P= 1 , (2.3)
p2 0 0 q2 . . .
... ... ... ...
21
Similarly, when the preceding r + 1 trials in order had the outcomes F, S, S, ..., S, the state
variable would carry the label r. The process is clearly Markovian (since the individual trials
were independent of each other) and its transition matrix has the form (2.3) where pn = β ,
n = 0, 1, · · · .
P(ξ = k) = ak ≥ 0, k = 0, 1, . . . (2.4)
where, ∑∞k=0 ak = 1. We assume that all offspring act independently of each other and at the
end of their lifetime individually have progeny in accordance with the probability distribution
(2.4). The process {Xn}, where Xn is the population size at the nth generation, is a Markov
22
chain. The transition matrix is obviously given by
where the ξi’s are independent observations of a r.v. with probability law (2.4).
p j = j/(2N), q j = 1 − p j
respectively. Repeated selections are done with replacement. By this procedure we generate a
Markov chain {Xn} where Xn is the number of a-genes in the nth generation among a constant
23
population size of 2N elements. The state space contains the 2N + 1 values {0, 1, 2, ..., 2N}.
The transition probability matrix is computed according to the binomial distribution as
2N k 2N−k
P(Xn+1 = k|Xn = j) = Pjk = p j q j , ( j, k = 0, 1, · · · , 2N).
k
Theorem 3.1 If the one-step transition probability matrix of a Markov chain is P = (Pi j ),
then
∞
Pinj = ∑ Pikr Pksj , (3.5)
k=0
for states i, j, and any fixed pair of nonnegative integers r and s satisfying r + s = n.
24
Here, we define
(
1, i = j,
Pi0j =
0 i 6= j,
and Pinj = P(Xm+n = j|Xm = i) independent of m. Eqn (3.5) is called Chapman-Kolmogorov
equations.
(n)
Note that Pi,nj does NOT mean Pi, j raised to the power n! We use Pi,nj (an abbreviation of Pi, j )
to denote the probability of going from state i to state j in exactly n transitions.
Proof. We prove the case n = 2. The event of going from state i to state j in two transitions
can be realized in the mutually exclusive ways of going to some intermediate state k (k =
0, 1, 2, . . . ) in the first transition and then going from state k to state j in the second transition.
Because of the Markovian assumption the probability of the second transition is Pk j , and that
of the first transition is Pik . If we use the law of total probabilities (3.5) follows. The argument
in the general case is identical.
25
Remark: It is easy to check that the nth step transition probability matrix is given by
n
n
z }| {
P =P × P × · · · × P .
26
The proof of transitivity proceeds as follows: i ↔ j and j ↔ k imply that there exist integers
n and m such that Pinj > 0 and Pjkm > 0. Consequently, by (3.5) and the nonnegativity of each
Prst , we conclude that
∞
Pikn+m = ∑ Pirn Prkm ≥ Pinj Pjkm > 0.
r=0
A similar argument shows the existence of an integer v such that Pkiv > 0, as desired.
We say that the Markov chain is irreducible if the equivalence relation induces only one class,
that is, all states communicate with each other.
27
Example 1
1 1
0
2 2 0 0
1 3
0
4 4 0 0
P 0
1
P= 0 0 0 1 0 =
0 P2
0 0 12 0 1
2
0 0 0 1 0
This Markov chain is divided into two classes {0, 1} and {2, 3, 4}, and hence is reducible.
The period of state i, written d(i), is defined to be the greatest common divisor (g.c.d.) of all
integers n ≥ 1 for which Piin > 0.
Convention: If Piin = 0 for all n ≥ 1, we define d(i) = 0.
28
Example 2 Consider the n by n matrix below,
0 1 0 0 ... 0
0 0 1 0 ... 0
P = . . . . . . . . . . . . . . .
0 0 0 0 ... 1
1 0 0 0 ... 0
Specifically, for n = 3,
0 1 0 0 0 1 1 0 0
P = 0 0 1 ⇒ Pii = 0; P2 = 1 0 0 ⇒ Pii2 = 0; P3 = 0 1 0 ⇒ Pii3 = 1.
1 0 0 0 1 0 0 0 1
29
Theorem 4.1 (i) If i ↔ j, then d(i) = d( j).
(ii) If state i has period d(i), then there exists an integer N depending on i such that for all
integers n ≥ N
nd(i)
Pii >0
m+nd(i)
(iii) If Pjim > 0, then Pji > 0 for all n sufficiently large.
Lemma 4.1 Let n1, . . . , nk be positive integers with greatest common divisor d. Then there
exists a positive integer M such that m ≥ M implies existing nonnegative integers {c j }kj=1 s.t.
md = ∑kj=1 c j n j .
Remark Lemma 4.1 or an alternative lemma for (ii) in Theorem 4.1 are number theoretic
results, and its proof is omitted here.3
3 For more detailed hint of the proof of Lemma 4.1, see page 77 Problem 2 in Karlin & Taylor (1975) A First Course in Stochastic
Process. For an alternative lemma, see Section 2.8.2 in Port, Hoel & Stone.
30
Proof of Theorem 4.1 (ii) If d(i) = 0, then the statement is obviously true for all n ≥ 1. So,
n n `
we assume d(i) > 0. That is, ∃n0 s.t. Pii 0 > 0 ⇒ Pii 0 > 0 for any positive integer ` because
∞
n ` n (`−1) n (`−1)
Pii 0 = ∑ Pikn0 Pki0 n
≥ Pii 0 Pii 0
n
≥ · · · ≥ (Pii 0 )` > 0.
k=0
Then by Lemma 4.1, there exists an integer N depending on i such that for all integers n ≥ N,
nd(i) ∑ c jn j c n
Pii = Pii ≥ Piic1n1 Piic2n2 . . . Pii k k > 0.
m+nd(i) nd(i)
(iii) Obviously, as Pji ≥ PjimPii > 0.
nd(i)
(i) From (ii), we know that ∃N, s.t. ∀n ≥ N, Pii > 0.
31
m
Given i ↔ j, ∃ m0, m1 s.t. Pi j 0 > 0 and Pjim1 > 0. Then,
∞
m +m +nd(i) m +nd(i) m +nd(i)
Pj j0 1 = ∑ Pjkm1 Pk j0 ≥ Pjim1 Pi j 0
k=0
∞
nd(i) m0 nd(i) m0
= Pjim1 ∑ Pik Pk j ≥ Pjim1 Pii Pi j > 0.
k=0
This gives that d( j)|m0 + m1 + nd(i) and d( j)|m0 + m1 + (n + 1)d(i). So d( j)|d(i). Similarly,
d(i)|d( j). So d( j) = d(i).
l d(i) l d(i)
(i) Method 2. ∃ l1, l2, s.t. Pii1 > 0, Pii2 > 0 and g.c.d. {l1, l2} = 1. Since i ↔ j, ∃ m, n
m+n+l1 d(i) m+n+l2 d(i) m+n+2l1 d(i)
such that Pinj > 0, Pjim > 0. So Pj j > 0. Similarly, Pj j > 0, Pj j > 0,
m+n+2l d(i)
2
Pj j > 0 ⇒ l1d(i) = k3d( j), l2d(i) = k4d( j) ⇒ d(i) = k5d( j). [Note that if g.c.d.(a, b) =
1, then there exist α and β s.t. aα + bβ = 1.] In a similar way, d( j) = k6d(i).
32
(i) Method 3. Since i ↔ j, ∃m, n, s.t. Pjim > 0, Pinj > 0. If Piis > 0, then Pjs+m+n
j ≥ PjimPiisPinj >
0 ⇒ s + m + n = ld( j). It follows from Piis > 0, then Pii2s ≥ PiisPiis > 0. So Pj2s+m+n
j >0⇒
2s + m + n = l1d( j). Thus s = (l1 − l)d( j) ⇒ d( j)|d(i). Similarly, d(i)|d( j).
A Markov chain in which each state has period one is called aperiodic. The vast majority
of Markov chain processes we deal with are aperiodic. Random walks usually typify the
periodic cases arising in practice. Results will be developed for the aperiodic case and the
modified conclusions for the general case will be stated usually without proof.
5 Recurrence
For any fixed i. We define, for each integer n ≥ 1,
33
is the probability of the first passage from state i, to state j at the nth transition. We define
fi0j = 0 for all i and j. Recall that Pi0j = 1 if i = j, = 0 if i 6= j.
34
The generating function Fi j (s) of the sequence { finj } is
∞
Fi j (s) = ∑ finj sn, for |s| < 1.
n=0
We say that a state i is recurrent if and only if ∑∞n=1 fiin = 1. In other words, a state i is
recurrent if and only if, starting from state i, the probability of returning to state i after some
35
finite length of time is one. A non-recurrent state is said to be transient, i.e., ∑∞n=1 fiin < 1.
(Note that ∑∞n=1 fiin ≤ 1. Why?)
Before stating a theorem relating the recurrence/non-recurrence of a state to the behavior
of its n-step transition probabilities Piin, we need the following
36
Theorem 5.1 (i) A state i is recurrent iff ∑∞n=0 Piin = ∞.
(ii) If i ↔ j and i is recurrent, then j is recurrent.
Proof. (i)“⇒” Assume i is recurrent, i.e. ∑∞n=1 fiin = 1. Then by Lemma 5.1(a) lims→1− Fii(s) =
1. Thus lims→1− Pii(s) = lims→1−(1−Fii(s))−1 = ∞. Applying Lemma 5.1(b), we have ∑∞n=0 Piin =
∞.
“⇐” We shall prove by contradiction. Suppose i is transient, i.e., ∑∞n=1 fiin < 1. By Lemma
5.1(a), we have lims→1− Fii(s) < 1, so lims→1− Pii(s) < ∞. Now appealing to Lemma 5.1(b),
we have ∑∞n=0 Piin < ∞, which gives the desired contradiction.
(ii) Since i ↔ j, there exist m, n ≥ 1 s.t. Pinj > 0, Pjim > 0. For any positive integer v, we
have Pjm+n+v
j ≥ PjimPiivPinj , and on summing over ν,
∞ ∞ ∞
∑ Pjm+n+v
j ≥ ∑ PjimPiivPinj = PjimPinj ∑ Piiv.
v=0 v=0 v=0
37
Given the assumption that i is recurrent, by (i) ∑∞v=0 Piiv = ∞, hence ∑∞v=0 Pjvj = ∞.
2n (pq)n22n (4pq)n
P00 ∼ √ = √ .
πn πn
38
Note further, 4pq ≤ 1 with equality iff p = q = 12 . Hence, ∑∞n=0 P00
n
= ∞ iff p = q = 12 . By
Theorem 5.1, the one-dimensional random walk is recurrent if and only if p = q = 12 .
7 More on Recurrence
Define Qii = P{MC starting in state i returns infinitely often to state i}.
Theorem 7.1 State i is recurrent or transient according to whether Qii = 1 or 0, respectively.
Proof. Let QNii be defined as
QNii = P{a particle starting in state i returns to state i at least N times}
39
Then, QNii = ∑∞k=1 fiik QN−1
ii = QN−1
ii fii∗, where fii∗ := ∑∞k=1 fiik . Proceeding recursively,
QNii = fii∗QN−1
ii = ( fii∗)2QiiN−2 = · · · = ( fii∗)N−1Q1ii = ( fii∗)N .
Since limN→∞ QNii = Qii, we have Qii = 1 or 0 according to fii∗ = 1 or < 1, respectively.
Equivalently, according to whether state i is recurrent or transient.
Proof. Since j → i, ∃N, s.t. f jiN > 0. Let N0 be the smallest N such that f jiN > 0. So
N
1 − f j∗j = P{a particle starting in state j never returns to state j} ≥ f ji 0 (1 − fi∗j ),
40
where 1 − fi∗j is the prob of never going to state j from state i. If fi∗j < 1, then 1 − f j∗j > 0 ⇒
f j∗j < 1. This implies state j is transient, contradicting the fact that j is recurrent. So, fi∗j = 1.
Qi j = fi∗j Q j j .
41
42
Chapter 3
∞
un = bn + ∑ an−k uk , for n = 0, ±1, ±2, . . . ,
k=−∞
is satisfied by a bounded sequence {un} of real numbers, then (i) limn→∞ un exists, (ii) limn→−∞ un
exists. Furthermore, (iii) if limn→−∞ un = 0, then
∑∞k=−∞ bk
lim un = ∞ .
n→∞ ∑k=−∞ kka
44
In the case that ∑∞k=−∞ kak = ∞, the limit relations are still valid provided we interpret
∑∞k=−∞ bk
∞ = 0.
ka
∑k=−∞ k
The proof of this theorem in its general form as stated here is beyond the scope of this class.
We will apply this theorem only for the case where ak = uk = bk = 0 for k < 0, and bk ≥ 0.
Remark In the case where a−k = b−k = u−k = 0 for k > 0, the renewal equation becomes
n n
un = bn + ∑ ak un−k = bn + ∑ an−k uk , for n = 0, 1, 2, · · · .
k=0 k=0
Remark (Reason for the term “renewal equation”) Consider a light bulb whose lifetime,
measured in discrete units, is a random variable ξ where P(ξ = k) = ak , for k = 0, 1, 2, · · · .
Thus ak ≥ 0 and ∑∞k=0 ak = 1. Let each bulb be replaced by a new one when the one in use
45
burns out. Suppose the first bulb lasts until time ξ1, the second bulb until time ξ1 + ξ2, and
the n-th bulb until time ∑ni=1 ξi. Here, ξi’s are independent identically distributed as ξ . Let un
denote the expected number of renewals (replacements) up to time n. If the first replacement
occurs at time k then the expected number of replacements in the remaining time up to n is
un−k , and summing over all possible values for k, we obtain
n ∞
un = ∑ (1 + un−k)ak + 0 ∑ ak
k=0 k=n+1
n n
= ∑ ak + ∑ akun−k (1.1)
k=0 k=0
n
= bn + ∑ ak un−k ,
k=0
where bn := ∑nk=0 ak . The reasoning behind (1.1) goes as follows. There are two cases to
consider: (i) If the first bulb fails at time k(0 < k < n), which happens with probability ak ,
46
then the factor 1 + un−k is the expected number of replacements in time n. (ii) If the first
bulb lasts a duration exceeding n time units with probability ∑∞k=n+1 ak , no replacement has
happened.
The theorem below describes the limiting behavior of Pinj as n → ∞ for all i and j in the case
of an aperiodic recurrent Markov chain. The proof is a simple application of Theorem 1.1.
Theorem 1.2 (Basic limit theorem of Markov chains) Consider a recurrent irreducible ape-
riodic Markov chain. For all states, i and j,
47
Proof. (a) Note that (see Section 5)
(
n
1, n = 0;
Piin − ∑ fiin−k Piik =
k=0 0, n > 0.
Take
( ( (
Piik , k ≥ 0, fiik , k ≥ 0, 1, k = 0,
uk = ak = bk =
0, k < 0; 0, k < 0; 0, k 6= 0;
and then apply Theorem 1.1.
(b) We use the recursion relation
n
Pjin = ∑ f jik Piin−k i 6= j, n ≥ 0. (1.2)
k=0
Write
yn = Pjin , an = f jin , xn = Piin.
48
Eqn (1.2) is of the form yn = ∑nk=0 an−k xk , where am ≥ 0, ∑∞m=0 am = 1, limk→∞ xk = c. It is
known that limn→∞ yn = c. Apply Theorem 7.2 to arrive at the desired result.
If limn→∞ Piin = πi > 0, for one i in an aperiodic class, then we may show that π j > 0 for
all j in the class of i (Pjm+n+v
j ≥ PjimPiivPinj ). So in this case, we call the class positive recurrent
or strongly ergodic. If each πi = 0, and the class is recurrent we call the class null recurrent
or weakly ergodic.
49
If {πi : i ≥ 0} satisfies (1.3), it is called a stationary probability distribution of the MC.
Proof. For every n and M, 1 = ∑∞j=0 Pinj ≥ ∑Mj=0 Pinj . Letting n → ∞, and using Theorem 1.2,
we obtain 1 ≥ ∑Mj=0 π j for every M. Thus ∑∞j=0 π j ≤ 1.
Also Pin+1
j ≥ ∑M n M
k=0 Pik Pk j ; if we let n → ∞, we obtain π j ≥ ∑k=0 πk Pk j . Since the left-hand
side is independent of M, letting M → ∞ gives
∞
πj ≥ ∑ πkPk j . (1.4)
k=0
Multiplying (1.4) on the right by Pji, then summing over j yields π j ≥ ∑∞k=0 πk Pk2j . Inductively,
π j ≥ ∑∞k=0 πk Pknj for any n.
Suppose strict inequality holds for some j. Adding these inequalities with respect to j, we
have
∞ ∞ ∞ ∞ ∞ ∞
∑ πj > ∑ ∑ πk Pknj = ∑ πk ∑ Pknj = ∑ πk,
j=0 j=0 k=0 k=0 j=0 k=0
50
a contradiction. Thus, π j = ∑∞k=0 πk Pknj for any n. Letting n → ∞, since ∑ πk converges and
Pknj is uniformly bounded, we conclude that
∞ ∞
πj = lim
∑ πk n→∞ Pknj = π j ∑ πk for every j.
k=0 k=0
51
Example Consider the class of random walks whose transition matrices are given by
0 1 0 0 ...
q1 0 p1 0 . . .
P= .
0 q2 0 p2 . . .
... ... ... ...
This Markov chain has period 2. Nevertheless we investigate the existence of a stationary
probability distribution, i.e., we wish to determine the positive solutions of
∞
xi = ∑ x j Pji = pi−1xi−1 + qi+1xi+1, i = 0, 1, · · · , (1.5)
j=0
under the normalization ∑∞i=0 xi = 1, where p−1 = 0 and p0 = 1, and thus x0 = q1x1. Letting
i = 1 in Eqn (1.5), x2 can be expressed in terms of x0; then letting i = 2 in Eqn (1.5), x3 can
52
be expressed in terms of x0; and so. Indeed, by induction,
i−1
pk
xi = x0 ∏ , i ≥ 1.
q
k=0 k+1
Since
∞ i−1
pk
1 = x0 + ∑ x0 ∏ ,
i=1 q
k=0 k+1
we have
1
x0 = i−1 pk
.
1 + ∑∞i=1 ∏k=0 q k+1
Therefore,
∞ i−1
pk
x0 > 0 ⇔ ∑ ∏ qk+1 < ∞.
i=1 k=0
53
In particular, if pk = p and qk = q = 1 − p for k ≥ 1, the series
∞ i−1 i
∞
pk 1 p
∑ ∏ qk+1 = ∑
i=1 k=0 p i=1 q
2 Absorption Probabilities
If T is the set of all transient states, then define
xi1 = ∑ Pi j ≤ 1, i ∈ T,
j∈T
1 That is, the MC is more likely to go to the left than to the right.
54
and define
xin = ∑ Pi j xn−1
j , n ≥ 2, i ∈ T.
j∈T
In words, xin denotes the probability that, starting from i, the Markov chain stays within T for
the next n transitions. Since xin ≤ 1 (why?) for all n ≥ 1, we shall prove by induction that xin
is non-increasing as a function of n as follows:
xi2 = ∑ Pi j x1j ≤ ∑ Pi j = xi1.
j∈T j∈T
55
It follows that if the only bounded solution of this set of equations is the zero vector
(0, 0, ...), then starting from any transient state absorption into a recurrent class occurs with
probability one. The reason is as follows. It is clear that xi (i ∈ T ) is the probability of never
being absorbed into a recurrent class, starting from state i. Since this sequence is a bounded
solution of (2.6), it follows that xi is zero for all i.
Let C,C1,C2, · · · denote recurrent classes. For a transient state i, let πi(C) be the probability
that the process starting at state i will be eventually absorbed in C.2
Let πin(C)= probability that the process will enter and thus be absorbed in C for the first time
at the n-th transition, given that the initial state is i ∈ T .
2 Once the process enters a recurrent class, it will never leaves the recurrent class. Proof Suppose α is in a recurrent class C, β
not in C and Pαβ > 0. Then Pβnα = 0 for any n > 0. Otherwise, β would be in C too. So if the process starts in α, there exists positive
probability, at least Pαβ , that the process will not return to α. This contradicts the fact that α is recurrent. Hence Pαβ = 0.
56
Then
∞
πi(C) = ∑ πin(C) ≤ 1, (2.7)
n=1
πi1(C) = ∑ Pi j , πin(C) = ∑ Pi j π n−1
j (C), n ≥ 2. (2.8)
j∈C j∈T
In (2.8) we use the fact that if from i, the MC goes to j in a recurrent class, then π n−1
j (C) = 0.
Rewriting (2.7) using (2.8) gives
∞ ∞ ∞
πi(C) = πi1(C) + ∑ πin(C) = πi1(C) + ∑∑ Pi j π n−1
j (C) = πi1(C) + ∑ Pi j ∑ π n−1
j (C),
n=2 n=2 j∈T j∈T n=2
πi(C) = πi1(C) + ∑ Pi j π j (C), i ∈ T. (2.9)
j∈T
57
Proof. Clearly, πin(C) = ∑k∈C πikn (C), where πikn (C) is the probability of the MC starting from
i and being absorbed at the n-th transition into class C at state k. We have
∞
πi(C) = ∑ ∑ πikv (C) ≤ 1.
v=1 k∈C
Therefore, for any ε > 0 there exist a finite number of states C0 ⊂ C and an integer N(ε) = N
s.t.
n ∞ n
|πi(C) − ∑ ∑0 πikv (C)| < ε, or |∑ ∑ πikv − ∑ ∑0 πikv | < ε, (2.10)
v=1 k∈C v=1 k∈C v=1 k∈C
58
Decomposing the events by the time of first entering some state in C, we have
n
Pinj = ∑ ∑ πikv Pkn−v
j , i ∈ T, j ∈ C.
v=1 k∈C
0
But Pkn−v n−v n−v
j ≤ 1, |Pk j − π j | ≤ 2, and limn→∞ Pk j = π j if C is aperiodic and k ∈ C (Theorems
59
1.2 & 1.3). Hence there exists N 0 > N s.t. for n > N 0, |Pkn−N
j − π j | < ε(k ∈ C0). So, for n > N 0,
n n n
|Pinj − πikv πikv + ∑ ∑ 0 πikv Pkn−v
∑ ∑0 π j| ≤ ε + 2 ∑ ∑0 j .
v=1 k∈C v=N+1 k∈C v=1 k∈C\C
However, the choice of N and C assures us that the right-hand side is ≤ 4ε. Then appealing
to (2.10) and the above result, we obtain
and therefore
lim Pinj = πi(C) lim Pjnj = πi(C)π j .
n→∞ n→∞
We emphasize the fact that if i is a transient state and j is a recurrent state, then the limit of
Pinj depends on both i and j. This is in sharp contrast with the case where i and j belong to
the same recurrent class.
60
Example (The gambler’s ruin on n + 1 states). [Note: n does not denote time here.]
1 0 0 0 ... ... ... ...
q
0 p 0 ... ... ... ...
0 q 0 p ... ... ... ...
P = .. ... ... ... ... ... ... ... .
.
... q 0 p
... 0 0 1
We shall calculate ui = πi(C0) and vi = πi(Cn), the probabilities that starting from i the process
ultimately enters the absorbing (and therefore recurrent) states3 0 and n, respectively. The
3 When the MC enters state 0, the gambler is ruined. When the MC enters state n, the gambler wins all (or his opponent is ruined)
61
system of equations (2.9) becomes
u1 = q + pu2;
ui = qui−1 + pui+1, 2 ≤ i ≤ n − 2; (2.11)
un−1 = qun−2.
We try a solution of the form ur = xr . Substituting in the middle equations and cancelling
common factors leads to px2 + q = x. This quadratic equation has two solutions: x = 1 and
x = q/p. So, ur = A + B(q/p)r , r = 1, 2, · · · , n − 1, satisfy the middle equations of (2.11) for
any A and B. We now determine A and B so that the first and the last equations hold.4 In the
case p/q 6= 1, the first equation leads to
q2
q
A+B = q+ p A+B 2 ;
p p
4 If q = p, the solution x = 1 is a double root of px2 + q = x, and one then has to replace (q/p)r by r. That is, ur = A + Br.
62
or equivalently,
A = 1 − B.
The last equation leads to
n−1 n−2!
q q
A+B = q A+B ; or pnA + qnB = 0.
p p
63
For the case q = p, similarly we find that A = 1 and B = −1/n. So,
n−r
ur = , when p = q.
n
A similar calculation shows that vi = 1 − ui, 0 ≤ i ≤ n. This is to be expected since there are
only two recurrent classes, C0 and Cn.
Consider the situation when gambler is playing against an infinitely rich adversary. The
equations for the probability of the gambler’s ruin (absorption into 0) become
u1 = q + pu2, (2.12)
ui = qui−1 + pui+1, i ≥ 2.
Again we find that, for i ≥ 0,
i
q 1
ui = A + B , for q 6= p; and ui = A + Bi, for q = p = .
p 2
64
If q ≥ p, then the condition that ui is bounded requires that B = 0 and the equation (2.12)
shows that ui ≡ 1. If q < p, we find that ui = (q/p)i.
Remark In fact, a simple passage to the limit from the finite state gambler’s ruin yields
u1 = q/p and then it readily follows that ui = (q/p)i.
65
3 Criteria for Recurrence
We prove two theorems which will be useful in determining whether a given Markov chain
is recurrent or transient.
Theorem 3.1 Let B be an irreducible Markov chain whose state space is labeled by the
nonnegative integers. Then a necessary and sufficient condition that B be transient (i.e.,
each state is a transient state) is that the system of equations
∞
∑ Pi j y j = yi, i > 0, (3.13)
j=0
66
Proof. Let the transition matrix for B be
P00 P01 . . .
P = (Pi j ) = P10 P11 . . . ,
... ...
We denote the Markov chain with transition probability matrix (3.14) by B̃. For the necessity,
we shall assume that the process is transient and then exhibit a nonconstant bounded solution
of (3.13).
67
Let fi0∗ = probability of entering state 0 in some finite time given that i is the initial state
( fi0∗ = ∑∞n=1 fi0n ). Since the process B is transient f j0 ∗
< 1 for some j > 0. (If f j0 ∗
= 1 for all
∗ ∞ n ∞ ∞ n−1 ∞ ∗ ∞
j > 0, then f00 = ∑n=1 f00 = f00 + ∑n=2 ∑ j=1 P0 j f j0 = P00 + ∑ j=1 P0 j f j0 = ∑ j=0 P0 j = 1. That
is, state 0 is recurrent, a contradiction!)
∗
For the process B̃ clearly π̃0(C0) = 1, π̃ j (C0) = f j0 < 1 for some j > 0, and π̃i(C0) =
∞ ∞
∑ j=0 P̃i j π̃ j (C0) for all i. Hence π̃i(C0) = ∑ j=0 P̃i j π̃ j (C0) for i > 0 and thus y j = π̃ j (C0)( j =
0, 1, 2, . . . ) is the desired bounded nonconstant solution.
Now assume that we have a nonconstant bounded solution {yi} of (3.13). Then
∞
∑ P̃i j y j = yi, for all i ≥ 0,
j=0
and iterating this equation repeatedly, we have, for all i > 0 and all n ≥ 1,
68
∞
∑ P̃inj y j = yi
j=0
So, yi = y0 for all i, which contradicts the fact that the y j ’s are not all equal.
69
Theorem 3.2 In an irreducible Markov chain a sufficient condition for recurrence is that
there exists a sequence {yi} such that
∞
∑ Pi j y j ≤ yi for i > 0 with yi → ∞. (3.15)
j=0
Since zi = yi + b satisfies (3.15), we may assume yi > 0 for all i ≥ 0. Iterating the preceding
inequality, we have
∞
∑ P̃imj y j ≤ yi.
j=0
70
Given ε > 0 we choose M(ε) such that 1/yi ≤ ε for i ≥ M(ε) (since yi → ∞). Now
M−1 ∞
∑ P̃imj y j + ∑ P̃imj y j ≤ yi,
j=0 j=M
and so
M−1 ∞
∑ P̃imj y j + min{yr } ∑ P̃imj ≤ yi.
r≥M
j=0 j=M
Since
∞
∑ P̃imj = 1
j=0
we have
!
M−1 M−1
∑ P̃imj y j + min
r≥M
{yr } 1− ∑ P̃imj ≤ yi .
j=0 j=0
71
Suppose B is transient, noting in B̃, 0 is an absorbing state, we see that
lim P̃inj ≤ lim Pinj = 0 for j > 0.
n→∞ n→∞
Thus, passing to the limit as m → ∞, we obtain for each fixed i > 0,
π̃i(C0)y0 + min{yr } (1 − π̃i(C0)) ≤ yi;
r≥M
1
1 − π̃i(C0) ≤ (yi − π̃i(C0)y0) ≤ εK
minr≥M {yr }
∗ n
where K = yi − π̃i(C0)y0. Letting ε → 0, π̃i(C0) = 1 for each i > 0. Hence f00 = ∑∞n=1 f00 =
n−1 ∗
f00 + ∑∞n=2 ∑∞j=1 P0 j f j0 = P00 + ∑∞j=1 P0 j f j0 = ∑∞j=0 P0 j = 1. The Markov chain is recurrent.
72
Wikipedia https://en.wikipedia.org/wiki/Markov_chain#Applications
gives brief descriptions and references of the varied applications of DTMC in Physics, Chem-
istry, Biology, Speech recognition, Information theory, Queueing theory, Search engines
(e.g., Google PageRank), Statistics (e.g., MC Monte Carlo), Economics and Finance, So-
cial sciences, Games, Music, Sports (e.g., Baseball), ...
73
2) Operations Research
3) Communication Systems
a) Probability and Random Processes for Electrical and Computer Engineers by John
A. Gubner. This book covers network traffic modeling and error correction.
b) Digital Communications: Fundamentals and Applications by Bernard Sklar. In-
cludes discussions on the use of Markov chains in communication systems.
74
a) Mathematical Models in Biology by Elizabeth A. Allman and John A. Rhodes. This
book explores biological applications of mathematical models, including Markov
chains.
b) Biostatistical Methods: The Assessment of Relative Risks by John M. Lachin. Pro-
vides an overview of statistical methods in medicine, including disease modeling.
5) Computer Science
a) Algorithms by Robert Sedgewick and Kevin Wayne. This textbook covers algorithms
related to Markov chains and their applications in computer science.
b) Probability and Computing: Randomized Algorithms and Probabilistic Analysis by
Michel X. Goemans, David P. Williamson, and Stephen A. Vavasis. Discusses ran-
domized algorithms and Markov chains.
75
a) Markov Chains and Stochastic Stability by Sean P. Meyn and Richard L. Tweedie.
This book covers Markov chains in decision-making contexts and game theory.
b) Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G.
Barto. Covers applications of Markov decision processes in reinforcement learning.
8) Social Sciences
a) Social Network Analysis: Methods and Applications by Wasserman and Faust. Cov-
ers applications of Markov chains in social network analysis.
76
b) Behavioral Modeling and Simulation: From Individuals to Societies by Roger L.
Cooke. Discusses modeling human behavior and social systems using Markov chains.
a) Robotics: Modelling, Planning and Control by Bruno Siciliano and Lorenzo Sciav-
icco. Covers the application of Markov chains in robotics and control systems.
b) Markov Chains: Theory and Applications by R. R. Yager and D. P. Sen. Provides a
comprehensive overview of Markov chains, including their application in robotics.
a) Ecological Models and Data in R by Benjamin M. Bolker. This book includes dis-
cussions on using Markov chains for ecological modeling.
77
b) Climate Change: The Science of Global Warming and Our Energy Future by Ed-
mond A. Mathez. Discusses environmental modeling, including applications of
Markov chains.
78
79
Chapter 4
Classical Examples of
Continuous Time Markov
Chains
80
1 Poisson Processes and General Pure Birth Processes
In this section, we consider a family of random variables {X(t); 0 ≤ t < ∞} where the possi-
ble values of X(t) are the nonnegative integers. We shall restrict attention to the case where
{X(t)} is a Markov process with stationary transition probabilities. Thus, the transition prob-
ability function for t > 0,
is independent of u ≥ 0.
81
ii. P{X(t + h) − X(t) = 0|X(t) = x} = 1 − λ h + o(h) as h ↓ 0.
iii. X(0) = 0.
The o(h) symbol means that if we divide this term by h then its value tends to zero as h
tends to zero. Notice that the right-hand side is independent of x.
82
Consider a sequence of positive numbers, {λi}. We define a pure birth process as a
Markov process satisfying the postulates:
83
Next we will find Pn(t) = P{X(t) = n}, assuming X(0) = 0. Note that
∞ ∞
1= ∑ Pm(h) = 1 − λ0h + o2,0(h) + λ0h + o1,0(h) + ∑ Pm(h),
m=0 m=2
so
∞
∑ Pm(h) = o(h).
m=2
This gives
∞
P(h) = ∑ Pm(h) = λ0h + o(h).
m=1
Markov property shows
84
Letting h → 0 gives
P0(t) = e−λ0t .
∞
Pn(t + h) = ∑ Pk(t)P{X(t + h) = n|X(t) = k}
k=0
∞
= ∑ Pk(t)P{X(t + h) − X(t) = n − k|X(t) = k}.
k=0
85
For k = 0, 1, · · · , n − 2 (i.e., n − k ≥ 2), we have
P{X(t + h) − X(t) = n − k|X(t) = k}
≤ P{X(t + h) − X(t) ≥ 2|X(t) = k}
= 1 − P{X(t + h) − X(t) = 0|X(t) = k} − P{X(t + h) − X(t) = 1|X(t) = k}
= o1,k (h) + o2,k (h).
So
Pn(t + h) = Pn(t)(1 − λnh + o2,n(h))
n−2
+ Pn−1(t)(λn−1h + o1,n−1(h)) + ∑ Pk (t)[o1,k (h) + o2,k (h)].
k=0
Or
Pn(t + h) − Pn(t) = Pn(t)(−λnh + o2,n(h))
+ Pn−1(t)(λn−1h + o1,n−1(h)) + on(h).
86
Dividing by h and letting h & 0, we obtain
Pn0 (t) = −λnPn(t) + λn−1Pn−1(t), n ≥ 1, (1.1)
P00 (t) = −λ0P0(t),
with boundary conditions
P0(0) = 1, Pn(0) = 0, n > 0.
Some characteristics of the process
Let Tk denote the time between the kth birth and the (k + 1)st birth, so that
n−1 n
Pn(t) = P{ ∑ Ti ≤ t < ∑ Ti}.
i=0 i=0
The random variables Tk ’s are called the “waiting times" between births, and
k−1
Sk = ∑ Ti = the time at which the kth birth occurs.
i=0
87
Since P0(t) = e−λ0t , P{T0 ≤ z} = 1 − P{X(z) = 0} = 1 − e−λ0z, i.e., T0 has an exponential
distribution with parameter λ0. We can show that (i) Tk has an exponential distribution with
parameter λk . and (ii) Tk ’s are independent.
Characteristic function of Sn, φn(t).
n−1 n−1
itSn it ∑n−1 λk
φn(t) = Ee = Ee k=0 Tk = ∏ Ee itTk
=∏ .
k=0 k=0 λk − it
88
Assuming independence and no interaction among members of the population, then
n
P{X(t + h) − X(t) = 1|X(t) = n} = [β h + o(h)][1 − β h + o(h)]n−1 = nβ h + on(h).
1
That is, in this example, λn = nβ . The system of equations (1.1), in the case that N = 1,
becomes
P1(0) = 1, Pn(0) = 0, n ≥ 2.
The solution is
89
The generating function may be determined easily by summing a geometric series. We have
∞
n −βt
∞
−βt n−1 se−βt
f (s) = ∑ Pn(t)s = se ∑ [(1 − e )s)] =
1 − (1 − e−βt )s
.
n=1 n=1
Case X(0) = N Since we have assumed independence and no interaction among the mem-
bers, we may view this population as the sum of N independent Yule processes, each begin-
ning with a single member. Let
∞
PN,n(t) = P{X(t) = n|X(0) = N} and fN (s) = ∑ PN,n(t)sn = [ f (s)]N
n=N
we have
−βt
N ∞
se −βt N m + N − 1 −βt m m
[ f (s)]N = = (se ) ∑ (1 − e ) s
1 − (1 − e−βt )s m=0 m
∞
n−1
= ∑ (e−βt )N (1 − e−βt )n−N sn,
n=N n − N
90
where we have used the binomial series (1 − x)−N = ∑∞m=0 m+N−1
m
m x . The coefficient of sn
in this expression must be
n−1
PN,n(t) = (e−βt )N (1 − e−βt )n−N f or n ≥ N.
n−N
iωX(t)
∞
e−λt (λt)neiωn
φt (ω) = E{e }= ∑ = exp[λt(eiω − 1)]
n=0 n!
91
In our discussion of the pure birth process we showed that
P(T0 ≤ x) = 1 − exp(−λ0x)
and mentioned that Tk follows an exponential distribution with parameter λk and that the
Tk ’s are independent. For the Poisson process, however, λk = λ for all k, so that the result
becomes
Theorem 2.1 The waiting times Tk are independent and identically distributed following an
exponential distribution with parameter λ .
92
Consider the times {Si} at which changes of X(t) occur, i.e.,
i−1
Si = ∑ Tk.
k=0
n!
Z s1 Z sn−1 Z sn
P{Si ≤ si, i = 1, . . . , n|X(t) = n} = n ··· dxndxn−1 . . . dx1,
t 0 xn−2 xn−1
which is the distribution of the order statistics from a sample of n observations taken from
the uniform distribution on [0,t].
93
Proof.
P{Si ≤ si, i = 1, . . . , n, X(t) = n}
n−1 n
=P{T0 ≤ s1, T0 + T1 ≤ s2, . . . , ∑ Ti ≤ sn, ∑ Ti > t}
i=0 i=0
Z s1 Z s2 −t1 Z sn −∑n−1
i=1 ti
Z ∞
n+1
= ··· λ n+1e−λ ∑i=1 ti dtn+1 . . . dt1
0 0 0 t−∑ni=1 ti
sn −∑n−1
s1 Z s2 −t1
∞
i=1 ti 1
Z Z
−λ ∑ni=1 ti
=λ n+1 ··· e − exp(−λtn+1) dtn . . . dt1
0 0 0 λ t−∑n i=1 ti
Z s1 Z s2 −t1 Z sn −∑n−1
i=1 ti
n −λt
=λ e ··· dtn . . . dt1
0 0 0
Z s1 Z s2 Z sn j
n −λt
=λ e ··· dun . . . du1 (Let u j = ∑ ti, j = 1, . . . , n)
0 u1 un−1 i=1
94
But
n
−λt (λt)
P{X(t) = n} = e ;
n!
Hence
P{Si ≤ si, i = 1, . . . , n, X(t) = n}
P{Si ≤ si, i = 1, . . . , n|X(t) = n} =
P{X(t) = n}
95
(C) Binomial distribution
For v < t and k < n
A second example in which the binomial distribution plays a part may be given by con-
96
sidering two independent Poisson processes X1(t) and X2(t) with parameters λ1 and λ2 .
P(X1(t) = k, X2(t) = n − k)
P(X1(t) = k|X1(t) + X2(t) = n) =
P(X1(t) + X2(t) = n)
[exp(−λ1t)(λ1t)k /k!][exp(−λ2t)(λ2t)n−k /(n − k)!]
=
exp[−(λ1 + λ2)t](λ1 + λ2)nt n/n!
n λ1 k λ2 n−k
=
k λ1 + λ2 λ1 + λ2
97
3 Birth and Death Processes
One of the obvious generalizations of the pure birth processes is to permit X(t) to decrease
as well as increase.
(A) Postulates
As in the case of the pure birth processes we assume that X(t) is a Markov process on the
states 0, 1, 2, . . . . and that its transition probabilities Pi j (t) are stationary, i.e., Pi j (t) = P{X(t +
s) = j|X(s) = i}. In addition we assume that the Pi j (t) satisfy
4. Pi j (0) = δi j ;
98
5. µ0 = 0, λ0 > 0, µi, λi > 0, i = 1, 2 . . . .
The matrix
−λ0 λ0 0 0 ···
µ −(λ + µ ) λ1 0 ···
1 1 1
A= 0 −(λ2 + µ2) ··· . (3.2)
µ2 λ2
0 0 −(λ3 + µ3) ···
µ3
... ... ... ...
is called the infinitesimal generator of the process. The λi and µi are called, the infinitesimal
birth and death rates respectively. In Postulates 1 and 2 we are assuming that if the process
starts in state i, then in a small interval of time the probabilities of the population increasing
or decreasing by 1 are essentially proportional to the length of the interval.
99
Since Pi j (t) are probabilities, we have Pi j (t) ≥ 0 and
∞
∑ Pi j (t) = 1. (3.3)
j=0
Using the Markov property of the process we may derive the Chapman-Kolmogorov equation
∞
Pi j (t + s) = ∑ Pik(t)Pk j (s). (3.4)
k=0
This equation states that in order to move from state i to j in time t + s, X(t) moves to some
state k in time t and then from k to j in the remaining time s.
In order to obtain the probability that X(t) = n, we must specify the probability distribu-
tion for the initial state. We have
∞
P (X(t) = n) = ∑ qiPin(t),
i=0
100
where qi = P (X(0) = i).
(B) Waiting times
With the aid of the preceding assumptions we may calculate the distribution of the r.v. Ti,
which is the waiting time of X(t) in state i; i.e., given that the process is in state i, what is the
distribution of Ti until it first leaves i? If we set
P(Ti ≥ t) = Gi(t),
Gi(t + h)
= P(X(s) = i, 0 ≤ s ≤ t + h|X(0) = i)
= P(X(s) = i, 0 ≤ s ≤ t + h|X(s) = i, 0 ≤ s ≤ h)P(X(s) = i, 0 ≤ s ≤ h|X(0) = i)
= Gi(t)Gi(h) = Gi(t)[Pii(h) + o(h)]
= Gi(t)[1 − (λi + µi)h] + o(h),
101
or
Gi(t + h) − Gi(t)
= −(λi + µi)Gi(t) + o(1),
h
so that
So Ti follows an exponential distribution with mean (λi + µi)−1. The proof presented above
is. not quite complete, since we have used the intuitive relationship Gi(h) = Pii(h) + o(h)
without a formal proof.
According to Postulates (1), (2), during a time duration of length h a transition occurs
from state i to i + 1 with pr. λih + o(h) and from state i to i − 1 with pr. µih + o(h). Intuitively,
102
given that a transition occurs at time t, the pr. that the transition is to state i + 1 is λi/(λi + µi)
and to state i − 1 is µi/(λi + µi).
It leads to an important characterization of a birth and death process: The process sojourns
in a given state i for a random length of time whose distribution function is an exponential
distribution with parameter λi + µi. When leaving state i the process enters either i + 1 or
i − 1 with pr. λi/(λi + µi) and µi/(λi + µi) respectively. The motion is analogous to that of a
random walk except that transitions occur at random times rather than at fixed time periods.
We determine realizations of the process as follows. Suppose X(0) = i; the particle spends
a random length of time, exponentially distributed with parameter λi + µi, in state i and
subsequently moves in pr. λi/(λi + µi) to state i + 1 and with pr. µi/(λi + µi) to state i −
1. Next, the particle sojourns a random length of time in the new state and then moves to
one of its neighboring states, and so on. More specifically, we observe a value t1 from the
exponential distribution with parameter λi + µi that fixes the initial sojourn time in state i.
Then we toss a coin with pr. of heads pi = λi/(λi + µi). If heads (tails) appear, we move
103
the particle to state i + 1(i − 1). In state i + 1 we observe a value t2 from the exponential
distribution with parameter λi+1 + µi+1 that fixes the sojourn time in the 2nd state visited.
If the particle at the first transition enters state i − 1, the subsequent sojourn time t20 is an
observation from the exponential distribution with parameter λi−1 + µi−1. After the 2nd wait
is completed, a Bernoulli trial is performed that chooses the next state to be visited, and the
process continues in the same way. The process obtained in this manner is called the minimal
process associated with the infinitesimal matrix A defined in (3.2).
104
equations. These are given by
∞
Pi j (t + h) = ∑ Pik(h)Pk j (t)
k=0
= Pi,i−1(h)Pi−1, j (t) + Pi,i(h)Pi, j (t) + Pi,i+1(h)Pi+1, j (t) (4.7)
+ ∑ Pi,k(h)Pk, j (t).
k: |k−i|>1
105
Using Postulates 1, 2, and 3 of Section 3, we obtain
so that
Pi j (t + h) = µihPi−1, j (t) + [1 − (λi + µi)h]Pi j (t) + λihPi+1, j (t) + o(h).
Transposing the term Pi j (t) to the left-hand side and dividing the equation by h, we obtain,
after letting h ↓ 0,
106
The backward equations are deduced by decomposing the time interval (0,t + h), where h is
positive and small, into the two periods
and examining the transition in each period separately. In this sense the backward equations
result from a “first step analysis," the first step being over the short time interval of duration
h.
A different result arises from a “last step analysis," which proceeds by splitting the time
interval (0,t + h) into the two periods
(0,t), (t,t + h)
and adapting the proceeding reasoning. From this viewpoint, under more stringent conditions
107
we can derive a further system of differential equations
Pi00 (t) = −λ0Pi,0(t) + µ1Pi1(t), (4.8)
Pi0j (t) = λ j−1Pi, j−1(t) − (λ j + µ j )Pi j (t) + µ j+1Pi, j+1(t), j ≥ 1,
with the same initial condition Pi j (0) = δi j . These are known as the forward Kolmogorov
differential equations. To derive these equations we interchange t and h in equation (4.7),
and under stronger assumptions in addition to Postulates (1), (2), and (3) it can be shown that
the last term is again o(h). The remainder of the argument is the same as before.
A sufficient condition that (4.8) hold is that [Pk j (h)]/h = o(1) for k 6= j, j − 1, j + 1, where
the o(1) term apart from tending to zero is uniformly bounded with respect to k for fixed j as
h → 0. In this case it can be proved that ∑0k Pi,k (t)Pk, j (h) = o(h).
Example Linear Growth with Immigration A birth and death process is called a linear
growth process if λn = λ n + a and µn = µn with λ > 0, µ > 0, and a > 0. Such processes
occur naturally in the study of biological reproduction and population growth. If the state n
108
describes the current population size, then the average instantaneous rate of growth is λ n + a.
Similarly, the probability of the state of the process decreasing by one after the elapse of a
small duration of time h is µnh + o(h). The factor λ n represents the natural growth of the
population owing to its current size, while the second factor a may be interpreted as the
infinitesimal rate of increase of the population due to an external source such as immigration.
The component µn, which gives the mean infinitesimal death rate of the present population,
possesses the obvious interpretation.
If we substitute the above values of λn and µn in (4.8), we obtain
Pi00 (t) = −aPi0(t) + µPi1(t),
Pi0j (t) = [λ ( j − 1) + a]Pi, j−1(t) − [(λ + µ) j + a]Pi j (t) + µ( j + 1)Pi, j+1(t), j ≥ 1.
Now, if we multiply the jth equation by j and sum, it follows that the expected value
∞
EX(t) = M(t) = ∑ jPi j (t)
j=1
109
satisfies the differential equation
M 0(t) = a + (λ − µ)M(t),
M(t) = at + i if λ = µ,
and
a
M(t) = {e(λ −µ)t − 1} + ie(λ −µ)t if λ 6= µ. (4.9)
λ −µ
The second moment, or variance, may be calculated in a similar way. It is interesting to note
that M(t) → ∞ as t → ∞ if λ ≥ µ, while if λ < µ, the mean population size for large t is
approximately
a
.
µ −λ
110
Chapter 5
Brownian Motion
1 Background Material
Brownian motion process is an example of a continuous time, continuous state space Markov
process. In this course, we confine ourselves to one-dimensional process.
111
Let X(t) be the x component of a particle in Brownian motion. Let x0 be the position of
the particle at time t0, i.e., X(t0) = x0 . Let p(x,t|x0) represent the conditional probability
density of X(t + t0), given that X(t0) = x0. We have
Z ∞
p(x,t|x0) ≥ 0, and p(x,t|x0)dx = 1. (1.1)
−∞
Further, we stipulate that, for small t, X(t + t0) is likely to be near X(t0) = x0. So we require
∂p ∂2p
= D 2. (1.2)
∂t ∂x
This is called the diffusion equation, and D is the diffusion coefficient. If we choose D = 12 ,
112
then
(x − x0)2
1
p(x,t|x0) = √ exp −
2πt 2t
is a solution of (1.2)
113
(c) X(0) = 0 and X(t) is continuous at t = 0.
Theorem 2.1 The conditional density of X(t) for t1 < t < t2 given X(t1) = A and X(t2) = B
(t2 −t)(t−t1 )
is a normal density with mean A + tB−A
2 −t1
(t − t1 ) and variance t2 −t1 . (We assume σ = 1 in
(a).)
Proof. Making use of the fact that Z1 := X(t1), Z2 := X(t) − X(t1), Z3 := X(t2) − X(t) are,
respectively, independent normal distributed rvs with means 0 and variances t1,t − t1 and
t2 − t, the joint density of X(t1), X(t), X(t2) is
2
(y − x)2 (z − y)2
1 x
f (x, y, z) = p exp − − − .
(2π)3/2 t1(t − t1)(t2 − t) 2t1 2(t − t1) 2(t2 − t)
Similarly, the joint density of X(t1), X(t2) is
2
(z − x)2
1 x
f (x, z) = p exp − − .
2π t1(t2 − t1) 2t1 2(t2 − t1)
114
So the conditional density of X(t) given X(t1), X(t2) is (after some straightforward but tedious
algebra)
(y − a)2
f (x, y, z) 1
f (y|x, z) = =√ exp − ,
f (x, z) 2πc 2c
(t2 −t)(t−t1 )
where a = x + t2z−x
−t1 (t − t1 ), c = t2 −t1 . Letting x = A and z = B gives the result.
115
Reflection principle
Consider the collection of sample paths X(t), 0 ≤ t ≤ T, X(0) = 0, with the property that
X(T ) > a (a > 0). Since X(t) is continuous and X(0) = 0, there exists a time τ at which X(t)
first attains the value a.
For t > τ, we “reflect” X(t) about the line x = a to obtain
(
X(t), for t < τ,
X̃(t) =
a − [X(t) − a], for t > τ.
Note that X̃(T ) < a. Because the probability law of the path for t > τ, given X(τ) = a, is
symmetrical with respect to the values x > a and x < a and independent of the history prior
to time τ, the reflection argument displays for every sample path with X(T ) > a two sample
paths X(t) and X̃(t) with the same probability of occurrence such that
116
Conversely, by the nature of this correspondence every sample path X(t) for which max0≤t≤T X(t) ≥
a results from either of two sample functions X(t) with equal probability, one of which is such
that X(T ) > a, unless X(T ) = a (but note that P(X(T ) = a) = 0):
2 ∞
Z
P{ max X(t) ≥ a} = 2P{X(T ) > a} = √ exp(−x2/2T )dx. (3.3)
0≤t≤T 2πT a
With the help of (3.3) we may determine the distribution of the first time of reaching a > 0
subject to the condition X(0) = 0. Let Ta denote the time at which X(t) first attains the value
a where X(0) = 0. Then, clearly
117
The density function of the random variable Ta is obtained by differentiating (3.5) with re-
spect to t. Thus
a −3/2 a2
fTa (t|X(0) = 0) = √ t exp − . (3.6)
2π 2t
Because of the symmetry and spatial homogeneity of the Brownian motion process we infer
for the distribution (3.6) that for a > 0,
P min X(u) ≤ 0|X(0) = a =P max X(u) ≥ 0|X(0) = −a
0≤u≤t 0≤u≤t
=P max X(u) ≥ a|X(0) = 0 = P(Ta ≤ t)
0≤u≤t
Z t 2
a a
=√ u−3/2 exp − du. (3.7)
2π 0 2u
118
Another way to express the result of (3.7) is as follows: If X(t0) = a then the probability P(a)
that X(t) has at least one zero between t0 and t1 is
t1 −t0
2
|a| a
Z
P(a) = √ u−3/2 exp − du = P(T|a| ≤ t1 − t0). (3.8)
2π 0 2u
Let’s calculate the probability α that if X(0) = 0 then X(t) vanishes at least once in the
interval (t0,t1).
In fact, we condition on the possible values of X(t0). Thus, if |X(t0)| = a then the proba-
bility that X(t) vanishes in the interval (t0,t1) is P(a). By the law of total probabilities
r 2
∞ 2 ∞ a
Z Z
α= P(a)P(|X(t0)| = a|X(0) = 0)da = P(a) exp − da.
0 πt0 0 2t0
119
Substituting from (3.8) and then interchanging the order of integration yields
r 2 Z t −t 2
2 a a a
Z ∞ 1 0
α= exp − √ u−3/2 exp − du da
πt0 0 2t0 2π 0 2u
Z t1 −t0 Z ∞ 2
1 a 1 1
= √ u−3/2 a exp − + da du.
π t0 0 0 2 u t0
The inner integral can be integrated exactly and after simplifying we get
√ Z t −t
t0 1 0 du
α= √ .
π 0 (t0 + u) u
120
which we may write by virtue of some standard trigonometric relations in the form
r r
t0 πα 2 t0
= cos or α = arccos .
t1 2 π t1
In summary,
Theorem 3.1 The probability that X(t) has at least one zero in the interval (t0,t1), given
X(0) = 0, is
r
2 t0
α = arccos .
π t1
Applying the “reflection principle”, we can solve the following problem: Determine
At (x, y) = P(X(t) > y, min X(u) > 0|X(0) = x), x, y > 0. (3.9)
0≤u≤t
121
We start with
where reflection principle is applied to the last term. Figure 2 is the appropriate picture to
guide the analysis. We deduce that
122
Inserting (3.11) in (3.10) yields
where p(u,t) = (2πt)−1/2 exp(−u2/2t) is the transition probability density function for the
Brownian motion process.
123
4 Variations and Extensions
We claim that if X(t) is a standard Brownian motion process, then each of the following
processes is a version of standard Brownian motion:
• (
tX(1/t), f or t > 0,
X2(t) :=
0, f or t = 0,
We can verify the conditions in Definition 2.1. For i = 1, 2 or 3, (i) the increment, Xi(t + s) −
Xi(t), is normally distributed with zero mean; and (ii) the increments over disjoint time inter-
vals are clearly independent r.v’s. It remains to verify that the variances of these processes
124
equal to t:
2
t +s s
2 t +s s
E[{X1(s + t) − X1(s)}2] = c2 E X − X = c − = t,
c2 c2 c2 c2
2
1 1
E[{X2(s + t) − X2(s)}2] = E sX − (t + s)X
s t +s
h 1
1
i
2 h 1 i2
= s2 E X −X + t 2E X
s t +s t +s
1 1 1
= s2 − + t2 = t,
s t +s t +s
E[{X3(s + t) − X3(s)}2] = E[{X(s + t + h) − X(s + h)}2] = t.
To complete the proof that these processes are versions of a standard Brownian Motion, it is
necessary to check that each Xi(t) is continuous at the origin. This is obviously true for X1(t)
and X3(t), but needs some arguments for X2(t). Equivalently, in the latter case it is enough to
125
show that
X(t)
P lim = 0|X(0) = 0 = 1.
t→∞ t
126
For any 0 ≤ t0 < t1 < · · · < tn and s > 0,
127
(ii) and
∞ 1
Z
E[Y (t)] = E|X(t)| = |y| √ exp(−y2/2t))dy
−∞ 2πt
1
Z ∞
= 2 y√ exp(−y2/2t))dy
r0 2πt
2t
= ,
π
and
E[Y 2(t)] = EX 2(t) = t.
128
(B) Brownian Motion absorbed at the origin
Suppose the initial value of a Brownian motion process is X(0) = x, where x > 0. Let τ
be the first time the process reaches zero. Brownian motion absorbed at the origin is defined
as
(
X(t), for t ≤ τ,
Z(t) =
0, for t > τ.
129
For any 0 < t0 < · · · < tn, y > 0,
Under the condition Z(0) = x > 0, Z(t) is a random variable whose distribution has a discrete
130
part and a continuous part. The discrete part is
Z 2x
P(Z(t) = 0|Z(0) = x) =1 − At (x, 0) = 1 − p(u − x,t)du
0
Z x Z ∞ Z ∞
=1 − p(u,t)du = 2 p(u,t)du = 2 p(u + x,t)du.
−x x 0
For 0 < a < b,
P(a < Z(t) < b|Z(0) = x) = At (x, a) − At (x, b)
Z b Z b+2x
= p(u − x,t)du − p(u − x,t)du
a a+2x
Z b
= [p(u − x,t) − p(u + x,t)]du.
a
Thus the transition probability density function for the continuous part of the absorbed Brow-
nian motion is
pt (x, y) = p(y − x,t) − p(y + x,t).
131
(C) Brownian Motion with drift
Let {X̃(t),t > 0} be a Brownian motion process. Brownian motion with drift is a stochas-
tic process having the distribution of
where µ is a constant, called the drift parameter. Alternatively, we may describe a Brownian
motion with drift in a manner that parallels Definition 2.1.
132
Definition 4.1 A Brownian motion with drift parameter µ is a stochastic process {X(t);t >
0} with the following properties:
(a) Every increment X(t + s) − X(s) ∼ N(µt, σ 2t), where σ is a fixed constant.
(b) For every pair of disjoint time intervals [t1,t2], [t3,t4], say, t1 < t2 < t3 < t4, the increments
X(t4) − X(t3) and X(t2) − X(t1) are independent.
133
We have
134
(D) Geometric Brownian Motion
Let {X(t),t ≥ 0} be a Brownian motion process with drift µ and diffusion coefficient σ 2.
Geometric Brownian motion is defined as
Y (t) = eX(t), t ≥ 0.
2 /2
Recall the moment generating function of a standard normal, Z, is given by et . It follows
that
2 /2)
E[Y (t)|Y (0) = y] =yE(eX(t)−X(0)) = y et(µ+σ ;
2
E[Y (t)2|Y (0) = y] =y2E(e2(X(t)−X(0))) = y2 et(2µ+2σ ).
135
Chapter 6
Renewal Processes
1 Definition of a Renewal Process and Related Concepts
A renewal process {N(t),t > 0} is a nonnegative integer-valued stochastic process that reg-
isters the successive occurrences of an event during the time interval (0,t], where the time
136
durations between consecutive “events" are positive, independent, and identically distributed
random variables. Let the successive occurrence times between events be {Xk }∞k=1 (represent-
ing the lifetimes of some units successively placed into service) such that Xi is the elapsed
time from the (i − 1)st event until the occurrence of the ith event. We write
for the common probability distribution of {Xk }. A basic stipulation for renewal processes is
F(0) = 0, signifying that the Xk ’s are positive random variables.
Let S0 = 0, and Sn = ∑ni=1 Xi (n ≥ 1) denotes the waiting time until the occurrence of the
nth event. So
The principal objective of renewal theory is to derive properties of certain random variables
associated with {N(t)} and {Sn} from the inter-occurrence distribution F. For example, it
137
is of significance and relevance to compute the expected number of renewals for the time
duration (0,t]:
EN(t) = M(t)
is called the renewal function. To this end, several pertinent relationships and formulas are
worth recording. In principle, the probability law of Sn = X1 + X2 + · · · + Xn can be calculated
in accordance with the convolution formula
P{Sn ≤ x} = Fn(x),
The fundamental link between the waiting time process {Sn} and the renewal counting
138
process {N(t)} is the observation that
N(t) ≥ k if and only if Sk ≤ t. (1.1)
That is, equation (1.1) asserts that the number of renewals up to time t is at least k if and only
if the kth renewal occurred on or before time t.
It follows from (1.1) that, for t > 0 and k = 1, 2, · · · ,
P{N(t) ≥ k} = P{Sk ≤ t} = Fk (t), (1.2)
and consequently,
P{N(t) = k} = P{N(t) ≥ k} − P{N(t) ≥ k + 1} = Fk (t) − Fk+1(t). (1.3)
For the renewal function, M(t) = EN(t), we sum the tail probabilities to derive EN(t) =
∞
∑ P{N(t) ≥ k}, and then use (1.2) to obtain
k=1
∞ ∞ ∞
M(t) = EN(t) = ∑ P{N(t) ≥ k} = ∑ P{Sk ≤ t} = ∑ Fk(t). (1.4)
k=1 k=1 k=1
139
A number of random variables are of interest in renewal theory. Three of these are the
excess/residual life (also called the excess random variable), the current life (also called the
age random variable), and the total life:
• Total life: βt = γt + δt .
140
ryless property of the exponential distribution serves decisively in yielding the explicit com-
putation of a number of functionals of the Poisson renewal process.
The Renewal Function Since N(t) has a Poisson distribution, then
(λt)k e−λt
P{N(t) = k} = , k = 0, 1, · · · ,
k!
and
M(t) = EN(t) = λt.
Excess Life Observe that the excess life at time t exceeds x if and only if there are no
renewals in the interval (t,t + x]. This event has the same probability as that of no renewals
in the interval (0, x], since a Poisson process has stationary independent increments. In formal
terms, we have
141
Thus, in a Poisson process, the excess life possesses the same exponential distribution
(
1 − e−λ x for 0 ≤ x < t,
P{δt ≤ x} = (2.7)
1 for t ≤ x.
R∞
Mean Total Life Using the evaluation EX = 0 P(X > x)dx for the mean of a nonnegative
142
random variable, we have
143
Let us reexamine the definition of the total life βt with a view to provide an intuitive ba-
sis for the seeming discrepancy. First, an arbitrary time point t is fixed. Then βt measures
the length of the renewal interval containing the point t. Such a procedure will, with higher
likelihood, favor a longer renewal interval than one of shorter interval. This phenomenon is
known as length biased sampling and occurs in a number of sampling situations.
Joint Distribution of γt and δt The joint distribution of γt and δt is determined in the same
manner as the marginal. In fact, for any x > 0 and 0 < y < t, the event {γt > x, δt > y} occurs
if and only if there are no renewals in the interval (t − y,t + x], which has probability e−λ (x+y).
Thus
(
e−λ (x+y), if x > 0, 0 < y < t,
P{γt > x, δt > y} = (2.8)
0, if y ≥ t.
For the Poisson process, observe that γt and δt are independent, since their joint distribution
factors as the product of their marginal distributions.
144
3 Renewal Equations and the Elementary Renewal
Theorem
A. The renewal function
Note that
Z t
Fn(t) = Fn−m(t − ξ )dFm(ξ ) ≤ Fn−m(t)Fm(t), 1 ≤ m ≤ n − 1.
0
So
Aim: M(t) = ∑∞k=1 Fk (t) < ∞ for any t. From (3.9), our purpose is to show that there must
exist r such that Fr (t) < 1 for each t > 0.
145
Since Xi are positive random variables with F(0+) = 0,
r
∑ Xi → ∞ a.s.
i=1
as r → ∞. So we conclude that for each t > 0 there must exist r such that Fr (t) < 1.
Conclusions For any given t > 0,
• Fn(t) → 0 as n → ∞,
146
R t R t−y
dB(z) dA(y) = 0t 0t−z dA(y)dB(z) = B ∗ A(t). So ∗ is a com-
R R
Note that A ∗ B(t) = 0 0
mutative operation.
We next show that the renewal function M(t) satisfies the equation
Z t
M(t) = F(t) + M(t − y)dF(y) = F(t) + F ∗ M(t), t ≥ 0,
0
This identity will be proved by the renewal argument: By conditioning on the time, X1, of the
first renewal and counting the expected number of renewals thereafter.
147
Note that, for t < x, E(N(t)|X1 = x) = 0; and for t ≥ x,
∞
E(N(t)|X1 = x) = 1 + ∑ P(Sk ≤ t|X1 = x)
k=2
!
∞ k
= 1+ ∑ P ∑ Xi ≤ t − x|X1 = x
k=2 i=2
!
∞ k
= 1+ ∑ P ∑ Xi ≤ t − x
k=2 i=2
= 1 + EN(t − x) = 1 + M(t − x).
148
Applying the law of total probability yields
Z t
M(t) = EN(t) = E(N(t)|X1 = x)dF(x)
0
Z t
= (1 + M(t − x))dF(x)
0
Z t
= F(t) + M(t − x)dF(x).
0
Much of the power of renewal theory derives from the preceding method of reasoning that
views the process starting anew at the occurrence of the first event.
Renewal Equations An integral equation of the form
Z t
A(t) = a(t) + A(t − x)dF(x), t ≥ 0,
0
is called a renewal equation. Here a(t) and F(x) are known and A(t) is unknown.
149
Without loss of ambiguity, we will employ the notation B ∗ c(t) for convolution of a
function c(t) (assumed reasonably smooth and bounded on finite intervals) with an increasing
right-continuous function B(t) with B(0) = 0 to stand for
Z t
B ∗ c(t) = c(t − τ)dB(τ).
0
150
Theorem 3.1 Suppose a is a bounded function. There exists one and only one function A
bounded on finite intervals that satisfies
Z t
A(t) = a(t) + A(t − y)dF(y). (3.10)
0
This function is
Z t
A(t) = a(t) + a(t − x)dM(x), (3.11)
0
Proof. We verify first that the function A defined by (3.11) fulfills the boundedness property
and solves (3.10). Because a is a bounded function and M is nondecreasing and finite, for
151
every T > 0, we have
Z t
sup |A(t)| ≤ sup |a(t)| + sup |a(y)| dM(x)
0≤t≤T 0≤t≤T 0 0≤t≤T
= sup |a(t)|(1 + M(T )) < ∞,
0≤t≤T
establishing that the function, defined by (3.11), is bounded on finite intervals. To check that
152
A(t) of (3.11) satisfies (3.10), we have
where we applied Fk = F ∗ Fk−1 in the second last equality. It remains to verify the uniqueness
of A. This is done by showing that any solution of the renewal equation (3.10), bounded in
153
finite intervals, is represented by (3.11).
A = a + F ∗ A = a + F ∗ (a + F ∗ A) = a + F ∗ a + F ∗ (F ∗ A)
!
n−1
= a + F ∗ a + F2 ∗ A = a + ∑ Fk ∗ a + Fn ∗ A.
k=1
Rt
≤ sup0≤y≤t |A(t −y)|·Fn(t) → 0 and limn→∞ ∑n−1
Note that |Fn ∗A(t)| = 0 A(t − y)dFn(y) k=1 Fk ∗
a(t) = (∑∞k=1 Fk ) ∗ a(t) = M ∗ a(t) by the boundedness of a(t). The proof is complete.
Another identity:
At first glance, this identity resembles the formula for the mean of a random sum, which
asserts that E[X1 + · · · + XN ] = EX1EN when N is an integer-valued random variable that is
independent of X1, X2, · · · . The random sum approach cannot be applied here because the
random number of summands, N(t) + 1, is not independent of the summands themselves.
154
Indeed, in Section 2, when the Poisson process is viewed as a renewal process, we show that
the last summand XN(t)+1 has a mean that approaches twice the unconditional mean µ = EX1
for t large. For this reason, it is not correct, in particular, that ESN(t) can be evaluated as the
product of EX1 and EN(t). In view of this comment, the identity expressed in equation (3.12)
becomes more intriguing and remarkable.
To derive (3.12) we will use a renewal argument to establish a renewal equation for A(t) =
ESN(t)+1. As usual, we condition on the time of the first renewal X1 = x, and distinguish two
contingencies: (i) when x > t, so that N(t) = 0, and SN(t)+1 = X1 = x; and (ii) when x ≤ t,
ESN(t)+1 = x + A(t − x). Therefore,
Z ∞ Z t Z ∞
A(t) = E(SN(t)+1|X1 = x)dF(x) = (x + A(t − x))dF(x) + xdF(x)
0 0 t
Z ∞ Z t Z t
= xdF(x) + A(t − x)dF(x) = EX1 + A(t − x)dF(x)
0 0 0
Thus A(t) satisfies a renewal equation in which a(t) = constant EX1. By Theorem 3.1, A(t) =
155
Rt
a(t) + 0 a(t − x)dM(x) = (M(t) + 1) · EX1.
Theorem 3.2 (Elementary renewal theorem) Let {N(t),t ≥ 0} be a renewal process gener-
ated by the inter-occurrence times with finite mean. Then
M(t) 1
lim = .
t→∞ t µ
Proof. By definition, t < SN(t)+1. By (3.12), we have
t < ESN(t)+1 = µ[1 + M(t)],
and therefore
t −1M(t) > µ −1 − t −1.
156
It follows that
lim inf t −1M(t) ≥ µ −1. (3.13)
t→∞
157
After rearranging terms,
t −1M(t) ≤ µc−1 + t −1(c/µ c − 1).
Hence
Rc R∞
Since limc→∞ µ c = limc→∞ 0 [1 − F(x)]dx = 0 [1 − F(x)]dx = µ while the left-hand side of
(3.14) is fixed, we deduce
The proof of this theorem is complete by combining inequalities (3.13) and (3.15).
158
4 The Renewal Theorem
The subject of this section involves one of the most basic theorems in applied probability. The
renewal theorem can be regarded as a refinement of the asymptotic relation M(t) ∼ t/µ,t →
∞, established in Theorem 3.2.
The proof of the renewal theorem is lengthy and demanding. We will omit the details and
refer to Feller’s book for comprehensive details. However, its statement will be given with
care so that the student can understand its meaning and be able to apply it without ambiguity.
For the precise statement, we need several preliminary definitions.
A distribution function is said to be arithmetic if there exists a positive number λ such that F
159
exhibits points of increase exclusively among the points 0, ±λ , ±2λ , · · · . The largest such λ
is called the span of F.
A distribution function F that has a continuous part is not arithmetic. The distribution
function of a discrete random variable having possible values 0, 1, 2, · · · is arithmetic with
span 1.
Definition 4.2. Let g be a function defined on [0, ∞). For every positive δ and n = 1, 2, · · · ,
let
mn = min{g(t) : (n − 1)δ ≤ t ≤ nδ },
m̄n = max{g(t) : (n − 1)δ ≤ t ≤ nδ },
∞ ∞
σ (δ ) = δ ∑ mn, and σ̄ (δ ) = δ ∑ m̄n.
n=1 n=1
160
Every monotonic function g which is absolutely integrable in the sense that
Z ∞
|g(t)|dt < ∞, (4.16)
0
is directly Riemann integrable, and this is the most important case for our purposes. Mani-
festly, all finite linear combinations of monotone functions satisfying (4.16) are also directly
Riemann integrable.
Theorem 4.1 (The Basic Renewal Theorem). Let F be the distribution function of a positive
random variable with mean µ. Suppose that a is directly Riemann integrable and that A is
the solution of the renewal equation
Z t
A(t) = a(t) + A(t − x)dF(x) (4.17)
0
161
i. If F is not arithmetic, then
( R∞
1
µ 0 a(x)dx, i f µ < ∞
lim A(t) =
t→∞ 0, i f µ = ∞.
There is a second form of the theorem, equivalent to that just given, but expressed more
directly in terms of the renewal function. Let h > 0 be given, and examine the special pre-
scription of (
1, i f 0 ≤ y < h
a(y) =
0, i f h ≤ y,
162
inserted in (4.17). In this example, for t > h, because of (3.11), we have
Z t
A(t) = a(t) + a(t − x)dM(x)
0
Z t
= dM(x)
t−h
= M(t) − M(t − h)
and µ −1 0∞ a(x)dx = h/µ. If F is not arithmetic, we may conclude on the basis of the renewal
R
theorem that
h
lim (M(t + h) − M(t)) = .
t→∞ µ
with the convention that h/µ = 0 when µ = ∞. If F is arithmetic with span λ , then letting
h = kλ and n > k,
A(c + nλ ) = M(c + nλ ) − M(c + (n − k)λ ).
163
So
λ ∞
lim M(c + nλ ) − M(c + (n − k)λ ) = ∑ a(c + nλ ) = kλ /µ.
n→∞ µ n=0
Theorem 4.2 Let F be the distribution function of a positive random variable with mean µ.
Let M(t) = ∑∞k=1 Fk (t) be the renewal function associated with F. Let h > 0 be fixed.
ii. If F is arithmetic, the same limit holds, provided h is a multiple of the span λ .
164
Proof of Theorem 3.2 assuming Theorem 4.2 Assume F is not arithmetic. Take bn =
M(n + 1) − M(n). Then by Theorem 4.2, we have bn → 1/µ
1 1 n−1 1
M(n) = ∑ bk → .
n n k=0 µ
( One can refer to p15, the last paragraph.) Now for any t > 0, let [t] denote the largest integer
not exceeding t.
[t] M([t]) M(t) [t] + 1 M([t] + 1) M(t) 1
≤ ≤ ⇒ → .
t [t] t t [t] + 1 t µ
If F is arithmetic with span λ , we set bn = M[(n + 1)λ ] − M(nλ ). Then bn → λ /µ, which
implies n−1M(λ n) = n−1 ∑n−1
k=0 bk → λ /µ as n → ∞. Note that t = λ × t/λ and
165
5 Applications of the Renewal Theorem
(a) Limiting Distribution of the Excess Life
Let γt = SN(t)+1 − t be the excess life at time t and for a fixed z > 0, set Az(t) = P(γt > z)
Using the renewal argument
!
N(t)+1
P(γt > z|X1 = x) = P(SN(t)+1 > z + t|X1 = x) = P ∑ Xi > z + t|X1 = x
i=1
1,
if x > t +z
= 0, i f t + z ≥ x > t, (N(t) = 0)
Az(t − x), i f t ≥ x > 0.
166
Theorem 3.1 yields
Z t
Az(t) = 1 − F(t + z) + (1 − F(t + z − x))dM(x).
0
R∞
We assume that µ = EX1 = 0 (1 − F(x))dx < ∞. Then
Z ∞ Z ∞
(1 − F(t + z))dt = (1 − F(y))dy < ∞ ⇒ 1 − F(t + z)
0 z
is directly Riemann integrable as a function of t with z fixed. Applying Theorem 4.1 yields
Z ∞
−1
lim Az(t) = µ (1 − F(y))dy (5.18)
t→∞ z
167
Note that {γt ≥ x and δt ≥ y} = {γt−y ≥ x + y}. (Proof of ’⇒’ By definition γt−y =
SN(t−y)+1 − (t − y). If δt ≥ y, there is no renewal in (t − y,t] and we have N(t − y) = N(t). So
γt−y = SN(t)+1 − (t − y) = γt + y ≥ x + y. Proof of ’⇔’ If there is a renewal in (t − y,t], then
SN(t−y)+1 ≤ t, which contradicts with γt−y ≥ x + y. So there is no renewal in (t − y,t]. Hence
δt ≥ y. Clearly, γt ≥ SN(t−y)+1 − t ≥ x.)
It follows that
Z ∞
−1
lim P(δt ≥ y, γt ≥ x) = lim P(γt−y ≥ x + y) = µ (1 − F(z))dz
t→∞ t→∞ x+y
168
at t − y. So conditioning on the time of the first renewal event, we have
P(βt > x|X1 = y) = P(SN(t)+1 − SN(t) > x|X1 = y) = P(XN(t)+1 > x|X1 = y)
1,
i f y > max(x,t),
= Kx (t − y), i f y ≤ t,
0, i f otherwise (i.e. t < y ≤ x. So N(t) = 0.)
169
(b) Asymptotic Expansion of the Renewal Function
Suppose F is a nonarithmetic distribution with a finite variance σ 2. We want to show
−1 σ 2 − µ2
lim (M(t) − µ t) = .
t→∞ 2µ 2
170
Now Z ∞ Z ∞ Z ∞
(x − t)dF(x) = ydF(t + y) = (1 − F(t + y))dy
t 0 0
R∞
is a monotonic function of t, and expressing 1 − F(t + y) = t+y dF(z) and interchanging the
orders of integration leads to
Z ∞ Z ∞ Z ∞Z ∞Z ∞
(1 − F(t + y))dy dt = dF(z)dydt
0 0 0 0 t+y
Z ∞ Z ∞ Z z−t Z ∞Z ∞
= dy dF(z)dt = (z − t)dF(z)dt
0 t 0 0 t
Z ∞Z z Z ∞ z2 σ 2 + µ2
= (z − t)dtdF(z) = dF(z) = < ∞.
0 0 0 2 2
Thus the renewal theorem (Theorem 4.1) implies
σ 2 + µ2
lim µH(t) = ,
t→∞ 2µ
171
or
−1 σ 2 + µ2 (σ 2 − µ 2)
lim (M(t) − µ t) = lim (H(t) − 1) = −1 =
t→∞ t→∞ 2µ 2 2µ 2
as is to be shown.
172
173
Chapter 7
Review
1 Markov chain
(
1, i = j
• Pinj = ∑∞k=0 Pikr Pksj , r + s = n, and Pi0j =
0, i 6= j.
174
• If Pinj > 0 for some n ≥ 0, we denote it by i → j.
• fiin (for n ≥ 1) denotes the probability that the MC starts at i, the first time it is back to i
at the nth step.
• Generating functions: Pi j (s) = ∑∞n=0 Pinj sn and Fi j (s) = ∑∞n=0 finj sn.
We have Fii(s)Pii(s) = Pii(s) − 1 or Pii(s) = 1−F1ii(s) .
175
– Suppose MC is recurrent, irreducible and aperiodic. Recall
(
n
1, n = 0
Piin − ∑ fiin−k Piik =
k=0 0, n > 0.
Then
1
i. limn→∞ Piin = k ;
∑∞
n=0 n f ii
ii. limn→∞ Pjin = limn→∞ Piin.
176
2 Poisson Process {N(t)}
• N(t) − N(s) ∼ Poisson(λ (t − s)),
• independent increment,
3 Brownian Motion
Definition. Reflection principle.
177
4 Renewal Process {N(t)}
• Some key terms/relation:
Sn = ∑ni=1 Xi;
N(t) = max{n : Sn ≤ t};
{N(t) ≥ k} = {Sk ≤ t}.
178
This implies
Z t Z t
M(t) = E(N(t)|X1 = x)dF = F(t) + M(t − x)dF(x).
0 0
• Elementary renewal theorem: Recall {Xi} i.i.d. with EXi = µ > 0. Then,
M(t) EN(t) 1
lim = lim = .
t→∞ t t→∞ t µ
179
• Inspection paradox
180