Assignments
Assignments
Assignments
Solution. (1) By Chebyshev’s inequality, for any n > 1 and M > 0, we have
E[|Xn |r ] C
P(|Xn | > M ) 6 r
6 r.
M M
Given ε > 0, there exists M > 0 (independent of n) such that the right hand side
is smaller than ε. Equivalently, for such M we have
1
We claim that |an | 6 M for all n. Indeed, if this is not the case, say an > M for
some n. Then
1 1
= P(Xn > an ) 6 P(Xn > M ) < ,
2 4
which is a contradiction. In addition, we also have
Z M (x−an )2
3 1 −
6 P(|Xn | 6 M ) = √ e 2σn2 dx
4 2πσn −M
Z M −an Z 2M
1 σn
− x2
2 1 σn x2
=√ e dx 6 √ e− 2 dx.
2π −Mσ−a
n
n 2π −2M σn
This implies that the sequence {σn } is bounded. For otherwise, if σn ↑ ∞ along a
subsequence, then the right hand side tends to zero along this subsequence, which
is again a contradiction.
Problem 2. Let {Xn : n > 1} be a sequence of independent and identically
distributed random variables, each following the exponential distribution exp(1).
(1) Let α > 0 be an arbitrary number. Compute the probability that “Xn >
α log n for infinitely many n”.
(2) Show that
Xn
lim sup = 1 a.s.
n→∞ log n
(3) Let
Mn = max Xi − log n.
16i6n
Show that Mn is weakly convergent and find the weak limiting distribution of Mn .
Solution. (1) By the assumption, we have
2
(2) Let
Xn
L = lim sup .
n→∞ log n
Since P(A1 ) = 1, we know that L > 1 almost surely. Moreover,
1
{L > 1} ⊆ ∪∞ ⊆ ∪∞
k=1 L > 1 + k=1 A1+ k1 .
k
Since P(A1+ 1 ) = 0 for all k, we see that P(L > 1) = 0. Therefore, L = 1 almost
k
surely.
(3) For each x ∈ R, we have
P(Mn 6 x) = P max Xi 6 x + log n
16i6n
−x−log n n
= (1 − e )
e−x n
= 1− ,
n
provided that x + log n > 0. By taking n → ∞, we obtain that
−x
lim P(Mn 6 x) = e−e .
n→∞
−x
Apparently, the function F (x) , e−e defines a continuous cumulative distribu-
tion function on R. Therefore, Mn converges weakly to the distribution given by
F.
3
For each n > 1 and λ > 0, we use Sn (λ) to denote a Gamma distributed random
variable with parameter n and λ.
(2-i) Show that
(−1)n−1 λn (n−1)
E[f (Sn (λ))] = g (λ),
(n − 1)!
where g (k) denotes the k-th derivative of g.
(2-ii) Given x > 0, show that Sn (n/x) converges to x in probability as n → ∞.
(2-iii) Use Part (ii) to prove the following inversion formula for the Laplace trans-
form:
(−1)n−1 n n (n−1) n
f (x) = lim g , x > 0.
n→∞ (n − 1)! x x
Solution. (1) We use fn to denote the probability density function of Sn . When
n = 1, the formula is just the exponential distribution. Suppose that the claim
d
is true for Sn , i.e. Sn = γ(n, λ). Since Sn and Xn+1 are independent, by the
convolution formula, for x > 0 we have
Z ∞
fn+1 (x) = fn (x − y)f1 (y)dy
−∞
Z x n
λ (x − y)n−1 −λ(x−y)
= e · λe−λy dy
0 (n − 1)!
λn+1 −λx x
Z
= e (x − y)n−1 dy
(n − 1)! 0
λn+1 n −λx
= x e .
n!
d
(2-i) Since Sn (λ) = γ(n, λ), we have
∞
λn xn−1 −λx
Z
E[f (Sn (λ))] = f (x) · e dx.
0 (n − 1)!
Therefore,
(−1)n−1 λn (n−1)
E[f (Sn (λ))] = g (λ).
(n − 1)!
4
(2-ii) By Chebyshev’s inequality, we have
n 1 n 2
P S n − x > ε 6 2 E S n − x .
x ε x
Note that Sn (n/x) can be viewed as the sum of n independent exp(n/x)-random
variables. Therefore,
n x
E[Sn ] = n · = x,
x n
and
n 2 n x2 x2
E Sn − x = Var[Sn ]=n· 2 = .
x x n n
Therefore,
n x2
P Sn − x > ε 6 2 → 0
x nε
as n → ∞. This shows that Sn (n/x) converges to x in probability.
(2-iii) We have in particular Sn (n/x) converges to x weakly, and therefore
n
E f Sn → E[f (x)] = f (x).
x
But from Part (2-i) we know that
n (−1)n−1 n n (n−1) n
E f Sn = g .
x (n − 1)! x x
This proves the desired inversion formula.
Remark. Similar to the proof of Theorem 3.2 in the lecture notes of Topic 1, it
is not hard to see that the convergence in the inversion formula holds uniformly
over compact intervals.
Problem 4. Let {Xn : n > 1} be a sequence of non-negative, independent and
identically distributed random variables. Define Sn , X1 + · · · + Xn . For each
t > 0, let N (t) be the random variable defined by
We often interpret Xn as the lifetime of the n-th object (a light bulb/battery etc.)
in a sequence. If we assume that as long as one object dies the next one replaces
it immediately, then Sn denotes the total lifetime of the first n objects (or the
time of the n-th replacement), and N (t) records the total number of replacements
up to time t.
5
(1) Suppose that E[X1 ] < ∞ and P(X1 = 0) < 1.
(1-i) Use the definition of limit to show that
P lim Xn = 0 = 0,
n→∞
Since F (0) < 1, by the right continuity of F , there exists η ∈ (0, 1) such that
F (ε) 6 1 − η when ε is small. Therefore, for each fixed N we have
P ∩∞
n=N Xn 6 ε
M
Y
= lim P(Xn 6 ε) (by independence)
M →∞
n=N
6 lim (1 − η)M −N
M →∞
= 0.
This implies that
P lim Xn = 0 = lim P ∪N >1 ∩n>N Xn 6 ε
n→∞ ε→0
∞
X
6 lim P ∩n>N Xn 6 ε
ε→0
N =1
= 0.
6
Since ∞
X
Xn < ∞ =⇒ Xn → 0,
n=1
or equivalently, Sn → ∞ a.s.
(1-ii) According to the strong law of large numbers, with probability one we have
Sn
→ E[X1 ] as n → ∞.
n
Note that N (t) → ∞ as t → ∞ (since Sn < ∞ for all n). In particular, with
probability one we have
SN (t)
→ E[X1 ] as t → ∞.
N (t)
It follows that
SN (t) t SN (t)+1 SN (t)+1 N (t) + 1
6 < = . .
N (t) N (t) N (t) N (t) + 1 N (t)
7
d
In addition, we know that Sn = γ(n, γ). Therefore,
P(N (t) = n)
= P(Sn 6 t, Sn + Xn+1 > t)
Z ∞
= P(Sn 6 t, Sn + Xn+1 > t|Sn = x)fSn (x)dx
0
Z t
= P(Xn+1 > t − x|Sn = x) · fSn (x)dx
0
Z t
= P(Xn+1 > t − x)fSn (x)dx
0
Z t
λn xn−1 −λx
= e−λ(t−x) e dx
0 (n − 1)!
Z t
λn e−λt
= xn−1 dx
(n − 1)! 0
(λt)n e−λt
= .
n!
This shows that N (t) is a Poisson random variable with parameter λt.
(3-i) By the definition of N (t), we have
In other words, there are precisely N (t) replacements up to time t. The (N (t)+1)-
th object is alive at time t, and it dies at time SN (t)+1 > t. Its lifetime is XN (t)+1 .
Since the sequence Xn takes values in {0, 1}, we see that XN (t)+1 is always equal
to 1, for otherwise
SN (t)+1 = SN (t) + 0 6 t
which is a contradiction. In particular, we have
8
takes values in N = {0, 1, 2, · · · }. At the next move it either remains at Sn (if
Xn+1 = 0) or jumps to Sn + 1 (if Xn+1 = 1). For each position m ∈ N, we define
Km , #{n : Sn = m}
to be the number of arrivals at the position m. Since N (t) counts the total number
of arrivals up to time t, it is apparent that
[t]
X
N (t) = Km
m=0
= (1 − p)r−1 p.
Here we have used the simple observation that N (m − 1) > m − 1. This shows
that, Km − 1 is a geometric random variable with parameter p. To see their
independence, let m > 1, r0 > 0, and r1 , · · · , rm > 1. Then
P(K0 = r0 , K1 = r1 , · · · , Km = rm )
= P(X1 = · · · = Xr0 = 0, Xr0 +1 = 1,
Xr0 +2 = · · · = Xr0 +r1 = 0, Xr0 +r1 +1 = 1,
· · · , Xr0 +···+rm−1 +1 = 1, Xr0 +···+rm−1 +2 = · · · = Xr0 +···+rm−1 +rm = 0,
Xr0 +···+rm−1 +rm +1 = 1)
= ((1 − p)r0 p) · ((1 − p)r1 −1 p) · · · ((1 − p)rm −1 p)
= P(K0 = r0 )P(K1 = r1 ) · · · P(Km = rm ).
9
The above notation looks complicated but the intuition is not hard if one draws
a picture of dots representing the stream of arrivals. This gives the desired inde-
pendence property.
To conclude, we obtain that
[t]
X
N (t) = K0 + [t] + (Km − 1)
m=1
d
= [t] + Negative Binomial([t] + 1, p).
is the sum of the deterministic constant [t] and a negative binomial random vari-
able with parameters ([t] + 1, p) (recall that the sum of independent geometric
random variables is negative binomial).
10
Assignment Two: Solutions
(3-i) Suppose that Xn : n > 1 are i.i.d. random variables with distribution
1
P(Xn = 0) = P(Xn = 2) = .
2
P∞ Xn
Show that the random series X = n=1 3n is convergent almost surely. What is
the distribution function of X?
(3-ii) Show that each of the two components on the right hand side of equation
(E) is the characteristic function a random variable whose distribution function
F is continuous but F 0 (x) = 0 for almost every x ∈ R. Such a random variable
is called a singular random variable. Note that a singular random variable can
never have a probability density function. This example shows that the sum of
two independent singular random variables may have a density function.
1
Solution. (1) The two characteristic functions are given by
1 1 itx 1 1
Z Z
sin t
fX (t) = e dx = cos txdx = ,
2 −1 2 −1 t
and
1 1
fX (t) = eit + e−it = cos t.
2 2
t
(2) Note that cos 2n is the characteristic function of 21n Xn where Xn is a sym-
metric Bernoulli random variable. In view of Part (1), to solve the problem,
d
it is sufficient to show that a uniform random variable X = U [−1, 1] admits a
representation
∞
X Xn
X= n
,
n=1
2
where {Xn : n > 1} are i.i.d. symmetric Bernoulli random variables. To this end,
d
we recall that ξ = U [0, 1] admits a binary expansion
∞
X ξn
ξ=
n=1
2n
where {ξn : n > 1} are i.i.d. Bernoulli random variables with parameter 1/2. It
follows that ∞ ∞
X ξn X 2ξn − 1
2ξ − 1 = 2 n
−1= .
n=1
2 n=1
2n
d
This is exactly the desired representation since 2ξ − 1 = U [−1, 1] and {2ξn − 1 :
n > 1} are i.i.d. symmetric Bernoulli random variables.
P∞(3-i) It is clear from Kolmogorov’s two series theorem that the series X ,
Xn
n=1 3n is convergent almost surely. To find its distribution, the crucial obser-
vation is that X takes values in the Cantor set C! Let G : [0, 1] → [0, 1] be the
Cantor function, and extend it to a distribution function on R by setting G(x) , 0
if x < 0 and G(x) , 1 if x > 1. From the explicit expression of G on the Cantor
set, we know that
∞
X Xn /2
G(X) = n
.
n=1
2
Observe that {Xn /2 : n > 1} is an i.i.d. sequence of standard Bernoulli random
variables. Therefore, G(X) is a binary expansion with i.i.d. Bernoulli digits.
d
Equivalently, we have G(X) = U (0, 1).
2
To proceed further, we need to recall a fact from elementary probability theory.
d
If F is a distribution function and U = U (0, 1), then F −1 (U ) has distribution
function F , where F −1 is the generalised inverse of F defined by
This generalised notion of inverse is only needed when F fails to be strictly in-
creasing. Returning to our problem, it is tempting to conclude directly that the
distribution function of X is the Cantor function G, from the seemingly apparent
d
relations X = G−1 (G(X)) and G(X) = U [0, 1]. Some technical care is needed
here, because it is not true in general that G−1 G(x) = x! By definition it is clear
that G−1 G(x) 6 x, however, if G is constant on an open interval I and x ∈ I,
then G−1 G(x) < x (why?).
To get around the above issue, in the construction of the Cantor set C, for
each n > 1 let In denote the union of the open intervals being removed at step n.
Let D , ∪∞ n=1 ∂In be the collection of all the endpoints of the intervals in the In ’s.
We first claim that, for any x ∈ C\D we have G−1 G(x) = x. Indeed, if x ∈ C
is not an endpoint, by the construction of G we know that there is a sequence
xn ↑ x such that G(xn ) < G(x). This ensures that G−1 G(x) > x (why?) and thus
G−1 G(x) = x for such x. The next observation is that,
P(X ∈ D) = 0.
This follows from the fact that X ∈ D if and only if “Xn = 0 eventually or Xn = 2
eventually”, both of which having zero probability. Therefore, with probability one
we have X ∈ C\D. In other words,
Since we know that the distribution function of G−1 G(X) is G, we conclude that
G is also the distribution function of X.
(3-ii) The characteristic function
∞
Y t
cos
n=1
22n
3
where {Xn : n >Q 1} are i.i.d. symmetric Bernoulli random variables. The chara-
teristic function ∞ t
n=1 cos 22n−1 corresponds to the distribution of 2X. Therefore,
it is enough to consider X only.
The idea is to observe that X has a Cantor type distribution. To be more
precise, let Yn , 23 (Xn + 1). The distribution of Yn is given by
1
P(Yn = 0) = P(Yn = 3) = .
2
We can then write Xn = 32 Yn − 1 and thus
∞ ∞
X 1 2 2X Yn 1
X= 2n
Yn − 1 = − .
n=1
2 3 3 n=1 4n 3
4
intervals, and in the natural increasing order the values of G(x) on these intervals
are given by
1 3 5 2n − 1
, , , · · · ,
2n 2n 2n 2n
respectively. This clearly specifies the function G on [0, 1]\C = ∪∞ n=1 In (i.e. on
all the open intervals being removed). Exactly as the base 3 case, this function G
extends to a continuous function on [0, 1]. For elements x ∈ C, we have
∞
X xn /3
G(x) = , (E1)
n=1
2n
where
x = 0.x1 x2 x3 · · ·
denotes the expansion of x under base 4. The function G satisfies G(0) = 0,
G(1) = 1, and it is non-decreasing on [0, 1]. More importantly, note that the
Lebesgue measure of the union of all the open sub-intervals being removed, de-
noted as I , ∪∞n=1 In , is equal to one. This is seen from the following simple
calculation: ∞
X 2
|I| = 2n−1 × n = 1.
n=1
4
Since G is constant on each of the open sub-intervals being removed, we see that
G0 (x) = 0 on I and thus G0 = 0 for almost every points in [0, 1].
We extend G to a distribution function by letting G(x) = 0 for x < 0 and
G(x) = 1 for x >P 1. To complete the proof, it remains to see that the distribution
function of Y , ∞ Yn
n=1 4n is G(x). Since Yn = 0 or 3, we know that Y takes values
in C. The fact that G is the distribution function of Y follows from exactly the
same argument as in the base 3 case. As a consequence, Y and thus the original
X is a singular random variable.
Alternative solution. There is a more direct and neater solution that is es-
sentially due to two students. They looked at expansions in different contexts,
but the essence of their approaches was quite similar. I now summarise their
approaches based on my understanding.
We wish to show that the random series X = ∞ Xn
P
n=1 22n−1 is a singular random
variable, where {Xn : n > 1} is an i.i.d. sequence of Bernoulli random variables.
In terms of binary expansion, X has the expansion
5
From the above expansion, it is clear that the distribution function of X is
continuous. Indeed, given a generic real number x ∈ (0, 1), the event {X = x}
uniquely specifies the values of all the Xn0 s since the expansion is unique (I leave
the reader to think about how to deal with the situation at the points where the
binary expansion is not unique). But any particular specification of the values of
the sequence {Xn } has zero probability. Therefore, P(X = x) = 0 for every x,
showing that the distribution function of X is continuous.
Next, we show that the distribution function of X has zero derivative almost
everywhere. Since X ∈ [0, 1], it is enough to restrict our attention to the unit
interval. The crucial observation is that (partly inspired by the construction of
the Cantor set), X takes values in the complement of a countable union of disjoint
intervals. To figure out what these intervals (to be removed) are, we look at the
expansion grouped in the following manner:
The first two digits (X1 0) tell us that X cannot take values in
Similarly, the next two digits (X2 0) tells us that X cannot take values in
ρ(2t) = ρ(t)2 .
6
(3) Use Part (2) to show that ρ(t) = 1 for all t and hence conclude that
f (2t) = f (t)4 .
2 /2 d d
(4) Use Part (3) to show that f (t) = e−t , and thus conclude that X = Y =
N (0, 1).
Solution. (1) Since X + Y and X − Y are independent, and 2X = (X + Y ) +
(X − Y ), we have
If f (t0 ) = 0 for some t0 , using the above relation we know that at least one
of f (t0 /2) or f (−t0 /2) is zero. By iterating this argument, we find a sequence
tn → 0 such that f (tn ) = 0. But this is impossible since f (0) = 1 6= 0 and f (t) is
continuous at t = 0.
(2) Since f (t) 6= 0 for all t, the function ρ(t) , ff(−t)
(t)
is well defined. In
addition, according to Part (i) we have
f (t) = 1 + o(t) as t → 0.
It is helpful to call the above o(t) some function ε(t). Then we also have
f (−t) = 1 + ε(−t).
It follows that
f (t) 1 + ε(t) ε(t) − ε(−t)
ρ(t) = = =1+ .
f (−t) 1 + ε(−t) 1 + ε(−t)
ε(t)−ε(−t)
Note that the function 1+ε(−t)
is also o(t) as t → 0. Therefore,
ρ(t) = 1 + o(t) as t → 0.
7
Now using Part (ii), we have
t 2 t 4 t 2n
ρ(t) = ρ =ρ = ··· = ρ n (for all n)
2 4 2
t 2n t
= 1+o n = exp 2n × log 1 + o n
2 2
→ 1 (as n → ∞).
f (2t) = f (t)4 .
By differentiation, we get
f 0 (2t) = 2f (t)3 f 0 (t).
In particular, if we define
f 0 (t)
g(t) , ,
f (t)
then we have
g(2t) = 2g(t).
8
As a consequence, for fixed t we have
t t t
g(t) = 2g( ) = 22 g( ) = · · · = 2k g( k )
2 4 2
for all k > 1.
Since X has mean zero and unit variance, we have f (0) = 1, f 0 (0) = 0 and
f 00 (0) = −1. Therefore, g(0) = 0 and
f 00 (0)f (0) − f 0 (0)2
g 0 (0) = = −1.
f (0)2
On the other hand, we also have
g(t/2k ) − g(0) 2k g(t/2k ) g(t) g(t)
g 0 (0) = lim = lim = lim = .
k→∞ t/2k k→∞ t k→∞ t t
As a result, we conclude that
g(t)
= −1, or g(t) = −t.
t
By the definition of g(t), we have
f 0 (t) = −tf (t).
2 /2
Using f (0) = 1, the unique solution to the above ODE is given by f (t) = e−t .
0
Remark. Based on the relation g(2t) = 2g(t), a few students observed that g (2t) =
g 0 (t), and thus
g 0 (t) = g 0 (t/2) = g 0 (t/4) · · · = g 0 (t/2n ) → g 0 (0) = −1.
Therefore, g(t) = −t.
Problem 3. Let {Xn : n > 1} be an independent sequence of random variables.
(1) In the following two cases, show that {Xn } satisfies Lyapunov’s central limit
theorem.
d
(1-i)Xn = U [−n, n] (uniform distribution on [−n, n]).
(1-ii) (
1
±nα , with probability 6n2(α−1) each;
Xn = 1
0, with probability 1 − 3n2(α−1) .
d
(2) Suppose that Xn = N (0, 2n−1 ). Show that Lindeberg’s condition fails for {Xn },
but the central limit theorem still holds:
S
p n → N (0, 1), weakly.
Var[Sn ]
9
Solution. (1-i) Note that Xn has mean zero. For the second moment, we have
Z n
2 1 2 n2
E[Xn ] = x dx = ,
2n −n 3
and thus n n
X 1 X 2 n(n + 1)(2n + 1)
Σ2n = Var[Xk ] = k =
k=1
3 k=1 18
and n n
X 1 X 3 n4
3
Γn , E[|Xk | ] = k 6 = O(n4 ).
k=1
4 k=1
4
It follows that
Γn n4
= O = O(n−1/2 ) → 0
Σ3n n9/2
as n → ∞. Therefore, Lyapunov’s central limit theorem holds.
(1-ii) Note that E[Xn ] = 0. For the second and third moments, we have
n2 1
E[Xn2 ] = , E[|Xn |3 ] = nα+2 .
3 3
Therefore, Σn = O(n3/2 ) and
n n
X 1 X α+2 1 α+3
Γn , E[|Xk |3 ] = k 6 n = O(nα+3 ).
k=1
3 k=1 3
It follows that
Γn nα+3 1
3
= O 9/2
= O 3/2−α .
Σn n n
Since α < 3/2, we see that ΣΓn3 → 0 and thus Lyapunov’s central limit theorem
n
holds.
(2) We have seen that Lindeberg’s condition implies
max16m6n σm
rn , →0
Σn
10
as n → ∞. Therefore, it is enough to show that rn does not converge to zero in
2
this example. By the assumption, we have σm = 2m−1 and
n
X
Σ2n = 2m−1 = 2n − 1.
m=1
It follows that
2n−1 1
rn2 =n
→ 6= 1.
2 −1 2
As a consequence, the sequence {Xn } does not satisfy Lindeberg’s condition.
However, since each Xn is Gaussian and they are independent, we know that
d
√ Sn = N (0, 1). Therefore, the central limit theorem holds trivially for the
Var[Sn ]
sequence {Xn }.
Problem 4. A little girl is painting on a blank paper. Suppose that there is a total
number of N available colors. At each time she selects one color randomly and
paints on the paper. It is possible that she picks a color that she has already used
before. Different selections are assumed to be independent. Let T be the number
of selections until the little girl picks a color that she has obtained previously.
(i) For each k ∈ N, what is P(T > k)?
(ii) By using the formula
∞
X
E[T ] = P(T > k)
k=0
which is true for any non-negative integer-valued random variables, show that
N
N! X Nj
E[T ] = N .
N j=0 j!
(iii) By using the central limit theorem in a suitable context, show that
N
−N
X Nj 1
lim e = .
N →∞
j=0
j! 2
(iv) Let φ(N ) = E[T ] (as a function of N ). Conclude from Part (iii) as well as
Stirling’s formula that r
φ(N ) π
lim √ = .
N →∞ N 2
11
q
This result shows that when N is large, on average it takes about πN 2
of selec-
tions to see a first repeated choice. It also gives a probabilistic way of simulating
π.
Solution. (i) The possible values of T are {2, 3, · · · , N + 1}. For 1 6 k 6 N + 1,
the event {T > k} means that there are no repeated choices among the first k
selections. Therefore, we have
N · (N − 1) · · · (N − k + 1)
P(T > k) = .
Nk
Note that P(T > k) = 1 if k = 0, 1 and P(T > k) = 0 if k > N + 1.
(ii) By using Part (i) and the given summation formula for E[T ], we have
N
X N · (N − 1) · · · (N − k + 1)
E[T ] = 1 +
k=1
Nk
N
X N!
=
k=0
(N − k)!N k
N
X N!
= (change of index: j = N − k)
j=0
j!N N −j
N
N! X Nj
= N .
N j=0 j!
13