Assignments

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Assigment One: Solutions

Problem 1. (1) Let {Xn : n > 1} be a sequence of random variables. Suppose


that there exists constants r, C > 0 such that

E[|Xn |r ] 6 C for all n.

Show that the family {Xn : n > 1} of random variables is tight.


(2) Let Xn (n > 1) be normally distributed with mean an and variance σn2 . Show
that the family {Xn : n > 1} is tight if and only if the real sequences {an : n > 1}
and {σn2 : n > 1} are both bounded.

Solution. (1) By Chebyshev’s inequality, for any n > 1 and M > 0, we have

E[|Xn |r ] C
P(|Xn | > M ) 6 r
6 r.
M M
Given ε > 0, there exists M > 0 (independent of n) such that the right hand side
is smaller than ε. Equivalently, for such M we have

P(−M 6 Xn 6 M ) > 1 − ε for all n.

This gives the tightness property.


(2) Sufficiency. Suppose that the real sequences {an } and {σn2 } are both bounded.
Since
E[Xn2 ] = (E[Xn ])2 + Var[Xn ] = a2n + σn2 ,
we know that the real sequence {E[Xn2 ]} is bounded. According to Part (1) with
r = 2, we conclude that the sequence {Xn } of random variables is tight.
Necessity. Suppose that the family {Xn } is tight. Then there exists M > 0,
such that
3
P(|Xn | 6 M ) > for all n.
4

1
We claim that |an | 6 M for all n. Indeed, if this is not the case, say an > M for
some n. Then
1 1
= P(Xn > an ) 6 P(Xn > M ) < ,
2 4
which is a contradiction. In addition, we also have
Z M (x−an )2
3 1 −
6 P(|Xn | 6 M ) = √ e 2σn2 dx
4 2πσn −M
Z M −an Z 2M
1 σn
− x2
2 1 σn x2
=√ e dx 6 √ e− 2 dx.
2π −Mσ−a
n
n 2π −2M σn

This implies that the sequence {σn } is bounded. For otherwise, if σn ↑ ∞ along a
subsequence, then the right hand side tends to zero along this subsequence, which
is again a contradiction.
Problem 2. Let {Xn : n > 1} be a sequence of independent and identically
distributed random variables, each following the exponential distribution exp(1).
(1) Let α > 0 be an arbitrary number. Compute the probability that “Xn >
α log n for infinitely many n”.
(2) Show that
Xn
lim sup = 1 a.s.
n→∞ log n
(3) Let
Mn = max Xi − log n.
16i6n

Show that Mn is weakly convergent and find the weak limiting distribution of Mn .
Solution. (1) By the assumption, we have

P(Xn > α log n) = e−α log n = 1/nα .

For each α > 0, define

Aα = {Xn > α log n for infinitely many n}.


P 1
Note that the series n nα is convergent if and only if α > 1. According to
Borel-Cantelli’s lemma, we conclude that
(
0, α > 1;
P(Aα ) =
1, 0 < α 6 1.

2
(2) Let
Xn
L = lim sup .
n→∞ log n
Since P(A1 ) = 1, we know that L > 1 almost surely. Moreover,
1
{L > 1} ⊆ ∪∞ ⊆ ∪∞

k=1 L > 1 + k=1 A1+ k1 .
k
Since P(A1+ 1 ) = 0 for all k, we see that P(L > 1) = 0. Therefore, L = 1 almost
k
surely.
(3) For each x ∈ R, we have

P(Mn 6 x) = P max Xi 6 x + log n
16i6n
−x−log n n
= (1 − e )
e−x n
= 1− ,
n
provided that x + log n > 0. By taking n → ∞, we obtain that
−x
lim P(Mn 6 x) = e−e .
n→∞

−x
Apparently, the function F (x) , e−e defines a continuous cumulative distribu-
tion function on R. Therefore, Mn converges weakly to the distribution given by
F.

Problem 3. (1) Let X1 , · · · , Xn be independent random variables, each following


the exponential distribution with parameter λ. Let Sn , X1 + · · · + Xn . Use
d
induction to show that, Sn = γ(n, λ), namely, the probability density function of
Sn is given by ( n n−1
λ x
(n−1)!
e−λx , x > 0;
fn (x) =
0, x 6 0.
(2) Let f : [0, ∞) → R be a bounded continuous function. The Laplace transform
of f, is the function g defined by
Z ∞
g(λ) , e−λx f (x)dx, λ > 0.
0

3
For each n > 1 and λ > 0, we use Sn (λ) to denote a Gamma distributed random
variable with parameter n and λ.
(2-i) Show that
(−1)n−1 λn (n−1)
E[f (Sn (λ))] = g (λ),
(n − 1)!
where g (k) denotes the k-th derivative of g.
(2-ii) Given x > 0, show that Sn (n/x) converges to x in probability as n → ∞.
(2-iii) Use Part (ii) to prove the following inversion formula for the Laplace trans-
form:
(−1)n−1 n n (n−1) n 
f (x) = lim g , x > 0.
n→∞ (n − 1)! x x
Solution. (1) We use fn to denote the probability density function of Sn . When
n = 1, the formula is just the exponential distribution. Suppose that the claim
d
is true for Sn , i.e. Sn = γ(n, λ). Since Sn and Xn+1 are independent, by the
convolution formula, for x > 0 we have
Z ∞
fn+1 (x) = fn (x − y)f1 (y)dy
−∞
Z x n
λ (x − y)n−1 −λ(x−y)
= e · λe−λy dy
0 (n − 1)!
λn+1 −λx x
Z
= e (x − y)n−1 dy
(n − 1)! 0
λn+1 n −λx
= x e .
n!
d
(2-i) Since Sn (λ) = γ(n, λ), we have

λn xn−1 −λx
Z
E[f (Sn (λ))] = f (x) · e dx.
0 (n − 1)!

On the other hand, by differentiating g(λ) in its defining formula, we have


Z ∞
g (n−1)
(λ) = (−1)n−1 xn−1 e−λx f (x)dx.
0

Therefore,
(−1)n−1 λn (n−1)
E[f (Sn (λ))] = g (λ).
(n − 1)!

4
(2-ii) By Chebyshev’s inequality, we have
n  1  n 2 
P S n − x > ε 6 2 E S n − x .
x ε x
Note that Sn (n/x) can be viewed as the sum of n independent exp(n/x)-random
variables. Therefore,
n x
E[Sn ] = n · = x,
x n
and
 n 2  n x2 x2
E Sn − x = Var[Sn ]=n· 2 = .
x x n n
Therefore,
n  x2
P Sn − x > ε 6 2 → 0
x nε
as n → ∞. This shows that Sn (n/x) converges to x in probability.
(2-iii) We have in particular Sn (n/x) converges to x weakly, and therefore
 n 
E f Sn → E[f (x)] = f (x).
x
But from Part (2-i) we know that
 n  (−1)n−1 n n (n−1) n 
E f Sn = g .
x (n − 1)! x x
This proves the desired inversion formula.
Remark. Similar to the proof of Theorem 3.2 in the lecture notes of Topic 1, it
is not hard to see that the convergence in the inversion formula holds uniformly
over compact intervals.
Problem 4. Let {Xn : n > 1} be a sequence of non-negative, independent and
identically distributed random variables. Define Sn , X1 + · · · + Xn . For each
t > 0, let N (t) be the random variable defined by

N (t) , sup{n > 1 : Sn 6 t}.

We often interpret Xn as the lifetime of the n-th object (a light bulb/battery etc.)
in a sequence. If we assume that as long as one object dies the next one replaces
it immediately, then Sn denotes the total lifetime of the first n objects (or the
time of the n-th replacement), and N (t) records the total number of replacements
up to time t.

5
(1) Suppose that E[X1 ] < ∞ and P(X1 = 0) < 1.
(1-i) Use the definition of limit to show that

P lim Xn = 0 = 0,
n→∞

and hence conclude that Sn → ∞ a.s.


(1-ii) Use the strong law of large numbers to show that:
N (t) 1
→ a.s.
t E[X1 ]
as t → ∞.
d
(2) Suppose that Xn = exp(λ) for each n. What is the distribution of N (t)?
d
(3) Suppose that Xn = Bernoulli(p), where p ∈ (0, 1) is a fixed parameter.
(3-i) Fix t > 0. Explain why the random variable XN (t)+1 represents the lifetime
of the object that is alive at time t. Is it true that E[XN (t)+1 ] = E[X1 ] = p?
(3-ii) What is the distribution of N (0)?
(3-iii) [Hard] For t > 0, what is the distribution of N (t)?
Solution. (1-i) By the definition of limit, we have
 
lim Xn = 0 = ∩ε>0 ∪N >1 ∩n>N Xn 6 ε .
n→∞

Since F (0) < 1, by the right continuity of F , there exists η ∈ (0, 1) such that
F (ε) 6 1 − η when ε is small. Therefore, for each fixed N we have
P ∩∞
 
n=N Xn 6 ε
M
Y
= lim P(Xn 6 ε) (by independence)
M →∞
n=N
6 lim (1 − η)M −N
M →∞
= 0.
This implies that
   
P lim Xn = 0 = lim P ∪N >1 ∩n>N Xn 6 ε
n→∞ ε→0

X  
6 lim P ∩n>N Xn 6 ε
ε→0
N =1
= 0.

6
Since ∞
X
Xn < ∞ =⇒ Xn → 0,
n=1

we immediately obtain that



X 
P Xn < ∞ = 0,
n=1

or equivalently, Sn → ∞ a.s.
(1-ii) According to the strong law of large numbers, with probability one we have

Sn
→ E[X1 ] as n → ∞.
n
Note that N (t) → ∞ as t → ∞ (since Sn < ∞ for all n). In particular, with
probability one we have
SN (t)
→ E[X1 ] as t → ∞.
N (t)

On the other hand, by the definition of N (t) we have

SN (t) 6 t < SN (t)+1 .

It follows that
SN (t) t SN (t)+1 SN (t)+1 N (t) + 1
6 < = . .
N (t) N (t) N (t) N (t) + 1 N (t)

Therefore, with probability one we have


t
lim = E[X1 ].
t→∞ N (t)

(2) Note that

{N (t) = n} = {Sn 6 t, Sn+1 > t} = {Sn 6 t, Sn + Xn+1 > t}.

7
d
In addition, we know that Sn = γ(n, γ). Therefore,

P(N (t) = n)
= P(Sn 6 t, Sn + Xn+1 > t)
Z ∞
= P(Sn 6 t, Sn + Xn+1 > t|Sn = x)fSn (x)dx
0
Z t
= P(Xn+1 > t − x|Sn = x) · fSn (x)dx
0
Z t
= P(Xn+1 > t − x)fSn (x)dx
0
Z t
λn xn−1 −λx
= e−λ(t−x) e dx
0 (n − 1)!
Z t
λn e−λt
= xn−1 dx
(n − 1)! 0
(λt)n e−λt
= .
n!
This shows that N (t) is a Poisson random variable with parameter λt.
(3-i) By the definition of N (t), we have

SN (t) 6 t < SN (t)+1 = SN (t) + XN (t)+1 .

In other words, there are precisely N (t) replacements up to time t. The (N (t)+1)-
th object is alive at time t, and it dies at time SN (t)+1 > t. Its lifetime is XN (t)+1 .
Since the sequence Xn takes values in {0, 1}, we see that XN (t)+1 is always equal
to 1, for otherwise
SN (t)+1 = SN (t) + 0 6 t
which is a contradiction. In particular, we have

E[XN (t)+1 ] = 1 > E[X1 ] = p.

This property is generally known as the inspection paradox.


(3-ii) We have

P(N (0) = n) = P(X1 = 0, · · · , Xn = 0, Xn+1 = 1)


= (1 − p)n p.

Therefore, N (0) is a geometric random variable with parameter p.


(3-iii) Think of the sequence {Sn : n > 1} as a stream of arrivals. Note that Sn

8
takes values in N = {0, 1, 2, · · · }. At the next move it either remains at Sn (if
Xn+1 = 0) or jumps to Sn + 1 (if Xn+1 = 1). For each position m ∈ N, we define
Km , #{n : Sn = m}
to be the number of arrivals at the position m. Since N (t) counts the total number
of arrivals up to time t, it is apparent that
[t]
X
N (t) = Km
m=0

where [t] denotes the integer part of t.


The crucial point is that, {K0 , K1 −1, K2 −1, · · · } is a sequence of independent
and identically distributed random variables, each following the geometric distri-
d
bution with parameter p. From Part (3-ii), it is clear that K0 = N (0) = Geo(p).
For m > 1, we have
P(Km = r)
X
= P(N (m − 1) = l, Km = r)
l>m−1
X
= P(Sl = m − 1, Xl+1 = 1, Xl+2 = · · · = Xl+r = 0, Xl+r+1 = 1)
l>m−1
X
= (1 − p)r−1 p · P(Sl = m − 1, Xl+1 = 1)
l>m−1
X
r−1
= (1 − p) p· P(N (m − 1) = l)
l>m−1

= (1 − p)r−1 p.
Here we have used the simple observation that N (m − 1) > m − 1. This shows
that, Km − 1 is a geometric random variable with parameter p. To see their
independence, let m > 1, r0 > 0, and r1 , · · · , rm > 1. Then
P(K0 = r0 , K1 = r1 , · · · , Km = rm )
= P(X1 = · · · = Xr0 = 0, Xr0 +1 = 1,
Xr0 +2 = · · · = Xr0 +r1 = 0, Xr0 +r1 +1 = 1,
· · · , Xr0 +···+rm−1 +1 = 1, Xr0 +···+rm−1 +2 = · · · = Xr0 +···+rm−1 +rm = 0,
Xr0 +···+rm−1 +rm +1 = 1)
= ((1 − p)r0 p) · ((1 − p)r1 −1 p) · · · ((1 − p)rm −1 p)
= P(K0 = r0 )P(K1 = r1 ) · · · P(Km = rm ).

9
The above notation looks complicated but the intuition is not hard if one draws
a picture of dots representing the stream of arrivals. This gives the desired inde-
pendence property.
To conclude, we obtain that
[t]
X
N (t) = K0 + [t] + (Km − 1)
m=1
d
= [t] + Negative Binomial([t] + 1, p).

is the sum of the deterministic constant [t] and a negative binomial random vari-
able with parameters ([t] + 1, p) (recall that the sum of independent geometric
random variables is negative binomial).

10
Assignment Two: Solutions

Problem 1. (1) Compute the characteristc function of X in the following two


cases:
d
(1-i) X = U [−1, 1];
(1-ii) X is a symmetric Bernoulli random variable, i.e.
1
P(X = 1) = P(X = −1) = .
2
(2) By using properties of the characteristic function in a suitable context, estab-
lish the following formula
∞ N
sin t Y t Y t
= cos n , lim cos n .
t n=1
2 N →∞
n=1
2

(3) [Optional] Let us rewrite the formula in Part (2) as


∞ ∞
sin t Y t  Y t 
= cos 2n−1 × cos 2n . (E)
t n=1
2 n=1
2

(3-i) Suppose that Xn : n > 1 are i.i.d. random variables with distribution
1
P(Xn = 0) = P(Xn = 2) = .
2
P∞ Xn
Show that the random series X = n=1 3n is convergent almost surely. What is
the distribution function of X?
(3-ii) Show that each of the two components on the right hand side of equation
(E) is the characteristic function a random variable whose distribution function
F is continuous but F 0 (x) = 0 for almost every x ∈ R. Such a random variable
is called a singular random variable. Note that a singular random variable can
never have a probability density function. This example shows that the sum of
two independent singular random variables may have a density function.

1
Solution. (1) The two characteristic functions are given by
1 1 itx 1 1
Z Z
sin t
fX (t) = e dx = cos txdx = ,
2 −1 2 −1 t
and
1 1
fX (t) = eit + e−it = cos t.
2 2
t
(2) Note that cos 2n is the characteristic function of 21n Xn where Xn is a sym-
metric Bernoulli random variable. In view of Part (1), to solve the problem,
d
it is sufficient to show that a uniform random variable X = U [−1, 1] admits a
representation

X Xn
X= n
,
n=1
2
where {Xn : n > 1} are i.i.d. symmetric Bernoulli random variables. To this end,
d
we recall that ξ = U [0, 1] admits a binary expansion

X ξn
ξ=
n=1
2n

where {ξn : n > 1} are i.i.d. Bernoulli random variables with parameter 1/2. It
follows that ∞ ∞
X ξn X 2ξn − 1
2ξ − 1 = 2 n
−1= .
n=1
2 n=1
2n
d
This is exactly the desired representation since 2ξ − 1 = U [−1, 1] and {2ξn − 1 :
n > 1} are i.i.d. symmetric Bernoulli random variables.
P∞(3-i) It is clear from Kolmogorov’s two series theorem that the series X ,
Xn
n=1 3n is convergent almost surely. To find its distribution, the crucial obser-
vation is that X takes values in the Cantor set C! Let G : [0, 1] → [0, 1] be the
Cantor function, and extend it to a distribution function on R by setting G(x) , 0
if x < 0 and G(x) , 1 if x > 1. From the explicit expression of G on the Cantor
set, we know that

X Xn /2
G(X) = n
.
n=1
2
Observe that {Xn /2 : n > 1} is an i.i.d. sequence of standard Bernoulli random
variables. Therefore, G(X) is a binary expansion with i.i.d. Bernoulli digits.
d
Equivalently, we have G(X) = U (0, 1).

2
To proceed further, we need to recall a fact from elementary probability theory.
d
If F is a distribution function and U = U (0, 1), then F −1 (U ) has distribution
function F , where F −1 is the generalised inverse of F defined by

F −1 (a) , inf{x : F (x) > a}, a ∈ (0, 1).

This generalised notion of inverse is only needed when F fails to be strictly in-
creasing. Returning to our problem, it is tempting to conclude directly that the
distribution function of X is the Cantor function G, from the seemingly apparent
d
relations X = G−1 (G(X)) and G(X) = U [0, 1]. Some technical care is needed
here, because it is not true in general that G−1 G(x) = x! By definition it is clear
that G−1 G(x) 6 x, however, if G is constant on an open interval I and x ∈ I,
then G−1 G(x) < x (why?).
To get around the above issue, in the construction of the Cantor set C, for
each n > 1 let In denote the union of the open intervals being removed at step n.
Let D , ∪∞ n=1 ∂In be the collection of all the endpoints of the intervals in the In ’s.
We first claim that, for any x ∈ C\D we have G−1 G(x) = x. Indeed, if x ∈ C
is not an endpoint, by the construction of G we know that there is a sequence
xn ↑ x such that G(xn ) < G(x). This ensures that G−1 G(x) > x (why?) and thus
G−1 G(x) = x for such x. The next observation is that,

P(X ∈ D) = 0.

This follows from the fact that X ∈ D if and only if “Xn = 0 eventually or Xn = 2
eventually”, both of which having zero probability. Therefore, with probability one
we have X ∈ C\D. In other words,

X = G−1 G(X) with probability one.

Since we know that the distribution function of G−1 G(X) is G, we conclude that
G is also the distribution function of X.
(3-ii) The characteristic function

Y t
cos
n=1
22n

corresponds to the distribution of



X Xn
X=
n=1
22n

3
where {Xn : n >Q 1} are i.i.d. symmetric Bernoulli random variables. The chara-
teristic function ∞ t
n=1 cos 22n−1 corresponds to the distribution of 2X. Therefore,
it is enough to consider X only.
The idea is to observe that X has a Cantor type distribution. To be more
precise, let Yn , 23 (Xn + 1). The distribution of Yn is given by

1
P(Yn = 0) = P(Yn = 3) = .
2
We can then write Xn = 32 Yn − 1 and thus
∞ ∞
X 1 2  2X Yn 1
X= 2n
Yn − 1 = − .
n=1
2 3 3 n=1 4n 3

It is sufficient to show that the random series ∞ Yn


P
n=1 4n has a singular distribution.
The reason we express X in terms of Yn is that, in this form the series is exactly
an expansion of P a random number in base 4. Since Yn = 0 or 3, the distribu-
tion function of ∞ Yn
n=1 4n is of Cantor type, and thus has zero derivative almost
everywhere (with respect to the Lebesgue measure). We explain this point more
precisely in what follows.
We first consider the following procedure of constructing a Cantor-type subset
C of [0, 1] under base 4 which is relevant to us. We obtain this (closed) subset
C as follows. In the first step, we divide [0, 1] into four equal subintervals of
length 1/4, and remove the open interval (1/4, 3/4). In the next step, we apply
the same removal procedure to each of the remaining closed subinterval from the
previous step. If we continue the removal procedure sequentially, what is left is
the definition of C. If we let Cn be the closed subset obtained after the n-th step
of removal, then C = ∩∞ n=1 Cn . This is exactly the construction of the standard
Cantor set but we now perform it under base 4 instead of 3. Note that for any
given
x = 0.x1 x2 x3 · · · (xn = 0, 1, 2, 3),
x ∈ C if and only if xn = 0 or 3 for every i. This is clear from the construction
of C, since being in C means that at each step x falls in the first (xn = 0) or the
last (xn = 3) subinterval so that it is not removed.
The corresponding Cantor function G(x) can be described easily, as the ana-
logue of the base 3 case. To be precise, let In denote the union of open intervals
removed at the n-th step (each of length 2 × 41n ). There are a total of 2n−1 such
intervals in the formation of In . The function G(x) is constant on each of these

4
intervals, and in the natural increasing order the values of G(x) on these intervals
are given by
1 3 5 2n − 1
, , , · · · ,
2n 2n 2n 2n
respectively. This clearly specifies the function G on [0, 1]\C = ∪∞ n=1 In (i.e. on
all the open intervals being removed). Exactly as the base 3 case, this function G
extends to a continuous function on [0, 1]. For elements x ∈ C, we have

X xn /3
G(x) = , (E1)
n=1
2n

where
x = 0.x1 x2 x3 · · ·
denotes the expansion of x under base 4. The function G satisfies G(0) = 0,
G(1) = 1, and it is non-decreasing on [0, 1]. More importantly, note that the
Lebesgue measure of the union of all the open sub-intervals being removed, de-
noted as I , ∪∞n=1 In , is equal to one. This is seen from the following simple
calculation: ∞
X 2
|I| = 2n−1 × n = 1.
n=1
4
Since G is constant on each of the open sub-intervals being removed, we see that
G0 (x) = 0 on I and thus G0 = 0 for almost every points in [0, 1].
We extend G to a distribution function by letting G(x) = 0 for x < 0 and
G(x) = 1 for x >P 1. To complete the proof, it remains to see that the distribution
function of Y , ∞ Yn
n=1 4n is G(x). Since Yn = 0 or 3, we know that Y takes values
in C. The fact that G is the distribution function of Y follows from exactly the
same argument as in the base 3 case. As a consequence, Y and thus the original
X is a singular random variable.
Alternative solution. There is a more direct and neater solution that is es-
sentially due to two students. They looked at expansions in different contexts,
but the essence of their approaches was quite similar. I now summarise their
approaches based on my understanding.
We wish to show that the random series X = ∞ Xn
P
n=1 22n−1 is a singular random
variable, where {Xn : n > 1} is an i.i.d. sequence of Bernoulli random variables.
In terms of binary expansion, X has the expansion

X = 0.X1 0X2 0X3 0X4 0 · · · .

5
From the above expansion, it is clear that the distribution function of X is
continuous. Indeed, given a generic real number x ∈ (0, 1), the event {X = x}
uniquely specifies the values of all the Xn0 s since the expansion is unique (I leave
the reader to think about how to deal with the situation at the points where the
binary expansion is not unique). But any particular specification of the values of
the sequence {Xn } has zero probability. Therefore, P(X = x) = 0 for every x,
showing that the distribution function of X is continuous.
Next, we show that the distribution function of X has zero derivative almost
everywhere. Since X ∈ [0, 1], it is enough to restrict our attention to the unit
interval. The crucial observation is that (partly inspired by the construction of
the Cantor set), X takes values in the complement of a countable union of disjoint
intervals. To figure out what these intervals (to be removed) are, we look at the
expansion grouped in the following manner:

X = (X1 0)(X2 0)(X3 0)(X4 0) · · · .

The first two digits (X1 0) tell us that X cannot take values in

I1 , (1/4, 1/2) ∪ (3/4, 1).

Similarly, the next two digits (X2 0) tells us that X cannot take values in

I2 , (1/16, 2/16) ∪ (3/16, 4/16) ∪ (9/16, 10/16) ∪ (11/16, 12/16).

Inductively, X cannot take values in I , ∪∞ n


n=1 In , where In consists of 2 intervals
1
of length 22n . It is plain to check that the total Lebesgue measure of I is one.
In addition, not being able to take values in a particular interval clearly implies
that the distribution function is constant and thus has zero derivative on that
interval. As a consequence, we conclude that the distribution function of X has
zero derivative almost everywhere.
Problem 2. Let X, Y be independent and identically distributed, both having
zero mean and unit variance. Suppose that X + Y and X − Y are independent.
Let f (t) be the characteristic function of X.
(1) Show that
f (2t) = f (t)3 f (−t),
and use this to show that f (t) 6= 0 for all t ∈ R.
(2) Define ρ(t) , ff(−t)
(t)
. Show that

ρ(2t) = ρ(t)2 .

6
(3) Use Part (2) to show that ρ(t) = 1 for all t and hence conclude that

f (2t) = f (t)4 .
2 /2 d d
(4) Use Part (3) to show that f (t) = e−t , and thus conclude that X = Y =
N (0, 1).
Solution. (1) Since X + Y and X − Y are independent, and 2X = (X + Y ) +
(X − Y ), we have

f (2t) = f2X (t)


= fX+Y (t) · fX−Y (t)
= f (t) · f (t) · f (t) · f (−t)
= f (t)3 f (−t).

If f (t0 ) = 0 for some t0 , using the above relation we know that at least one
of f (t0 /2) or f (−t0 /2) is zero. By iterating this argument, we find a sequence
tn → 0 such that f (tn ) = 0. But this is impossible since f (0) = 1 6= 0 and f (t) is
continuous at t = 0.
(2) Since f (t) 6= 0 for all t, the function ρ(t) , ff(−t)
(t)
is well defined. In
addition, according to Part (i) we have

f (2t) f (t)3 f (−t) f (t)2


ρ(2t) = = = = ρ(t)2 .
f (−2t) f (−t)3 f (t) f (−t)2

(3) Since X has zero mean, we know that

f (t) = 1 + o(t) as t → 0.

It is helpful to call the above o(t) some function ε(t). Then we also have

f (−t) = 1 + ε(−t).

It follows that
f (t) 1 + ε(t) ε(t) − ε(−t)
ρ(t) = = =1+ .
f (−t) 1 + ε(−t) 1 + ε(−t)
ε(t)−ε(−t)
Note that the function 1+ε(−t)
is also o(t) as t → 0. Therefore,

ρ(t) = 1 + o(t) as t → 0.

7
Now using Part (ii), we have
t 2 t 4 t 2n
ρ(t) = ρ =ρ = ··· = ρ n (for all n)
2 4 2
t 2n t 
= 1+o n = exp 2n × log 1 + o n
2 2
→ 1 (as n → ∞).

Therefore, we conclude that ρ(t) = 1, or equivalently f (t) = f (−t). It follows


from Part (i) that
f (2t) = f (t)4 for all t.
(4) Since X has mean zero and unit variance, we can write
1
f (t) = 1 − t2 + o(t2 ) as t → 0.
2
According to Part (iii), for fixed t ∈ R we have
t 4 t 42 t 4n
f (t) = f =f = ··· = f (for all n)
2 4 2n
1 t2 t2 4n
= 1− + o
2 4n 4n
1 t2 t2 
= exp 4n × log 1 − +o
2 4n 4n
2
→ e−t /2 (as n → ∞).
2 /2 d d
Therefore, f (t) = e−t and thus X = Y = N (0, 1).
Alternative solution. There is a more elegant solution that is due to several
students independently. I summarise the essence of this method in what follows.
Recall that we have obtained the relation

f (2t) = f (t)4 .

By differentiation, we get
f 0 (2t) = 2f (t)3 f 0 (t).
In particular, if we define
f 0 (t)
g(t) , ,
f (t)
then we have
g(2t) = 2g(t).

8
As a consequence, for fixed t we have
t t t
g(t) = 2g( ) = 22 g( ) = · · · = 2k g( k )
2 4 2
for all k > 1.
Since X has mean zero and unit variance, we have f (0) = 1, f 0 (0) = 0 and
f 00 (0) = −1. Therefore, g(0) = 0 and
f 00 (0)f (0) − f 0 (0)2
g 0 (0) = = −1.
f (0)2
On the other hand, we also have
g(t/2k ) − g(0) 2k g(t/2k ) g(t) g(t)
g 0 (0) = lim = lim = lim = .
k→∞ t/2k k→∞ t k→∞ t t
As a result, we conclude that
g(t)
= −1, or g(t) = −t.
t
By the definition of g(t), we have
f 0 (t) = −tf (t).
2 /2
Using f (0) = 1, the unique solution to the above ODE is given by f (t) = e−t .
0
Remark. Based on the relation g(2t) = 2g(t), a few students observed that g (2t) =
g 0 (t), and thus
g 0 (t) = g 0 (t/2) = g 0 (t/4) · · · = g 0 (t/2n ) → g 0 (0) = −1.
Therefore, g(t) = −t.
Problem 3. Let {Xn : n > 1} be an independent sequence of random variables.
(1) In the following two cases, show that {Xn } satisfies Lyapunov’s central limit
theorem.
d
(1-i)Xn = U [−n, n] (uniform distribution on [−n, n]).
(1-ii) (
1
±nα , with probability 6n2(α−1) each;
Xn = 1
0, with probability 1 − 3n2(α−1) .
d
(2) Suppose that Xn = N (0, 2n−1 ). Show that Lindeberg’s condition fails for {Xn },
but the central limit theorem still holds:
S
p n → N (0, 1), weakly.
Var[Sn ]

9
Solution. (1-i) Note that Xn has mean zero. For the second moment, we have
Z n
2 1 2 n2
E[Xn ] = x dx = ,
2n −n 3

and thus n n
X 1 X 2 n(n + 1)(2n + 1)
Σ2n = Var[Xk ] = k =
k=1
3 k=1 18

and thus Σn = O(n3/2 ). Similarly, for the third moment we have


Z n
3 1 n3
E[|Xn | ] = |x|3 dx =
2n −n 4

and n n
X 1 X 3 n4
3
Γn , E[|Xk | ] = k 6 = O(n4 ).
k=1
4 k=1
4
It follows that
Γn n4 
= O = O(n−1/2 ) → 0
Σ3n n9/2
as n → ∞. Therefore, Lyapunov’s central limit theorem holds.
(1-ii) Note that E[Xn ] = 0. For the second and third moments, we have

n2 1
E[Xn2 ] = , E[|Xn |3 ] = nα+2 .
3 3
Therefore, Σn = O(n3/2 ) and
n n
X 1 X α+2 1 α+3
Γn , E[|Xk |3 ] = k 6 n = O(nα+3 ).
k=1
3 k=1 3

It follows that
Γn nα+3  1 
3
= O 9/2
= O 3/2−α .
Σn n n
Since α < 3/2, we see that ΣΓn3 → 0 and thus Lyapunov’s central limit theorem
n
holds.
(2) We have seen that Lindeberg’s condition implies
max16m6n σm
rn , →0
Σn

10
as n → ∞. Therefore, it is enough to show that rn does not converge to zero in
2
this example. By the assumption, we have σm = 2m−1 and
n
X
Σ2n = 2m−1 = 2n − 1.
m=1

It follows that
2n−1 1
rn2 =n
→ 6= 1.
2 −1 2
As a consequence, the sequence {Xn } does not satisfy Lindeberg’s condition.
However, since each Xn is Gaussian and they are independent, we know that
d
√ Sn = N (0, 1). Therefore, the central limit theorem holds trivially for the
Var[Sn ]
sequence {Xn }.

Problem 4. A little girl is painting on a blank paper. Suppose that there is a total
number of N available colors. At each time she selects one color randomly and
paints on the paper. It is possible that she picks a color that she has already used
before. Different selections are assumed to be independent. Let T be the number
of selections until the little girl picks a color that she has obtained previously.
(i) For each k ∈ N, what is P(T > k)?
(ii) By using the formula

X
E[T ] = P(T > k)
k=0

which is true for any non-negative integer-valued random variables, show that
N
N! X Nj
E[T ] = N .
N j=0 j!

(iii) By using the central limit theorem in a suitable context, show that
N
−N
X Nj 1
lim e = .
N →∞
j=0
j! 2

(iv) Let φ(N ) = E[T ] (as a function of N ). Conclude from Part (iii) as well as
Stirling’s formula that r
φ(N ) π
lim √ = .
N →∞ N 2

11
q
This result shows that when N is large, on average it takes about πN 2
of selec-
tions to see a first repeated choice. It also gives a probabilistic way of simulating
π.
Solution. (i) The possible values of T are {2, 3, · · · , N + 1}. For 1 6 k 6 N + 1,
the event {T > k} means that there are no repeated choices among the first k
selections. Therefore, we have
N · (N − 1) · · · (N − k + 1)
P(T > k) = .
Nk
Note that P(T > k) = 1 if k = 0, 1 and P(T > k) = 0 if k > N + 1.
(ii) By using Part (i) and the given summation formula for E[T ], we have
N
X N · (N − 1) · · · (N − k + 1)
E[T ] = 1 +
k=1
Nk
N
X N!
=
k=0
(N − k)!N k
N
X N!
= (change of index: j = N − k)
j=0
j!N N −j
N
N! X Nj
= N .
N j=0 j!

(iii) Let {Xn : n > 1} be an independent sequence of Poisson(1)-distributed


random variables. Then Sn = X1 + · · · + Xn is a Poisson(n)-distributed random
variable. Note that E[Sn ] = Var[Sn ] = n. According to the central limit theorem,
we have
SN − N  1
P(SN 6 N ) = P √ 6 0 → Φ(0) = ,
N 2
where Φ(x) is the distribution function of N (0, 1). On the other hand, we also
have
N
−N
X Nj
P(SN 6 N ) = e .
j=0
j!
Therefore, the result follows.
(iv) Recall from Stirling’s formula that
√ N N
N ! ∼ 2πN
e
12
aN
where the notation aN ∼ bN means bN
→ 1 as N → ∞. It follows from Part (iii)
that
N N
N! X Nj N !eN −N X N j
φ(N ) = = · e
N N j=0 j! NN j=0
j!
√ N N eN 1
∼ 2πN · · N ·
r e N 2
πN
∼ .
2

13

You might also like