Characteristic

Lecture 8: Characteristic Functions
1 of 9
Course:
Theory of Probability I
Term:
Fall 2013
Instructor: Gordan Zitkovic
Lecture 8
Characteristic Functions
First properties
A characteristic function is simply the Fourier transform, in probabilistic language. Since we will be integrating complex-valued functions,
we define (both integrals on the right need to exist)
Z
f d =
< f d + i
= f d,
where < f and = f denote the real and the imaginary part of a function
f : R C. The reader will easily figure out which properties of the
integral transfer from the real case.
Definition 8.1. The characteristic function of a probability measure
on B(R) is the function : R C given by
(t) =
eitx (dx )
When we speak of the characteristic function X of a random variable X, we have the characteristic function X of its distribution X
in mind. Note, moreover, that
X (t) = E[eitX ].
While difficult to visualize, characteristic functions can be used to
learn a lot about the random variables they correspond to. We start
with some properties which follow directly from the definition:
Proposition 8.2. Let X, Y and { Xn }nN be a random variables.
1. X (0) = 1 and | X (t)| 1, for all t.
2. X (t) = X (t), where bar denotes complex conjugation.
3. X is uniformly continuous.
4. If X and Y are independent, then X +Y = X Y .
Last Updated: December 8, 2013
2 of 9
5. For all t1 < t2 < < tn , the matrix A = ( aij )1i,jn given by
a jk = X (t j tk ),
is Hermitian and positive semi-definite, i.e., A = A and T A 0, for
any Cn ,
D
6. If Xn X, then Xn (t) X (t), for each t R.
Note: We do not prove (or use) it in

these notes, but it can be shown that
a function : R C, continuous at
the origin with (0) = 1 is a characteristic function of some probability measure on B(R) if and only if it is positive semidefinite, i.e., if it satisfies part
5. of Proposition 8.2. This is known as
Bochners theorem.
Proof.
1. Immediate.
2. eitx = eitx .
R

3. We have | X (t) X (s)| = (eitx eisx ) (dx ) h(t s), where

iux

R iux
h(u) = e 1 (dx ). Since e 1 2, dominated convergence theorem implies that limu0 h(u) = 0, and, so, X is uniformly continuous.
4. Independence of X and Y implies the independence of exp(itX ) and
exp(itY ). Therefore,
X +Y (t) = E[eit(X +Y ) ] = E[eitX eitY ] = E[eitX ]E[eitY ] = X (t) Y (t).
5. The matrix A is Hermitian by 2. above. To see that it is positive
semidefinite, note that a jk = E[eit j X eitk X ], and so
!
!
n
n n
n
k eitk X
j k a jk = E j eitj X
j =1 k =1
j =1
k =1
= E[| j eit j X |2 ] 0.
j =1
6. The functions x 7 cos(tx ) and x 7 sin(tx ) and bounded and

continuous so it suffices to apply the definition of weak convergence.
Here is a simple problem you can use to test your understanding
of the definitions:
Problem 8.1. Let and be two probability measures on B(R), and
let and be their characteristic functions. Show that Parsevals
Identity holds:
Z
R
eits (t) (dt) =
Z
R
(t s) (dt), for all s R.
Our next result shows can be recovered from its characteristic

function :
3 of 9
Theorem 8.3 (Inversion theorem). Let be a probability measure on B(R),

and let = be its characteristic function. Then, for a < b R, we have
Z T
1
lim
2 T
T
(( a, b)) + 21 ({ a, b}) =
eita eitb
(t) dt.
it
(8.1)
Proof. We start by picking a < b and noting that

eita eitb
=
it
Z b
a
eity dy,
so that, by Fubinis theorem, the integral in (8.1) is well-defined:

F ( a, b, T ) =
Z
[ T,T ][ a,b]
where
F ( a, b, T ) =
exp(ity) (t) dy dt,
Z T ita
e
eitb
it
(t) dt.
Another use of Fubinis theorem yields:

F ( a, b, T ) =
exp(ity) exp(itx ) dy dt (dx )

=
exp(it(y x )) dy dt (dx )
R
[ T,T ][ a,b]

Z
Z

it( a x )
it(b x )
1
e
dt (dx ).
=
it e
[ T,T ][ a,b]R
Z Z
[ T,T ]
Set
Z T
f ( a, b, T ) =
1 it( a x )
it ( e
eit(b x) ) dt and K ( T, c) =
Z T
sin(ct)
0
dt,
and note that, since cos is an even and sin an odd function, we have
Note: The integral
Z T
sin(( a x )t)
Z T
f ( a, b, T; x ) = 2
sin((b x )t)
t
dt
1
it
exp(it( a x )) dt
is not defined; we really need to work

with the full f ( a, b, T; x ) to get the right
cancellation.
= 2K ( T; a x ) 2K ( T; b x ).
Since
K ( T; c) =
R
T
sin(ct)
ct d ( ct )
R cT
0
sin(s)
s
ds = K (cT; 1),
c>0
0,
c=0
K (|c| T; 1),
c < 0,
(8.2)
Problem 5.11 implies that
lim K ( T; c) =
2,
c > 0,
0,
c = 0,
2 ,
c < 0.
and so
lim f ( a, b, T; x ) =
0,
4 of 9
x [ a, b]c ,
x = a or x = b,
2,
a < x < b.
Observe first that the function T 7 K ( T; 1) is continuous on [0, ) and

has a finite limit as T so that supT 0 |K ( T; 1)| < . Furthermore,
(8.2) implies that |K ( T; c)| supT 0 K ( T; 1) for any c R and T 0
so that
sup{| f ( a, b, T; x )| : x R, T 0} < .
Therefore, we can use the dominated convergence theorem to get that
1
F ( a, b, T; x )
T 2
lim
1
T 2
= lim
=
1
2
f ( a, b, T; x ) (dx )
lim f ( a, b, T; x ) ( x )
= 21 ({ a}) + (( a, b)) + 12 ({b}).

Corollary 8.4. For probability measures 1 and 2 on B(R), the equality
1 = 2 implies that 1 = 2 .
Proof. By Theorem 8.3, we have 1 (( a, b)) = 2 (( a, b)) for all a, b C
where C is the set of all x R such that 1 ({ x }) = 2 ({ x }) = 0. Since
C c is at most countable, it is straightforward to see that the family
{( a, b) : a, b C } of intervals is a -system which generates B(R).

R
Corollary 8.5. Suppose that R (t) dt < . Then and
bounded and continuous function given by
d
1
= f , where f ( x ) =
d
2
Z
R
d
d
is a
eitx (t) dt for x R.

Proof. Since is integrable and eitx = 1, f is well defined. For
a < b we have
Z b
a
b
1
f ( x ) dx =
eitx (t) dt dx
2 a R
Z b

Z
1
=
(t)
eitx dx dt
2 R
a
eita eitb
(t) dt
it
R
Z T ita
1
e
eitb
= lim
(t) dt
it
T 2 T
1
2
(8.2)
= (( a, b)) + 21 ({ a, b}),
by Theorem 8.3, where the use of Fubinis theorem above is justified by
the fact that the function (t, x ) 7 eitx (t) is integrable on [ a, b] R,
5 of 9
for all a < b. For a, b such that ({ a}) = ({b}) = 0, the equation
Rb
(8.2) implies that (( a, b)) = a f ( x ) dx. The claim now follows by the
-theorem.
Example 8.6. Here is a list of some common distributions and the
corresponding characteristic functions:
1. Continuous distributions.
Density f X ( x )
Name
Parameters
Uniform
a<b
1
b a
Symmetric Uniform
a>0
1
2a
Normal
R, > 0
Exponential
>0
Double Exponential
>0
Cauchy
R, > 0
1
22
eita eitb
it(b a)
1[ a,b] ( x )
sin( at)
at
1[ a,a] ( x )
exp(
( x )2
22
exp(x )1[0,) ( x )
1
2
Ch. function X (t)
exp( | x |)
(2 +( x )2 )
exp(it 21 2 t2 )
it
2
2 + t2
exp(it |t|)
2. Discrete distributions.
Name
Parameters
Distribution X ,
Ch. function X (t)
Dirac
cR
exp(itc)
Biased Coin-toss
p (0, 1)
p1 + (1 p)1
cos(t) + (2p 1)i sin(t)
Geometric
p (0, 1)
nN0 pn (1 p)n
1 p
1eit p
10
Poisson
>0
n
n!
exp((eit 1))
nN0 e
n , n N0
3. A singular distribution.
11
Name
Ch. function X (t)
Cantor
t
eit/2
k =1 cos( 3k )
Tail behavior
We continue by describing several methods one can use to extract useful information about the tails of the underlying probability distribution from a characteristic function.
Proposition 8.7. Let X be a random variable. If E[| X |n ] < , then
dn
(t) exists for all t and
(dt)n X
dn
(dt)n
X (t) = E[eitX (iX )n ].
In particular
n
d
E[ X n ] = (i )n (dt
(0).
)n X
6 of 9
Proof. We give the proof in the case n = 1 and leave the general case
to the reader:
(h) (0)
h
h 0
lim
= lim
h 0 R
eihx 1
h
(dx ) =
eihx 1
h
R h 0
lim
(dx ) =
Z
R
ix (dx ),
where the passage of the limit under the integral sign is justified by
the dominated convergence theorem which, in turn, can be used since
Z
ihx
e 1
| x | (dx ) = E[| X |] < .
h | x | , and
R
Remark 8.8.
1. It can be shown that for n even, the existence of
dn
(dt)n
X (0) (in the ap-
propriate sense) implies the finiteness of the n-th moment E[| X |n ].

2. When n is odd, it can happen that
- see Problem 8.6.
dn
(dt)n
X (0) exists, but E[| X |n ] =
Finer estimates of the tails of a probability distribution can be obtained by finer analysis of the behavior of around 0:
Proposition 8.9. Let be a probability measure on B(R) and let =
be its characteristic function. Then, for > 0 we have
([ 2 , 2 ]c )
(1 (t)) dt.
Proof. Let X be a random variable with distribution . We start by

using Fubinis theorem to get
1
2
(1 (t)) dt =
1
2 E[
= 1 E[
Z
0
(1 eitX ) dt]
(1 cos(tX )) dt] = E[1
sin( x )
sin(X )
X ].
sin( x )
It remains to observe that 1 x 0 and 1 x 1 |1x| for all

x. Therefore, if we use the first inequality on [2, 2] and the second
sin( x )
one on [2, 2]c , we get 1 x 12 1{| x|>2} so that
1
2
(1 (t)) dt 12 P[|X | > 2] = 12 ([ 2 , 2 ]c ).
Problem 8.2. Use the inequality of Proposition 8.9 to show that if

R
(t) = 1 + O(|t| ) for some > 0, then R | x | (dx ) < , for all
R
< . Give an example where R | x | (dx ) = .
Note: f (t) = g(t) + O(h(t)) means

that, for some > 0, we have
Problem 8.3 (Riemann-Lebesgue theorem). Suppose that . Show

that
lim (t) = lim (t) = 0.
Hint: Use (and prove) the fact that f

L1+ (R) can be approximated in L1 (R)
by a function of the form nk=1 k 1[ak ,bk ] .
sup
|t|
| f (t) g(t)|
h(t)
< .
7 of 9
The continuity theorem

Theorem 8.10 (Continuity theorem). Let {n }nN be a sequence of probability distributions on B(R), and let { n }nN be the sequence of their characteristic functions. Suppose that there exists a function : R C such
that
1. n (t) (t), for all t R, and
2. is continuous at t = 0.
Then, is the characteristic function of a probability measure on B(R) and
w
n .
Proof. We start by showing that the continuity of the limit implies
tightness of {n }nN . Given > 0 there exists > 0 such that 1
(t) /2 for |t| . By the dominated convergence theorem we
have
lim sup n ([ 2 , 2 ]c ) lim sup 1
n
n
Z
(1 n (t)) dt
(1 (t)) dt .
By taking an even smaller 0 > 0, we can guarantee that

sup n ([ 20 , 20 ]c ) ,
n N
which, together with the arbitrariness of > 0 implies that {n }nN is

tight.
Let {nk }kN be a convergent subsequence of {n }nN , and let
be its limit. Since nk , we conclude that is the characteristic
function of . It remains to show that the whole sequence converges
to weakly. This follows, however, directly from Problem 7.4, since
any convergent subsequence {nk }kN has the same limit .
Problem 8.4. Let be a characteristic function of some probability
measure on B(R). Show that (t) = e (t)1 is also a characteristic
function of some probability measure on B(R).
Additional Problems
Problem 8.5 (Atoms from the characteristic function). Let be a probability measure on B(R), and let = be its characteristic function.
R T ita
1
(t) dt.
1. Show that ({ a}) = limT 2T
T e
2. Show that if limt | (t)| = limt | (t)| = 0, then has no
atoms.
8 of 9
3. Show that converse of (2) is false.

Problem 8.6 (Existence of 0X (0) does not imply that X L1 ). Let X
be a random variable which takes values in Z \ {2, 1, 0, 1, 2} with
P[ X = k ] = P[ X = k ] =
where C =
1
1
1
2 ( k 3 k2 log(k ) )
C
,
k2 log(k )
Hint: Prove that | (tn )| = 1 along a suitably chosen sequence tn , where

is the characteristic function of the Cantor distribution.
for k = 3, 4, . . . ,
(0, ). Show that 0X (0) = 0, but
X 6 L1 . Hint: Argue that, in order to establish that 0X (0) = 0, it is enough to show

that
cos(hk )1
lim 1
2
h0 h k3 k log(k)
= 0.
Then split the sum at k close to 2/h and use (and prove) the inequality |cos( x ) 1|
min( x2 /2, x ). Bounding sums by integrals may help, too.
Problem 8.7 (Multivariate characteristic functions). Let X = ( X1 , . . . , Xn )

be a random vector. The characteristic function = X : Rn C is
given by
(t1 , t2 , . . . , tn ) = E[exp(i
tk Xk )].
k =1
We will also use the shortcut t for (t1 , . . . , tn ) and t X for the random
variable nk=1 tk Xk . Prove the following statements
1. Random variables X and Y are independent if and only if
(X,Y ) (t1 , t2 ) = X (t1 ) Y (t2 ) for all t1 , t2 R.
2. Random vectors X 1 and X 2 have the same distribution if and only
if random variables t X 1 and t X 2 have the same distribution for
all t Rn . (This fact is known as Walds device.)
Note: Take for granted the following

statement (the proof of which is similar
to the proof of the 1-dimensional case):
Suppose that X 1 and X 2 are random vectors
with X 1 (t ) = X 2 (t ) for all t Rn . Then
X 1 and X 2 have the same distribution, i.e.
X 1 = X 2 .
An n-dimensional random vector X is said to be Gaussian (or, to

have the multivariate normal distribution) if there exists a vector Rn
and a symmetric positive semi-definite matrix Rnn such that
X (t ) = exp(i t 21 t t ),
where t is interpreted as a column vector, and () is transposition.
This is denoted as X N (, ). X is said to be non-degenerate if is
positive definite.
3. Show that a random vector X is Gaussian, if and only if the random
vector t X is normally distributed (with some mean and variance)
for each t Rn .
Note: Be careful, nothing in the second

statement tells you what the mean and
variance of t X are.
4. Let X = ( X1 , X2 , . . . , Xn ) be a Gaussian random vector. Show that

Xk and Xl , k 6= l, are independent if and only if they are uncorrelated.
9 of 9
5. Construct a random vector ( X, Y ) such that both X and Y are normally distributed, but that X = ( X, Y ) is not Gaussian.
6. Let X = ( X1 , X2 , . . . , Xn ) be a random vector consisting of n independent random variables with Xi N (0, 1). Let Rnn
be a given positive semi-definite symmetric matrix, and Rn
a given vector. Show that there exists an affine transformation
T : Rn Rn such that the random vector T ( X ) is Gaussian with
T ( X ) N (, ).
7. Find a necessary and sufficient condition on and such that
the converse of the previous problem holds true: For a Gaussian
random vector X N (, ), there exists an affine transformation
T : Rn Rn such that T ( X ) has independent components with the
N (0, 1)-distribution (i.e. T ( X ) N (0, yI ), where yI is the identity
matrix).
Problem 8.8 (Slutskys Theorem). Let X, Y, { Xn }nN and {Yn }nN be
random variables defined on the same probability space, such that
D
Xn X and Yn Y.
(8.3)
Show that
D
1. It is not necessarily true that Xn + Yn X + Y. For that matter,

D
we do not necessarily have ( Xn , Yn ) ( X, Y ) (where the pairs are

considered as random elements in the metric space R2 ).
2. If, in addition to (8.3), there exists a constant c R such that P[Y =
D
c] = 1, show that g( Xn , Yn ) g( X, c), for any continuous function

g : R2 R.
Hint:
It is enough to show that

D
( Xn , Yn ) ( Xn , c). Use Problem 8.7).
Problem 8.9 (Convergence of a normal sequence).

1. Let { Xn }nN be a sequence of normally-distributed random variables converging weakly towards a random variable X. Show that
X must be a normal random variable itself.
a.s.
2. Let Xn be a sequence of normal random variables such that Xn X.

Lp
Show that Xn X for all p 1.
Hint: Use this fact: for a sequence

{n }nN of real numbers, the following two
statements are equivalent
(a) n R, and
(b) exp(itn ) exp(it), for all t.
You dont need to prove it, but feel free to try.

Characteristic

Uploaded by

Characteristic

Uploaded by

Lecture 8: Characteristic Functions

Lecture 8: Characteristic Functions

6. If Xn X, then Xn (t) X (t), for each t R.

Note: We do not prove (or use) it in

6. The functions x 7 cos(tx ) and x 7 sin(tx ) and bounded and

eits (t) (dt) =

(t s) (dt), for all s R.

Our next result shows can be recovered from its characteristic

Lecture 8: Characteristic Functions

Theorem 8.3 (Inversion theorem). Let be a probability measure on B(R),

Proof. We start by picking a < b and noting that

so that, by Fubinis theorem, the integral in (8.1) is well-defined:

exp(ity) (t) dy dt,

Another use of Fubinis theorem yields:

exp(ity) exp(itx ) dy dt (dx )

Note: The integral

is not defined; we really need to work

Problem 5.11 implies that

Last Updated: December 8, 2013

Lecture 8: Characteristic Functions

Observe first that the function T 7 K ( T; 1) is continuous on [0, ) and

= 21 ({ a}) + (( a, b)) + 12 ({b}).

eitx (t) dt for x R.

Lecture 8: Characteristic Functions

Ch. function X (t)

Ch. function X (t)

cos(t) + (2p 1)i sin(t)

Ch. function X (t)

X (t) = E[eitX (iX )n ].

Last Updated: December 8, 2013

Lecture 8: Characteristic Functions

X (0) (in the ap-

propriate sense) implies the finiteness of the n-th moment E[| X |n ].

X (0) exists, but E[| X |n ] =

Proof. Let X be a random variable with distribution . We start by

(1 cos(tX )) dt] = E[1

It remains to observe that 1 x 0 and 1 x 1 |1x| for all

(1 (t)) dt 12 P[|X | > 2] = 12 ([ 2 , 2 ]c ).

Problem 8.2. Use the inequality of Proposition 8.9 to show that if

Note: f (t) = g(t) + O(h(t)) means

Problem 8.3 (Riemann-Lebesgue theorem). Suppose that  . Show

Hint: Use (and prove) the fact that f

Last Updated: December 8, 2013

Lecture 8: Characteristic Functions

The continuity theorem

By taking an even smaller 0 > 0, we can guarantee that

which, together with the arbitrariness of > 0 implies that {n }nN is

Lecture 8: Characteristic Functions

3. Show that converse of (2) is false.

Hint: Prove that | (tn )| = 1 along a suitably chosen sequence tn , where

(0, ). Show that 0X (0) = 0, but

X 6 L1 . Hint: Argue that, in order to establish that 0X (0) = 0, it is enough to show

Problem 8.7 (Multivariate characteristic functions). Let X = ( X1 , . . . , Xn )

Note: Take for granted the following

An n-dimensional random vector X is said to be Gaussian (or, to

Note: Be careful, nothing in the second

4. Let X = ( X1 , X2 , . . . , Xn ) be a Gaussian random vector. Show that

Lecture 8: Characteristic Functions

1. It is not necessarily true that Xn + Yn X + Y. For that matter,

we do not necessarily have ( Xn , Yn ) ( X, Y ) (where the pairs are

c] = 1, show that g( Xn , Yn ) g( X, c), for any continuous function

It is enough to show that

( Xn , Yn ) ( Xn , c). Use Problem 8.7).

Problem 8.9 (Convergence of a normal sequence).

2. Let Xn be a sequence of normal random variables such that Xn X.

Show that Xn X for all p 1.

Hint: Use this fact: for a sequence

Last Updated: December 8, 2013

You might also like

Problem 8.3 (Riemann-Lebesgue theorem). Suppose that . Show