Central Limit Theorem: Scott Sheffield

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

18.

440: Lecture 31
Central limit theorem

Scott Sheffield

MIT

1
18.440 Lecture 31
Outline

Central limit theorem

Proving the central limit theorem

2
18.440 Lecture 31
Outline

Central limit theorem

Proving the central limit theorem

3
18.440 Lecture 31
Recall: DeMoivre-Laplace limit theorem
I Let XiPbe an i.i.d. sequence of random variables. Write
Sn = ni=1 Xn .
I Suppose each Xi is 1 with probability p and 0 with probability
q = 1 − p.
I DeMoivre-Laplace limit theorem:
Sn − np
lim P{a ≤ √ ≤ b} → Φ(b) − Φ(a).
n→∞ npq
I Here Φ(b) − Φ(a) = P{a ≤ Z ≤ b} when Z is a standard
normal random variable.
n −np
S√
npq describes “number of standard deviations that Sn is
I

above or below its mean”.


I Question: Does a similar statement hold if the Xi are i.i.d. but
have some other probability distribution?
I Central limit theorem: Yes, if they have finite variance.
4
18.440 Lecture 31
Example

I Say we roll 106 ordinary dice independently of each other.


P106
I Let Xi be the number on the ith die. Let X = i=1 Xi be the
total of the numbers rolled.
I What is E [X ]?
I What is Var[X ]?
I How about SD[X ]?
I What is the probability that X is less than a standard
deviations above its mean?
Ra 2
I Central limit theorem: should be about √12π −∞ e −x /2 dx.

5
18.440 Lecture 31
Example

I Suppose earthquakes in some region are a Poisson point


process with rate λ equal to 1 per year.
I Let X be the number of earthquakes that occur over a
ten-thousand year period. Should be a Poisson random
variable with rate 10000.
I What is E [X ]?
I What is Var[X ]?
I How about SD[X ]?
I What is the probability that X is less than a standard
deviations above its mean?
Ra 2
I Central limit theorem: should be about √12π −∞ e −x /2 dx.

6
18.440 Lecture 31
General statement

I Let Xi be an i.i.d. sequence of random variables with finite


mean µ and variance σ 2 .
Pn
I Write Sn = i=1 Xi . So E [Sn ] = nµ and Var[Sn ] = nσ 2 and

SD[Sn ] = σ n.
I √ n −nµ . Then Bn is the difference
Write Bn = X1 +X2 +...+X
σ n
between Sn and its expectation, measured in standard
deviation units.
I Central limit theorem:

lim P{a ≤ Bn ≤ b} → Φ(b) − Φ(a).


n→∞

7
18.440 Lecture 31
Outline

Central limit theorem

Proving the central limit theorem

8
18.440 Lecture 31
Outline

Central limit theorem

Proving the central limit theorem

9
18.440 Lecture 31
Recall: characteristic functions

I Let X be a random variable.


I The characteristic function of X is defined by
φ(t) = φX (t) := E [e itX ]. Like M(t) except with i thrown in.
I Recall that by definition e it = cos(t) + i sin(t).
I Characteristic functions are similar to moment generating
functions in some ways.
I For example, φX +Y = φX φY , just as MX +Y = MX MY , if X
and Y are independent.
I And φaX (t) = φX (at) just as MaX (t) = MX (at).
(m)
I And if X has an mth moment then E [X m ] = i m φX (0).
I Characteristic functions are well defined at all t for all random
variables X .

10
18.440 Lecture 31
Rephrasing the theorem

I Let X be a random variable and Xn a sequence of random


variables.
I Say Xn converge in distribution or converge in law to X if
limn→∞ FXn (x) = FX (x) at all x ∈ R at which FX is
continuous.
I Recall: the weak law of large numbers can be rephrased as the
statement that An = X1 +X2 +...+X
n
n
converges in law to µ (i.e.,
to the random variable that is equal to µ with probability one)
as n → ∞.
I The central limit theorem can be rephrased as the statement
√ n −nµ converges in law to a standard
that Bn = X1 +X2 +...+X
σ n
normal random variable as n → ∞.

11
18.440 Lecture 31
Continuity theorems

I Lévy’s continuity theorem (see Wikipedia): if

lim φXn (t) = φX (t)


n→∞

for all t, then Xn converge in law to X .


I By this theorem, we can prove the central limit theorem by
2
showing limn→∞ φBn (t) = e −t /2 for all t.
I Moment generating function continuity theorem: if
moment generating functions MXn (t) are defined for all t and
n and limn→∞ MXn (t) = MX (t) for all t, then Xn converge in
law to X .
I By this theorem, we can prove the central limit theorem by
2
showing limn→∞ MBn (t) = e t /2 for all t.

12
18.440 Lecture 31
Proof of central limit theorem with moment generating
functions
I Write Y = X σ−µ . Then Y has mean zero and variance 1.
I Write MY (t) = E [e tY ] and g (t) = log MY (t). So
MY (t) = e g (t) .
I We know g (0) = 0. Also MY0 (0) = E [Y ] = 0 and
MY00 (0) = E [Y 2 ] = Var[Y ] = 1.
I Chain rule: MY0 (0) = g 0 (0)e g (0) = g 0 (0) = 0 and
MY00 (0) = g 00 (0)e g (0) + g 0 (0)2 e g (0) = g 00 (0) = 1.
I So g is a nice function with g (0) = g 0 (0) = 0 and g 00 (0) = 1.
Taylor expansion: g (t) = t 2 /2 + o(t 2 ) for t near zero.
I Now Bn is √1n times the sum of n independent copies of Y .
√ n ng ( √t n )
I So MBn (t) = MY (t/ n) = e .
ng ( √t ) n( √t )2 /2 2 /2
I But e n ≈ e n = et , in sense that LHS tends to
2
e t /2 as n tends to infinity.
13
18.440 Lecture 31
Proof of central limit theorem with characteristic functions

I Moment generating function proof only applies if the moment


generating function of X exists.
I But the proof can be repeated almost verbatim using
characteristic functions instead of moment generating
functions.
I Then it applies for any X with finite variance.

14
18.440 Lecture 31
Almost verbatim: replace MY (t) with φY (t)

I Write φY (t) = E [e itY ] and g (t) = log φY (t). So


φY (t) = e g (t) .
I We know g (0) = 0. Also φ0Y (0) = iE [Y ] = 0 and
φ00Y (0) = i 2 E [Y 2 ] = −Var[Y ] = −1.
I Chain rule: φ0Y (0) = g 0 (0)e g (0) = g 0 (0) = 0 and
φ00Y (0) = g 00 (0)e g (0) + g 0 (0)2 e g (0) = g 00 (0) = −1.
I So g is a nice function with g (0) = g 0 (0) = 0 and
g 00 (0) = −1. Taylor expansion: g (t) = −t 2 /2 + o(t 2 ) for t
near zero.
I Now Bn is √1
times the sum of n independent copies of Y .
n
√ n ng ( √t n )
I So φBn (t) = φY (t/ n) = e .
ng ( √t ) −n( √t )2 /2 2 /2
I But e n ≈ e n = e −t , in sense that LHS tends
−t 2 /2
to e as n tends to infinity.
15
18.440 Lecture 31
Perspective

I The central limit theorem is actually fairly robust. Variants of


the theorem still apply if you allow the Xi not to be identically
distributed, or not to be completely independent.
I We won’t formulate these variants precisely in this course.
I But, roughly speaking, if you have a lot of little random terms
that are “mostly independent” — and no single term
contributes more than a “small fraction” of the total sum —
then the total sum should be “approximately” normal.
I Example: if height is determined by lots of little mostly
independent factors, then people’s heights should be normally
distributed.
I Not quite true... certain factors by themselves can cause a
person to be a whole lot shorter or taller. Also, individual
factors not really independent of each other.
I Kind of true for homogenous population, ignoring outliers.
16
18.440 Lecture 31
MIT OpenCourseWare
http://ocw.mit.edu

18.440 Probability and Random Variables


Spring 2014

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like