Notes04 PDF
Notes04 PDF
Notes04 PDF
Contents
1 Continuous Random Variables 1
1.1 Defining the Probability Density Function . . . . . . . . . . . . . . . . . . . 1
1.2 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Percentiles and Medians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Normal Distribution 6
2.1 The Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . 9
2.2 zα Values and Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Approximating Distributions of Discrete Random Variables . . . . . . . . . . 11
2.3.1 Example: Approximating the Binomial Distribution . . . . . . . . . . 11
1
Thursday 16 December 2010
p(x) = P (X = x) (1.1)
2
Note that for a discrete random variable, the cdf is constant for a range of values, and then
jumps when we reach one of the possible values for the rv.
For a continuous rv X, we know that the probability of X equalling exactly any specific
x is zero, so clearly p(x) = P (X = x) doesn’t make any sense. However, F (x) = P (X ≤ x)
is still a perfectly good definition. Now, though, F (x) should be a continuous function with
no “jumps”, since a jump corresponds to a finite probability for some specific value:
Note that in both cases, if xmin and xmax are the minimum and maximum possible values of
3
X, then F (x) = 0 when x < xmin , and F (x) = 1 when x ≥ xmax . We can also write
F (−∞) = 0 (1.3a)
F (∞) = 1 (1.3b)
Just as before, we can calculate the probability for X to lie in some finite interval from
a to b, using the fact that
(X ≤ b) = (X ≤ a) ∪ (a < X ≤ b) (1.4)
We can’t talk about the probability for X to be exactly some value x, but we can consider
the probability for X to lie in some little interval of width ∆x centered at x:
We know that if we make ∆x smaller and smaller, the difference between F (x + ∆x/2) and
F (x − ∆x/2) has to get smaller and smaller. If we define
4
If we want to calculate the probability for X to lie in some interval, we can use the pdf
f (x) or the cdf F (x). If the interval is small enough that the cdf F (x) can be approximated
as a straight line, the change in F (x) over that interval is approximately the width of the
interval times the derivative f (x) = F 0 (x):
where
xi+i − xi = ∆x (1.11)
and
x1 − ∆x/2 = a (1.12a)
xN + ∆x/2 = b (1.12b)
(1.12c)
5
Of course, this is also a consequence of the Second Fundamental Theorem of Calculus:
Z b
0
if f (x) = F (x) then f (x) dx = F (b) − F (a) (1.14)
a
P
The normalization condition, which for a discrete random variable is x p(x) = 1, be-
comes for a continuous random variable
Z ∞
f (x)dx = 1 (1.15)
−∞
In the case of a continuous random variable, we can use the same construction as before to
divide the whole range from xmin to xmax into little intervals of width ∆x, centered at points
xi , we can define a discrete random variable X ∆x which is just X rounded off to middle of
whatever interval it’s in. The pmf for this discrete random variable is
p∆x (xi ) = P (xi − ∆x/2 < X < xi + ∆x/2) ≈ f (xi ) ∆x (1.17)
X ∆x and X will differ by at most ∆x/2, so we can write, approximately,
X X
E(X) ≈ E(X ∆x ) = xi p∆x (xi ) ≈ xi f (xi ) ∆x (1.18)
i i
In the limit that we divide up into more and more intervals and ∆x → 0, this becomes
Z xmax Z ∞
E(X) = x f (x) dx = x f (x) dx (1.19)
xmin −∞
The same argument works for the expected value of any function of X:
Z ∞
E(h(X)) = h(x) f (x) dx (1.20)
−∞
In particular, Z ∞
2
E(X ) = x2 f (x) dx (1.21)
−∞
We can also define the variance, and as usual, the same shortcut formula applies:
2
2
= E (X − µX )2 = E(X 2 ) − E(X)
σX (1.22)
where
µX = E(X) (1.23)
6
1.3 Percentiles and Medians
Practice Problems
4.1, 4.5, 4.11, 4.13, 4.17, 4.21
2 Normal Distribution
On this worksheet, you considered one simple pdf for a continuous random variable, a uniform
distribution. Now we’re going to consider one which is more complicated, but very useful, the
normal distribution, sometimes called the Gaussian distribution. The distribution depends
on two parameters, µ and σ,1 and if X follows a normal distribution with those parameters,
we write X ∼ N (µ, σ 2 ), and the pdf of X is
1 2 2
f (x; µ, σ) = √ e−(x−µ) /(2σ ) (2.1)
σ 2π
The shape of the normal distribution is the classic “bell curve”:
1
µ can be positive, negative or zero, but σ has to be positive
7
Some things we can notice about the normal distribution:
• It is non-zero for all x, so xmin = −∞ and xmax = ∞
• The pdf is, in fact, normalized. This is kind of tricky to show, since it relies on the
definite integral result Z ∞
2 √
e−z /2 dz = 2π (2.2)
−∞
This integral can’t really be done by the usual means, since there is no ordinary function
2
whose derivative is e−z /2 , and actually only the definite integral from −∞ to ∞ (or
from 0 to ∞) has a nice value. The proof of this identity is cute, but unfortunately it
requires the use of a two-dimensional integral in polar coördinates, so it’s beyond our
scope at the moment.
• The parameter µ, as you might guess from the name, is the mean of the distribution.
This is not so hard to see from the graph: the pdf is clearly symmetric about x = µ,
so that point should be both the mean and median of the distribution. It’s also not so
hard to show from the equations that
Z ∞ Z ∞
−(x−µ)2 /(2σ 2 ) 2 2
xe dx = µ e−(x−µ) /(2σ ) dx (2.3)
−∞ −∞
8
Now, because the expected values for continuous random variables still have the same be-
havior under linear transformations as for discrete rvs, i.e.,
The problem is that, as we noted before, there’s no ordinary function whose derivative is
2
e−z /2 . Still, the cumulative distribution function of a standard normal random variable
is a perfectly sensible thing, and we could, for example, numerically integrate to find an
approximate value for any z. So we define a function Φ(z) to be equal to that integral:
Z z
1 2
Φ(z) = √ e−u /2 du (definition) (2.10)
2π −∞
We can now write down the cdf for the original normal rv X:
x−µ x−µ
F (x; µ, σ) = P (X ≤ x) = P Z ≤ =Φ (2.11)
σ σ
and from the cdf we can find the probability for X to lie in any interval:
b−µ a−µ
P (a < X ≤ b) = F (b; µ, σ) − F (a; µ, σ) = Φ −Φ (2.12)
σ σ
9
This identification of Φ(z) as a cdf tells us a few values exactly:
Φ(−∞) = 0 (2.14a)
1
Φ(0) = (2.14b)
2
Φ(∞) = 1 (2.14c)
We know Φ(∞) from the fact that the normal distribution is normalized. We can deduce
Φ(0) from the fact that it’s symmetric about its mean value, and therefore the mean value
is also the median. For any other value of z, Φ(z) can in principle be calculated numerically.
Here’s what a plot of Φ(z) looks like:
Of course, since Φ(z) is an important function–like, for example, sin θ or ex –its value has
been tabulated. Table A.3, in the back of the book, collects some of those values. Now,
in the old days trigonometry books also had tables of values of trig functions, but now we
don’t need those because our calculators have sin and cos and ex buttons. Unfortunately,
our calculators don’t have Φ buttons on them yet, so we have the table. Note that if you’re
using a mathematical computing environment like scipy or Mathematica or matlab which
provides the “error function” erf(x), you can evaluate
1 z
Φ(z) = 1 + erf √ (2.15)
2 2
(That’s how I made the plot of Φ(z).)
10
Some useful sample values of Φ(z) are
P (X ≤ µ + σ) = Φ(1) ≈ .8413 (2.16a)
P (X ≤ µ + 2σ) = Φ(2) ≈ .9772 (2.16b)
P (X ≤ µ + 3σ) = Φ(3) ≈ .9987 (2.16c)
P (µ − σ < X ≤ µ + σ) = Φ(1) − Φ(−1) ≈ .6827 (2.16d)
P (µ − 2σ < X ≤ µ + 2σ) = Φ(2) − Φ(−2) ≈ .9545 (2.16e)
P (µ − 3σ < X ≤ µ + 3σ) = Φ(3) − Φ(−3) ≈ .9973 (2.16f)
The last few in particular are useful to keep in mind: A given value drawn from a standard
normal distribution has
• about a 2/3 (actually 68%) chance of being within one sigma of the mean
• about a 95% chance of being within two sigma of the mean
• about a 99.7% chance of being within three sigma of the mean
11
Suppose we have a discrete random variable X with pmf p(x) which can take on values
which are ∆x apart.2 The continuous random variable X with pdf f (x) is a good approxi-
mation of X if
One thing to watch out for is the discreteness of the original random variable X. For
instance, if we want to estimate its cdf, P (X ≤ x), we should consider that the value X = x
corresponds to the range x − ∆x/2 < X < x + ∆x/2, so
Z x+∆x/2
P (X ≤ x) ≈ P (X < x + ∆x/2) = f (y) dy (2.20)
−∞
E(X) = µ (2.21a)
V (X) = σ 2 (2.21b)
(Did anyone forget what q is? I did. q = 1 − p.) But if we can approximate X as a normally-
distributed random variable with µ = E(X) = np and σ 2 = V (X) = np(1 − p) = npq, life is
pretty simple (here ∆x = 1 because x can take on integer values):
x + .5 − np
P (X ≤ x) = B(x; n, p) ≈ Φ √ (2.24)
npq
At the very least, we only have to look in one table, rather than in a two-parameter family
of tables!
2
This is made even simpler if the discrete rv can take on integer values, so that ∆x = 1, but it also works
for other examples, like SAT scores, for which ∆x = 10.
12
Practice Problems
4.29, 4.31, 4.35, 4.47, 4.53, 4.55
• The exponential distribution occurs in the context of a Poisson process. It is the pdf
of the time we have to wait for a Poisson event to occur.
– The gamma distribution is a variant of the exponential distribution with a mod-
ified shape. It describes a physical situation where the expected waiting time
changes depending on how long we’ve been waiting. (The Weibull distribution,
which is in section 4.5, is a different variation on the exponential distribution.)
• The chi-squared distribution is the pdf of the sum of the squares of one or more inde-
pendent standard normal random variables. This pdf turns out to be a special case of
the gamma distribution.
Part of the complexity of the formulas for these various random variables arises from
several things which are not essential to the fundamental shape of the distribution. Consider
a normal distribution
1 2 2
f (x; µ, σ) = √ e−(x−µ) /(2σ ) (3.1)
σ 2π
First, notice that the factor of σ√12π does not depend on the value x of the random variable
X. It’s a constant (which happens to depend on the parameter σ) which is just needed to
normalize the distribution. So we can write
2 /(2σ 2 )
f (x; µ, σ) = N (σ)e−(x−µ) (3.2)
or
2 /(2σ 2 )
f (x; µ, σ) ∝ e−(x−µ) (3.3)
We will try to separate the often complicated forms of the normalization constants from the
often simpler forms of the x-dependent parts of the distributions.
Also, notice that the parameters µ and σ affect the location and scale of the pdf, but not
the shape, and if we define the rv Z = X−µ
σ
, the pdf is
2 /2
f (z) ∝ e−z (3.4)
13
BTW, what I mean by “doesn’t change the shape” is that if I plot a normal pdf with different
parameters, I can always change the ranges of the axes to make it look like any other one.
For example:
14
If we isolate the x dependence, we can write the slightly simpler
In any event, we see that changing λ just changes the scale of the x variable, so we can draw
one shape:
15
Note that α (which need not be an integer) really does influence the shape of the distribution,
and α = 1 reduces to the exponential distribution.
Practice Problems
4.59, 4.61, 4.63, 4.65, 4.67, 4.69, 4.71
16