Notes04 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Continuous Random Variables

(Devore Chapter Four)


1016-345-01
Probability and Statistics for Engineers∗
Winter 2010-2011

Contents
1 Continuous Random Variables 1
1.1 Defining the Probability Density Function . . . . . . . . . . . . . . . . . . . 1
1.2 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Percentiles and Medians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Normal Distribution 6
2.1 The Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . 9
2.2 zα Values and Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Approximating Distributions of Discrete Random Variables . . . . . . . . . . 11
2.3.1 Example: Approximating the Binomial Distribution . . . . . . . . . . 11

3 Other Continuous Distributions 12


3.1 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Chi-Squared Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Copyright 2011, John T. Whelan, and all that

1
Thursday 16 December 2010

1 Continuous Random Variables


Recall that when we introduced the concept of random variables last chapter we focussed on
discrete random variables, which could take on one of a set of specific values, either finite or
countably infinite. Now we turn attention to continuous random variables, which can
take on an value in one or more intervals, but for which there is zero probability for any
single value.

1.1 Defining the Probability Density Function


Devore starts with the definition of the probability density function, and deduces the cu-
mulative distribution function for a continuous random variable from that. We will follow
a complementary presentation, starting by extending the cdf to a continuous rv, and then
deriving the pdf from that.
Recall that if X is a discrete random variable, we can define the probability mass function

p(x) = P (X = x) (1.1)

and the cumulative distribution function


X
F (x) = P (X ≤ x) = p(y) (1.2)
y≤x

2
Note that for a discrete random variable, the cdf is constant for a range of values, and then
jumps when we reach one of the possible values for the rv.
For a continuous rv X, we know that the probability of X equalling exactly any specific
x is zero, so clearly p(x) = P (X = x) doesn’t make any sense. However, F (x) = P (X ≤ x)
is still a perfectly good definition. Now, though, F (x) should be a continuous function with
no “jumps”, since a jump corresponds to a finite probability for some specific value:

Note that in both cases, if xmin and xmax are the minimum and maximum possible values of

3
X, then F (x) = 0 when x < xmin , and F (x) = 1 when x ≥ xmax . We can also write

F (−∞) = 0 (1.3a)
F (∞) = 1 (1.3b)

Just as before, we can calculate the probability for X to lie in some finite interval from
a to b, using the fact that

(X ≤ b) = (X ≤ a) ∪ (a < X ≤ b) (1.4)

(where b > a):

P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a) . (1.5)

But now, since P (X = a) = 0 and P (X = b) = 0, we don’t have to worry about whether or


not the endpoints are included:

P (a ≤ X ≤ b) = P (a < X ≤ b) = P (a ≤ X < b) (X a cts rv) (1.6)

We can’t talk about the probability for X to be exactly some value x, but we can consider
the probability for X to lie in some little interval of width ∆x centered at x:

P (x − ∆x/2 < X < x + ∆x/2) = F (x + ∆x/2) − F (x − ∆x/2) (1.7)

We know that if we make ∆x smaller and smaller, the difference between F (x + ∆x/2) and
F (x − ∆x/2) has to get smaller and smaller. If we define

P (x − ∆x/2 < X < x + ∆x/2)


f (x) = lim
∆x→0 ∆x (1.8)
F (x + ∆x/2) − F (x − ∆x/2) dF
= lim = = F 0 (x)
∆x→0 ∆x dx
this is just the derivative of the cdf. This derivative f (x) is called the probability density
function.

4
If we want to calculate the probability for X to lie in some interval, we can use the pdf
f (x) or the cdf F (x). If the interval is small enough that the cdf F (x) can be approximated
as a straight line, the change in F (x) over that interval is approximately the width of the
interval times the derivative f (x) = F 0 (x):

P (x − ∆x/2 < X < x + ∆x/2) = F (x + ∆x/2) − F (x − ∆x/2) ≈ F 0 (x) ∆x = f (x) ∆x


(1.9)
If the interval is not small enough, we can always divide it up into pieces that are. If the ith
piece is ∆x wide and centered on xi ,
N
X
P (a < X < b) = F (b) − F (a) ≈ f (xi ) ∆x (1.10)
i=1

where
xi+i − xi = ∆x (1.11)
and

x1 − ∆x/2 = a (1.12a)
xN + ∆x/2 = b (1.12b)
(1.12c)

In the limit that N → ∞, so that ∆x → 0, the sum becomes an integral, so that


Z b
P (a < X < b) = F (b) − F (a) = f (x) dx (1.13)
a

5
Of course, this is also a consequence of the Second Fundamental Theorem of Calculus:
Z b
0
if f (x) = F (x) then f (x) dx = F (b) − F (a) (1.14)
a
P
The normalization condition, which for a discrete random variable is x p(x) = 1, be-
comes for a continuous random variable
Z ∞
f (x)dx = 1 (1.15)
−∞

1.2 Expected Values


Recall that for a discrete random variable, the expected value is
X
E(X) = x p(x) (X a discrete rv) (1.16)
x

In the case of a continuous random variable, we can use the same construction as before to
divide the whole range from xmin to xmax into little intervals of width ∆x, centered at points
xi , we can define a discrete random variable X ∆x which is just X rounded off to middle of
whatever interval it’s in. The pmf for this discrete random variable is
p∆x (xi ) = P (xi − ∆x/2 < X < xi + ∆x/2) ≈ f (xi ) ∆x (1.17)
X ∆x and X will differ by at most ∆x/2, so we can write, approximately,
X X
E(X) ≈ E(X ∆x ) = xi p∆x (xi ) ≈ xi f (xi ) ∆x (1.18)
i i

In the limit that we divide up into more and more intervals and ∆x → 0, this becomes
Z xmax Z ∞
E(X) = x f (x) dx = x f (x) dx (1.19)
xmin −∞

The same argument works for the expected value of any function of X:
Z ∞
E(h(X)) = h(x) f (x) dx (1.20)
−∞

In particular, Z ∞
2
E(X ) = x2 f (x) dx (1.21)
−∞
We can also define the variance, and as usual, the same shortcut formula applies:
2
2
= E (X − µX )2 = E(X 2 ) − E(X)

σX (1.22)
where
µX = E(X) (1.23)

6
1.3 Percentiles and Medians
Practice Problems
4.1, 4.5, 4.11, 4.13, 4.17, 4.21

Tuesday 4 January 2011


1.4 Worksheet
• worksheet
• solutions

2 Normal Distribution
On this worksheet, you considered one simple pdf for a continuous random variable, a uniform
distribution. Now we’re going to consider one which is more complicated, but very useful, the
normal distribution, sometimes called the Gaussian distribution. The distribution depends
on two parameters, µ and σ,1 and if X follows a normal distribution with those parameters,
we write X ∼ N (µ, σ 2 ), and the pdf of X is
1 2 2
f (x; µ, σ) = √ e−(x−µ) /(2σ ) (2.1)
σ 2π
The shape of the normal distribution is the classic “bell curve”:

1
µ can be positive, negative or zero, but σ has to be positive

7
Some things we can notice about the normal distribution:
• It is non-zero for all x, so xmin = −∞ and xmax = ∞
• The pdf is, in fact, normalized. This is kind of tricky to show, since it relies on the
definite integral result Z ∞
2 √
e−z /2 dz = 2π (2.2)
−∞
This integral can’t really be done by the usual means, since there is no ordinary function
2
whose derivative is e−z /2 , and actually only the definite integral from −∞ to ∞ (or
from 0 to ∞) has a nice value. The proof of this identity is cute, but unfortunately it
requires the use of a two-dimensional integral in polar coördinates, so it’s beyond our
scope at the moment.
• The parameter µ, as you might guess from the name, is the mean of the distribution.
This is not so hard to see from the graph: the pdf is clearly symmetric about x = µ,
so that point should be both the mean and median of the distribution. It’s also not so
hard to show from the equations that
Z ∞ Z ∞
−(x−µ)2 /(2σ 2 ) 2 2
xe dx = µ e−(x−µ) /(2σ ) dx (2.3)
−∞ −∞

(and thus E(X) = µ) by using the change of variables y = x − µ.


• The parameter σ turns out to be the standard deviation. You can’t really tell this
from the graph, but it is possible to show that
Z ∞
1 2 2
2
V (X) = E((X − µ) ) = √ (x − µ)2 e−(x−µ) /(2σ ) dx (2.4)
σ 2π −∞
This is harder, though, since the integral
Z ∞ Z ∞
2 −(x−µ)2 /(2σ 2 ) 2 /(2σ 2 )
(x − µ) e dx = σ 2
e−(x−µ) dx (2.5)
−∞ −∞

has to be done using integration by parts.


• Although the pdf depends on two parameters, neither one of them fundamentally
changes the shape. Changing µ just slides the pdf back and forth. Changing σ just
changes the scale of the horizontal axis. If we increase σ, the curve is stretched hori-
zontally, and it has to be squeezed by the same factor vertically, in order to keep the
area underneath it equal to 1 (since the pdf is normalized).

2.1 The Standard Normal Distribution


The fact that, for X ∼ N (µ, σ 2 ), changing the parameters µ and σ just slides and stretches
the x axis is the motivation for constructing a new random variable Z:
X −µ
Z= (2.6)
σ

8
Now, because the expected values for continuous random variables still have the same be-
havior under linear transformations as for discrete rvs, i.e.,

E(aX + b) = aE(X) + b (2.7a)


V (aX + b) = a2 V (X) (2.7b)

we can see that


E(X) µ µ µ
E(Z) = − = − =0 (2.8a)
σ σ σ σ
V (X) σ2
V (Z) = = =1 (2.8b)
σ2 σ2
In fact, Z is itself normally distributed Z ∼ N (0, 1). We call this special case of the normal
distribution the standard normal distribution.
The standard normal distribution is often useful in calculating the probability that a
normally distributed random variable falls in a certain finite interval:
  b−µ
a−µ b−µ
Z
1 σ 2 /2
P (a < X ≤ b) = P <Z≤ =√ e−z dz (2.9)
σ σ 2π a−µ
σ

The problem is that, as we noted before, there’s no ordinary function whose derivative is
2
e−z /2 . Still, the cumulative distribution function of a standard normal random variable
is a perfectly sensible thing, and we could, for example, numerically integrate to find an
approximate value for any z. So we define a function Φ(z) to be equal to that integral:
Z z
1 2
Φ(z) = √ e−u /2 du (definition) (2.10)
2π −∞
We can now write down the cdf for the original normal rv X:
   
x−µ x−µ
F (x; µ, σ) = P (X ≤ x) = P Z ≤ =Φ (2.11)
σ σ

and from the cdf we can find the probability for X to lie in any interval:
   
b−µ a−µ
P (a < X ≤ b) = F (b; µ, σ) − F (a; µ, σ) = Φ −Φ (2.12)
σ σ

2.1.1 Cumulative Distribution Function


The function Φ(z) has been defined so that it equals the cdf of Z:
Z z
F (z; 1, 0) = P (Z ≤ z) = f (u; 0, 1) du = Φ(z) (2.13)
−∞

9
This identification of Φ(z) as a cdf tells us a few values exactly:

Φ(−∞) = 0 (2.14a)
1
Φ(0) = (2.14b)
2
Φ(∞) = 1 (2.14c)

We know Φ(∞) from the fact that the normal distribution is normalized. We can deduce
Φ(0) from the fact that it’s symmetric about its mean value, and therefore the mean value
is also the median. For any other value of z, Φ(z) can in principle be calculated numerically.
Here’s what a plot of Φ(z) looks like:

Of course, since Φ(z) is an important function–like, for example, sin θ or ex –its value has
been tabulated. Table A.3, in the back of the book, collects some of those values. Now,
in the old days trigonometry books also had tables of values of trig functions, but now we
don’t need those because our calculators have sin and cos and ex buttons. Unfortunately,
our calculators don’t have Φ buttons on them yet, so we have the table. Note that if you’re
using a mathematical computing environment like scipy or Mathematica or matlab which
provides the “error function” erf(x), you can evaluate
  
1 z
Φ(z) = 1 + erf √ (2.15)
2 2
(That’s how I made the plot of Φ(z).)

10
Some useful sample values of Φ(z) are
P (X ≤ µ + σ) = Φ(1) ≈ .8413 (2.16a)
P (X ≤ µ + 2σ) = Φ(2) ≈ .9772 (2.16b)
P (X ≤ µ + 3σ) = Φ(3) ≈ .9987 (2.16c)
P (µ − σ < X ≤ µ + σ) = Φ(1) − Φ(−1) ≈ .6827 (2.16d)
P (µ − 2σ < X ≤ µ + 2σ) = Φ(2) − Φ(−2) ≈ .9545 (2.16e)
P (µ − 3σ < X ≤ µ + 3σ) = Φ(3) − Φ(−3) ≈ .9973 (2.16f)
The last few in particular are useful to keep in mind: A given value drawn from a standard
normal distribution has
• about a 2/3 (actually 68%) chance of being within one sigma of the mean
• about a 95% chance of being within two sigma of the mean
• about a 99.7% chance of being within three sigma of the mean

2.2 zα Values and Percentiles


Often you want to go the other way, and ask, for example, what’s the value that a normal
random variable will only exceed 10% of the time. To get this, you basically need the inverse
of the function Φ(z). The notation is to define zα such that α (which can be between 0 and
1) is the probability that Z > zα :
P (Z > zα ) = 1 − Φ(zα ) = α (2.17)
We know a few exact values:
z0 = ∞ (2.18a)
z.5 = 0 (2.18b)
z1 = −∞ (2.18c)
The rest can be deduced from the table of Φ(z) values; a few of them are in Table 4.1
of Devore (on page 148). Note that his argument about z.05 actually only shows that it’s
between 1.64 and 1.65. A more accurate value turns out to be z.05 ≈ 1.64485 which means
that, somewhat coı̈ncidentally, z.05 ≈ 1.645 is accurate to four significant figures. (But to
three significant figures z.05 ≈ 1.64.)

2.3 Approximating Distributions of Discrete Random Variables


We described some aspects of a continuous random variable as a limiting case of a discrete
random variable as the spacing between possible values went to zero. It turns out that
continuous random variables, and normally-distributed ones in particular, can be good ap-
proximations of discrete random variables when the numbers involved are large. In this case,
∆x doesn’t go to zero, but it’s small compared to the other numbers in the problem.

11
Suppose we have a discrete random variable X with pmf p(x) which can take on values
which are ∆x apart.2 The continuous random variable X with pdf f (x) is a good approxi-
mation of X if

p(x) = P (X = x) ≈ P (x − ∆x/2 < X < x + ∆x/2) ≈ f (x) ∆x (2.19)

One thing to watch out for is the discreteness of the original random variable X. For
instance, if we want to estimate its cdf, P (X ≤ x), we should consider that the value X = x
corresponds to the range x − ∆x/2 < X < x + ∆x/2, so
Z x+∆x/2
P (X ≤ x) ≈ P (X < x + ∆x/2) = f (y) dy (2.20)
−∞

Often the continuous random variable used in the approximation is a normally-distributed


rv with the same mean and standard deviation as the original discrete rv. I.e., if

E(X) = µ (2.21a)
V (X) = σ 2 (2.21b)

it is often a good approximation to take


x+∆x/2  
x + .5(∆x) − µ
Z
F (x) = P (X ≤ x) ≈ f (y; µ, σ) dy = Φ (2.22)
−∞ σ

2.3.1 Example: Approximating the Binomial Distribution


As an example, consider a binomial random variable X ∼ Bin(n, p). Remember that for
large values of n it was kind of a pain to find the cdf
x   x  
X n y n−y
X n y n−y
P (X ≤ x) = B(x; n, p) = p (1 − p) = p q (2.23)
y=0
y y=0
y

(Did anyone forget what q is? I did. q = 1 − p.) But if we can approximate X as a normally-
distributed random variable with µ = E(X) = np and σ 2 = V (X) = np(1 − p) = npq, life is
pretty simple (here ∆x = 1 because x can take on integer values):
 
x + .5 − np
P (X ≤ x) = B(x; n, p) ≈ Φ √ (2.24)
npq

At the very least, we only have to look in one table, rather than in a two-parameter family
of tables!
2
This is made even simpler if the discrete rv can take on integer values, so that ∆x = 1, but it also works
for other examples, like SAT scores, for which ∆x = 10.

12
Practice Problems
4.29, 4.31, 4.35, 4.47, 4.53, 4.55

Thursday 6 January 2011

3 Other Continuous Distributions


Sections 4.4 and 4.5 unleash upon us a dizzying array of other distribution functions. (I
count six or seven.) In this course, we’ll skip section 4.5, and therefore only have to contend
with the details of a few of them. Still, it’s handy to put them into some context so we can
understand how they fit together. First, a qualitative overview:

• The exponential distribution occurs in the context of a Poisson process. It is the pdf
of the time we have to wait for a Poisson event to occur.
– The gamma distribution is a variant of the exponential distribution with a mod-
ified shape. It describes a physical situation where the expected waiting time
changes depending on how long we’ve been waiting. (The Weibull distribution,
which is in section 4.5, is a different variation on the exponential distribution.)
• The chi-squared distribution is the pdf of the sum of the squares of one or more inde-
pendent standard normal random variables. This pdf turns out to be a special case of
the gamma distribution.

Part of the complexity of the formulas for these various random variables arises from
several things which are not essential to the fundamental shape of the distribution. Consider
a normal distribution
1 2 2
f (x; µ, σ) = √ e−(x−µ) /(2σ ) (3.1)
σ 2π
First, notice that the factor of σ√12π does not depend on the value x of the random variable
X. It’s a constant (which happens to depend on the parameter σ) which is just needed to
normalize the distribution. So we can write
2 /(2σ 2 )
f (x; µ, σ) = N (σ)e−(x−µ) (3.2)

or
2 /(2σ 2 )
f (x; µ, σ) ∝ e−(x−µ) (3.3)
We will try to separate the often complicated forms of the normalization constants from the
often simpler forms of the x-dependent parts of the distributions.
Also, notice that the parameters µ and σ affect the location and scale of the pdf, but not
the shape, and if we define the rv Z = X−µ
σ
, the pdf is
2 /2
f (z) ∝ e−z (3.4)

13
BTW, what I mean by “doesn’t change the shape” is that if I plot a normal pdf with different
parameters, I can always change the ranges of the axes to make it look like any other one.
For example:

3.1 Exponential Distribution


Suppose we have a Poisson process with rate λ, so that the probability mass function of the
number of events occurring in any period of time of duration t is
(λt)n −λt
P (n events in t) = e (3.5)
n!
In particular, evaluating the pmf at n = 0 we get
(λt)0 −λt
P (no events in t) = e = e−λt (3.6)
0!
Now pick some arbitrary moment and let X be the random variable describing how long we
have to wait for the next event. If X ≤ t that means there are one or more events occurring
in the interval of length t starting at that moment, so the probability of this is
P (X ≤ t) = 1 − P (no events in t) = 1 − e−λt (3.7)
But that is the definition of the cumulative distribution function, so
F (x; λ) = P (X ≤ x) = 1 − e−λx (3.8)
Note that it doesn’t make sense for X to take on negative values, which is good, since
F (0) = 0. This means that technically,
(
0 x<0
F (x; λ) = −λx
(3.9)
1−e x≥0

We can differentiate (with respect to x) to get the pdf


(
0 x<0
f (x; λ) = (3.10)
λe−λx x ≥ 0

14
If we isolate the x dependence, we can write the slightly simpler

f (x; λ) ∝ e−λx x≥0 (3.11)

In any event, we see that changing λ just changes the scale of the x variable, so we can draw
one shape:

You can show, by using integration by parts to do the integrals, that


Z ∞
1
E(X) = x λ e−λx dx = (3.12)
0 λ
and Z ∞
2
2
E(X ) = x2 λ e−λx dx = (3.13)
0 λ2
so
1
V (X) = E(X 2 ) − [E(X)]2 = (3.14)
λ2

3.1.1 Gamma Distribution


Sometimes you also see the parameter in an exponential distribution written as β = 1/λ
rather than λ, so that
f (x; β) ∝ e−x/β x≥0 (3.15)
The gamma distribution is a generalization which adds an additional parameter α > 0,
so that  α−1
x
f (x; α, β) ∝ e−x/β x≥0 gamma distribution (3.16)
β

15
Note that α (which need not be an integer) really does influence the shape of the distribution,
and α = 1 reduces to the exponential distribution.

3.1.2 The Gamma Function


Most of the complication associated with the gamma distribution in particular is associated
with finding the normalization constant. If we write
 α−1
x
f (x; α, β) = N (α, β) e−x/β (3.17)
β
then  α−1
Z ∞ Z ∞
1 x −x/β β
= e dx = β uα−1 e−u du = (3.18)
N (α, β) 0 β 0 N (α, 1)
This integral Z ∞
1
= xα−1 e−x dx = Γ(α) (3.19)
N (α, 1) 0
is defined as the “Gamma function”, which has lots of nifty properties:
• Γ(α) = (α
R ∞− 1)Γ(α − 1) if α > 1 (which can be shown by integration by parts)
• Γ(1) = 0 e−x dx = 1 from which it follows that Γ(n) = (n − 1)! if n is a positive
integer √
√ R∞ 2
• Γ(1/2) = π which actually follows from −∞ e−z /2 dz = 2π

3.2 Chi-Squared Distribution


Consider the square of a standard normal random variable, X = Z 2 . Its cdf is
√ √ √ √
F (x) = P (X ≤ x) = P (Z 2 ≤ x) = P (− x ≤ Z ≤ x) = Φ( x) − Φ(− x) (3.20)
which we can differentiate to get the pdf
  
0 dP dz 2 −x/2 1
f (x) = F (x) = = √ e √ ∝ x−1/2 e−x/2 (3.21)
dz dx 2π 2 x
This is a special case of chi-squared distribution, with one degree of freedom. It turns out
that ifPyou take the sum of the squares of ν independent standard normal random variables,
X = νk=1 (Z ν )2 , it obeys a chi-squared distribution with ν degrees of freedom,
f (x) ∝ x(ν/2)−1 e−x/2 (3.22)
But if we compare this to equation (3.16) we see that the chi-squared distribution with ν
degrees of freedom is just a Gamma distribution with α = ν/2 and β = 2.

Practice Problems
4.59, 4.61, 4.63, 4.65, 4.67, 4.69, 4.71

16

You might also like