Common Probability Distributionsi Math 217/218 Probability and Statistics
Common Probability Distributionsi Math 217/218 Probability and Statistics
Common Probability Distributionsi Math 217/218 Probability and Statistics
1 Introduction.
I summarize here some of the more common distributions used in probability and statistics.
Some are more important than others, and not all of them are used in all fields.
I’ve identified four sources of these distributions, although there are more than these.
• those that come from the Bernoulli process or sampling with or without replace-
ment: Bernoulli(p), Binomial(n, p), Geometric(p), NegativeBinomial(p, r),
and Hypergeometric(N, M, n).
• those that come from the Poisson process: Poisson(λ, t), Exponential(λ),
Gamma(λ, r), and Beta(α, β).
For each distribution, I give the name of the distribution along with one or two parameters
and indicate whether it is a discrete distribution or a continuous one. Then I describe an
example interpretation for a random variable X having that distribution. Each discrete
distribution is determined by a probability mass function f which gives the probabilities for
the various outcomes, so that f (x) = P (X=x), the probability that a random variable X
with that distribution takes on the value x. Each continuous distribution is determined by a
probability density function f , which, when integrated from a to b gives you the probability
P (a ≤ X ≤ b). Next, I list the mean µ = E(X) and variance σ 2 = E((X −µ)2 ) = E(X 2 )−µ2
for the distribution, and for most of the distributions I include the moment generating function
m(t) = E(X t ). Finally, I indicate how some of the distributions may be used.
1
The gamma function, Γ(x), is defined for any real number x, except for 0 and negative
integers, by the integral Z ∞
Γ(x) = tx−1 e−t dt.
0
The Γ function generalizes the factorial function in the sense that when x is a positive integer,
Γ(x) = (x − 1)!. This follows from the recursion formula, Γ(x + 1) = xΓ(x), and the fact that
Γ(1) = 1, both of which can be easily proved by methods of calculus.
The values of the Γ function at half integers are important
√ for some of the distributions
mentioned below. A bit of clever work shows that Γ( 12 ) = π, so by the recursion formula,
1
(2n − 1)(2n − 3) · · · 3 · 1 √
Γ n+ 2
= π.
2n
The beta function, B(α, β), is a function of two positive real arguments. Like the Γ
function, it is defined by an integral, but it can also be defined in terms of the Γ function
Z 1
Γ(α)Γ(β)
B(α, β) = tα−1 (1 − t)β−1 dt = .
0 Γ(α + β)
The beta function is related to binomial coefficients, for when α and β are positive integers,
(α − 1)!(β − 1)!
B(α, β) = .
(α + β − 1)!
Uniform(n)
f (x) = 1/n, for x = 1, 2, . . . , n
n+1 n2 − 1
µ= . σ2 =
2 12
1 − e(n+1)t
m(t) =
1 − et
35
Example. A typical example is tossing a fair cubic die. n = 6, µ = 3.5, and σ 2 = 12
.
2
2.2 Continuous uniform distribution.
Uniform(a, b). Continuous.
In general, a continuous uniform variable X takes values on a curve, surface, or higher
dimensional region, but here I only consider the case when X takes values in an interval [a, b].
Being uniform, the probability that X lies in a subinterval is proportional to the length of
that subinterval.
Uniform(a, b)
1
f (x) = , for x ∈ [a, b]
b−a
a+b (b − a)2
µ= . σ2 =
2 12
ebt − eat
m(t) =
t(b − a)
Note. Most computer programming languages have a built in pseudorandom number gen-
erator which generates numbers X in the unit interval [0, 1]. Random number generators for
any other distribution can then computed by applying the inverse of the cumulative distri-
bution function for that distribution to X.
3
3.1 Bernoulli distribution.
Bernoulli(p). Discrete.
The parameter p is a real number between 0 and 1. The random variable X takes on two
values: 1 (success) or 0 (failure). The probability of success, P (X=1), is the parameter p.
The symbol q is often used for 1 − p, the probability of failure, P (X=0).
Bernoulli(p)
f (0) = 1 − p, f (1) = p
µ = p. σ 2 = p(1 − p) = pq
m(t) = pet + q
Binomial(n, p)
n x
f (x) = p (1 − p)n−x , for x = 0, 1, . . . , n
x
µ = np. σ 2 = np(1 − p) = npq
m(t) = (pet + q)n
Sampling with replacement. Sampling with replacement occurs when a set of N elements
has a subset of M “preferred” elements, and n elements are chosen at random, but the n
elements don’t have to be distinct. In other words, after one element is chosen, it’s put
back in the set so it can be chosen again. Selecting a preferred element is success, and that
happens with probability p = M/N , while selecting a nonpreferred element is failure, and
that happens with probability q = 1 − p = (N − M )/N . Thus, sampling with replacement is
a Bernoulli process.
A bit of history. The first serious development in the theory of probability was in the
1650s when Pascal and Fermat investigated the binomial distribution in the special case
p = 21 . Pascal published the resulting theory of binomial coefficients and properties of what
we now call Pascal’s triangle.
In the very early 1700s Jacob Bernoulli extended these results to general values of p.
4
3.3 Geometric distribution.
Geometric(p). Discrete.
When independent Bernoulli trials are repeated, each with probability p of success, the
number of trials X it takes to get the first success has a geometric distribution.
Geometric(p)
f (x) = q x−1 p, for x = 1, 2, . . .
1 1−p
µ = . σ2 =
p p2
t
pe
m(t) =
1 − qet
NegativeBinomial(p,
r)
x − 1 r x−r
f (x) = p q , for x = r, r + 1, . . .
r−1
r rq
µ = . σ2 = 2
p pr
pet
m(t) =
1 − qet
Hypergeometric(N, M, n)
M N −M
x n−x
f (x) = N
, for x = 0, 1, . . . , n
n
2 N −n
µ = np. σ = npq
N −1
5
hypergeometric distribution, described below, answersthe question: of the n chosen elements,
how many are preferred?
When n is a small fraction of N , then sampling without replacement is almost the same
as sampling with replacement, and the hypergeometric distribution is almost a binomial
distribution.
Many surveys have questions with only two responses, such as yes/no, for/against, or
candidate A/B. These are actually sampling without replacement because the same person
won’t be asked to respond to the survey twice. But since only a small portion of the population
will respond, the analysis of surveys can be treated as sampling with replacement rather than
sampling without replacement (which it actually is).
Poisson(λt)
1
f (x) = (λt)x e−λt , for x = 0, 1, . . .
x!
µ = λt. σ 2 = λt
s
m(s) = eλt(e −1)
6
4.2 Exponential distribution.
Exponential(λ). Continuous.
When events occur uniformly at random over time at a rate of λ events per unit time,
then the random variable X giving the time to the first event has an exponential distribution.
Exponential(λ)
f (x) = λe−λx , for x ∈ [0, ∞)
µ = 1/λ. σ 2 = 1/λ2
m(t) = (1 − t/λ)−1
Gamma(λ, r)
1 r r−1 −λx xα−1 e−x/β
f (x) = λx e = , for x ∈ [0, ∞)
Γ(r) β α Γ(α)
µ = r/λ = αβ. σ 2 = r/λ2 = αβ 2
m(t) = (1 − t/λ)−r = (1 − βt)−α
Beta(α, β)
1
f (x) = xα−1 (1 − x)β−1 , for x ∈ [0, 1]
B(α, β)
αβ αβ
µ= 2
. σ2 = 2
(α + β) (α + β + 1) (α + β) (α + β + 1)
7
Application to Bayesian statistics. Beta distributions are used in Bayesian statistics
as conjugate priors for the distributions in the Bernoulli process. Indeed, that’s just what
Thomas Bayes (1702–1761) did. In Beta(α, β), α counts the number of successes observed
while β keeps track of the failures observed.
Normal(µ, σ2)
(x − µ)2
1
f (x) = √ exp − , for x ∈ R
σ 2π 2σ 2
µ = µ. σ 2 = σ 2
m(t) = exp(µt + t2 σ 2 /2)
The standard normal distribution has µ = 0 and σ 2 = 1. Thus, its density function is
1 2 2
f (x) = √ e−x /2 , and its moment generating function is m(t) = et /2 .
2π
Applications. Normal distributions are used in statistics to make inferences about the
population mean when the sample size n is large.
5.2 χ2 -distribution.
ChiSquared(ν). Continuous.
The parameter ν, the number of “degrees of freedom,” is a positive integer. I generally
use this Greek letter nu for degrees of freedom. This is the distribution for the sum of the
squares of ν independent standard normal distributions.
8
A χ2 -distribution is actually a special case of a gamma distribution with a fractional value
for r. ChiSquared(ν) = Gamma(λ, r) where λ = 12 and r = ν2 .
ChiSquared(ν)
xν/2−1 ex/2
f (x) = ν/2 , for x ≥ 0
2 Γ(ν/2)
µ = ν. σ 2 = 2ν
m(t) = (1 − 2t)ν/2
T(ν)
Γ( ν+1
2
)
f (x) = √ ν , for x ∈ R
πν Γ( 2 ) (1 + x2 /ν)(ν+1)/2
ν
µ = 0. σ 2 = , when ν > 2
ν−2
F(ν1 , ν2 )
1 ( νν21 )ν1 /2 xν1 /2−1
f (x) = , for x > 0
B( ν21 , ν22 ) (1 + ν1
ν2
x)(ν1 +ν2 )/2
ν2 2ν22 (ν1 + ν2 − 2)
µ= , when ν2 > 2. σ2 = , when ν2 > 4
ν2 − 2 ν1 (ν2 − 2)2 (ν2 − 4)
Applications. F -distributions are used in statistics when comparing variances of two pop-
ulations.
9
Table of Discrete and Continuous distributions
10