Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu

Stat 110, Lecture 7
Continuous Probability Distributions
[email protected]
Stat 110 [email protected]

Statistics
No Data Some Data Way Too

Much Data
Probability
Inferential Descriptive
general notions Statistics Statistics
discrete continuous
distributions distributions

Classical continuous probability distributions
• uniform
• exponential
• normal or Gaussian distribution
• Weibull
• gamma
• beta

probability on a continous space
Discrete: Continuous:
sample space is  sample space is [0,∞) or
{0,1,2,…} (–∞, ∞).
P(X=x) = 0, but
p(k) = P(X=k) > 0 P(x–δ< X ≤ x+δ) > 0 for
called sometimes the small values of δ.
probability mass Define f(y)dy such that
function.
∫[x–δ,x+δ)f(y)dy
Σ k in {0,1,..} p( k ) = 1 = P(x–δ< X ≤ x+δ)
is called the probability
density function (pdf).
definitions
CDF: The cumulative distribution function for a

random variable is such that F(y0) = P(Y ≤ y0).
A continuous random variable has three
properties:
3. The sample space is (–∞, ∞)
4. Its associated CDF is continuous.
5. The probability that it equals any particular
value y0 is zero.
The probability density function (pdf) is f(y) = dF
(y)/dy.
Key properties of a pdf:
1. f(y) ≥ 0.
3. ∫(–∞, ∞) f(y)dy =1
5. ∫[a,b) f(y)dy = P( a < Y ≤ b ) = F(b) – F(a).

A few general properties with CDFs
n independent, identically distributed observations

from the same CDF F(x).
2. Maximum: P( max{ X1, X2, …, Xn }≤ x ) = F(x)n.
3. Minimum: P(min{X1,X2,…,Xn } ≤ x)=1–[1–F(x)]n.
5. When P(X≥0)=1, E(1–F(X)) = E(X).

Exponential distribution
1
Sample space:
0.8
[0, ∞ )
pdf( exp(1) )
0.6
pdf: exp(–t/λ)(1/λ)
0.4
cdf: 1–exp(–t/λ)
0.2
Moments: 0 1 2 3 4 5 6 7
E(y) = λ (MTTF)
Var(y) = λ2
Key application:
Lifetime data, waiting times
Examples with pdfs:
Exponential, with parameter λ.
1 = ∫(–∞, ∞) f(y)dy = ∫[0, ∞) f(y)dy =
∞
= ∫[0, ∞)(1/λ)exp(-y/λ)dy = ∫[0, ∞)exp(-t)dt = exp(-t)|0
= exp(-∞)– exp(-0) = 0 –(–1)=1
Mean = ∫[0, ∞) yf(y)dy = ∫[0, ∞) y(1/λ) exp(-y/λ)dy

= λ ∫[0, ∞) t exp(-t)dt = λ[–t exp(-t)dt – ∫[0, ∞)–exp(-t)dt ]
= λ [ 0 –(–1)] = λ

Key property of the exponential
P( X > t ) = 1–(1–exp( –t/λ )) = exp( –t/λ )
P( X > t+u | X > u ) = exp( –(t+u)/λ ) / exp( –u/λ )

= exp( –t/λ )
i.e. memoryless, no wearout or improvement.
…and a curious one:

P( X > E(X)=λ ) = exp( –1 ) = 0.368, so 63%
chance of failing by the MTTF.

uniform distribution 1
0.8
Sample space
pdf( U(0.5,2.0)
0.6
[0, 1] or [a,b]
pdf: 1/(b–a) 0.4
0.2
Moments 0
E(y) = (a+b)/2 .0 .5 1.0 1.5 2.0 2.5
Var(y)= (b–a)2/12
Key property:
Easy to generate random ones.
X with cdf F(x), then F(X) is uniform.

uniform examples
Measurement error induced by rounding.
Material, when selected using narrow specifications

from a process with rather wide variation.

Checking for the uniform distribution
distribution of hourly 1
0.9
counts given count > 0. 0.8
0.7
0.6
Log counts appear
CDF
0.5
uniformly distributed. 0.4
0.3
0.2
0.1
Plotting ranked counts vs
0
(rank / n)
1 10 100 1000 10000 100000
count

normal or Gaussian distribution
Sample space: (–∞, ∞)
Parameters:
μ mean and median
σ standard deviation
Probability distribution:
pdf(x) = exp{–[(x–μ)/σ]2/2}/(2πσ2)1/2 = φ((x–μ)/σ)/σ
Moments:
E( x ) = μ Var(x) = σ2
Key property:
central limit theorem

Central Limit Theorem
N Rows=1 N Rows=1
Distributions Distributions
N E
Moments Moments
Mean -0.005831 λexp(-λx) Mean 0.9968311
Std Dev 1.0024338 Std Dev 0.9968924
Std Err Mean 0.0079249 Std Err Mean 0.0078811
upper 95% Mean 0.0097023 upper 95% Mean 1.0122789
lower 95% Mean -0.021365 lower 95% Mean 0.9813832
-3 -2 -1 0 1 2 3 0 1 2 3 4 5
N 16000 N 16000
N Rows=4 N Rows=4
N E
Moments Moments
Mean -0.005831 Mean 0.9968311
Std Dev 0.50137 Std Dev 0.4954521
-3 -2 -1 0 1 2 3 0 1 2 3 4 5
N 4000 N 4000
N Rows=16 N Rows=16
N E
Moments Moments
Mean -0.005831 Mean 0.9968311
Std Dev 0.2499766 Std Dev 0.2474526
-3 -2 -1 0 1 2 3
N 1000 0 1 2 3 4 5
N 1000
Stat 110 Gaussian Exponential

[email protected]
The Three Effects of Averaging
Comparable estimates: The distribution of

averages has same center as the raw values.
Precision effect: The width of the distribution of
averages is reduced. With uncorrelated
readings, the standard deviation of averages of
n raw readings becomes σ/√n.
Central limit theorem: The shape of the
distribution of averages becomes normal or
Gaussian.

Why Gaussian?
Many observations are accumulations of several
phenomena, including measurement error.
Manufacturing usually involves several steps; each
step contributes some variation to the final
product properties.
Extreme values from each source can combine
together with positive probability, but become
progressively less likely.
In general, distributions of sums and averages
tend to converge in distribution to the normal
distribution.
Key properties of the Normal Distribution
Unimodal, Symmetric,
Completely describable by μ, σ2 which

allows for Standardization:
X Gaussian with μ, σ2 then
Z = ( X – μ ) / σ is called the standard normal. Its

probabilities are tabulated in Appendix II Table 4.

Problem 5.31
Standard flourescent: mean=7000h, s=1000h
Compact flourescent: mean=7500h, s=1200h
Prob lifetimes ≥ 9000h? ≤ 5000h?

(9000–7000)
0.4772 1000
=2
From table 4, appendix II,
P(0<z≤2) = 0.4772
so P(z>2) = 0.5–0.4772 = 0.0228
-5 -4 -3 -2 -1 0 1 2 3 4 5
z
(5000–7500) = –2.083
1200
From table 4, appendix II,
-5 -4 -3 -2 -1 0 1 2 3 4 5
P(0<z≤+2.08) = 0.4812
z P(0<z≤+2.09) = 0.4817
= 0.5 – P(0<z≤+2.08) = 0.48137
by linear interpolation
-5 -4 -3 -2 -1 0 1 2 3 4 5
z so P(z>2) = 0.5–0.4814 = 0.0186
= 0.5 – 0.4814
-5 -4 -3 -2 -1 0 1 2 3 4 5
Stat 110 z
[email protected]
(9000–7000)
0.9772 1000
=2
From Excel NORMSDIST(z)
P(z≤2) = 0.9772
so P(z>2) = 1–0.9772 = 0.0228
-5 -4 -3 -2 -1 0 1 2 3 4 5
z
spec'd life means stdev z Pr(z<) Pr(z>)

9000 7000 1000 2.0000 0.9772 0.0228
1200 1.6667 0.9522 0.0478
7500 1000 1.5000 0.9332 0.0668
1200 1.2500 0.8944 0.1056
5000 7000 1000 -2.0000 0.0228 0.9772
1200 -1.6667 0.0478 0.9522
7500 1000 -2.5000 0.0062 0.9938
1200 -2.0833 0.0186 .9814
z-scores:
CDF(z) = area under curve
to left
plots Φ(z) = P(Z ≤ z) vs z
-5 -4 -3 -2 -1 0 1 2 3 4 5
z
Inverse function Φ–1(p) 1.0
takes a probability and 0.9

0.8
calculates the associated 0.7
0.6
z value.
P(Z<z)
0.5
0.4
e.g. Φ(2) = 0.9772 so 0.3

0.2
Φ–1(0.9772) = +2 0.1
0.0
Φ–1(0.0228) = –2 -5 -4 -3 -2 -1 0
z
1 2 3 4 5

Aside: the 4th root of an exponential looks Gaussian
• y → y1/4
Exponential:
Distributions
E sqrt E cubert E 4th root E log E
10.0 2.2
3 1.7 2
9.0 2 1
8.0 1.8 1.5
1.6 -1
7.0 1.3
6.0 2 1.4
1.1 -3
1.2 -4
5.0 0.9
1
4.0
0.8 0.7 -6
3.0 1
0.6 0.5
2.0 -8
0.4 -9
1.0 0.3
0.2
0.0 0 0.1 -11
0
How Gaussian is any particular set of data?

Checking for normality
1.17 1.61 1.16 1.38 3.53 Stem Leaf count
1.23 3.76 1.94 0.96 4.75 4. 5-9 75, 1
0.15 2.41 0.71 0.02 1.59 4. 0-4
0.19 0.82 0.47 2.16 2.01 3. 5-9 53,76, 2

3. 0-4 07 1
0.92 0.75 2.59 3.07 1.40
2. 5-9 59, 1
2. 0-4 01,16,41, 3
0.02 0.75 1.17 1.61 2.59
1. 5-9 59,61,94, 3
0.15 0.82 1.23 1.94 3.07
1. 0-4 16,17,23,38,40 5
0.19 0.92 1.38 2.01 3.53 0. 5-9 71,75,82,92,96 5
0.47 0.96 1.40 2.16 3.76 0. 0-4 02,15,19,47 4
0.71 1.16 1.59 2.41 4.75

…sorting by value
linear: Gaussian fits
concave here, right-skewness
Normal probability plot
2.0
1.5
constructed Mean rank method:
scale
1.0 for an obs X(r) with rank r
of n, calculate p as
z mean rank
0.5
0.0
-0.5
( r – 0.375)/( n + 0.25)
-1.0
-1.5
data scale Calculate then from the
-2.0
normal distribution the zp
0 1 2 3 4 5
CPU times value such that
P( Z ≤ zp ) = p
(r+a)/(n+b) symmetric when
1+2a=b,
Plot zp=Φ–1(p) vs X(r).
e.g. 1–2•0.375 =1–0.75= 0.25
calculation (r –0.375) NORMSINV( p mean rank )
(n + 0.25)
z
CPU ran p mean z mean p median median
times k rank rank rank rank
0.02 1 0.0248 -1.9642 0.0273 -1.9213
0.15 2 0.0644 -1.5192 0.0662 -1.5045
0.19 3 0.1040 -1.2593 0.1055 -1.2506
0.47 4 0.1436 -1.0644 0.1449 -1.0585
0.71 5 0.1832 -0.9034 0.1843 -0.8989
… …
3.53 23 0.8960 1.2593 0.8945 1.2506
3.76 24 0.9356 1.5192 0.9338 1.5045
4.75 25 0.9752 1.9642 0.9727 1.9213
BETAINV( 0.5, r, n+1–r )
Stat 110
NORMSINV( p median rank )
[email protected]
Comments
• Linear pattern: consistent with Gaussian
• Concave: relatively right-skewed
• Outliers: values to the right
• AT&T convention: y-axis=data, x-axis=z-scores
• Most continuous distributions admit such a plot

to assess goodness-of-fit.
• Essentially is a QQ plot, where one quantile is
from a theoretical distribution.

Normal probability plot for (exponential)1/4
Fits rather well, 2

pretty linear.
1
Slightly less
tendency to
zp
0
extreme values
than a -1
Gaussian. -2
.2 .4 .6 .8 1 1.2 1.4 1.6

4th root

CPU times vs exponential distribution
Stem Leaf count
F( t ) = 1–exp(–t/λ) or
p = 1–exp(–t/λ) or
4. 5-9 75, 1
1–p = exp(–t/λ) or
4. 0-4
–log(1–p) = t/λ so
3. 5-9 53,76, 2 plot: –log(1–p) vs t
3. 0-4 07 1 ( slope = 1/λ )
2. 5-9 59, 1 4.0
3.5
2. 0-4 01,16,41, 3
3.0
1. 5-9 59,61,94, 3 2.5
-log(1-p)
2.0
1. 0-4 16,17,23,38,40 5
1.5
0. 5-9 71,75,82,92,96 5 1.0
0.5
0. 0-4 02,15,19,47 4
0.0
0 1 2 3 4 5
CPU times
Weibull distribution
Sample space: [0, ∞ )
CDF: 1–exp( –(t/λ)α )

Parameters:
λ = characteristic life
α = “shape”
Moments:
E(Y) = λ Γ((1+α)/α)
Var(Y) = λ2 [Γ((2+α)/α) –Γ((1+α)/α)]
Key applications: lifetime data, generalizes exponential,

modeling extreme values
Key properties of the Weibull
 Weibull is merely the
power transformation of Weibull pdfs for α=1/4, 1, 4
the exponential. λ = 0.
3
 P( Y > λ ) = 0.63, which

admits an estimate of λ. 2 exponential
pdf
 P( Y > (1+u)y | Y > y ) 1
= P( Y > (1+u)cy | Y
> cy ). “The chance of
0
living 10% longer is
always the same, …of 0 .5 1 1.5 2 2.5 3 3.5
living 20% longer,
always the same.”
Checking Weibull fit
F( t ) = 1–exp(–t/λ)α or 2
p = 1–exp(–t/λ)α or 1
1–p = exp(–t/λ)α or 0
log(-log(1-p))
-1
–log(1–p) = (t/λ)α or
-2
-3
log(–log(1–p))
-4
=α[log(t)–log(λ)] so 0 1 2 3 4 5
CPU times
log(–log(1–p)) vs log(t)
(slope = α)

(yλ –1)/λ
Power
Transformations
• y → yλ
• y → (yλ─1)/λ:
slope(@y=1)=1
y
Exponential:
Distributions
E sqrt E cubert E 4th root E log E
10.0 2.2
3 1.7 2
9.0 2 1
8.0 1.8 1.5
1.6 -1
7.0 1.3
6.0 2 1.4
1.1 -3
1.2 -4
5.0 0.9
1
4.0
0.8 0.7 -6
3.0 1
0.6 0.5
2.0 -8
0.4 -9
1.0 0.3
0.2
0.0 0 0.1 -11
0
×
0.15
0.10
0.05
0.00
0 5 10 15 20 25
k

Summary of relationships
hypergeometric
 Sampling without replacement
D/N → p
n/N → 0
binomial negative binomial
 Sampling with replacement  Sampling till result
np → λ
n→∞
Poisson

Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu

Uploaded by

Copyright:

Available Formats

Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu

Uploaded by

Copyright:

Available Formats

Stat 110, Lecture 7

Continuous Probability Distributions

Stat 110 [email protected]

No Data Some Data Way Too

Stat 110 [email protected]

• normal or Gaussian distribution

Stat 110 [email protected]

CDF: The cumulative distribution function for a

5. ∫[a,b) f(y)dy = P( a < Y ≤ b ) = F(b) – F(a).

Stat 110 [email protected]

n independent, identically distributed observations

5. When P(X≥0)=1, E(1–F(X)) = E(X).

Stat 110 [email protected]

Mean = ∫[0, ∞) yf(y)dy = ∫[0, ∞) y(1/λ) exp(-y/λ)dy

Stat 110 [email protected]

P( X > t ) = 1–(1–exp( –t/λ )) = exp( –t/λ )

P( X > t+u | X > u ) = exp( –(t+u)/λ ) / exp( –u/λ )

…and a curious one:

Stat 110 [email protected]

E(y) = (a+b)/2 .0 .5 1.0 1.5 2.0 2.5

Stat 110 [email protected]

Material, when selected using narrow specifications

Stat 110 [email protected]

Stat 110 [email protected]

Stat 110 [email protected]

Stat 110 Gaussian Exponential

Comparable estimates: The distribution of

Stat 110 [email protected]

Completely describable by μ, σ2 which

Z = ( X – μ ) / σ is called the standard normal. Its

Stat 110 [email protected]

Prob lifetimes ≥ 9000h? ≤ 5000h?

Stat 110 [email protected]

spec'd life means stdev z Pr(z<) Pr(z>)

takes a probability and 0.9

e.g. Φ(2) = 0.9772 so 0.3

Stat 110 [email protected]

How Gaussian is any particular set of data?

0.19 0.82 0.47 2.16 2.01 3. 5-9 53,76, 2

0.71 1.16 1.59 2.41 4.75

• AT&T convention: y-axis=data, x-axis=z-scores

• Most continuous distributions admit such a plot

Stat 110 [email protected]

Fits rather well, 2

.2 .4 .6 .8 1 1.2 1.4 1.6

Stat 110 [email protected]

CDF: 1–exp( –(t/λ)α )

Key applications: lifetime data, generalizes exponential,

 P( Y > λ ) = 0.63, which

Stat 110 [email protected]

Stat 110 [email protected]

Stat 110 [email protected]

You might also like