Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Stat 110, Lecture 7

Continuous Probability Distributions

[email protected]

Stat 110 [email protected]


Statistics

No Data Some Data Way Too


Much Data
Probability
Inferential Descriptive
general notions Statistics Statistics

discrete continuous
distributions distributions

Stat 110 [email protected]


Classical continuous probability distributions

• uniform
• exponential

• normal or Gaussian distribution

• Weibull
• gamma

• beta

Stat 110 [email protected]


probability on a continous space
Discrete: Continuous:
sample space is  sample space is [0,∞) or
{0,1,2,…} (–∞, ∞).
P(X=x) = 0, but
p(k) = P(X=k) > 0 P(x–δ< X ≤ x+δ) > 0 for
called sometimes the small values of δ.
probability mass Define f(y)dy such that
function.
∫[x–δ,x+δ)f(y)dy
Σ k in {0,1,..} p( k ) = 1 = P(x–δ< X ≤ x+δ)
is called the probability
density function (pdf).
Stat 110 [email protected]
definitions

CDF: The cumulative distribution function for a


random variable is such that F(y0) = P(Y ≤ y0).
A continuous random variable has three
properties:
3. The sample space is (–∞, ∞)
4. Its associated CDF is continuous.
5. The probability that it equals any particular
value y0 is zero.
The probability density function (pdf) is f(y) = dF
(y)/dy.
Stat 110 [email protected]
Key properties of a pdf:

1. f(y) ≥ 0.

3. ∫(–∞, ∞) f(y)dy =1

5. ∫[a,b) f(y)dy = P( a < Y ≤ b ) = F(b) – F(a).

Stat 110 [email protected]


A few general properties with CDFs

n independent, identically distributed observations


from the same CDF F(x).
2. Maximum: P( max{ X1, X2, …, Xn }≤ x ) = F(x)n.
3. Minimum: P(min{X1,X2,…,Xn } ≤ x)=1–[1–F(x)]n.

5. When P(X≥0)=1, E(1–F(X)) = E(X).

Stat 110 [email protected]


Exponential distribution
1
Sample space:
0.8
[0, ∞ )

pdf( exp(1) )
0.6
pdf: exp(–t/λ)(1/λ)
0.4
cdf: 1–exp(–t/λ)
0.2

Moments: 0 1 2 3 4 5 6 7

E(y) = λ (MTTF)
Var(y) = λ2
Key application:
Lifetime data, waiting times
Stat 110 [email protected]
Examples with pdfs:
Exponential, with parameter λ.
1 = ∫(–∞, ∞) f(y)dy = ∫[0, ∞) f(y)dy =

= ∫[0, ∞)(1/λ)exp(-y/λ)dy = ∫[0, ∞)exp(-t)dt = exp(-t)|0
= exp(-∞)– exp(-0) = 0 –(–1)=1

Mean = ∫[0, ∞) yf(y)dy = ∫[0, ∞) y(1/λ) exp(-y/λ)dy


= λ ∫[0, ∞) t exp(-t)dt = λ[–t exp(-t)dt – ∫[0, ∞)–exp(-t)dt ]
= λ [ 0 –(–1)] = λ

Stat 110 [email protected]


Key property of the exponential

P( X > t ) = 1–(1–exp( –t/λ )) = exp( –t/λ )

P( X > t+u | X > u ) = exp( –(t+u)/λ ) / exp( –u/λ )


= exp( –t/λ )
i.e. memoryless, no wearout or improvement.

…and a curious one:


P( X > E(X)=λ ) = exp( –1 ) = 0.368, so 63%
chance of failing by the MTTF.

Stat 110 [email protected]


uniform distribution 1

0.8
Sample space

pdf( U(0.5,2.0)
0.6
[0, 1] or [a,b]
pdf: 1/(b–a) 0.4

0.2

Moments 0

E(y) = (a+b)/2 .0 .5 1.0 1.5 2.0 2.5

Var(y)= (b–a)2/12

Key property:
Easy to generate random ones.
X with cdf F(x), then F(X) is uniform.

Stat 110 [email protected]


uniform examples
Measurement error induced by rounding.

Material, when selected using narrow specifications


from a process with rather wide variation.

Stat 110 [email protected]


Checking for the uniform distribution

distribution of hourly 1
0.9
counts given count > 0. 0.8
0.7
0.6
Log counts appear

CDF
0.5
uniformly distributed. 0.4
0.3
0.2
0.1
Plotting ranked counts vs
0
(rank / n)
1 10 100 1000 10000 100000
count

Stat 110 [email protected]


normal or Gaussian distribution
Sample space: (–∞, ∞)
Parameters:
μ mean and median
σ standard deviation

Probability distribution:
pdf(x) = exp{–[(x–μ)/σ]2/2}/(2πσ2)1/2 = φ((x–μ)/σ)/σ
Moments:
E( x ) = μ Var(x) = σ2
Key property:
central limit theorem

Stat 110 [email protected]


Central Limit Theorem
N Rows=1 N Rows=1
Distributions Distributions
N E
Moments Moments
Mean -0.005831 λexp(-λx) Mean 0.9968311
Std Dev 1.0024338 Std Dev 0.9968924
Std Err Mean 0.0079249 Std Err Mean 0.0078811
upper 95% Mean 0.0097023 upper 95% Mean 1.0122789
lower 95% Mean -0.021365 lower 95% Mean 0.9813832
-3 -2 -1 0 1 2 3 0 1 2 3 4 5
N 16000 N 16000

N Rows=4 N Rows=4
Distributions Distributions
N E
Moments Moments
Mean -0.005831 Mean 0.9968311
Std Dev 0.50137 Std Dev 0.4954521
Std Err Mean 0.0079274 Std Err Mean 0.0078338
upper 95% Mean 0.0097106 upper 95% Mean 1.0121896
lower 95% Mean -0.021373 lower 95% Mean 0.9814725
-3 -2 -1 0 1 2 3 0 1 2 3 4 5
N 4000 N 4000

N Rows=16 N Rows=16
Distributions Distributions
N E
Moments Moments
Mean -0.005831 Mean 0.9968311
Std Dev 0.2499766 Std Dev 0.2474526
Std Err Mean 0.007905 Std Err Mean 0.0078251
upper 95% Mean 0.0096808 upper 95% Mean 1.0121866
lower 95% Mean -0.021344 lower 95% Mean 0.9814755
-3 -2 -1 0 1 2 3
N 1000 0 1 2 3 4 5
N 1000

Stat 110 Gaussian Exponential


[email protected]
The Three Effects of Averaging

Comparable estimates: The distribution of


averages has same center as the raw values.
Precision effect: The width of the distribution of
averages is reduced. With uncorrelated
readings, the standard deviation of averages of
n raw readings becomes σ/√n.
Central limit theorem: The shape of the
distribution of averages becomes normal or
Gaussian.

Stat 110 [email protected]


Why Gaussian?
Many observations are accumulations of several
phenomena, including measurement error.
Manufacturing usually involves several steps; each
step contributes some variation to the final
product properties.
Extreme values from each source can combine
together with positive probability, but become
progressively less likely.
In general, distributions of sums and averages
tend to converge in distribution to the normal
distribution.
Stat 110 [email protected]
Key properties of the Normal Distribution

Unimodal, Symmetric,

Completely describable by μ, σ2 which


allows for Standardization:
X Gaussian with μ, σ2 then

Z = ( X – μ ) / σ is called the standard normal. Its


probabilities are tabulated in Appendix II Table 4.

Stat 110 [email protected]


Problem 5.31
Standard flourescent: mean=7000h, s=1000h
Compact flourescent: mean=7500h, s=1200h

Prob lifetimes ≥ 9000h? ≤ 5000h?

Stat 110 [email protected]


(9000–7000)
0.4772 1000
=2
From table 4, appendix II,
P(0<z≤2) = 0.4772
so P(z>2) = 0.5–0.4772 = 0.0228
-5 -4 -3 -2 -1 0 1 2 3 4 5
z

(5000–7500) = –2.083
1200
From table 4, appendix II,
-5 -4 -3 -2 -1 0 1 2 3 4 5
P(0<z≤+2.08) = 0.4812
z P(0<z≤+2.09) = 0.4817
= 0.5 – P(0<z≤+2.08) = 0.48137
by linear interpolation
-5 -4 -3 -2 -1 0 1 2 3 4 5
z so P(z>2) = 0.5–0.4814 = 0.0186
= 0.5 – 0.4814
-5 -4 -3 -2 -1 0 1 2 3 4 5
Stat 110 z
[email protected]
(9000–7000)
0.9772 1000
=2
From Excel NORMSDIST(z)
P(z≤2) = 0.9772
so P(z>2) = 1–0.9772 = 0.0228
-5 -4 -3 -2 -1 0 1 2 3 4 5
z

spec'd life means stdev z Pr(z<) Pr(z>)


9000 7000 1000 2.0000 0.9772 0.0228
1200 1.6667 0.9522 0.0478
7500 1000 1.5000 0.9332 0.0668
1200 1.2500 0.8944 0.1056
5000 7000 1000 -2.0000 0.0228 0.9772
1200 -1.6667 0.0478 0.9522
7500 1000 -2.5000 0.0062 0.9938
1200 -2.0833 0.0186 .9814
Stat 110 [email protected]
z-scores:
CDF(z) = area under curve
to left
plots Φ(z) = P(Z ≤ z) vs z
-5 -4 -3 -2 -1 0 1 2 3 4 5
z
Inverse function Φ–1(p) 1.0

takes a probability and 0.9


0.8
calculates the associated 0.7
0.6
z value.
P(Z<z)
0.5
0.4

e.g. Φ(2) = 0.9772 so 0.3


0.2

Φ–1(0.9772) = +2 0.1
0.0

Φ–1(0.0228) = –2 -5 -4 -3 -2 -1 0
z
1 2 3 4 5

Stat 110 [email protected]


Aside: the 4th root of an exponential looks Gaussian
• y → y1/4
Exponential:
Distributions
E sqrt E cubert E 4th root E log E
10.0 2.2
3 1.7 2
9.0 2 1
8.0 1.8 1.5
1.6 -1
7.0 1.3
6.0 2 1.4
1.1 -3
1.2 -4
5.0 0.9
1
4.0
0.8 0.7 -6
3.0 1
0.6 0.5
2.0 -8
0.4 -9
1.0 0.3
0.2
0.0 0 0.1 -11
0

How Gaussian is any particular set of data?


Stat 110 [email protected]
Checking for normality
1.17 1.61 1.16 1.38 3.53 Stem Leaf count
1.23 3.76 1.94 0.96 4.75 4. 5-9 75, 1
0.15 2.41 0.71 0.02 1.59 4. 0-4

0.19 0.82 0.47 2.16 2.01 3. 5-9 53,76, 2


3. 0-4 07 1
0.92 0.75 2.59 3.07 1.40
2. 5-9 59, 1
2. 0-4 01,16,41, 3
0.02 0.75 1.17 1.61 2.59
1. 5-9 59,61,94, 3
0.15 0.82 1.23 1.94 3.07
1. 0-4 16,17,23,38,40 5
0.19 0.92 1.38 2.01 3.53 0. 5-9 71,75,82,92,96 5
0.47 0.96 1.40 2.16 3.76 0. 0-4 02,15,19,47 4

0.71 1.16 1.59 2.41 4.75


…sorting by value
Stat 110 [email protected]
linear: Gaussian fits
concave here, right-skewness
Normal probability plot
2.0

1.5
constructed Mean rank method:
scale
1.0 for an obs X(r) with rank r
of n, calculate p as
z mean rank

0.5
0.0
-0.5
( r – 0.375)/( n + 0.25)
-1.0
-1.5
data scale Calculate then from the
-2.0
normal distribution the zp
0 1 2 3 4 5
CPU times value such that
P( Z ≤ zp ) = p
(r+a)/(n+b) symmetric when
1+2a=b,
Plot zp=Φ–1(p) vs X(r).
e.g. 1–2•0.375 =1–0.75= 0.25
Stat 110 [email protected]
calculation (r –0.375) NORMSINV( p mean rank )
(n + 0.25)
z
CPU ran p mean z mean p median median
times k rank rank rank rank
0.02 1 0.0248 -1.9642 0.0273 -1.9213
0.15 2 0.0644 -1.5192 0.0662 -1.5045
0.19 3 0.1040 -1.2593 0.1055 -1.2506
0.47 4 0.1436 -1.0644 0.1449 -1.0585
0.71 5 0.1832 -0.9034 0.1843 -0.8989
… …
3.53 23 0.8960 1.2593 0.8945 1.2506
3.76 24 0.9356 1.5192 0.9338 1.5045
4.75 25 0.9752 1.9642 0.9727 1.9213
BETAINV( 0.5, r, n+1–r )
Stat 110
NORMSINV( p median rank )
[email protected]
Comments
• Linear pattern: consistent with Gaussian
• Concave: relatively right-skewed
• Outliers: values to the right

• AT&T convention: y-axis=data, x-axis=z-scores

• Most continuous distributions admit such a plot


to assess goodness-of-fit.
• Essentially is a QQ plot, where one quantile is
from a theoretical distribution.

Stat 110 [email protected]


Normal probability plot for (exponential)1/4

Fits rather well, 2


pretty linear.
1
Slightly less
tendency to
zp
0
extreme values
than a -1

Gaussian. -2

.2 .4 .6 .8 1 1.2 1.4 1.6


4th root

Stat 110 [email protected]


CPU times vs exponential distribution
Stem Leaf count
F( t ) = 1–exp(–t/λ) or
p = 1–exp(–t/λ) or
4. 5-9 75, 1
1–p = exp(–t/λ) or
4. 0-4
–log(1–p) = t/λ so
3. 5-9 53,76, 2 plot: –log(1–p) vs t
3. 0-4 07 1 ( slope = 1/λ )
2. 5-9 59, 1 4.0
3.5
2. 0-4 01,16,41, 3
3.0
1. 5-9 59,61,94, 3 2.5

-log(1-p)
2.0
1. 0-4 16,17,23,38,40 5
1.5
0. 5-9 71,75,82,92,96 5 1.0
0.5
0. 0-4 02,15,19,47 4
0.0
0 1 2 3 4 5
CPU times
Stat 110 [email protected]
Weibull distribution
Sample space: [0, ∞ )

CDF: 1–exp( –(t/λ)α )


Parameters:
λ = characteristic life
α = “shape”
Moments:
E(Y) = λ Γ((1+α)/α)
Var(Y) = λ2 [Γ((2+α)/α) –Γ((1+α)/α)]

Key applications: lifetime data, generalizes exponential,


modeling extreme values
Stat 110 [email protected]
Key properties of the Weibull
 Weibull is merely the
power transformation of Weibull pdfs for α=1/4, 1, 4
the exponential. λ = 0.
3

 P( Y > λ ) = 0.63, which


admits an estimate of λ. 2 exponential

pdf
 P( Y > (1+u)y | Y > y ) 1
= P( Y > (1+u)cy | Y
> cy ). “The chance of
0
living 10% longer is
always the same, …of 0 .5 1 1.5 2 2.5 3 3.5
living 20% longer,
always the same.”
Stat 110 [email protected]
Checking Weibull fit

F( t ) = 1–exp(–t/λ)α or 2

p = 1–exp(–t/λ)α or 1

1–p = exp(–t/λ)α or 0

log(-log(1-p))
-1
–log(1–p) = (t/λ)α or
-2

-3
log(–log(1–p))
-4
=α[log(t)–log(λ)] so 0 1 2 3 4 5
CPU times

log(–log(1–p)) vs log(t)
(slope = α)

Stat 110 [email protected]


(yλ –1)/λ
Power
Transformations
• y → yλ
• y → (yλ─1)/λ:
slope(@y=1)=1
y
Exponential:
Distributions
E sqrt E cubert E 4th root E log E
10.0 2.2
3 1.7 2
9.0 2 1
8.0 1.8 1.5
1.6 -1
7.0 1.3
6.0 2 1.4
1.1 -3
1.2 -4
5.0 0.9
1
4.0
0.8 0.7 -6
3.0 1
0.6 0.5
2.0 -8
0.4 -9
1.0 0.3
0.2
0.0 0 0.1 -11
0
Stat 110 [email protected]
×

0.15

0.10

0.05

0.00
0 5 10 15 20 25
k

Stat 110 [email protected]


Summary of relationships
hypergeometric
 Sampling without replacement
D/N → p
n/N → 0
binomial negative binomial
 Sampling with replacement  Sampling till result
np → λ
n→∞
Poisson

Stat 110 [email protected]

You might also like