Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu
Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu
Stat 110, Lecture 7 Continuous Probability Distributions: Bheavlin@stat - Stanford.edu
discrete continuous
distributions distributions
• uniform
• exponential
• Weibull
• gamma
• beta
1. f(y) ≥ 0.
3. ∫(–∞, ∞) f(y)dy =1
pdf( exp(1) )
0.6
pdf: exp(–t/λ)(1/λ)
0.4
cdf: 1–exp(–t/λ)
0.2
Moments: 0 1 2 3 4 5 6 7
E(y) = λ (MTTF)
Var(y) = λ2
Key application:
Lifetime data, waiting times
Stat 110 [email protected]
Examples with pdfs:
Exponential, with parameter λ.
1 = ∫(–∞, ∞) f(y)dy = ∫[0, ∞) f(y)dy =
∞
= ∫[0, ∞)(1/λ)exp(-y/λ)dy = ∫[0, ∞)exp(-t)dt = exp(-t)|0
= exp(-∞)– exp(-0) = 0 –(–1)=1
0.8
Sample space
pdf( U(0.5,2.0)
0.6
[0, 1] or [a,b]
pdf: 1/(b–a) 0.4
0.2
Moments 0
Var(y)= (b–a)2/12
Key property:
Easy to generate random ones.
X with cdf F(x), then F(X) is uniform.
distribution of hourly 1
0.9
counts given count > 0. 0.8
0.7
0.6
Log counts appear
CDF
0.5
uniformly distributed. 0.4
0.3
0.2
0.1
Plotting ranked counts vs
0
(rank / n)
1 10 100 1000 10000 100000
count
Probability distribution:
pdf(x) = exp{–[(x–μ)/σ]2/2}/(2πσ2)1/2 = φ((x–μ)/σ)/σ
Moments:
E( x ) = μ Var(x) = σ2
Key property:
central limit theorem
N Rows=4 N Rows=4
Distributions Distributions
N E
Moments Moments
Mean -0.005831 Mean 0.9968311
Std Dev 0.50137 Std Dev 0.4954521
Std Err Mean 0.0079274 Std Err Mean 0.0078338
upper 95% Mean 0.0097106 upper 95% Mean 1.0121896
lower 95% Mean -0.021373 lower 95% Mean 0.9814725
-3 -2 -1 0 1 2 3 0 1 2 3 4 5
N 4000 N 4000
N Rows=16 N Rows=16
Distributions Distributions
N E
Moments Moments
Mean -0.005831 Mean 0.9968311
Std Dev 0.2499766 Std Dev 0.2474526
Std Err Mean 0.007905 Std Err Mean 0.0078251
upper 95% Mean 0.0096808 upper 95% Mean 1.0121866
lower 95% Mean -0.021344 lower 95% Mean 0.9814755
-3 -2 -1 0 1 2 3
N 1000 0 1 2 3 4 5
N 1000
Unimodal, Symmetric,
(5000–7500) = –2.083
1200
From table 4, appendix II,
-5 -4 -3 -2 -1 0 1 2 3 4 5
P(0<z≤+2.08) = 0.4812
z P(0<z≤+2.09) = 0.4817
= 0.5 – P(0<z≤+2.08) = 0.48137
by linear interpolation
-5 -4 -3 -2 -1 0 1 2 3 4 5
z so P(z>2) = 0.5–0.4814 = 0.0186
= 0.5 – 0.4814
-5 -4 -3 -2 -1 0 1 2 3 4 5
Stat 110 z
[email protected]
(9000–7000)
0.9772 1000
=2
From Excel NORMSDIST(z)
P(z≤2) = 0.9772
so P(z>2) = 1–0.9772 = 0.0228
-5 -4 -3 -2 -1 0 1 2 3 4 5
z
Φ–1(0.9772) = +2 0.1
0.0
Φ–1(0.0228) = –2 -5 -4 -3 -2 -1 0
z
1 2 3 4 5
1.5
constructed Mean rank method:
scale
1.0 for an obs X(r) with rank r
of n, calculate p as
z mean rank
0.5
0.0
-0.5
( r – 0.375)/( n + 0.25)
-1.0
-1.5
data scale Calculate then from the
-2.0
normal distribution the zp
0 1 2 3 4 5
CPU times value such that
P( Z ≤ zp ) = p
(r+a)/(n+b) symmetric when
1+2a=b,
Plot zp=Φ–1(p) vs X(r).
e.g. 1–2•0.375 =1–0.75= 0.25
Stat 110 [email protected]
calculation (r –0.375) NORMSINV( p mean rank )
(n + 0.25)
z
CPU ran p mean z mean p median median
times k rank rank rank rank
0.02 1 0.0248 -1.9642 0.0273 -1.9213
0.15 2 0.0644 -1.5192 0.0662 -1.5045
0.19 3 0.1040 -1.2593 0.1055 -1.2506
0.47 4 0.1436 -1.0644 0.1449 -1.0585
0.71 5 0.1832 -0.9034 0.1843 -0.8989
… …
3.53 23 0.8960 1.2593 0.8945 1.2506
3.76 24 0.9356 1.5192 0.9338 1.5045
4.75 25 0.9752 1.9642 0.9727 1.9213
BETAINV( 0.5, r, n+1–r )
Stat 110
NORMSINV( p median rank )
[email protected]
Comments
• Linear pattern: consistent with Gaussian
• Concave: relatively right-skewed
• Outliers: values to the right
Gaussian. -2
-log(1-p)
2.0
1. 0-4 16,17,23,38,40 5
1.5
0. 5-9 71,75,82,92,96 5 1.0
0.5
0. 0-4 02,15,19,47 4
0.0
0 1 2 3 4 5
CPU times
Stat 110 [email protected]
Weibull distribution
Sample space: [0, ∞ )
pdf
P( Y > (1+u)y | Y > y ) 1
= P( Y > (1+u)cy | Y
> cy ). “The chance of
0
living 10% longer is
always the same, …of 0 .5 1 1.5 2 2.5 3 3.5
living 20% longer,
always the same.”
Stat 110 [email protected]
Checking Weibull fit
F( t ) = 1–exp(–t/λ)α or 2
p = 1–exp(–t/λ)α or 1
1–p = exp(–t/λ)α or 0
log(-log(1-p))
-1
–log(1–p) = (t/λ)α or
-2
-3
log(–log(1–p))
-4
=α[log(t)–log(λ)] so 0 1 2 3 4 5
CPU times
log(–log(1–p)) vs log(t)
(slope = α)
0.15
0.10
0.05
0.00
0 5 10 15 20 25
k