Probability Theory - Formula Sheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Cheat Sheet

2DF20

2023 / 2024

1
1 Discrete Distributions

X ∞
X
E[X] = P(X = k) · k = pk · k
k=0 k=0
X∞
E[X 2 ] = pk · k 2
k=0
∞ ∞
!2
X X
Var[X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2 = pk · k 2 − pk · k
k=0 k=0

1.1 Bernoulli Distribution


X ∼ Bernoulli(p) - number of successes X in one trial with success probability p.
P(X = 1) = p, and P(X = 0) = 1 − p
E[X] = p
Var[X] = p · (1 − p)

1.2 Binomial Distribution


X ∼ Bin(n, p) - number of successes
 X in n trials with success probability
 p.
n n!
P(X = k) = · pk · (1 − p)n−k = · pk · (1 − p)n−k
k k! · (n − k)!
E[X] = n · p
Var[X] = n · p · (1 − p)

1.2.1 Properties
X 
X1 + ... + Xn ∼ Bin mi , p , for Xi ∼ Bin(mi , p)
X1 + ... + Xn ∼ Bin(n, p), for Xi ∼ Bernoulli(p)

1.3 Poisson Distribution


X ∼ Pois(λ) - number of events X occurring in a fixed interval with rate λ.
λk
P(X = k) = e−λ ·
k!
E[X] = λ
Var[X] = λ

1.3.1 Properties
X 
X1 + ... + Xn ∼ Pois λi , for Xi ∼ Pois(λi )
 
λ
lim Bin n, ≡ Pois(λ)
n→∞ n

2
1.4 Geometric Distribution
X ∼ Geo(p) - number of successes X before the first fail with success probability p
P(X = k) = pk · (1 − p)
p
E[X] =
1−p
p
Var[X] =
(1 − p)2

1.4.1 Properties
P(X ≥ k + a|X ≥ a) = P(X ≥ k) (memoryless property)

1.4.2 Special Cases


P(X = k) = p · (1 − p)k - number of fails X before the first success
P(X = k) = pk−1 · (1 − p) - number of trails X until the first fail
- (a lot of similarities with Bin(n = k, p))

3
2 Probability Generating Function (PGF)

X ∞
X
k
GX (z) = P(X = k) · z = pk · z k = p0 + p1 · z + p2 · z 2 + p3 · z 3 + ...
k=0 k=0
GX (z) = E[z X ]

2.0.1 Properties
From PGF the
 following
 can 
be shown:
 k
1 (k) 1 d
P(X = k) = GX (0) = GX (z)
k! k! dz k z=0

From PGF the following can be shown:



!
d X
E[X] = G′X (1) = GX (z) = p1 + 2p2 z + 3p3 z 2 + ... = p1 + 2p2 + 3p3 + ... = pk · k
dz z=1 z=1 k=0

More generally:
(k) dk
E[X(X − 1)(X − 2)...(X − k + 1)] = GX (1) = k GX (z)
dz z=1

Suppose X = Y + Z,
then GX (z) = GY (z) · GZ (z)

Suppose X = p · Y + (1 − p) · Z,
then GX (z) = p · GY (z) + (1 − p) · GZ (z)

2.1 Common PDFs


X ∼ Bin(n, p): GX (z) = (1 − p + pz)n
X ∼ Pois(λ): GX (z) = exp(λ(z − 1))
1−p
X ∼ Geo(p): GX (z) =
1 − pz

4
3 Continuous Distributions
F (x) = P(X ≤ x)
d
f (x) = F (x)
dx
Z ∞
E[X] = x · f (x) dx
Z0 ∞
E[X 2 ] = x2 · f (x) dx
0 2
Z ∞ Z ∞
2 2 2 2
Var[X] = E[(X − E[X]) ] = E[X ] − (E[X]) = x · f (x) dx − x · f (x) dx
0 0

3.1 Uniform Distribution


X ∼ Uniform(a, b) - all and only the X between a and b even likely to occur.
1
f (x) = , for a < x < b, else f (x) = 0
b−a
1
E[X] = (b + a)
2
1
Var[X] = (b − a)2
12

3.2 Exponential Distribution


X ∼ Exp(λ) ≡ Gamma(1, λ) - time X before an event occurs with rate λ.
f (x) = λ · exp(−λx), for x ≥ 0, else f (x) = 0
F (x) = P(X ≤ x) = 1 − exp(−λx)
1
E[X] =
λ
1
Var[X] = 2
λ

3.2.1 Properties
P(X > k + a|X > a) = P(X > a) (memoryless property)
X 
min(X1 , ..., Xn ) ∼ Exp λi , for Xi ∼ Exp(λi )
λ1
P(X1 < X2 ) = , for Xi ∼ Exp(λi )
λ1 + λ2
λi
P(Xi = min(X1 , ..., Xn )) = , for Xi ∼ Exp(λi )
λ1 + ... + λn

3.3 Erlang Distribution


X ∼ Erlang(k, λ) ≡ Gamma(k, λ), where k ∈ Z+ - time X before k events occur with rate λ.
x(k−1)
f (x) = · λk · e−λx , for x ≥ 0, else f (x) = 0
(k − 1)!

5
1
E[X] = k ·
λ
1
Var[X] = k · 2
λ

3.3.1 Properties
X1 + ... + Xn ∼ Erlang(n, λ), for Xi ∼ Exp(λ)

3.4 Gamma Distribution


X ∼ Gamma(α, β) - time X before α (can be non-integer) events occur with rate β.
Z ∞
x(α−1) α −βx
f (x) = · β · e , where Γ(α) = x(α−1) e−x dx
Γ(α) 0
α
E[X] =
β
α
Var[X] = 2
β

3.4.1 Properties
Γ(α) = (α − 1)!, for α ∈ Z+

Γ(α + 1)
=α → Γ(α) · α = Γ(α + 1)
Γ(α)

3.5 Normal Distribution


X ∼ N (µ, σ 2 ).
(x − µ)2
 
1
f (x) = √ · exp −
2πσ 2 2σ 2
E[X] = µ
Var[X] = σ 2

3.5.1 Properties
X X 
X1 ± ... ± Xn ∼ N ±µi , σi2 , for Xi ∼ N (µ, σ 2 )

3.6 Lognormal Distribution


X ∼ LN (µ, σ 2 ), where ln(X) ∼ N (µ, σ 2 )

(ln(x) − µ)2
 
1
f (x) = √ · exp −
x 2πσ 2 2σ 2
1
E[X] = exp(µ + σ 2 )
2
E[X 2 ] = exp(2µ + 2σ 2 )
Var[X] = exp(2µ + 2σ 2 ) − exp(2µ + σ 2 )
= (exp(σ 2 ) − 1) · exp(2µ + σ 2 )

6
4 Moment Generating Function (MGF)
Z ∞
MX (t) = etu · f (u) du = E[etX ]
−∞
t2 t3 tn
MX (t) = t + tE[X] + E[X 2 ] + E[X 3 ] + ... + + E[X n ]
2! 3! n!

4.0.1 Properties
From MGF the following can be shown:
t2

d
E[X] = MX′ (0) = MX (t) = E[X] + E[X 2 ]t + E[X 3 ] + ... = E[X]
dt t=0 2! t=0

More generally:
(k) dk
E[X k ] = MX (0) = MX (t)
dtk t=0

Suppose Y = aX + b,
then MY (t) = MX (at) · ebt

Suppose X = Y + Z,
then MX (t) = MY (t) · MZ (t)

Suppose X = p · Y + (1 − p) · Z,
then MX (t) = p · MY (t) + (1 − p) · MZ (t)

4.1 Common MGFs


etb − eta
X ∼ Uniform(a, b): MX (t) =
tb − ta
λ
X ∼ Exp(λ): MX (t) =
λ−t  α
β
X ∼ Gamma(α, β): MX (t) =
 β−t 
2 1 22
X ∼ N (µ, σ ): MX (t) = exp µt + σ t
2

4.2 Laplace-Stieltjes Transforms (LSTs)


Z ∞
ϕX (s) = e−su · f (u) du = E[e−sX ]
−∞
ϕX (s) = MX (−s)

d
E[X] = −ϕ′X (0) = − ϕX (s)
ds s=0
k
(k) d
E[X k ] = ϕX (0) = (−1)k k ϕX (s)
ds s=0

7
5 Sums of Random Variables
Sn = X1 + X2 + ... + Xn ,
where, e.g., Xi is the payment for claim i, n is the number of claims, and Sn is the total claim
amount.

E[Sn ] = E[X1 + ... + Xn ] = E[X1 ] + ... + E[Xn ]


Var[Sn ] = Var[X1 + ... + Xn ] = Var[X1 ] + ... + Var[Xn ], (because Xi are independent)
n n
!
Y Y
MSn (t) = MXi (t) = E[etXi ] = E[et(X1 +...+Xn ) ] = MX1 +...+Xn (t)
i=1 i=1

5.1 Convolution Formula


k
X
P(X1 + X2 = k) = P(X1 = j) · P(X2 = k − j)
j=0

5.2 Stochastic Sums of Random Variables


SN (t) = X1 + X2 + ... + XN (t) ,
where, e.g., Xi is the payment for claim i, N (t) is the (uncertain) number of claims in (0, t], and
SN (t) is the total claim amount.

If, N (t) ∼ Discrete(...), with GN (t) (z),


and Xi , with MXi (a), then:

MSN (t) (a) = E[eaSN (t) ]


X∞
MSN (t) (a) = P(N (t) = n) · E[ea(X1 +X2 +...+Xn ) ]
n=0

X n
MSN (t) (a) = P(N (t) = n) · E[eaX1 ]
n=0

X
MSN (t) (a) = P(N (t) = n) · (MX1 (a))n
n=0
MSN (t) (a) = GN (t) (MX1 (a))

E[SN (t) ] = E[X1 ] · E[N (t)]


Var[SN (t) ] = Var[X1 ] · E[N (t)] + Var[N (t)] · (E[X1 ])2

5.3 Counting Process


A stochastic process is called a counting process when N (t) represents the number of ’events’ that
have occurred.
All counting processes have to satisfy the following properties:
(1) N (0) = 0

8
(2) N (t) ∈ Z+
(3) N (b) ≥ N (a), for b > a
(4) N (b) − N (a) is the number of events in (a, b]

Independent Increments: This means N (b)−N (a) and N (d)−N (c) are independent if the intervals
(a, b] and (c, d] do not overlap.

Stationary Increments: This means that N (b) − N (a) and N (b + u) − N (a + u) are identically
distributed.

5.4 Poisson Process


A stochastic process {N (t), t ≥ 0} is called a Poisson process, PP(λ), if it follows the properties:
(1) Counting Process
(2) Independent Increments
(3) Stationary Increments
(λt)n
(4) P(N (t) = n) = exp(−λt) · , hence N (t) ∼ PP(λ) ≡ Pois(λt)
n!

5.4.1 Properties
N1 (t) + ... + Nm (t) ∼ PP(λ1 + ... + λm ) ≡ Pois((λ1 + ... + λm ) · t), for Ni ∼ PP(λi ) ≡ Pois(λi · t)
Y1 ∼ PP(pλ), Y2 ∼ PP((p − 1)λ), for X = pY1 + (p − 1)Y2 ∼ PP(λ)

P(T1 > t) = P(N (t) = 0) = exp(−λt)


Ti ∼ Exp(λ), same λ as in Pois(λt)

Total time:
Vn = T1+ T2 +... + Tn
n
λ
MVn =
λ−t
So, Vn ∼ Gamma(n, λ) ≡ Erlang(n, λ)

5.5 Compound Poisson Process


A stochastic process {S(t), t ≥ 0} is said to be a compound Poisson process if it can be represented
by:
S(t) = SN (t) = X1 + X2 + ... + XN (t) , where {N (t), t ≥ 0} is a Poisson process with rate λ.

E[SN (t) ] = λt · E[X]


Var[SN (t) ] = λt · E[X 2 ]

Since GN (z) 
= exp(λt(z
 − 1),
it follows:
λ
MS(u) = exp λt −1
λ−1

9
6 Theorems
6.1 Markov Inequality
E[X] 1
P(X ≥ a) ≤ - the probability that X is at least a, is at most times the E[X]
a a

6.2 Chebychev Inequality


Var[X] 1
P(|X − E[X]| ≥ a) ≤ - the prob. that the sd. is at least a, is at most times the Var[X]
a2 a2

6.3 Weak Law of Large Numbers


n
! !
1X
lim P Xi − E[X] > ϵ = 0 - the prob. that for ∞ variables X̄ and E[X] differ is 0
n→∞ n i=1
n
1X
lim Xi = E[X] - for ∞ variables X̄ is E[X]
n→∞ n
i=1

6.4 Central Limit Theorem (CLT)


n
X Sn
Let Sn = Xi , and X̄n =
i=1
n
Sn − n · E[X]
Then Zn = p , is Sn normalized for the mean and variance
n · Var[X]
X̄n − E[X] √
Then Zn = p · n
Var[X] r
n
Then Zn = (X̄n − E[X]) ·
σ2

lim P(Zn ≤ z) = Φ(z), where Φ(z) is the CDF of N (0, 1)


n→∞

10
7 Calculating Rules
7.1 Expected Value and Variance
E[b] =b
E[aX] = aE[X]
E[X + Y ] = E[X] + E[Y ]
E[E[X]] = E[X]
E[X · Y ] = E[X] · E[Y ], if X, Y are independent

Var[b] =0
Var[aX] = a2 Var[X]
Var[X + Y ] = Var[X] + Var[Y ] + 2(E[X · Y ] − E[X] · E[Y ]))
Var[X + Y ] = Var[X] + Var[Y ], if X, Y are independent

7.2 Sum Operator


7.2.1 Linearity
n
X
c=n·c
k=0
n
X n
X
c · ak = c · ak
k=0 k=0
n
X n
X n
X
ak + b k = ak + bk
k=0 k=0 k=0

7.2.2 Exponential Sums


n
X n(n + 1)
k=
k=0
2
n
X n(n + 1)(2n + 1)
k2 =
k=0
6
n
X n2 (n + 1)2
k3 =
k=0
4

7.2.3 Power Series / Convergent Series



X 1
=e
k=0
k!

X xk
= ex , more generally (Taylor Series)
k=0
k!

X 1 n
=
k=0
nk n−1

11
7.3 Derivative
7.3.1 Linearity
h(x) = a · f (x) + b · g(x)
h′ (x) = a · f ′ (g(x)) + b · g ′ (x)

7.3.2 Product Rule


h(x) = f (x) · g(x)
h′ (x) = f ′ (x) · g(x) + f (x) · g ′ (x)

7.3.3 Chain Rule


h(x) = f (g(x))
h′ (x) = f ′ (g(x)) · g ′ (x)

7.3.4 Quotient Rule


f (x)
h(x) =
g(x)
f ′ (x) · g(x) − f (x) · g ′ (x)
h′ (x) =
g(x)2

7.3.5 Reciprocal Rule


1
h(x) =
g(x)
−g ′ (x) [1]′ · g(x) − 1 · g ′ (x)
 

h (x) = =
g(x)2 g(x)2

7.3.6 Useful Derivatives


f (x) = a, f ′ (x) = 0
f (x) = x5 , f ′ (x) = 5x4
f (x) = ax , f ′ (x) = ax · ln(a)
f (x) = ex , f ′ (x) = ex
1
f (x) = ln(x), f ′ (x) =
x

7.4 Combination Operator


 
n n!
= - ”n choose x”
x x! · (n − x)!

7.5 Binomial Theorem


n  
X n k (n−k)
a b = (a + b)n , for a, b > 0
k=0
k

12
7.6 Lemma

X
E[X] = P(X > k)
k=0

Means that you can find the average value of X by adding up the probabilities that X is greater
than 0, then adding the probabilities that X is greater than 1, then adding the probabilities that
X is greater than 2, and so on, all the way to ∞.

So the expected value is essentially the sum of the ”tail probabilities” of X. These tail prob-
abilities represent the chances that X exceeds various values, and when you add up all these
chances, you get the overall average value of X.

13

You might also like