Probability Theory - Formula Sheet

Cheat Sheet
2DF20
2023 / 2024
1
1 Discrete Distributions
∞
X ∞
X
E[X] = P(X = k) · k = pk · k
k=0 k=0
X∞
E[X 2 ] = pk · k 2
k=0
∞ ∞
!2
X X
Var[X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2 = pk · k 2 − pk · k
k=0 k=0
1.1 Bernoulli Distribution

X ∼ Bernoulli(p) - number of successes X in one trial with success probability p.
P(X = 1) = p, and P(X = 0) = 1 − p
E[X] = p
Var[X] = p · (1 − p)
1.2 Binomial Distribution

X ∼ Bin(n, p) - number of successes
X in n trials with success probability
p.
n n!
P(X = k) = · pk · (1 − p)n−k = · pk · (1 − p)n−k
k k! · (n − k)!
E[X] = n · p
Var[X] = n · p · (1 − p)
1.2.1 Properties
X
X1 + ... + Xn ∼ Bin mi , p , for Xi ∼ Bin(mi , p)
X1 + ... + Xn ∼ Bin(n, p), for Xi ∼ Bernoulli(p)
1.3 Poisson Distribution

X ∼ Pois(λ) - number of events X occurring in a fixed interval with rate λ.
λk
P(X = k) = e−λ ·
k!
E[X] = λ
Var[X] = λ
1.3.1 Properties
X
X1 + ... + Xn ∼ Pois λi , for Xi ∼ Pois(λi )

λ
lim Bin n, ≡ Pois(λ)
n→∞ n
2
1.4 Geometric Distribution
X ∼ Geo(p) - number of successes X before the first fail with success probability p
P(X = k) = pk · (1 − p)
p
E[X] =
1−p
p
Var[X] =
(1 − p)2
1.4.1 Properties
P(X ≥ k + a|X ≥ a) = P(X ≥ k) (memoryless property)
1.4.2 Special Cases

P(X = k) = p · (1 − p)k - number of fails X before the first success
P(X = k) = pk−1 · (1 − p) - number of trails X until the first fail
- (a lot of similarities with Bin(n = k, p))
3
2 Probability Generating Function (PGF)
∞
X ∞
X
k
GX (z) = P(X = k) · z = pk · z k = p0 + p1 · z + p2 · z 2 + p3 · z 3 + ...
k=0 k=0
GX (z) = E[z X ]
2.0.1 Properties
From PGF the
following
can
be shown:
k
1 (k) 1 d
P(X = k) = GX (0) = GX (z)
k! k! dz k z=0
From PGF the following can be shown:

∞
!
d X
E[X] = G′X (1) = GX (z) = p1 + 2p2 z + 3p3 z 2 + ... = p1 + 2p2 + 3p3 + ... = pk · k
dz z=1 z=1 k=0
More generally:
(k) dk
E[X(X − 1)(X − 2)...(X − k + 1)] = GX (1) = k GX (z)
dz z=1
Suppose X = Y + Z,
then GX (z) = GY (z) · GZ (z)
Suppose X = p · Y + (1 − p) · Z,
then GX (z) = p · GY (z) + (1 − p) · GZ (z)
2.1 Common PDFs

X ∼ Bin(n, p): GX (z) = (1 − p + pz)n
X ∼ Pois(λ): GX (z) = exp(λ(z − 1))
1−p
X ∼ Geo(p): GX (z) =
1 − pz
4
3 Continuous Distributions
F (x) = P(X ≤ x)
d
f (x) = F (x)
dx
Z ∞
E[X] = x · f (x) dx
Z0 ∞
E[X 2 ] = x2 · f (x) dx
0 2
Z ∞ Z ∞
2 2 2 2
Var[X] = E[(X − E[X]) ] = E[X ] − (E[X]) = x · f (x) dx − x · f (x) dx
0 0
3.1 Uniform Distribution

X ∼ Uniform(a, b) - all and only the X between a and b even likely to occur.
1
f (x) = , for a < x < b, else f (x) = 0
b−a
1
E[X] = (b + a)
2
1
Var[X] = (b − a)2
12
3.2 Exponential Distribution

X ∼ Exp(λ) ≡ Gamma(1, λ) - time X before an event occurs with rate λ.
f (x) = λ · exp(−λx), for x ≥ 0, else f (x) = 0
F (x) = P(X ≤ x) = 1 − exp(−λx)
1
E[X] =
λ
1
Var[X] = 2
λ
3.2.1 Properties
P(X > k + a|X > a) = P(X > a) (memoryless property)
X
min(X1 , ..., Xn ) ∼ Exp λi , for Xi ∼ Exp(λi )
λ1
P(X1 < X2 ) = , for Xi ∼ Exp(λi )
λ1 + λ2
λi
P(Xi = min(X1 , ..., Xn )) = , for Xi ∼ Exp(λi )
λ1 + ... + λn
3.3 Erlang Distribution

X ∼ Erlang(k, λ) ≡ Gamma(k, λ), where k ∈ Z+ - time X before k events occur with rate λ.
x(k−1)
f (x) = · λk · e−λx , for x ≥ 0, else f (x) = 0
(k − 1)!
5
1
E[X] = k ·
λ
1
Var[X] = k · 2
λ
3.3.1 Properties
X1 + ... + Xn ∼ Erlang(n, λ), for Xi ∼ Exp(λ)
3.4 Gamma Distribution

X ∼ Gamma(α, β) - time X before α (can be non-integer) events occur with rate β.
Z ∞
x(α−1) α −βx
f (x) = · β · e , where Γ(α) = x(α−1) e−x dx
Γ(α) 0
α
E[X] =
β
α
Var[X] = 2
β
3.4.1 Properties
Γ(α) = (α − 1)!, for α ∈ Z+
Γ(α + 1)
=α → Γ(α) · α = Γ(α + 1)
Γ(α)
3.5 Normal Distribution

X ∼ N (µ, σ 2 ).
(x − µ)2

1
f (x) = √ · exp −
2πσ 2 2σ 2
E[X] = µ
Var[X] = σ 2
3.5.1 Properties
X X
X1 ± ... ± Xn ∼ N ±µi , σi2 , for Xi ∼ N (µ, σ 2 )
3.6 Lognormal Distribution

X ∼ LN (µ, σ 2 ), where ln(X) ∼ N (µ, σ 2 )
(ln(x) − µ)2

1
f (x) = √ · exp −
x 2πσ 2 2σ 2
1
E[X] = exp(µ + σ 2 )
2
E[X 2 ] = exp(2µ + 2σ 2 )
Var[X] = exp(2µ + 2σ 2 ) − exp(2µ + σ 2 )
= (exp(σ 2 ) − 1) · exp(2µ + σ 2 )
6
4 Moment Generating Function (MGF)
Z ∞
MX (t) = etu · f (u) du = E[etX ]
−∞
t2 t3 tn
MX (t) = t + tE[X] + E[X 2 ] + E[X 3 ] + ... + + E[X n ]
2! 3! n!
4.0.1 Properties
From MGF the following can be shown:
t2

d
E[X] = MX′ (0) = MX (t) = E[X] + E[X 2 ]t + E[X 3 ] + ... = E[X]
dt t=0 2! t=0
More generally:
(k) dk
E[X k ] = MX (0) = MX (t)
dtk t=0
Suppose Y = aX + b,
then MY (t) = MX (at) · ebt
Suppose X = Y + Z,
then MX (t) = MY (t) · MZ (t)
Suppose X = p · Y + (1 − p) · Z,
then MX (t) = p · MY (t) + (1 − p) · MZ (t)
4.1 Common MGFs

etb − eta
X ∼ Uniform(a, b): MX (t) =
tb − ta
λ
X ∼ Exp(λ): MX (t) =
λ−t α
β
X ∼ Gamma(α, β): MX (t) =
β−t
2 1 22
X ∼ N (µ, σ ): MX (t) = exp µt + σ t
2
4.2 Laplace-Stieltjes Transforms (LSTs)

Z ∞
ϕX (s) = e−su · f (u) du = E[e−sX ]
−∞
ϕX (s) = MX (−s)
d
E[X] = −ϕ′X (0) = − ϕX (s)
ds s=0
k
(k) d
E[X k ] = ϕX (0) = (−1)k k ϕX (s)
ds s=0
7
5 Sums of Random Variables
Sn = X1 + X2 + ... + Xn ,
where, e.g., Xi is the payment for claim i, n is the number of claims, and Sn is the total claim
amount.
E[Sn ] = E[X1 + ... + Xn ] = E[X1 ] + ... + E[Xn ]

Var[Sn ] = Var[X1 + ... + Xn ] = Var[X1 ] + ... + Var[Xn ], (because Xi are independent)
n n
!
Y Y
MSn (t) = MXi (t) = E[etXi ] = E[et(X1 +...+Xn ) ] = MX1 +...+Xn (t)
i=1 i=1
5.1 Convolution Formula

k
X
P(X1 + X2 = k) = P(X1 = j) · P(X2 = k − j)
j=0
5.2 Stochastic Sums of Random Variables

SN (t) = X1 + X2 + ... + XN (t) ,
where, e.g., Xi is the payment for claim i, N (t) is the (uncertain) number of claims in (0, t], and
SN (t) is the total claim amount.
If, N (t) ∼ Discrete(...), with GN (t) (z),

and Xi , with MXi (a), then:
MSN (t) (a) = E[eaSN (t) ]

X∞
MSN (t) (a) = P(N (t) = n) · E[ea(X1 +X2 +...+Xn ) ]
n=0
∞
X n
MSN (t) (a) = P(N (t) = n) · E[eaX1 ]
n=0
∞
X
MSN (t) (a) = P(N (t) = n) · (MX1 (a))n
n=0
MSN (t) (a) = GN (t) (MX1 (a))
E[SN (t) ] = E[X1 ] · E[N (t)]

Var[SN (t) ] = Var[X1 ] · E[N (t)] + Var[N (t)] · (E[X1 ])2
5.3 Counting Process

A stochastic process is called a counting process when N (t) represents the number of ’events’ that
have occurred.
All counting processes have to satisfy the following properties:
(1) N (0) = 0
8
(2) N (t) ∈ Z+
(3) N (b) ≥ N (a), for b > a
(4) N (b) − N (a) is the number of events in (a, b]
Independent Increments: This means N (b)−N (a) and N (d)−N (c) are independent if the intervals
(a, b] and (c, d] do not overlap.
Stationary Increments: This means that N (b) − N (a) and N (b + u) − N (a + u) are identically
distributed.
5.4 Poisson Process

A stochastic process {N (t), t ≥ 0} is called a Poisson process, PP(λ), if it follows the properties:
(1) Counting Process
(2) Independent Increments
(3) Stationary Increments
(λt)n
(4) P(N (t) = n) = exp(−λt) · , hence N (t) ∼ PP(λ) ≡ Pois(λt)
n!
5.4.1 Properties
N1 (t) + ... + Nm (t) ∼ PP(λ1 + ... + λm ) ≡ Pois((λ1 + ... + λm ) · t), for Ni ∼ PP(λi ) ≡ Pois(λi · t)
Y1 ∼ PP(pλ), Y2 ∼ PP((p − 1)λ), for X = pY1 + (p − 1)Y2 ∼ PP(λ)
P(T1 > t) = P(N (t) = 0) = exp(−λt)

Ti ∼ Exp(λ), same λ as in Pois(λt)
Total time:
Vn = T1+ T2 +... + Tn
n
λ
MVn =
λ−t
So, Vn ∼ Gamma(n, λ) ≡ Erlang(n, λ)
5.5 Compound Poisson Process

A stochastic process {S(t), t ≥ 0} is said to be a compound Poisson process if it can be represented
by:
S(t) = SN (t) = X1 + X2 + ... + XN (t) , where {N (t), t ≥ 0} is a Poisson process with rate λ.
E[SN (t) ] = λt · E[X]

Var[SN (t) ] = λt · E[X 2 ]
Since GN (z)
= exp(λt(z
− 1),
it follows:
λ
MS(u) = exp λt −1
λ−1
9
6 Theorems
6.1 Markov Inequality
E[X] 1
P(X ≥ a) ≤ - the probability that X is at least a, is at most times the E[X]
a a
6.2 Chebychev Inequality

Var[X] 1
P(|X − E[X]| ≥ a) ≤ - the prob. that the sd. is at least a, is at most times the Var[X]
a2 a2
6.3 Weak Law of Large Numbers

n
! !
1X
lim P Xi − E[X] > ϵ = 0 - the prob. that for ∞ variables X̄ and E[X] differ is 0
n→∞ n i=1
n
1X
lim Xi = E[X] - for ∞ variables X̄ is E[X]
n→∞ n
i=1
6.4 Central Limit Theorem (CLT)

n
X Sn
Let Sn = Xi , and X̄n =
i=1
n
Sn − n · E[X]
Then Zn = p , is Sn normalized for the mean and variance
n · Var[X]
X̄n − E[X] √
Then Zn = p · n
Var[X] r
n
Then Zn = (X̄n − E[X]) ·
σ2
lim P(Zn ≤ z) = Φ(z), where Φ(z) is the CDF of N (0, 1)

n→∞
10
7 Calculating Rules
7.1 Expected Value and Variance
E[b] =b
E[aX] = aE[X]
E[X + Y ] = E[X] + E[Y ]
E[E[X]] = E[X]
E[X · Y ] = E[X] · E[Y ], if X, Y are independent
Var[b] =0
Var[aX] = a2 Var[X]
Var[X + Y ] = Var[X] + Var[Y ] + 2(E[X · Y ] − E[X] · E[Y ]))
Var[X + Y ] = Var[X] + Var[Y ], if X, Y are independent
7.2 Sum Operator

7.2.1 Linearity
n
X
c=n·c
k=0
n
X n
X
c · ak = c · ak
k=0 k=0
n
X n
X n
X
ak + b k = ak + bk
k=0 k=0 k=0
7.2.2 Exponential Sums

n
X n(n + 1)
k=
k=0
2
n
X n(n + 1)(2n + 1)
k2 =
k=0
6
n
X n2 (n + 1)2
k3 =
k=0
4
7.2.3 Power Series / Convergent Series

∞
X 1
=e
k=0
k!
∞
X xk
= ex , more generally (Taylor Series)
k=0
k!
∞
X 1 n
=
k=0
nk n−1
11
7.3 Derivative
7.3.1 Linearity
h(x) = a · f (x) + b · g(x)
h′ (x) = a · f ′ (g(x)) + b · g ′ (x)
7.3.2 Product Rule

h(x) = f (x) · g(x)
h′ (x) = f ′ (x) · g(x) + f (x) · g ′ (x)
7.3.3 Chain Rule

h(x) = f (g(x))
h′ (x) = f ′ (g(x)) · g ′ (x)
7.3.4 Quotient Rule

f (x)
h(x) =
g(x)
f ′ (x) · g(x) − f (x) · g ′ (x)
h′ (x) =
g(x)2
7.3.5 Reciprocal Rule

1
h(x) =
g(x)
−g ′ (x) [1]′ · g(x) − 1 · g ′ (x)

′
h (x) = =
g(x)2 g(x)2
7.3.6 Useful Derivatives

f (x) = a, f ′ (x) = 0
f (x) = x5 , f ′ (x) = 5x4
f (x) = ax , f ′ (x) = ax · ln(a)
f (x) = ex , f ′ (x) = ex
1
f (x) = ln(x), f ′ (x) =
x
7.4 Combination Operator

n n!
= - ”n choose x”
x x! · (n − x)!
7.5 Binomial Theorem

n
X n k (n−k)
a b = (a + b)n , for a, b > 0
k=0
k
12
7.6 Lemma
∞
X
E[X] = P(X > k)
k=0
Means that you can find the average value of X by adding up the probabilities that X is greater
than 0, then adding the probabilities that X is greater than 1, then adding the probabilities that
X is greater than 2, and so on, all the way to ∞.
So the expected value is essentially the sum of the ”tail probabilities” of X. These tail prob-
abilities represent the chances that X exceeds various values, and when you add up all these
chances, you get the overall average value of X.
13

Probability Theory - Formula Sheet

Uploaded by

Copyright:

Available Formats

Probability Theory - Formula Sheet

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory - Formula Sheet

Uploaded by

Copyright:

Available Formats

Cheat Sheet

1.1 Bernoulli Distribution

1.2 Binomial Distribution

1.3 Poisson Distribution

1.4.2 Special Cases

From PGF the following can be shown:

2.1 Common PDFs

3.1 Uniform Distribution

3.2 Exponential Distribution

3.3 Erlang Distribution

3.4 Gamma Distribution

3.5 Normal Distribution

3.6 Lognormal Distribution

4.1 Common MGFs

4.2 Laplace-Stieltjes Transforms (LSTs)

E[Sn ] = E[X1 + ... + Xn ] = E[X1 ] + ... + E[Xn ]

5.1 Convolution Formula

5.2 Stochastic Sums of Random Variables

If, N (t) ∼ Discrete(...), with GN (t) (z),

MSN (t) (a) = E[eaSN (t) ]

E[SN (t) ] = E[X1 ] · E[N (t)]

5.3 Counting Process

5.4 Poisson Process

P(T1 > t) = P(N (t) = 0) = exp(−λt)

5.5 Compound Poisson Process

E[SN (t) ] = λt · E[X]

6.2 Chebychev Inequality

6.3 Weak Law of Large Numbers

6.4 Central Limit Theorem (CLT)

lim P(Zn ≤ z) = Φ(z), where Φ(z) is the CDF of N (0, 1)

7.2 Sum Operator

7.2.2 Exponential Sums

7.2.3 Power Series / Convergent Series

7.3.2 Product Rule

7.3.3 Chain Rule

7.3.4 Quotient Rule

7.3.5 Reciprocal Rule

7.3.6 Useful Derivatives

7.4 Combination Operator

7.5 Binomial Theorem

You might also like