Probability Theory - Formula Sheet
Probability Theory - Formula Sheet
Probability Theory - Formula Sheet
2DF20
2023 / 2024
1
1 Discrete Distributions
∞
X ∞
X
E[X] = P(X = k) · k = pk · k
k=0 k=0
X∞
E[X 2 ] = pk · k 2
k=0
∞ ∞
!2
X X
Var[X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2 = pk · k 2 − pk · k
k=0 k=0
1.2.1 Properties
X
X1 + ... + Xn ∼ Bin mi , p , for Xi ∼ Bin(mi , p)
X1 + ... + Xn ∼ Bin(n, p), for Xi ∼ Bernoulli(p)
1.3.1 Properties
X
X1 + ... + Xn ∼ Pois λi , for Xi ∼ Pois(λi )
λ
lim Bin n, ≡ Pois(λ)
n→∞ n
2
1.4 Geometric Distribution
X ∼ Geo(p) - number of successes X before the first fail with success probability p
P(X = k) = pk · (1 − p)
p
E[X] =
1−p
p
Var[X] =
(1 − p)2
1.4.1 Properties
P(X ≥ k + a|X ≥ a) = P(X ≥ k) (memoryless property)
3
2 Probability Generating Function (PGF)
∞
X ∞
X
k
GX (z) = P(X = k) · z = pk · z k = p0 + p1 · z + p2 · z 2 + p3 · z 3 + ...
k=0 k=0
GX (z) = E[z X ]
2.0.1 Properties
From PGF the
following
can
be shown:
k
1 (k) 1 d
P(X = k) = GX (0) = GX (z)
k! k! dz k z=0
More generally:
(k) dk
E[X(X − 1)(X − 2)...(X − k + 1)] = GX (1) = k GX (z)
dz z=1
Suppose X = Y + Z,
then GX (z) = GY (z) · GZ (z)
Suppose X = p · Y + (1 − p) · Z,
then GX (z) = p · GY (z) + (1 − p) · GZ (z)
4
3 Continuous Distributions
F (x) = P(X ≤ x)
d
f (x) = F (x)
dx
Z ∞
E[X] = x · f (x) dx
Z0 ∞
E[X 2 ] = x2 · f (x) dx
0 2
Z ∞ Z ∞
2 2 2 2
Var[X] = E[(X − E[X]) ] = E[X ] − (E[X]) = x · f (x) dx − x · f (x) dx
0 0
3.2.1 Properties
P(X > k + a|X > a) = P(X > a) (memoryless property)
X
min(X1 , ..., Xn ) ∼ Exp λi , for Xi ∼ Exp(λi )
λ1
P(X1 < X2 ) = , for Xi ∼ Exp(λi )
λ1 + λ2
λi
P(Xi = min(X1 , ..., Xn )) = , for Xi ∼ Exp(λi )
λ1 + ... + λn
5
1
E[X] = k ·
λ
1
Var[X] = k · 2
λ
3.3.1 Properties
X1 + ... + Xn ∼ Erlang(n, λ), for Xi ∼ Exp(λ)
3.4.1 Properties
Γ(α) = (α − 1)!, for α ∈ Z+
Γ(α + 1)
=α → Γ(α) · α = Γ(α + 1)
Γ(α)
3.5.1 Properties
X X
X1 ± ... ± Xn ∼ N ±µi , σi2 , for Xi ∼ N (µ, σ 2 )
(ln(x) − µ)2
1
f (x) = √ · exp −
x 2πσ 2 2σ 2
1
E[X] = exp(µ + σ 2 )
2
E[X 2 ] = exp(2µ + 2σ 2 )
Var[X] = exp(2µ + 2σ 2 ) − exp(2µ + σ 2 )
= (exp(σ 2 ) − 1) · exp(2µ + σ 2 )
6
4 Moment Generating Function (MGF)
Z ∞
MX (t) = etu · f (u) du = E[etX ]
−∞
t2 t3 tn
MX (t) = t + tE[X] + E[X 2 ] + E[X 3 ] + ... + + E[X n ]
2! 3! n!
4.0.1 Properties
From MGF the following can be shown:
t2
d
E[X] = MX′ (0) = MX (t) = E[X] + E[X 2 ]t + E[X 3 ] + ... = E[X]
dt t=0 2! t=0
More generally:
(k) dk
E[X k ] = MX (0) = MX (t)
dtk t=0
Suppose Y = aX + b,
then MY (t) = MX (at) · ebt
Suppose X = Y + Z,
then MX (t) = MY (t) · MZ (t)
Suppose X = p · Y + (1 − p) · Z,
then MX (t) = p · MY (t) + (1 − p) · MZ (t)
d
E[X] = −ϕ′X (0) = − ϕX (s)
ds s=0
k
(k) d
E[X k ] = ϕX (0) = (−1)k k ϕX (s)
ds s=0
7
5 Sums of Random Variables
Sn = X1 + X2 + ... + Xn ,
where, e.g., Xi is the payment for claim i, n is the number of claims, and Sn is the total claim
amount.
8
(2) N (t) ∈ Z+
(3) N (b) ≥ N (a), for b > a
(4) N (b) − N (a) is the number of events in (a, b]
Independent Increments: This means N (b)−N (a) and N (d)−N (c) are independent if the intervals
(a, b] and (c, d] do not overlap.
Stationary Increments: This means that N (b) − N (a) and N (b + u) − N (a + u) are identically
distributed.
5.4.1 Properties
N1 (t) + ... + Nm (t) ∼ PP(λ1 + ... + λm ) ≡ Pois((λ1 + ... + λm ) · t), for Ni ∼ PP(λi ) ≡ Pois(λi · t)
Y1 ∼ PP(pλ), Y2 ∼ PP((p − 1)λ), for X = pY1 + (p − 1)Y2 ∼ PP(λ)
Total time:
Vn = T1+ T2 +... + Tn
n
λ
MVn =
λ−t
So, Vn ∼ Gamma(n, λ) ≡ Erlang(n, λ)
Since GN (z)
= exp(λt(z
− 1),
it follows:
λ
MS(u) = exp λt −1
λ−1
9
6 Theorems
6.1 Markov Inequality
E[X] 1
P(X ≥ a) ≤ - the probability that X is at least a, is at most times the E[X]
a a
10
7 Calculating Rules
7.1 Expected Value and Variance
E[b] =b
E[aX] = aE[X]
E[X + Y ] = E[X] + E[Y ]
E[E[X]] = E[X]
E[X · Y ] = E[X] · E[Y ], if X, Y are independent
Var[b] =0
Var[aX] = a2 Var[X]
Var[X + Y ] = Var[X] + Var[Y ] + 2(E[X · Y ] − E[X] · E[Y ]))
Var[X + Y ] = Var[X] + Var[Y ], if X, Y are independent
11
7.3 Derivative
7.3.1 Linearity
h(x) = a · f (x) + b · g(x)
h′ (x) = a · f ′ (g(x)) + b · g ′ (x)
12
7.6 Lemma
∞
X
E[X] = P(X > k)
k=0
Means that you can find the average value of X by adding up the probabilities that X is greater
than 0, then adding the probabilities that X is greater than 1, then adding the probabilities that
X is greater than 2, and so on, all the way to ∞.
So the expected value is essentially the sum of the ”tail probabilities” of X. These tail prob-
abilities represent the chances that X exceeds various values, and when you add up all these
chances, you get the overall average value of X.
13