Math Review For ML
Math Review For ML
Math Review For ML
Fall 2018
ECE – Carnegie Mellon University
Outline
1. Linear Algebra
3. Probability
4. Review on Statistics
1
Linear Algebra
Linear Algebra
Probability
Review on Statistics
2
Vector spaces – definition
Vector Space (V , +, ·) over a field F
Set of elements (vectors) with two operations:
3
Vector spaces – definition
Vector Space (V , +, ·) over a field F
Set of elements (vectors) with two operations:
Satisfying:
1. ∃0∈V : x + 0 = x,
2. ∀x∈V : ∃−x : x + (−x)0,
3. ∃ζ ∈ F : ζx = x we denote ζ = 1,
4. Commutativity: x + y = y + x.
5. Associativity: (x + y) + z = x + (y + z) and α(βx) = (αβ)x,
6. Distributivity: α(x + y) = αx + αy and (α + β)x = αx + αx,
Linear Independence
x1 , x2 , . . . , xn ∈ V are linearly independent if
n
X
αi xi = 0 =⇒ α1 , . . . , αn = 0.
i=1
4
Vector spaces
Linear Independence
x1 , x2 , . . . , xn ∈ V are linearly independent if
n
X
αi xi = 0 =⇒ α1 , . . . , αn = 0.
i=1
Span
The span of x1 , x2 , . . . , xn inV is
n
X
L{x1 , x2 , . . . , xn } = {x ∈ V : ∃α1 ,...,αn ∈F : αi xi = x}.
i=1
4
Vector spaces
Basis
B = {x1 , . . . , xn } is a basis of a vector space V if
n
X
∀x∈V ∃α1 ,...,αn ∈F : αi xi = x,
i=1
5
Normed spaces
Norm
Let V be a real vector space. A Norm is a function, denoted by
k · k : V → R, that satisfies:
6
Normed spaces
Norm
Let V be a real vector space. A Norm is a function, denoted by
k · k : V → R, that satisfies:
Examples (Norms in Rn ):
Pn
• kxk1 = i=1 |xi |,
Pn 1
p p
• kxkp = i=1 |xi | , p ≥ 1,
• kxk∞ = max1≤i≤n |xi |.
6
Inner product spaces
Inner product
An inner product on a real vector space V is a function
h·i : V × V → R satisfying:
Example
Inner product in Rn
n
X
hx, yi = xi yi = x> y.
i=1
7
Inner product spaces
Remark
p
Any inner product in V induces a norm on V : kxk = hx, xi.
8
Inner product spaces
Remark
p
Any inner product in V induces a norm on V : kxk = hx, xi.
Orthogonality
Two vectors x, y ∈ V are orthogonal, x ⊥ y if hx, yi = 0.
8
Inner product spaces
Remark
p
Any inner product in V induces a norm on V : kxk = hx, xi.
Orthogonality
Two vectors x, y ∈ V are orthogonal, x ⊥ y if hx, yi = 0.
Pythagorean Theorem
If x ⊥ y, then
kx + yk2 = kxk2 + kyk2 .
8
Inner product spaces
Remark
p
Any inner product in V induces a norm on V : kxk = hx, xi.
Orthogonality
Two vectors x, y ∈ V are orthogonal, x ⊥ y if hx, yi = 0.
Pythagorean Theorem
If x ⊥ y, then
kx + yk2 = kxk2 + kyk2 .
Cauchy-Schwarz Inequality
8
Singular value decomposition (SVD) I
9
Calculus and Optimization
Linear Algebra
Probability
Review on Statistics
10
Gradient
Gradient
Consider a multivariate function f : Rd → R, the gradient of f is:
∂f
∂x. 1 ∂f
∇f = . [∇f ]i = ∀i ∈ {1, 2, . . . , d}
. ∂xi
∂f
∂xn
11
Jacobian
Jacobian
The Jacobian of a vector field f : Rn → Rm is:
∂f1 ∂f1
. . .
∂x. 1 . ∂xn
.. ∂fi
Jf = .. .. . [Jf ]ij = ∂x
j
∂fn
∂x1 . . . ∂f
∂xn
m
12
Hessian
Hessian
The Hessian of a vector field f : Rn → Rm is:
∂f1 ∂f1
∂x1 . . . ∂x n
. .. .. ∂2f
Hf = . . . [Hf ]ij = ∂x ∂x
.
i j
∂fn
∂x1 . . . ∂f
∂xn
m
13
Hessian
Clairaut’s Theorem
If the second order partial derivatives of f : Rd → R are continuous, at
a point x, then
∂2f ∂2f
(x) = (x), ∀i,j∈{1,...,d} ,
∂xi ∂xj ∂xj ∂xi
in this case the Hessian is symmetric [Hf ]ij (x) = [Hf ]ji (x).
14
Matrix Calculus
∇x (a> x) = a
(
> > (A + A> )x,
∇x (x A x) =
2Ax, if A is symmetric.
15
Chain rule
16
Chain rule
If k = 1, we have f ◦ g : Rn → R and
16
Convexity
17
Probability
Linear Algebra
Probability
Review on Statistics
18
Setup
Random variables can take on many values, and we are often interested
in the distribution over the values of a random variable, e.g., P(Y = 0)
20
Distribution function
21
Example of distributions
22
Expectation
Expected Values
P
• Discrete random variable X, E [g (X )] = x∈X g (x)f (x);
R∞
• Continuous random variable X, E [g (X )] = −∞ g (x)f (x)
23
Multivariate Distributions
Definition:
FX ,Y (x, y ) := P(X ≤ x, Y ≤ y ),
and
∂ 2 FX ,Y (x, y )
fX ,Y (x, y ) := ,
∂x∂y
24
Conditional Probability and Bayes Rule
P(X = x, Y = y ) fX ,Y (x, y )
fX |Y (x|y ) = P(X = x|Y = y ) = =
P(Y = y ) fY (y )
Bayes Rule:
P(Y |X )P(X )
P(X |Y ) =
P(Y )
25
Independence
26
Review on Statistics
Statistics
Sample Variance:
N
2 1 X
SN−1 = (Xi − X̄ )2 .
N −1
i=1
If Xi are iid:
E [X̄ ] = E [Xi ] = µ,
Var (X̄ ) = σ 2 /N,
2
E [SN−1 ] = σ2
27
Point Estimation
bias(θ̂N ) = Eθ [θ̂N ] − θ
28
Example
x1 , x2 , · · · , xN
Then,
1
P
• Sample mean X̄ = N n xn is an unbiased estimator of X ’s mean.
2 1 2
P
• Sample variance SN−1 = N−1 n (xn − X̄ ) is an unbiased estimator
of X ’s variance
1
• Sample variance SN2 = − X̄ )2 is not an unbiased estimator
P
N n (xn
of X ’s variance
29