Dimensionality Reduction
Dimensionality Reduction
Dimensionality Reduction
Learning Learning
Machine
Learning
Reinforcement
Learning
Supervised Unsupervised
Learning Learning
Machine
Learning
Reinforcement
Learning
Supervised Unsupervised
Learning Learning
Machine
Learning
Reinforcement
Learning
Classi cation
Supervised Unsupervised
Learning Learning
Machine
Learning
Regression
Reinforcement
Learning
fi
Classi
Machine
Learning
Reinforcement
Learning
ImageNet cat
196608D
256px
28px 256px
A ACA T
CG GGT ATAACA
C T
A T…
28px
T
256px
C
TA
Gene
TG T…
A
T
G
C
CC
A
G ATA T TG T
T TG T A
28px 256px
A ACA T
CG GGT ATAACA
C T
A T…
28px
T
256px
C
TA
Gene
TG T…
A
T
G
C
CC
A
G ATA T TG T
T TG T A
28px 256px
Hard to visualise
What is the problem with high-dimensional
things?
Hard to visualise
What is the problem with high-dimensional
things?
Algorithms tend
Hard to visualise
to get slow
fi
What is the problem with high-dimensional
things?
Nearest Neighbour Classi er
y Algorithms tend
Hard to visualise
Euclidean distance to get slow
d= (x2 − x1)2 + (y2 − y1)2 + (z2 − z1)2
fi
What is the problem with high-dimensional
things?
Nearest Neighbour Classi er
y Algorithms tend
Hard to visualise
Euclidean distance to get slow
d= (x2 − x1)2 + (y2 − y1)2 + (z2 − z1)2 + . . . + (n2 − n1)2
fi
What is the problem with high-dimensional
things?
Algorithms tend
Hard to visualise
to get slow
Algorithms tend
Hard to visualise
to get slow
The Curse of
Dimensionality
What is the curse of dimensionality?
What is the curse of dimensionality?
1 2 3 4 5 X
What is the curse of dimensionality?
X>2
False True
6 points 4 points
What is the curse of dimensionality?
X>2
2 4
False True
6 points 4 points
What is the curse of dimensionality?
6 points 4 points
What is the curse of dimensionality?
y 2 4
4
4
3
2 2
1
1 2 3 4 5 X
What is the curse of dimensionality?
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
What is the curse of dimensionality?
y 2 4
100% 0% 0% 100% 0% 0%
Highly unbalanced
5
regions
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
What is the curse of dimensionality?
y 2 4
100% 0% 0% 100% 0% 0%
Nothing is going on
5
here!
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
What is the curse of dimensionality?
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
On average 55.5% of cells will be
either empty or singletons
What is the curse of dimensionality?
z
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2
X
2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y
On average 55.5% of cells will be On average 92.5% of cells will be
either empty or singletons either empty or singletons
What is the curse of dimensionality?
In order to keep high-dimensional space reasonably
covered you need a lot more data
z
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2
X
2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y
On average 55.5% of cells will be On average 92.5% of cells will be
either empty or singletons either empty or singletons
What is the curse of dimensionality?
(part II)
What is the curse of dimensionality?
(part II)
Algorithms tend
Hard to visualise
to get slow
Algorithms tend
Hard to visualise
to get slow
A lot of high-
dimensional things
A ACA T
CG GGT ATAACA
C T
T A T…
C
TA
Gene
TG T…
A
A
T
G
C
CC
A
G ATA T TG T
T TG T A
A lot of high- High-dimensional things
VS
dimensional things are hard to work with
A ACA T GT ATAACA
CG
T ACG T
T…
O(n3)
C
TA
Gene
TG T…
O(2n)
A
A
T
G
C
CC
O(n2)
A
G ATA T TG T
T TG T A
O(n!)
Is there a way to break the curse?
Feature extraction vs feature elimination
Feature extraction vs feature elimination
Diameter
Feature extraction vs feature elimination
Circumference
Diameter
ce
en
erf
um
rc
Ci
Diameter
Principle Component Analysis
)
C1
(P
#1
nt
ne
po
om
eC
ipl
inc
Pr
)
C1
(P
#1
nt
ne
po
om
eC
ipl
inc
Pr
X
1 2 3 4 5
2-Dimensional data
y
x y 6
1 2
5
2 4 4
3 5 3
4 4
2
5 5 1
X
1 2 3 4 5
3-Dimensional data
z
6
x y z 5
4
1 2 2 3
2
2 4 0.5
1
3 5 1
X
1 2 3
1 4 5 6
2
4 4 1 3
4
5 5 0.5 5
6
y 7
200-Dimensional data?
200-Dimensional data?
Are all of these dimensions equally useful?
2-D example revisited
y
6
X
1 2 3 4 5
2-D example revisited
y
Main variation is from left to right
6
X
1 2 3 4 5
2-D example revisited
y
Main variation is from left to right
6
X
1 2 3 4 5
2-D example revisited
y
Main variation is from left to right
6
2
Not so much from top to bottom
1
X
1 2 3 4 5
2-D example revisited
y
Main variation is from left to right
6 We can keep only one dimension
5
4
X
1 2 3 4 5
3
2
Not so much from top to bottom
1
X
1 2 3 4 5
2-D example revisited
y
Main variation is from left to right
6 We can keep only one dimension
5
4
X
1 2 3 4 5
3
X
1 2 3 4 5
2-D example revisited
y
6
X
1 2 3 4 5
2-D example revisited
y
6
X
1 2 3 4 5
2-D example revisited
y
6
X
1 2 3 4 5
2-D example revisited
Data seem to be spread more equally
along X and y axes
y
6
X
1 2 3 4 5
X
1 2 3 4 5
2-D example revisited
y
Data is mostly spread along this
6 line
5
X
1 2 3 4 5
2-D example revisited
y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
2-D example revisited
How about we make new axes from these lines?
y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
2-D example revisited
How about we make new axes from these lines?
y y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
X
1 2 3 4 5
2-D example revisited
How about we make new axes from these lines?
y y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
X
1 2 3 4 5
2-D example revisited
X
How about we make new axes from these lines?
y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
y
2-D example revisited
X
How about we make new axes from these lines?
y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
y
2-D example revisited
X
These new axes are called principle components
y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
y
2-D example revisited
These new axes are called principle components X
y
Data is mostly spread along this
6 line
5
PC #1
4
3 #1
And PC
a little#1
bit is a new vector which
PC
along
2 this linemost of the variation in
spans along data
1
X
1 2 3 4 5
PC #2
6 line
5
PC
#2
3 #1
And PC
a little#1
bit is a new vector which
PC
along
2 this line
1
X
1 2 3 4 5
X
Principle components are not additional
axes/dimensions
y
PC #2
How many PCs will be
in 3D space? PC #1
#3
PC
X
As many as there were
original dimensions,
hence 3 PCs
Principle components are not additional
axes/dimensions
3
And a little bit along
2 this line
1
X
1 2 3 4 5
y
X
y
Data is mostly spread along this
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
y
X
y
Data is mostly spread along this
6 line
5
4
1 2 3 4 5
3
And a little bit along From 2D to 1D without
2 this line loosing much information
1
X
1 2 3 4 5
PC #2
How many PCs will be
formed in 200D space? PC #1
PC #2
How many PCs will be
formed in 200D space? PC #1
5 1 2
4 2 4
3 3 5
2 4 4
1 5 5
X
1 2 3 4 5
y
6 x y
5 1 2
4 2 4 x̄ = 3
3 3 5 ȳ=4
2 4 4
1 5 5
X
1 2 3 4 5
y
6 x y x - x̄ y-ȳ
5 1 2 -2 -2
4 2 4 x̄ = 3 -1 0
3 3 5 ȳ=4 0 1
2 4 4 1 0
1 5 5 2 1
X
1 2 3 4 5
y
2 x - x̄ y-ȳ
1 -2 -2
0 -1 0
-1 0 1
-2 1 0
-3 2 1
X
-2 -1 0 1 2
y Z
2 x - x̄ y-ȳ
1 -2 -2
0 -1 0
-1 0 1
-2 1 0
-3 2 1
X
-2 -1 0 1 2
Transpose the matrix of coordinates
Z
-2 -2
-1 0
0 1
1 0
2 1
Transpose the matrix of coordinates
Z
-2 -2
-1 0
0 1
What are the dimensions
of the transposed matrix?
1 0
2 1
Z
-2 -2 ⊤
Z
-1 0
-2 -1 0 1 2
0 1
-2 0 1 0 1
1 0
2 1
Transpose the matrix of coordinates
Z
-2 -2 ⊤
Z
-1 0
-2 -1 0 1 2
0 1
-2 0 1 0 1
1 0
2 1
⊤
Z ×Z=S
⊤
Z ×Z
=S
n−1
⊤
Z ×Z
=S
n−1
Because we compute
empirical covariance matrix
(i.e. from data)
Z
⊤ -2 -2
Z
-1 0
-2 -1 0 1 2
× 0 1 = S
-2 0 1 0 1
1 0
2 1
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 10 6
× 0 1 =
-2 0 1 0 1 6 6
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 10 6
× 0 1 = /n − 1
-2 0 1 0 1 6 6
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 10 6
× 0 1 = /4
-2 0 1 0 1 6 6
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 2.5 1.5
× 0 1 =
-2 0 1 0 1 1.5 1.5
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 2.5 1.5
× 0 1 =
-2 0 1 0 1 1.5 1.5
1 0
2 1 Covariance matrix
How to interpret values in covariance matrix?
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2]
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] x̄ = 0
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
mean of all
How to interpret values in covariance matrix?points
∑ (xi − x̄)2 x̄ = 0
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
mean of all
How to interpret values in covariance matrix?points
∑ (xi − 0)2 x̄ = 0
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
mean of all
How to interpret values in covariancex̄matrix?
= 0 points
∑ (xi)2
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Variance σ = 2.5
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Variance σ = 2.5
y
∑ (xi − x̄)2
2 σ=
number of
points
S n−1
1
Variance is an expected value
2.5squared
of the 1.5 deviation from
0
the mean
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
How to interpret values in covariance matrix?
y Variance along
2
rst axis
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
How to interpret values in covariance matrix?
y Variance along
2
rst axis
S
1
2.5 1.5
0
-1 1.5 1.5
?
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
How to interpret values in covariance matrix?
y Variance along
2
rst axis
S
1
2.5 1.5
0
Variance along
-1 1.5 1.5 second axis
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
How to interpret values in covariance matrix?
∑ (yi − ȳ)2
mean of all
points points
σ=
n−1
y Variance along
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
y Variance along
2
rst axis
S
1
2.5 1.5
0
Variance along
-1 1.5 1.5 second axis
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
How to interpret values in covariance matrix?
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Covariance indicates how two variables are related. A positive covariance means
the variables are positively related, while a negative covariance means the
variables are inversely related.
y
ce
n
aria
o v
e c
itiv
o s
Second variable
X
First variable
y
Second variable
Inv
ers
ec
ova
rian
ce
X
First variable
How to interpret values in covariance matrix?
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [-2, 0, 0, 1, 1]
∑ (xi − x̄)(yi − ȳ)
x̄ = 0 ȳ=0 cov(x, y) =
n−1
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [-2, 0, 0, 1, 1]
6
x̄ = 0 ȳ=0 cov(x, y) =
4
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [-2, 0, 0, 1, 1]
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1]
∑ (xi − x̄)(yi − ȳ)
x̄ = ? ȳ=? cov(x, y) =
n−1
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1]
∑ (xi − x̄)(yi − ȳ)
x̄ = 0 ȳ=0 cov(x, y) =
n−1
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1]
∑ (xi)(yi)
x̄ = 0 ȳ=0 cov(x, y) =
4
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] (−2)(0) + (−1)(0) + (0)(1) + (1)(−2) + (2)(1)
cov(x, y) =
4
x̄ = 0 ȳ=0
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] 0
cov(x, y) = = 0
x̄ = 0 ȳ=0 4
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] 0
cov(x, y) = = 0
x̄ = 0 ȳ=0 4
y
2
S
1
2.5 0
0
-1 0 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] 0
cov(x, y) = = 0
x̄ = 0 ȳ=0 4
y
2
S
1
2.5 0
0
-1 0 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
How to interpret values in covariance matrix?
Variance along
y rst axis
Covariances
2
S
1
2.5 1.5
0
Variance along
-1 1.5 1.5 second axis
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
How to interpret values in covariance matrix?
Variance along
y
Covariances
2
S
1
2.5 1.5
0
Aren’t they supposed to be
-1 1.5 1.5
between [-1,1]?
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Variance along
y
Covariances
2
S
1
2.5 1.5
0
Aren’t they supposed to be
-1 1.5 1.5
between [-1,1]?
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 2.5 1.5
-2 0 1 0 1
× 0 1 = × 4
1.5 1.5
1 0
2 1
S
⊤
PDP
2.5 1.5
=
1.5 1.5
S PDP ⊤
2.5 1.5 2.5 1.5
=
1.5 1.5 1.5 1.5
S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
Eigenvalues
-0.81 0.58
-0.58 -0.81
y
2
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-0.58
-1
-2
-3
-0.81
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-0.81
-1
-2
-3
0.58
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
ei
ge
nv
2
ec
to
r#
2
#1
-1 e c tor
e nv
eig
-2
-3
X
-2 -1 0 1 2
-0.81 0.58
-0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-0.81 -0.58
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-2 0
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-2 0
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-2*(-0.81) + (-2)*(-0.58) = 2.78 X
-2*(0.58) + (-2)*(-0.81) = 0.46
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-2 0
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-2*(-0.81) + (-2)*(-0.58) = 2.78 X
-2*(0.58) + (-2)*(-0.81) = 0.46
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1 0.46
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
2.78
X
-2 -1 0 1 2 -2 -1 0 1 2
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1 0.46
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
2.78
X
-2 -1 0 1 2 -2 -1 0 1 2
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-1*(-0.81) + (0)*(-0.58) = 0.81 X
-1*(0.58) + (0)*(-0.81) = -0.58
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-1*(-0.81) + (0)*(-0.58) = 0.81 X
-1*(0.58) + (0)*(-0.81) = -0.58
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
1
-1 nv e c tor
#
-1 -0.58
e
eig
-2 -2
-3 -3
0.81
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-1*(-0.81) + (0)*(-0.58) = 0.81 X
-1*(0.58) + (0)*(-0.81) = -0.58
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
1
-1 nv e c tor
#
-1 -0.58
e
eig
-2 -2
-3 -3
0.81
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
0*(-0.81) + (1)*(-0.58) = -0.58 X
0*(0.58) + (1)*(-0.81) = -0.81
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81
0*(-0.81) + (1)*(-0.58) = -0.58 X
0*(0.58) + (1)*(-0.81) = -0.81
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1 -0.81
e nv
eig
-2 -2
-3 -3
-0.58
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81
-0.81 0.58
X -2.2 0.35
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
×
These eigenvectors are called Principle = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81 Components
-0.81 0.58
-2.2 0.35
y
eigenvector #2
m
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
×
These eigenvectors are called Principle = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81 Components
-0.81 0.58
-2.2 0.35
y
PC2
m
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
1 1
0 0
PC1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
P⊤ Z⊤
-0.81 -0.58
×
-2 -1 0 1 2
=
2.78 0.81 -0.58 -0.81 -2.2 New
-2 0 1 0 1 0.46 0.58 0.81 -0.58 -0.35
0.58 -0.81
coordinates
Do you still remember what was it all about?
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
0 0.42
Eigenvalues
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2
Both old and new axes2 explain 4 units of
variance
1 1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2 PC1
Old coordinate system New coordinate system
S D
2.5 1.5 3.58 0
PC2
2
Both old and new axes2 explain 4 units of
variance
1 1
-2 2.5/4=62.5% -2
And Y axis explains:
-3 -3
1.5/4=37.5%
X
-2 -1 0 1 2 -2 -1 0 1 2 PC1
Old coordinate system New coordinate system
S D
2.5 1.5 3.58 0
PC2
2
Both old and new axes2 explain 4 units of
variance
1 1
X
-2 -1 0 1 2 -2 -1 0 1 2 PC1
Old coordinate system New coordinate system
We can ignore the second PC because it
does not contain much information
How much information the second PC contains?
y
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
PCA
Train/test split
Find the
4 best model 3
using CV
Safe place
Evaluate nal
5 model on
the test set
Pro t
fi
fi
2 Preprocessing
Train/test split
3
Evaluate
2
(subtract mean)
2
(subtract mean)
Evaluate
2
(subtract mean)
Safe place
Evaluate
2
(subtract mean)
Safe place
Evaluate
2
(subtract mean)
Evaluate
2
(subtract mean)
Evaluate
Evaluate nal
6 model on
the test set
Pro t
fi
fi
Safe place
PCA
PCA
Reverse
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
t-SNE iteratively tries to make distances in low-
dimensional space to be similar to distances in high-
dimensional space
y
2D 1D
X X
X X
UMAP ultimately tries to achieve similar things,
using slightly different mechanisms
t-SNE iteratively tries to make distances in low-
dimensional space to be similar to distances in high-
dimensional space
y
2D 1D
X X
UMAP ultimately tries to achieve similar things,
using slightly different mechanisms
t-SNE iteratively tries to make distances in low-
dimensional space to be similar to distances in high-
dimensional space
y
2D 1D
X X
UMAP ultimately tries to achieve similar things,
using slightly different mechanisms
UMAP explained: https://pair-code.github.io/understanding-umap/
UMAP explained and compared to t-SNE: https://pair-code.github.io/understanding-umap/
Projector Tensor ow
https://projector.tensor ow.org/
fl
fl
Recap
0 ?1 Classi
Machine
?$ Learning
Reinforcement
Learning
to get slow
z
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2
X
2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y
On average 55.5% of cells will be On average 92.5% of cells will be
either empty or singletons either empty or singletons
200-Dimensional data?
Are all of these dimensions equally useful?
Principle components are not additional
axes/dimensions
PC #2
How many PCs will be
formed in 200D space? PC #1
Z⊤ -2 -2
-1 0
S
-2 -1 0 1 2
× 0 1 =
-2 0 1 0 1
1 0
2 1
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
P⊤ Z⊤
-0.81 -0.58 -2 -1 0 1 2 2.78 0.81 -0.58 -0.81 -2.2
× =
-2 0 1 0 1 0.46 0.58 0.81 -0.58 -0.35
0.58 -0.81
Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
P⊤ Z⊤
-0.81 -0.58
×
-2 -1 0 1 2
=
2.78 0.81 -0.58 -0.81 -2.2 New
-2 0 1 0 1
0.58 -0.81 0.46 0.58 0.81 -0.58 -0.35
coordinates