Dimensionality Reduction

Supervised Unsupervised
Learning Learning
Machine
Learning
Reinforcement
Learning
Learning Learning
Machine
Learning
Reinforcement
Learning
Learning Learning
Machine
Learning
Reinforcement
Learning
Classi cation
Learning Learning
Machine
Learning
Regression
Reinforcement
Learning
fi
Classi
Machine
Learning
Reinforcement
Learning
ImageNet cat
196608D
What are the examples of high-dimensional

objects?
A taxi ride from
LiDAR recording
NYC data
3.5$
Object from 7D few million D
MNIST digit ImageNet cat

3x
28px
256px
28px 256px
Object from Object from

784D 196608D


objects?
A taxi ride from
LiDAR recording
NYC data
3.5$
MNIST digit ImageNet cat Human DNA

3x
A ACA T
CG GGT ATAACA
C T
A T…
28px
T
256px
C
TA
Gene
TG T…
A
T
G
C
CC
A
G ATA T TG T
T TG T A
28px 256px
Object from Object from

784D 196608D


objects?
A taxi ride from
LiDAR recording
NYC data
3.5$
MNIST digit ImageNet cat Human DNA

3x
A ACA T
CG GGT ATAACA
C T
A T…
28px
T
256px
C
TA
Gene
TG T…
A
T
G
C
CC
A
G ATA T TG T
T TG T A
28px 256px
Object from Object from Object from

784D 196608D 3.6 billion D

What is the problem with high-dimensional

things?
things?
Hard to visualise
things?
Hard to visualise
things?
Algorithms tend
Hard to visualise
to get slow


things?
Nearest Neighbour Classi er
y Algorithms tend
Hard to visualise
Euclidean distance to get slow
d= (x2 − x1)2 + (y2 − y1)2
nearest neighbour is found by calculating X

distances to all existing examples

fi
things?
y Algorithms tend
Hard to visualise
d= (x2 − x1)2 + (y2 − y1)2 + (z2 − z1)2


fi
things?
y Algorithms tend
Hard to visualise
d= (x2 − x1)2 + (y2 − y1)2 + (z2 − z1)2 + . . . + (n2 − n1)2
Nearest Neighbour is O(n2),

but with number of dimensions
approaching the number of
samples it is O(n3)


fi
things?
Algorithms tend
Hard to visualise
to get slow


things?
Algorithms tend
Hard to visualise
to get slow
Methods trained on high-dimensional data

suffer from the curse of dimensionality

The Curse of
Dimensionality
What is the curse of dimensionality?
1 2 3 4 5 X
X>2
False True
Blue (75%) X>4
4 points False True
1 2 3 4 5 X Blue (50%) Red (75%)
6 points 4 points
X>2
2 4
False True
Blue (75%) X>4
4 points False True
1 2 3 4 5 X Blue (50%) Red (75%)
6 points 4 points
75% 25% 50% 50% 25% 75%

X>2
2 4
False True
Blue (75%) X>4
4 points False True
1 2 3 4 5 X Blue (50%) Red (75%)
6 points 4 points
y 2 4
4
4
3
2 2
1
1 2 3 4 5 X
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y 2 4
100% 0% 0% 100% 0% 0%
Highly unbalanced
5
regions
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y 2 4
100% 0% 0% 100% 0% 0%
Nothing is going on
5
here!
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2 2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
On average 55.5% of cells will be
either empty or singletons
z
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2
X
2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y
On average 55.5% of cells will be On average 92.5% of cells will be
either empty or singletons either empty or singletons
In order to keep high-dimensional space reasonably
covered you need a lot more data
z
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2
X
2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y
(part II)
(part II)
Distances become similar in high-dimensional space

things?
Algorithms tend
Hard to visualise
to get slow



things?
Algorithms tend
Hard to visualise
to get slow
You need more data and objects become

closer in high-dimensional space

A lot of high-
dimensional things
A ACA T
CG GGT ATAACA
C T
T A T…
C
TA
Gene
TG T…
A
A
T
G
C
CC
A
G ATA T TG T
T TG T A
A lot of high- High-dimensional things
VS
dimensional things are hard to work with
A ACA T GT ATAACA
CG
T ACG T
T…
O(n3)
C
TA
Gene
TG T…
O(2n)
A
A
T
G
C
CC
O(n2)
A
G ATA T TG T
T TG T A
O(n!)
Is there a way to break the curse?
Feature extraction vs feature elimination
Keeping only few

original features

Remove Keeping only few

all the rest original features


Circumference
Diameter
Circumference
Diameter
ce
en
erf
um
rc
Ci
Diameter
Principle Component Analysis
)
C1
(P
#1
nt
ne
po
om
eC
ipl
inc
Pr
Principle Component #2 (PC2)

Principle Component Analysis
)
C1
(P
#1
nt
ne
po
om
eC
ipl
inc
Pr
Principle Component #2 (PC2)

1-Dimensional data
X
1 2 3 4 5
2-Dimensional data
y
x y 6
1 2
5
2 4 4
3 5 3
4 4
2
5 5 1
X
1 2 3 4 5
3-Dimensional data
z
6
x y z 5
4
1 2 2 3
2
2 4 0.5
1
3 5 1
X
1 2 3
1 4 5 6
2
4 4 1 3
4
5 5 0.5 5
6
y 7
200-Dimensional data?
Are all of these dimensions equally useful?
2-D example revisited
y
6
X
1 2 3 4 5
y
Main variation is from left to right
6
X
1 2 3 4 5
y
6
X
1 2 3 4 5
y
6
2
Not so much from top to bottom
1
X
1 2 3 4 5
y
6 We can keep only one dimension
5
4
X
1 2 3 4 5
3
2
Not so much from top to bottom
1
X
1 2 3 4 5
y
6 We can keep only one dimension
5
4
X
1 2 3 4 5
3
2 Projected data does not seem

Not so much from top to bottom to loose much information
1
X
1 2 3 4 5
y
6
X
1 2 3 4 5
y
6
X
1 2 3 4 5
y
6
X
1 2 3 4 5
Data seem to be spread more equally
along X and y axes
y
6
X
1 2 3 4 5


y
6
X
1 2 3 4 5
y
Data is mostly spread along this
6 line
5
X
1 2 3 4 5
y
6 line
5
3
And a little bit along
2 this line
1
X
1 2 3 4 5
How about we make new axes from these lines?
y
6 line
5
3
2 this line
1
X
1 2 3 4 5
y y
6 line
5
3
2 this line
1
X
X
1 2 3 4 5
y y
6 line
5
3
2 this line
1
X
X
1 2 3 4 5
X
y
6 line
5
3
2 this line
1
X
1 2 3 4 5
y
X
y
6 line
5
3
2 this line
1
X
1 2 3 4 5
y
X
These new axes are called principle components
y
6 line
5
3
2 this line
1
X
1 2 3 4 5
y
These new axes are called principle components X
y
6 line
5
PC #1
4
3 #1
And PC
a little#1
bit is a new vector which
PC
along
2 this linemost of the variation in
spans along data
1
X
1 2 3 4 5

These new axes are called principle components X
y
PC #2
6 line
5
PC
#2
3 #1
And PC
a little#1
bit is a new vector which
PC
along
2 this line
1
X
1 2 3 4 5
PC #2 is another new vector which spans along y

the direction of the second most variation
Principle components are not additional
axes/dimensions
They are old

dimensions
rearranged
rst axis now spans along most

variation, the second the second most variation etc.
fi
axes/dimensions
y
How many PCs will be

in 3D space?
z
X
axes/dimensions
y
PC #2
in 3D space? PC #1
#3
PC
X
As many as there were
original dimensions,
hence 3 PCs
axes/dimensions

formed in 200D space?
axes/dimensions

No exceptions, 200 PCs

axes/dimensions


But what is the bene t of
having PCs?
fi
X
y
6 line
5
3
2 this line
1
X
1 2 3 4 5
y
X
y
6 line
5
3
2 this line
1
X
1 2 3 4 5
y
X
y
6 line
5
4
1 2 3 4 5
3
And a little bit along From 2D to 1D without
2 this line loosing much information
1
X
1 2 3 4 5

axes/dimensions
PC #2
formed in 200D space? PC #1

But what is the bene t of
having PCs?
fi
axes/dimensions
PC #2
First few PCs would be

enough to capture
important information
Computational example
of PCA
y
6 x y
5 1 2
4 2 4
3 3 5
2 4 4
1 5 5
X
1 2 3 4 5
y
6 x y
5 1 2
4 2 4 x̄ = 3
3 3 5 ȳ=4
2 4 4
1 5 5
X
1 2 3 4 5
y
6 x y x - x̄ y-ȳ
5 1 2 -2 -2
4 2 4 x̄ = 3 -1 0
3 3 5 ȳ=4 0 1
2 4 4 1 0
1 5 5 2 1
X
1 2 3 4 5
y
2 x - x̄ y-ȳ
1 -2 -2
0 -1 0
-1 0 1
-2 1 0
-3 2 1
X
-2 -1 0 1 2
y Z
2 x - x̄ y-ȳ
1 -2 -2
0 -1 0
-1 0 1
-2 1 0
-3 2 1
X
-2 -1 0 1 2
Transpose the matrix of coordinates
Z
-2 -2
-1 0
0 1
1 0
2 1
Z
-2 -2
-1 0
0 1
What are the dimensions
of the transposed matrix?
1 0
2 1

Z
-2 -2 ⊤
Z
-1 0
-2 -1 0 1 2
0 1
-2 0 1 0 1
1 0
2 1
Z
-2 -2 ⊤
Z
-1 0
-2 -1 0 1 2
0 1
-2 0 1 0 1
1 0
2 1
⊤
Z ×Z=S
⊤
Z ×Z
=S
n−1
⊤
Z ×Z
=S
n−1
Because we compute
empirical covariance matrix
(i.e. from data)

Z
⊤ -2 -2
Z
-1 0
-2 -1 0 1 2
× 0 1 = S
-2 0 1 0 1
1 0
2 1
Matrix multiplication beautifully animated http://matrixmultiplication.xyz/

Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 ? ?
× 0 1 =
-2 0 1 0 1 ? ?
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 10 6
× 0 1 =
-2 0 1 0 1 6 6
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 10 6
× 0 1 = /n − 1
-2 0 1 0 1 6 6
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 10 6
× 0 1 = /4
-2 0 1 0 1 6 6
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 2.5 1.5
× 0 1 =
-2 0 1 0 1 1.5 1.5
1 0
2 1
Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 2.5 1.5
× 0 1 =
-2 0 1 0 1 1.5 1.5
1 0
2 1 Covariance matrix
How to interpret values in covariance matrix?
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2]
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Collect all projected

onto X axis values
[-2, -1, 0, 1, 2] mean([-2, -1, 0, 1, 2]) = ?

y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values
[-2, -1, 0, 1, 2] mean([-2, -1, 0, 1, 2]) = 0

y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values
[-2, -1, 0, 1, 2] x̄ = 0
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values
∑ (xi − x̄)2
[-2, -1, 0, 1, 2] x̄ = 0 σ=
n−1
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values
∑ (xi − x̄)2
[-2, -1, 0, 1, 2] x̄ = 0 σ=
n−1
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values
∑ (xi − x̄)2
[-2, -1, 0, 1,Variance
2] x̄ = 0 σ=
n−1
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values
∑ (xi − x̄)2
[-2, -1, 0, 1,Variance
2] x̄ = 0 σ=
n−1
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


∑ (xi − x̄)2
[-2, -1, 0, 1,Variance
2] x̄ = 0 σ=
5−1
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


∑ (xi − x̄)2
[-2, -1, 0, 1,Variance
2] x̄ = 0 σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values

mean of all
How to interpret values in covariance matrix?points
∑ (xi − x̄)2 x̄ = 0
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values

mean of all
How to interpret values in covariance matrix?points
∑ (xi − 0)2 x̄ = 0
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values

mean of all
How to interpret values in covariancex̄matrix?
= 0 points
∑ (xi)2
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values

value of mean of all

How to interpret each
values
point
in covariance matrix?
x̄ = 0 points
∑ (xi)2
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


How to interpret values
[-2, -1, 0, 1, 2] each point in covariance matrix?
x̄ = 0 points
∑ (xi)2
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


x̄ = 0 points
(−2)2 + (−1)2 + (0)2 + (1)2 + (2)2
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


x̄ = 0 points
10
Variance σ=
4
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


x̄ = 0 points
Variance σ = 2.5
y
2
number of
points
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values


x̄ = 0 points
Variance σ = 2.5
y
∑ (xi − x̄)2
2 σ=
number of
points
S n−1
1
Variance is an expected value
2.5squared
of the 1.5 deviation from
0
the mean
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

onto X axis values

[-2, -1, 0, 1, 2] x̄ = 0 σ = 2.5

y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] x̄ = 0 σ = 2.5

y Variance along
2
rst axis
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
y Variance along
2
rst axis
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
y Variance along
2
rst axis
S
1
2.5 1.5
0
-1 1.5 1.5
?
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
y Variance along
2
rst axis
S
1
2.5 1.5
0
Variance along
-1 1.5 1.5 second axis
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
∑ (yi − ȳ)2
mean of all
points points
σ=
n−1
y Variance along
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2


∑ (yi − ȳ)2
mean of all
points points
σ=
5 ȳ=0 n−1
y Variance along
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2


mean of all
6
points points
σ=
5 ȳ=0 4
y Variance along
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2

y Variance along
2
rst axis
S
1
2.5 1.5
0
Variance along
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Covariance indicates how two variables are related. A positive covariance means
the variables are positively related, while a negative covariance means the
variables are inversely related.
y
ce
n
aria
o v
e c
itiv
o s
Second variable
X
First variable
y
Second variable
Inv
ers
ec
ova
rian
ce
X
First variable
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [-2, 0, 0, 1, 1]
∑ (xi − x̄)(yi − ȳ)
x̄ = 0 ȳ=0 cov(x, y) =
n−1
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [-2, 0, 0, 1, 1]
6
x̄ = 0 ȳ=0 cov(x, y) =
4
y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [-2, 0, 0, 1, 1]
x̄ = 0 ȳ=0 cov(x, y) = 1.5

y
Covariances
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1]
∑ (xi − x̄)(yi − ȳ)
x̄ = ? ȳ=? cov(x, y) =
n−1
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1]
∑ (xi − x̄)(yi − ȳ)
x̄ = 0 ȳ=0 cov(x, y) =
n−1
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1]
∑ (xi)(yi)
x̄ = 0 ȳ=0 cov(x, y) =
4
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] (−2)(0) + (−1)(0) + (0)(1) + (1)(−2) + (2)(1)
cov(x, y) =
4
x̄ = 0 ȳ=0
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] 0
cov(x, y) = = 0
x̄ = 0 ȳ=0 4
y
2
S
1
2.5 ?
0
-1 ? 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] 0
cov(x, y) = = 0
x̄ = 0 ȳ=0 4
y
2
S
1
2.5 0
0
-1 0 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
[-2, -1, 0, 1, 2] [0, 0, 1,-2, 1] 0
cov(x, y) = = 0
x̄ = 0 ȳ=0 4
y
2
S
1
2.5 0
0
-1 0 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Covariance 0 means that there is no relationship between two variables.

Knowing something about the value of one does not say anything about the value
of the other.
y
2
S
1
2.5 1.5
0
-1 1.5 1.5
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Variance along
y rst axis
Covariances
2
S
1
2.5 1.5
0
Variance along
-2
Covariance matrix
-3
X
-2 -1 0 1 2
fi
Variance along
y
Covariances
2
S
1
2.5 1.5
0
Aren’t they supposed to be
-1 1.5 1.5
between [-1,1]?
-2
Covariance matrix
-3
X
-2 -1 0 1 2

Variance along
y
Covariances
2
S
1
2.5 1.5
0
Aren’t they supposed to be
-1 1.5 1.5
between [-1,1]?
-2
Covariance matrix
-3
X
-2 -1 0 1 2
Covariance vs Correlation: https://en.wikipedia.org/wiki/Covariance_and_correlation

Z
Z ⊤ -2 -2
S
-1 0
-2 -1 0 1 2 2.5 1.5
-2 0 1 0 1
× 0 1 = × 4
1.5 1.5
1 0
2 1
S
⊤
PDP
2.5 1.5
=
1.5 1.5
S PDP ⊤
2.5 1.5 2.5 1.5
=
1.5 1.5 1.5 1.5
For an example: https://www.scss.tcd.ie/Rozenn.Dahyot/CS1BA1/SolutionEigen.pdf

S PD P⊤
2.5 1.5 -2.9 0.24 -0.81 -0.58
= ×
1.5 1.5 -2.1 -0.33 0.58 -0.81

S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81

Eigendecomposition
S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81

Eigendecomposition
Eigenvectors
S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
Eigenvalues

Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-0.58
-1
-2
-3
-0.81
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
1
(0,0)
0
-0.81
-1
-2
-3
0.58
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
2
-1
-2
-3
X
-2 -1 0 1 2
Eigenvectors
-0.81 0.58
-0.58 -0.81
y
ei
ge
nv
2
ec
to
r#
2
#1
-1 e c tor
e nv
eig
-2
-3
X
-2 -1 0 1 2
Old coordinate system

Eigenvectors
-0.81 0.58
-0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2
Old coordinate system New coordinate system

y
Eigenvectors EigenvectorsT
-0.81 0.58 transpose -0.81 -0.58

=
-0.58 -0.81 0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
EigenvectorsT
-0.81 -0.58
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
EigenvectorsT Z⊤
-0.81 -0.58 -2 -1 0 1 2
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
EigenvectorsT Z⊤
-0.81 -0.58 -2 -1 0 1 2
×
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
EigenvectorsT Z⊤ -2 -1
-2 0
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
-2 0
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-2*(-0.81) + (-2)*(-0.58) = 2.78 X
-2*(0.58) + (-2)*(-0.81) = 0.46
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
-2 0
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-2*(-0.81) + (-2)*(-0.58) = 2.78 X
-2*(0.58) + (-2)*(-0.81) = 0.46
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1 0.46
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
2.78
X
-2 -1 0 1 2 -2 -1 0 1 2

y
EigenvectorsT Z⊤
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-2*(-0.81) + (-2)*(-0.58) = 2.78 X
-2*(0.58) + (-2)*(-0.81) = 0.46
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1 0.46
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
2.78
X
-2 -1 0 1 2 -2 -1 0 1 2

y
EigenvectorsT Z⊤ 2.78 0.46
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-1*(-0.81) + (0)*(-0.58) = 0.81 X
-1*(0.58) + (0)*(-0.81) = -0.58
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-1*(-0.81) + (0)*(-0.58) = 0.81 X
-1*(0.58) + (0)*(-0.81) = -0.58
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
1
-1 nv e c tor
#
-1 -0.58
e
eig
-2 -2
-3 -3
0.81
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
-1*(-0.81) + (0)*(-0.58) = 0.81 X
-1*(0.58) + (0)*(-0.81) = -0.58
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
1
-1 nv e c tor
#
-1 -0.58
e
eig
-2 -2
-3 -3
0.81
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× =
-2 0 1 0 1
0.58 -0.81
0*(-0.81) + (1)*(-0.58) = -0.58 X
0*(0.58) + (1)*(-0.81) = -0.81
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81
0*(-0.81) + (1)*(-0.58) = -0.58 X
0*(0.58) + (1)*(-0.81) = -0.81
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1 -0.81
e nv
eig
-2 -2
-3 -3
-0.58
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81
X
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
× = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81
-0.81 0.58
X -2.2 0.35
y
eigenvector #2
m
yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#
2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
×
These eigenvectors are called Principle = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81 Components
-0.81 0.58
-2.2 0.35
y
eigenvector #2
m
eigenvector #1 is called PC1

yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#

2
1 1
0 0
eigenvector #1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

0.81 -0.58
-0.81 -0.58 -2 -1 0 1 2
×
These eigenvectors are called Principle = -0.58 -0.81
-2 0 1 0 1
0.58 -0.81 Components
-0.81 0.58
-2.2 0.35
y
PC2
m

yste
ei
s
ge
te a
rdin
nv
2 2 coo
ec
Old
to
r#

2
1 1
0 0
PC1
#1
-1 e c tor -1
e nv
eig
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

Z
Z⊤ -2 -2
-1 0
Compute
S
-2 -1 0 1 2 covariance
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
P⊤ Z⊤
-0.81 -0.58
×
-2 -1 0 1 2
=
2.78 0.81 -0.58 -0.81 -2.2 New
-2 0 1 0 1 0.46 0.58 0.81 -0.58 -0.35
0.58 -0.81
coordinates
Do you still remember what was it all about?
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

We want to reduce the dimensionality!
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

We want to reduce the dimensionality!
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

We can ignore the second eigenvector
because it does not contain much information
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

We can ignore the second eigenvector
because it does not contain much information
How much information the second eigenvector
contains?
y
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

y
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

D
3.58 0
0 0.42
Eigenvalues
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S D
2.5 1.5 3.58 0
1.5 1.5 0 0.42
Covariance matrix Eigenvalues
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S D
Variance along
2.5 1.5 X axis 3.58 0
1.5 1.5 Variance along 0 0.42

Y axis
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S Variance
along ?
D
Variance along
2.5 1.5 X axis 3.58 0
Variance along Variance

1.5 1.5 0 0.42
Y axis along ?
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S Variance
along PC1
D
Variance along
2.5 1.5 X axis 3.58 0
Variance along Variance

1.5 1.5 0 0.42
Y axis along PC2
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S D
2.5 1.5 3.58 0
Covariances Covariances
1.5 1.5 0 0.42
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S D
2.5 1.5 3.58 0
1.5 1.5 0 0.42
Covariance matrix New covariance matrix
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

S D
2.5 1.5 3.58 0
1.5 1.5 0 0.42
PC2
2
Both old and new axes2 explain 4 units of
variance
1 1
0 2.5 + 1.5 = 4 0 3.58 + 0.42 = 4

-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2 PC1
S D
2.5 1.5 3.58 0
1.5 1.5 0 0.42
PC2
2
variance
1 1
0 2.5 + 1.5 = 4 0 3.58 + 0.42 = 4

-1 Out of these 4, X axis explains: -1
-2 2.5/4=62.5% -2
And Y axis explains:
-3 -3
1.5/4=37.5%
X
-2 -1 0 1 2 -2 -1 0 1 2 PC1
S D
2.5 1.5 3.58 0
1.5 1.5 0 0.42
PC2
2
variance
1 1
0 2.5 + 1.5 = 4 0 3.58 + 0.42 = 4

-1 Out of these 4, X axis explains: -1 Out of these 4, PC1 axis explains:
-2 2.5/4=62.5% -2 3.58/4=89.5%
And Y axis explains: And PC2 axis explains:
-3 -3
1.5/4=37.5% 0.42/4=10.5%
X
-2 -1 0 1 2 -2 -1 0 1 2 PC1
We can ignore the second PC because it
does not contain much information
How much information the second PC contains?
y
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2

We can ignore the second PC because it
explains only 10.5% of variation
PC2
2 2
1 1
0 0
PC1
-1 -1
-2 -2
-3 -3
X
-2 -1 0 1 2 -2 -1 0 1 2




How many PCs should we keep?
Variance explained is a good criteria for
choosing the total number of PCs to keep


You should keep as many

PCs as it takes to explain
90% of total variance

You should keep as many

PCs as it takes to explain
90% of total variance

Principle Component
Analysis (PCA)
PCA
Can be used as part of supervised learning pipeline

Supervised Learning pipeline
1 Acquire Data 2 Preprocessing
Train/test split
Find the
4 best model 3
using CV
Safe place
Evaluate nal
5 model on
the test set
Pro t
fi
fi

2 Preprocessing
Train/test split
3
Evaluate
2
(subtract mean)
2
(subtract mean)
Train/test split Train/test split

3 3
Evaluate
2
(subtract mean)
Train/test split Train/test split

3 3
test
Safe place
Evaluate
2
(subtract mean)
PCA 200 PCs

Train/test split
4 3
test
Safe place
Evaluate
2
(subtract mean)
PCA 200 PCs

Train/test split
4 3
test
Keep few PCs
(90% variance) Safe place
Evaluate
2
(subtract mean)
PCA 200 PCs

Train/test split
Find the
5 best model 4 3
using CV test
Keep few PCs
Evaluate


1 2
Normalisation
(subtract mean)
200D raw data
PCA 200 PCs
Train/test split
Find the
5 best model 4 3
using CV test
Keep few PCs
Evaluate nal
6 model on
the test set
Pro t
fi
fi


1
Normalisation
200D raw data
Safe place
PCA
PCA has an “undo" button
PCA
Reverse
You can recover the original features back!

(t-SNE)
&
Uniform Manifold Approximation and Projection
(UMAP)
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
y
2D 1D
X X
t-SNE iteratively tries to make distances in low-
dimensional space to be similar to distances in high-
dimensional space
y
2D 1D
X X
A bit more about t-SNE: https://distill.pub/2016/misread-tsne/

dimensional space
y
2D 1D
X X
UMAP ultimately tries to achieve similar things,
using slightly different mechanisms
dimensional space
y
2D 1D
Both t-SNE and UMAP cannot “undo”

transformations
X X
dimensional space
y
2D 1D
Both t-SNE and UMAP cannot “undo”

transformations
Both t-SNE and UMAP are slower than PCA
X X
UMAP explained: https://pair-code.github.io/understanding-umap/
UMAP explained and compared to t-SNE: https://pair-code.github.io/understanding-umap/
Projector Tensor ow
https://projector.tensor ow.org/
fl
fl
Recap
0 ?1 Classi
Machine
?$ Learning
Reinforcement
Learning
to get slow



In order to keep high-dimensional space reasonably
covered you need a lot more data
z
y 2 4
100% 0% 0% 100% 0% 0%
5
4
4
100% 0% 50% 50% 0% 100%
3
2
X
2
50% 50% 50% 50%
1
75% 25%
1 2 3 4 5 X
y
Are all of these dimensions equally useful?
axes/dimensions
PC #2
First few PCs would be

enough to capture
important information
What are the main steps
to compute PCA?
Z
Z⊤ -2 -2
-1 0
S
-2 -1 0 1 2
× 0 1 =
-2 0 1 0 1
1 0
2 1
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5 -0.81 0.58 3.58 0 -0.81 -0.58
= × ×
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
P⊤ Z⊤
-0.81 -0.58 -2 -1 0 1 2 2.78 0.81 -0.58 -0.81 -2.2
× =
-2 0 1 0 1 0.46 0.58 0.81 -0.58 -0.35
0.58 -0.81
Z
Z⊤ -2 -2
-1 0
Compute
S
× 0 1 =
-2 0 1 0 1
1 0 matrix
2 1
S P D P⊤
2.5 1.5
=
-0.81 0.58
×
3.58 0
×
-0.81 -0.58
Perform
1.5 1.5 -0.58 -0.81 0 0.41 0.58 -0.81
eigendecomposition
P⊤ Z⊤
-0.81 -0.58
×
-2 -1 0 1 2
=
2.78 0.81 -0.58 -0.81 -2.2 New
-2 0 1 0 1
0.58 -0.81 0.46 0.58 0.81 -0.58 -0.35
coordinates

Dimensionality Reduction

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Dimensionality Reduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimensionality Reduction

Uploaded by

Copyright:

Available Formats

Supervised Unsupervised

What are the examples of high-dimensional

Object from 7D few million D

MNIST digit ImageNet cat

Object from Object from

What are the examples of high-dimensional

Object from 7D few million D

MNIST digit ImageNet cat Human DNA

Object from Object from

What are the examples of high-dimensional

Object from 7D few million D

MNIST digit ImageNet cat Human DNA

Object from Object from Object from

What is the problem with high-dimensional

What is the problem with high-dimensional

nearest neighbour is found by calculating X

nearest neighbour is found by calculating X

Nearest Neighbour is O(n2),

nearest neighbour is found by calculating X

What is the problem with high-dimensional

Methods trained on high-dimensional data

Blue (75%) X>4

4 points False True

1 2 3 4 5 X Blue (50%) Red (75%)

Blue (75%) X>4

4 points False True

1 2 3 4 5 X Blue (50%) Red (75%)

75% 25% 50% 50% 25% 75%

Blue (75%) X>4

4 points False True

1 2 3 4 5 X Blue (50%) Red (75%)

Distances become similar in high-dimensional space

Methods trained on high-dimensional data

What is the problem with high-dimensional

You need more data and objects become

Keeping only few

Feature extraction vs feature elimination

Remove Keeping only few

Feature extraction vs feature elimination

Principle Component #2 (PC2)

Principle Component #2 (PC2)

2 Projected data does not seem

2-D example revisited

2-D example revisited

PC #2 is another new vector which spans along y

They are old

rst axis now spans along most

How many PCs will be

How many PCs will be

How many PCs will be

No exceptions, 200 PCs

How many PCs will be

No exceptions, 200 PCs

Principle components are not additional

No exceptions, 200 PCs

First few PCs would be

Transpose the matrix of coordinates

Matrix multiplication beautifully animated http://matrixmultiplication.xyz/

Collect all projected