U L D R: Nsupervised Earning and Imensionality Eduction
U L D R: Nsupervised Earning and Imensionality Eduction
U L D R: Nsupervised Earning and Imensionality Eduction
DIMENSIONALITY REDUCTION
4
1 Unsupervised Learning (Clustering Problem)
x1
x2
x1
Cluster 1
Cluster 2
x2
❖ K-means algorithm
• Inputs:
─ K (number of clusters)
• Algorithm flow:
𝑛
Randomly initialize K cluster centroids 𝜇1 , 𝜇2 , . . . , 𝜇𝐾 ∈ ℝ
Repeat {
for i = 1 to m
c (i ) := index (from 1 to K) of cluster centroid closest to x (i )
for k = 1 to K
k := average (mean) of points assigned to cluster k
}
9
2 K-means clustering algorithm
❖ K-means algorithm
• Execution of algorithm
10
2 K-means clustering algorithm
❖ K-means algorithm
• Execution of algorithm
11
2 K-means clustering algorithm
❖ K-means algorithm
• Execution of algorithm
12
2 K-means clustering algorithm
❖ K-means algorithm
• Execution of algorithm
13
2 K-means clustering algorithm
❖ K-means algorithm
• Execution of algorithm
14
2 K-means clustering algorithm
❖ K-means algorithm
• Execution of algorithm
15
2 K-means clustering algorithm
1 m (i )
, 1 ,..., K ) = x − c( i )
2
(1) (m)
J (c ,..., c
m i =1
min
(1) (m)
J ( c (1)
,..., c (m)
, 1 ,..., K )
c ,..., c
1 ,..., K
16
Strengths of k-means
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
where n is the number of data points,
k is the number of clusters, and
t is the number of iterations.
– Since both k and t are small. k-means is considered a
linear algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is
used. The global optimum is hard to find due to
complexity.
17
Weaknesses of k-means
• The algorithm is only applicable if the mean
is defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away
from other data points.
– Outliers could be errors in the data recording or
some special data points with very different
values.
18
Weaknesses of k-means: Problems with outliers
19
Weaknesses of k-means: To deal with outliers
20
Weaknesses of k-means (cont …)
• The algorithm is sensitive to initial seeds.
21
Weaknesses of k-means (cont …)
22
Weaknesses of k-means (cont …)
23
Common ways to represent clusters
26
3 Centroid initialization and choosing the number of K
27
3 Centroid initialization and choosing the number of K
For i =1 to 100 {
Randomly initialize K-means.
Run K-means. Get c (1) ,..., c( m ) , 1 ,..., K
Compute cost function:
J (c (1) ,..., c( m ) , 1 ,..., K )
}
28
3 Centroid initialization and choosing the number of K
29
3 Centroid initialization and choosing the number of K
30
3 Centroid initialization and choosing the number of K
Rubbing Increasing
Feature 1 and Feature 2
construct Health Index
for rubbing prognosis Feature 1
- No rubbing
- Slight rubbing
- Intensive rubbing
Feature 2
31
4
Dimensionality Reduction
4 Dimensionality Reduction
𝑥 (2) ∈ ℝ2 → 𝑧 (2) ∈ ℝ
x (2)
(1) (2) x1
z z
33
EMBEDDED
4 Dimensionality Reduction SYSTEM
LABORATORY
z2
x3
x1 x2
z1
x (i ) 3
z (i ) 2
34
5
Principal Component
Analysis
Principal Component Analysis
36
5 Principal Component Analysis
x1
Reduce from n-dimension to k-dimension: Find k vectors u (1) , u (2) ,..., u ( k ) onto which
project the data, so as to minimize the projection error. 37
5 Principal Component Analysis
38
5 Principal Component Analysis
1 n
= ( x ( i ) )( x ( i ) )T
m i =1
[U,S,V] = svd(Sigma);
Here, we obtain:
| | |
𝑈 = 𝑢(1) 𝑢(2) 𝑢(𝑛) ∈ ℝ𝑛×𝑛
| | |
k
Now, we can transform our original data with it is own feature dimension into new
feature space, based on principal components (𝑥 ∈ ℝ𝑛 → 𝑧 ∈ ℝ𝑘 ) :
| | | − (u (1) ) −
z (i ) = u (1) u (2) u ( k ) x ( i ) = − (u (2) ) − x(i )
| | | − (u ( k ) ) − nx1
nxk kxn
Ureduce
kx1 40
5 Principal Component Analysis
❖ Principal Component Analysis (PCA) algorithm
• MATLAB implementation
After mean normalization (ensure every feature has zero mean) and optionally
feature scaling:
1 n
= ( x ( i ) )( x ( i ) )T
m i =1
[U,S,V] = svd(Sigma);
Ureduce = U(:,1:k);
Z = Ureduce’*x;
*Notice:
In MATLAB there are new functions, which you can use directly to obtain principal
components for your data without the steps mentioned above.
Please, have a look on functions called: pca (if you apply PCA directly to data) and pcacov
(if you want to extract principal components from covariance matrix). The results should be
similar. 41
5 Principal Component Analysis
❖ Principal Component Analysis (PCA) algorithm
• Reconstruction of data from its compressed representation
x2 x2
(1) x1 (1)
xapprox x1
x
x (2) (2)
xapprox
z = U reduce x 𝑧 ∈ ℝ → 𝑥 ∈ ℝ2
z (1) (𝑖)
𝑥𝑎𝑝𝑝𝑟𝑜𝑥 = 𝑈𝑟𝑒𝑑𝑢𝑐𝑒 𝑧 (𝑖)
z1 nxk kx1
nx1 42
5 Principal Component Analysis
1 m
2
Average squared projection error: x ( i ) − xapprox
(i )
m i =1
1 m
Total variation in the data: (i ) 2
x
m i =1
1
m 2
i =1
x (i )
− x (i )
approx
m 0.01
1
i =1 (i ) 2
m
x
m
43
5 Principal Component Analysis
45
5 Principal Component Analysis
• Visualisation
• Before implementing PCA, first try running whatever you want to do with the
original/raw data. Only if that does not do what you want, then implement PCA
and consider using another feature space dimension.
46
Principal Component Analysis
47
How to choose a clustering algorithm
48
Choose a clustering algorithm (cont …)
49
Cluster Evaluation: hard problem
50
Cluster evaluation: ground truth
51
Evaluation measures: Entropy
52
Evaluation measures: purity
53
A remark about ground truth evaluation
55
Indirect evaluation
56
Summary
57
58