Session 18-Cluster Analysis
Session 18-Cluster Analysis
Session 18-Cluster Analysis
Note: Content used in this PPT has copied from various source.
Cluster Analysis-Introduction
Variable 1
Finding similarities between data according to the
characteristics found in the data and grouping similar data
objects into clusters
Unsupervised learning: no predefined classes (i.e., learning by
observations vs. learning by examples: supervised)
Variable 2
classes (i.e., learning by observations vs. learning by examples:
supervised)
A Classification of Clustering Procedures
Clustering Procedures
K=2
Until no change
K-Mean Cluster: Example
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
K-Mean Cluster: Example…
Initial Step: Group the data set into two clusters
• Find a sensible initial partition. Let the A & B values of the two individuals furthest
apart (using the Euclidean distance measure), define the initial cluster means,
giving:
1 (1.0, 1.0), 2 (1.5, 2.0), 3 (3.0, 4.0), 4 (5.0, 7.0), 5 (3.5, 5.0), 6 (4.5, 5.0), 7 (3.5, 4.5)
K-Mean Cluster: Example…
Now the initial partition has changed, and the two clusters at this stage having the
following characteristics:
But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare
each individual’s distance to its own cluster mean and to that of the opposite cluster.
1 (1.0, 1.0), 2 (1.5, 2.0), 3 (3.0, 4.0), 4 (5.0, 7.0), 5 (3.5, 5.0), 6 (4.5, 5.0), 7 (3.5, 4.5)
K-Mean Cluster: Example…
Calculating the mean distance of individual from the Mean of Cluster 1 and Cluster 2
we find:
Individual Distance to Mean Distance to Mean
(Centroid) of Cluster 1 (Centroid) of Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1).
In other words, each individual's distance to its own cluster mean should be smaller that the
distance to the other cluster's mean (which is not the case with individual 3) …
K-Means Cluster: Example…
…Thus, individual 3 is relocated to Cluster 2 resulting in the new partition:
The iterative relocation would now continue from this new partition until no
more relocations occur. However, in this example each individual is now
nearer its own cluster mean than that of the other cluster and the iteration
stops, choosing the latest partitioning as the final cluster solution.
Note: it is possible that the k-means algorithm won't find a final solution. In this case it would
be a good idea to consider stopping the algorithm after a pre-chosen maximum of iterations.
K-Means Cluster: Example…
…Thus, individual 3 is relocated to Cluster 2 resulting in the new partition:
The iterative relocation would now continue from this new partition until no
more relocations occur. However, in this example each individual is now
nearer its own cluster mean than that of the other cluster and the iteration
stops, choosing the latest partitioning as the final cluster solution.
Note: it is possible that the k-means algorithm won't find a final solution. In this case it would
be a good idea to consider stopping the algorithm after a pre-chosen maximum of iterations.
K-Mean Clustering
Example
• Iris Data-IrisData.xlsx
K-Mean Clustering: R Code
library(readxl)
df<-read_excel('IrisData.xlsx')
print(df)
kmeans(df[, c(2,3,4,5)], 3)
Output
Clustering vector:
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 1 3 3 3 3 3
333333333
[68] 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1 1 1 1 1 1 3 3 1 1 1 1 3 1 3 1 3 1
133111113
[135] 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3
Classification Vs. Clustering