Live 2 - AI - K Means Clustering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Artificial Intelligence

k - means Clustering

Pham Viet Cuong


Dept. Control Engineering & Automation, FEEE
Ho Chi Minh City University of Technology
k - Means Clusterings
ü Basic idea: group together similar instances
v High intra-cluster similarity
v Low inter-cluster similarity

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 2
k - Means Clusterings
ü Example:
v Document clustering
§ Web search engine often return thousands of pages --> Difficult
for user
§ Clustering can be used to group retrieved documents into
categories
v Customer segmentation
v Recommendation engines
v Image compression

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 3
k - Means Clusterings
ü Supervised or unsupervised?

ü Requires data, but no labels


ü Useful when don’t know what we’re looking for

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 4
k - Means Clusterings
ü Requirements
v An integer k
v A set of training data (without labels)
v A metric to measure similarity
ü Algorithm
v Pick k random points as cluster centers
v Repeat until convergence
§ Assign data points to closest cluster center
§ Update each cluster center to be the mean of its assigned points
Convergence: No pointsʼ assignments change
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 5
k - Means Clusterings
ü Example 1

ü Pick k random points as


cluster centers
v Repeat until convergence
§ Assign data points to
closest cluster center
§ Update each cluster
center to be the mean of its assigned points
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 6
k - Means Clusterings
ü Example 2
ü Example 3

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 7
k - Means Clusterings
ü Example: Image segmentation
v Segmentation: partition an image into regions each of which has
reasonably homogenous visual appearance

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 8
k - Means Clusterings
ü Example: Geyser eruptions
v Eruption time (mins)
v Waiting time to next eruption (mins)

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 9
k - Means Clusterings
ü Example: Image compression
v Original image: 396*396*24 = 3,763,584 bits
v Compressed image: 30*24 + 396*396*4 = 627,984 bits

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 10
k - Means Clusterings
ü Properties
v Guaranteed to converge in a finite number of iterations
v Running time per iteration
§ Assign data points to closest cluster center
O(kN)
§ Update the cluster center to be the mean of its assigned points
O(N)

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 11
k - Means Clusterings
ü How to measure similarity?
v Similarity is subjective
v Depends on data, cases, users, etc.
v Not always straightforward which metrics work well
v “Trial and error” can be used
v Examples of similarity measures: Euclidean, Mahattan, cosine distance

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 12
k - Means Clusterings
ü How to choose k?
v Elbow method

Percentage of variance explained is the ratio of the between-group variance


to the total variance
Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 13
k - Means Clusterings
ü k-means clustering: heuristic
v Requires initial means
v Does matter what you pick

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 14
k - Means Clusterings
ü Drawbacks

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 15
k - Means Clusterings
ü Drawbacks

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 16
k - Nearest Neighbors
ü Sources:
v http://people.csail.mit.edu/dsontag/courses/ml12/slides/lecture14.pdf
v https://www.slideshare.net/annafensel/kmeans-clustering-122651195
v https://en.wikipedia.org/wiki/Elbow_method_(clustering)
v https://www2.stat.duke.edu/courses/Fall02/sta290/datasets/geyser

Pham Viet Cuong - Dept. Control Eng. & Automation, FEEE, HCMUT 17

You might also like