Quiz Week 8 - Unsupervised Learning Clustering

Unsupervised Learning
Quiz, 5 questions

1 1.  For which of the following tasks might K-means clustering be a suitable algorithm? Select all
point that apply.

Given a database of information about your users, automatically group them into
di erent market segments.

Given sales data from a large number of products in a supermarket, gure out
which products tend to form coherent groups (say are frequently purchased
together) and thus should be put on the same shelf.

Given historical weather records, predict the amount of rainfall tomorrow (this
would be a real-valued output)

Given sales data from a large number of products in a supermarket, estimate future
sales for each of these products.

1 2.  Suppose we have three cluster centroids μ1 = [

] μ2 = [
] and μ3 = [
point 2 0 2

Furthermore, we have a training example x(i) = [ ]. After a cluster assignment step, what

will c (i) be?

c = 3

c = 1

c = 2

is not assigned

1 3.  K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in
point its inner-loop. Which two?

Test on the cross-validation set.

Move the cluster centroids, where the centroids μk are updated.

The cluster assignment step, where the parameters c (i) are updated.

Randomly initialize the cluster centroids.

1 4.  Suppose you have an unlabeled dataset {x(1) , … , x(m) }. You run K-means with 50 di erent
point random

initializations, and obtain 50 di erent clusterings of the

data. What is the recommended way for choosing which one of

these 50 clusterings to use?

Always pick the nal (50th) clustering found, since by that time it is more likely to
have converged to a good solution.

For each of the clusterings, compute , and pick the one that
1 m (i) 2
∑ ||x − μc(i) ||
m i=1

minimizes this.

The only way to do so is if we also have labels y (i) for our data.

The answer is ambiguous, and there is no good way of choosing.

1 5.  Which of the following statements are true? Select all that apply.
Once an example has been assigned to a particular centroid, it will never be
reassigned to another di erent centroid

K-Means will always give the same results regardless of the initialization of the

A good way to initialize K-means is to select K (distinct) examples from the training
set and set the cluster centroids equal to these selected examples.

On every iteration of K-means, the cost function J (c (1) , … , c (m) , μ1 , … , μk ) (the

distortion function) should either stay the same or decrease; in particular, it should
not increase.

Unsupervised Learning
