80823v00 Machine Learning Section3 Ebook v05 PDF

Applying
Unsupervised Learning
When to Consider
Unsupervised Learning
Unsupervised learning is useful when you want to explore your data but
dont yet have a specific goal or are not sure what information the data
contains. Its also a good way to reduce the dimensions of your data.
Unsupervised Learning Techniques
As we saw in section 1, most unsupervised learning techniques are

a form of cluster analysis.
In cluster analysis, data is partitioned into groups based on some

measure of similarity or shared characteristic. Clusters are formed
so that objects in the same cluster are very similar and objects in
different clusters are very distinct.
Clustering algorithms fall into two broad groups:
Hard clustering, where each data point belongs to only

one cluster
Soft clustering, where each data point can belong to more
than one cluster Gaussian mixture model used to separate data into two clusters.
You can use hard or soft clustering techniques if you already know
the possible data groupings.
If you dont yet know how the data might be grouped:
Use self-organizing feature maps or hierarchical

clustering to look for possible structures in the data.
Use cluster evaluation to look for the best number

of groups for a given clustering algorithm.
Applying Unsupervised Learning 3

Common Hard Clustering Algorithms
k-Means k-Medoids
How it Works How It Works
Partitions data into k number of mutually exclusive clusters. Similar to k-means, but with the requirement that the cluster
How well a point fits into a cluster is determined by the centers coincide with points in the data.
distance from that point to the clusters center.
Best Used...
Best Used...
When the number of clusters is known
When the number of clusters is known For fast clustering of categorical data
For fast clustering of large data sets To scale to large data sets
Result: Cluster centers Result: Cluster centers that

coincide with data points

Common Hard Clustering Algorithms continued
Hierarchical Clustering Self-Organizing Map

Produces nested sets of clusters by analyzing similarities Neural-network based clustering that transforms a dataset
between pairs of points and grouping objects into a binary, into a topology-preserving 2D map.
hierarchical tree.
Best Used...
Best Used...
To visualize high-dimensional data in 2D or 3D
When you dont know in advance how many clusters To deduce the dimensionality of data by preserving its
are in your data topology (shape)
You want visualization to guide
your selection
Result:
Result: Dendrogram showing Lower-dimensional
the hierarchical relationship (typically 2D)
between clusters representation

Common Hard Clustering Algorithms continued
Example: Using k-Means Clustering to Site Cell Phone Towers
A cell phone company wants to know the number and placement

of cell phone towers that will provide the most reliable service. For
optimal signal reception, the towers must be located within
clusters of people.
The workflow begins with an initial guess at the number of clusters

that will be needed. To evaluate this guess, the engineers compare
service with three towers and four towers to see how well theyre
able to cluster for each scenario (in other words, how well the
towers provide service).
A phone can only talk to one tower at a time, so this is a hard

clustering problem. The team uses k-means clustering because
k-means treats each observation in the data as an object having
a location in space. It finds a partition in which objects within
each cluster are as close to each other as possible and as far from
objects in other clusters as possible.
After running the algorithm, the team can accurately determine the
results of partitioning the data into three and four clusters.

Common Soft Clustering Algorithms
Fuzzy c-Means Gaussian Mixture Model

Partition-based clustering when data points may belong to Partition-based clustering where data points come from
more than one cluster. different multivariate normal distributions with certain
probabilities.
Best Used...
Best Used...
When the number of clusters is known
For pattern recognition When a data point might belong to more than
one cluster
When clusters overlap
When clusters have different sizes and correlation
structures within them
Result: Cluster centers Result: A

model of
(similar to k-means) but Gaussian distributions
with fuzziness so that that give probabilities of
points may belong to a point being in a cluster
more than one cluster

Common Soft Clustering Algorithms continued
Example: Using Fuzzy c-Means Clustering to Analyze

Gene Expression Data
A team of biologists is analyzing gene expression data from

microarrays to better understand the genes involved in normal and
abnormal cell division. (A gene is said to be expressed if it is
actively involved in a cellular function such as protein production.)
The microarray contains expression data from two tissue samples.

The researchers want to compare the samples to determine whether
certain patterns of gene expression are implicated in
cancer proliferation.
After preprocessing the data to remove noise, they cluster the data.
Because the same genes can be involved in several biological
processes, no single gene is likely to belong to one cluster only.
The researchers apply a fuzzy c-means algorithm to the data. They
then visualize the clusters to identify groups of genes that behave in
a similar way.

Improving Models with Dimensionality Reduction
Machine learning is an effective method for finding patterns in As datasets get bigger, you frequently need to reduce the
big datasets. But bigger data brings added complexity. number of features, or dimensionality.
Example: EEG Data Reduction
Suppose you have electroencephalogram (EEG) data that captures

electrical activity of the brain, and you want to use this data to
predict a future seizure. The data was captured using dozens of
leads, each corresponding to a variable in your original dataset.
Each of these variables contains noise. To make your prediction
algorithm more robust, you use dimensionality reduction techniques
to derive a smaller number of features. Because these features are
calculated from multiple sensors, they will be less susceptible to
noise in an individual sensor than would be the case if you used
the raw data directly.

Common Dimensionality Reduction Techniques
The three most commonly used dimensionality reduction

techniques are:
Principal component analysis (PCA)performs a linear

transformation on the data so that most of the variance or
information in your high-dimensional dataset is captured by the
first few principal components. The first principal component
will capture the most variance, followed by the second principal
component, and so on.
Factor analysisidentifies underlying correlations between

variables in your dataset to provide a representation in terms of a
smaller number of unobserved latent, or common, factors.
Nonnegative matrix factorizationused when model terms must

represent nonnegative quantities, such as physical quantities.

Using Principal Component Analysis
In datasets with many variables, groups of variables often move Each principal component is a linear combination of the original
together. PCA takes advantage of this redundancy of information variables. Because all the principal components are orthogonal to
by generating new variables via linear combinations of the original each other, there is no redundant information.
variables so that a small number of new variables captures most of
the information.
Example: Engine Health Monitoring
You have a dataset that includes measurements for different

sensors on an engine (temperatures, pressures, emissions, and so
on). While much of the data comes from a healthy engine, the
sensors have also captured data from the engine when it needs
maintenance.
You cannot see any obvious abnormalities by looking at any

individual sensor. However, by applying PCA, you can transform
this data so that most variations in the sensor measurements
are captured by a small number of principal components. It is
easier to distinguish between a healthy and unhealthy engine by
inspecting these principal components than by looking at the raw
sensor data.

Using Factor Analysis
Your dataset might contain measured variables that overlap, In a factor analysis model, the measured variables depend on
meaning that they are dependent on one another. Factor a smaller number of unobserved (latent) factors. Because each
analysis lets you fit a model to multivariate data to estimate factor might affect several variables, it is known as a common
this sort of interdependence. factor. Each variable is assumed to be dependent on a linear
combination of the common factors.
Example: Tracking Stock Price Variation
Over the course of 100 weeks, the percent change in stock prices
has been recorded for ten companies. Of these ten, four are
technology companies, three are financial, and a further three
are retail. It seems reasonable to assume that the stock prices
for companies in the same sector will vary together as economic
conditions change. Factor analysis can provide quantitative
evidence to support this premise.

Using Nonnegative Matrix Factorization
This dimension reduction technique is based on a low-rank nonnegative, producing models that respect features such as
approximation of the feature space. In addition to reducing the nonnegativity of physical quantities.
the number of features, it guarantees that the features are
Example: Text Mining
Suppose you want to explore variations in vocabulary and style

among several web pages. You create a matrix where each
row corresponds to an individual web page and each column
corresponds to a word (the,a,we, and so on). The data will be
the number of times a particular word occurs on a particular page.
Since there more than a million words in the English language,

you apply nonnegative matrix factorization to create an arbitrary
number of features that represent higher-level concepts rather than
individual words. These concepts make it easier to distinguish
between, say, news, educational content, and online retail content.

Next Steps
In this section we took a closer look at hard and soft clustering

LOTS OF DATA
algorithms for unsupervised learning, offered some tips on selecting
the right algorithm for your data, and showed how reducing the
number of features in your dataset improves model performance. UNSUPERVISED
As for your next steps: LEARNING
Unsupervised learning might be your end goal. For example,

if you are doing market research and want to segment
consumer groups to target based on web site behavior, a
DATA CLUSTERS LOWER-DIMENSIONAL
clustering algorithm will almost certainly give you the results
DATA
youre looking for.
On the other hand, you might want to use unsupervised RESULTS
learning as a preprocessing step for supervised learning. FEATURE
For example, apply clustering techniques to derive a smaller SELECTION
number of features, and then use those features as inputs for
training a classifier.
SUPERVISED
In section 4 well explore supervised learning algorithms and LEARNING
techniques, and see how to improve models with feature selection,
feature reduction, and parameter tuning.
MODEL

Learn More
Ready for a deeper dive? Explore these unsupervised learning resources.
Clustering Algorithms Fuzzy C-Means Dimensionality

and Techniques Cluster Quasi-Random Data Using Reduction
Fuzzy C-Means Clustering Analyze Quality of Life in U.S. Cities
k-Means Using PCA
Gaussian Mixture Models
Use K-Means and Hierarchical Analyze Stock Prices Using Factor
Clustering to Find Natural Patterns Gaussian Process Regression Models Analysis
in Data Cluster Data from Mixture of Gaussian
Distributions Nonnegative Factorization
Cluster Genes Using K-Means and
Self-Organizing Maps Cluster Gaussian Mixture Data Using Perform Nonnegative Matrix
Soft Clustering Factorization
Color-Based Segmentation Using
K-Means Clustering Tune Gaussian Mixture Models Model Suburban Commuting Using
Subtractive Clustering
Hierarchical Clustering Image Processing Example: Detecting
Cars with Gaussian Mixture Models
Connectivity-Based Clustering
Iris Clustering
Self-Organizing Maps
Cluster Data with a
Self-Organizing Map
2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
80823v00

80823v00 Machine Learning Section3 Ebook v05 PDF

Uploaded by

Copyright:

Available Formats

80823v00 Machine Learning Section3 Ebook v05 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

80823v00 Machine Learning Section3 Ebook v05 PDF

Uploaded by

Copyright:

Available Formats

Applying

As we saw in section 1, most unsupervised learning techniques are

In cluster analysis, data is partitioned into groups based on some

Clustering algorithms fall into two broad groups:

Hard clustering, where each data point belongs to only

Use self-organizing feature maps or hierarchical

Use cluster evaluation to look for the best number

Applying Unsupervised Learning 3

Result: Cluster centers Result: Cluster centers that

Applying Unsupervised Learning 4

Hierarchical Clustering Self-Organizing Map

Applying Unsupervised Learning 5

Example: Using k-Means Clustering to Site Cell Phone Towers

A cell phone company wants to know the number and placement

The workflow begins with an initial guess at the number of clusters

A phone can only talk to one tower at a time, so this is a hard

Applying Unsupervised Learning 6

Fuzzy c-Means Gaussian Mixture Model

Result: Cluster centers Result: A

Applying Unsupervised Learning 7

Example: Using Fuzzy c-Means Clustering to Analyze

A team of biologists is analyzing gene expression data from

The microarray contains expression data from two tissue samples.

Applying Unsupervised Learning 8

Example: EEG Data Reduction

Suppose you have electroencephalogram (EEG) data that captures

Applying Unsupervised Learning 9

The three most commonly used dimensionality reduction

Principal component analysis (PCA)performs a linear

Factor analysisidentifies underlying correlations between

Nonnegative matrix factorizationused when model terms must

Applying Unsupervised Learning 10

Example: Engine Health Monitoring

You have a dataset that includes measurements for different

You cannot see any obvious abnormalities by looking at any

Applying Unsupervised Learning 11

Example: Tracking Stock Price Variation

Applying Unsupervised Learning 12

Example: Text Mining

Suppose you want to explore variations in vocabulary and style

Since there more than a million words in the English language,

Applying Unsupervised Learning 13

In this section we took a closer look at hard and soft clustering

Unsupervised learning might be your end goal. For example,

Applying Unsupervised Learning 14

Ready for a deeper dive? Explore these unsupervised learning resources.

Clustering Algorithms Fuzzy C-Means Dimensionality

You might also like