80823v00 Machine Learning Section3 Ebook v05 PDF
80823v00 Machine Learning Section3 Ebook v05 PDF
80823v00 Machine Learning Section3 Ebook v05 PDF
Unsupervised Learning
When to Consider
Unsupervised Learning
Unsupervised learning is useful when you want to explore your data but
dont yet have a specific goal or are not sure what information the data
contains. Its also a good way to reduce the dimensions of your data.
Unsupervised Learning Techniques
k-Means k-Medoids
How it Works How It Works
Partitions data into k number of mutually exclusive clusters. Similar to k-means, but with the requirement that the cluster
How well a point fits into a cluster is determined by the centers coincide with points in the data.
distance from that point to the clusters center.
Best Used...
Best Used...
When the number of clusters is known
When the number of clusters is known For fast clustering of categorical data
For fast clustering of large data sets To scale to large data sets
Result:
Result: Dendrogram showing Lower-dimensional
the hierarchical relationship (typically 2D)
between clusters representation
After running the algorithm, the team can accurately determine the
results of partitioning the data into three and four clusters.
After preprocessing the data to remove noise, they cluster the data.
Because the same genes can be involved in several biological
processes, no single gene is likely to belong to one cluster only.
The researchers apply a fuzzy c-means algorithm to the data. They
then visualize the clusters to identify groups of genes that behave in
a similar way.
Machine learning is an effective method for finding patterns in As datasets get bigger, you frequently need to reduce the
big datasets. But bigger data brings added complexity. number of features, or dimensionality.
In datasets with many variables, groups of variables often move Each principal component is a linear combination of the original
together. PCA takes advantage of this redundancy of information variables. Because all the principal components are orthogonal to
by generating new variables via linear combinations of the original each other, there is no redundant information.
variables so that a small number of new variables captures most of
the information.
Your dataset might contain measured variables that overlap, In a factor analysis model, the measured variables depend on
meaning that they are dependent on one another. Factor a smaller number of unobserved (latent) factors. Because each
analysis lets you fit a model to multivariate data to estimate factor might affect several variables, it is known as a common
this sort of interdependence. factor. Each variable is assumed to be dependent on a linear
combination of the common factors.
Over the course of 100 weeks, the percent change in stock prices
has been recorded for ten companies. Of these ten, four are
technology companies, three are financial, and a further three
are retail. It seems reasonable to assume that the stock prices
for companies in the same sector will vary together as economic
conditions change. Factor analysis can provide quantitative
evidence to support this premise.
This dimension reduction technique is based on a low-rank nonnegative, producing models that respect features such as
approximation of the feature space. In addition to reducing the nonnegativity of physical quantities.
the number of features, it guarantees that the features are
Iris Clustering
Self-Organizing Maps
Cluster Data with a
Self-Organizing Map
2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
80823v00