Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog

5/10/22, 10:06 AM Which machine learning algorithm should I use?
- The SAS Data Science Blog
Blogs All Topics  All Industries 
Which machine learning algorith

should I use?
By Hui Li on The SAS Data Science Blog
| December 9, 2020
Topics | Advanced Analytics Machine Learning
This resource is designed primarily for beginner to intermediate data scientists or analysts who are int
applying machine learning algorithms to address the problems of their interest.
A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is
The answer to the question varies depending on many factors, including:
The size, quality, and nature of data.
The available computational time.

The urgency of the task.
What you want to do with the data.
Even an experienced data scientist cannot tell which algorithm will perform the best before trying diffe
advocating a one-and-done approach, but we do hope to provide some guidance on which algorithms
clear factors.
Editor's note: This post was originally published in 2017. We are republishing it with an u
this topic. You can watch How to Choose a Machine Learning Algorithm below. Or keep r
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 1/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
sheet that helps you find the right algorithm for your project.

The machine learning algorithm cheat sheet

The machine learning algorithm cheat sheet helps you to choose from a variety of machine learnin
appropriate algorithm for your specific problems. This article walks you through the process of how to
Since the cheat sheet is designed for beginner data scientists and analysts, we will make some simpl
about the algorithms.
The algorithms recommended here result from compiled feedback and tips from several data scientist
and developers. There are several issues on which we have not reached an agreement and for these
commonality and reconcile the difference.
Additional algorithms will be added in later as our library grows to encompass a more complete set of
How to use the cheat sheet

Read the path and algorithm labels on the chart as "If <path label> then use <algorithm>." For examp
If you want to perform dimension reduction then use principal component analysis.
If you need a numeric prediction quickly, use decision trees or linear regression.
If you need a hierarchical result, use hierarchical clustering.
Sometimes more than one branch will apply, and other times none of them will be a perfect match. It’s
paths are intended to be rule-of-thumb recommendations, so some of the recommendations are not e
talked with said that the only sure way to find the very best algorithm is to try all of them.
Types of machine learning algorithms

This section provides an overview of the most popular types of machine learning. If you’re familiar wit
move on to discussing specific algorithms, you can skip this section and go to “When to use specific a
Supervised learning
Supervised learning algorithms make predictions based on a set of examples. For example, historical
future prices. With supervised learning, you have an input variable that consists of labeled training dat
variable. You use an algorithm to analyze the training data to learn the function that maps the input to
function maps new, unknown examples by generalizing from the training data to anticipate results in u
Classification: When the data are being used to predict a categorical variable, supervised learn
This is the case when assigning a label or indicator, either dog or cat to an image. When there a
binary classification. When there are more than two categories, the problems are called multi-cla
Regression: When predicting continuous values, the problems become a regression problem.
Forecasting: This is the process of making predictions about the future based on past and pres
used to analyze trends. A common example might be an estimation of the next year sales based
year and previous years.
Semi-supervised learning
The challenge with supervised learning is that labeling data can be expensive and time-consuming. If
unlabeled examples to enhance supervised learning. Because the machine is not fully supervised in t
is semi-supervised. With semi-supervised learning, you use unlabeled examples with a small amount
learning accuracy.
Unsupervised learning
When performing unsupervised learning, the machine is presented with totally unlabeled data. It is as
patterns that underlie the data, such as a clustering structure, a low-dimensional manifold, or a sparse
Clustering: Grouping a set of data examples so that examples in one group (or one cluster) are
some criteria) than those in other groups. This is often used to segment the whole dataset into s
performed in each group to help users to find intrinsic patterns.
Dimension reduction: Reducing the number of variables under consideration. In many applica
high dimensional features and some features are redundant or irrelevant to the task. Reducing t
the true, latent relationship.
Reinforcement learning
Reinforcement learning is another branch of machine learning which is mainly utilized for sequential d
this type of machine learning, unlike supervised and unsupervised learning, we do not need to have a
the learning agent interacts with an environment and learns the optimal policy on the fly based on the
environment. Specifically, in each time step, an agent observes the environment’s state, chooses an a
feedback it receives from the environment. The feedback from an agent’s action has many important c
the resulting state of the environment after the agent has acted on it. Another component is the rewar
agent receives from performing that particular action in that particular state. The reward is carefully ch
for which we are training the agent. Using the state and reward, the agent updates its decision-making
term reward. With the recent advancements of deep learning, reinforcement learning gained significan
demonstrated striking performances in a wide range of applications such as games, robotics, and con
learning models such as Deep-Q and Fitted-Q networks in action, check out this article.
Considerations when choosing an algorithm

When choosing an algorithm, always take these aspects into account: accuracy, training time and eas
accuracy first, while beginners tend to focus on algorithms they know best.
When presented with a dataset, the first thing to consider is how to obtain results, no matter what thos
Beginners tend to choose algorithms that are easy to implement and can obtain results quickly. This w
the first step in the process. Once you obtain some results and become familiar with the data, you ma
sophisticated algorithms to strengthen your understanding of the data, hence further improving
the res
Even in this stage, the best algorithms might not be the methods that have achieved the highest repor
usually requires careful tuning and extensive training to obtain its best achievable performance.
When to use specific algorithms

Looking more closely at individual algorithms can help you understand what they provide and how the
provide more details and give additional tips for when to use specific algorithms, in alignment with the
Linear regression and Logistic regression
Linear regression Logistic regression
Linear regression is an approach for modeling the relationship between a continuous dependent varia
predictors X . The relationship between y and X can be linearly modeled as y = β T X + ϵ Given the
{xi , yi }
N
i=1
, the parameter vector β can be learnt.
If the dependent variable is not continuous but categorical, linear regression can be transformed to log
link function. Logistic regression is a simple, fast yet powerful classification algorithm. Here we discus
dependent variable y only takes binary values {yi ∈ (−1, 1)}
N
i=1
(it which can be easily extended to
problems).
In logistic regression we use a different hypothesis class to try to predict the probability that a given ex
versus the probability that it belongs to the "-1" class. Specifically, we will try to learn a function of the
p(yi = 1|xi ) = σ(β
T
xi ) and p(yi = −1|xi ) = 1 − σ(β
T
xi ) . Here σ(x) =
1
is a sigmo
1+exp(−x)
examples{xi , yi }N
i=1
, the parameter vector β can be learnt by maximizing the log-likelihood of β give
Group By Linear Regression Logistic Regression in S
Linear SVM and kernel SVM

Kernel tricks are used to map a non-linearly separable functions into a higher dimension linearly sepa
machine (SVM) training algorithm finds the classifier represented by the normal vector w and bias b o
hyperplane (boundary) separates different classes by as wide a margin as possible. The problem can
constrained optimization problem:
minimize ||w||
w
T
subject to yi (w Xi − b) ≥ 1, i = 1, … , n.
A support vector machine (SVM) training algorithm finds the classifier represented by the normal vect
This hyperplane (boundary) separates different classes by as wide a margin as possible. The problem
constrained optimization problem:
Kernel tricks are used to map a non-linearly separable functions into a higher
dimension linearly separable function.
When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separab
dimension linearly separable space.
When most dependent variables are numeric, logistic regression and SVM should be the first try for c
easy to implement, their parameters easy to tune, and the performances are also pretty good. So thes
beginners.
Trees and ensemble trees
A decision tree for prediction model
Decision trees, random forest and gradient boosting are all algorithms based on decision trees. There
trees, but they all do the same thing – subdivide the feature space into regions with mostly the same l
to understand and implement. However, they tend to over-fit data when we exhaust the branches and
Random Forrest and gradient boosting are two popular ways to use tree algorithms to achieve good a
the over-fitting problem.
Neural networks and deep learning
A convolution neural network architecture (image source: wikipedia creative commons)
Neural networks flourished in the mid-1980s due to their parallel and distributed processing ability. Bu
impeded by the ineffectiveness of the back-propagation training algorithm that is widely used to optim
networks. Support vector machines (SVM) and other simpler models, which can be easily trained by s
problems, gradually replaced neural networks in machine learning.
In recent years, new and improved training techniques such as unsupervised pre-training and layer-w
a resurgence of interest in neural networks. Increasingly powerful computational capabilities, such as
(GPU) and massively parallel processing (MPP), have also spurred the revived adoption of neural net
in neural networks has given rise to the invention of models with thousands of layers.
In other words, shallow neural networks have evolved into deep

learning neural networks. Deep neural networks have been very
successful for supervised learning. When used for speech and image
recognition, deep learning performs as well as, or even better than,
humans. Applied to unsupervised learning tasks, such as feature
extraction, deep learning also extracts features from raw images or
speech with much less human intervention.
A neural network consists of three parts: input layer, hidden layers and
output layer. The training samples define the input and output layers. A neural network i
When the output layer is a categorical variable, then the neural
network is a way to address classification problems. When the output
layer is a continuous variable, then the network can be used to do regression. When the output layer
the network can be used to extract intrinsic features. The number of hidden layers defines the model c
capacity.
Deep Learning: What it is and why it matters

k-means/k-modes, GMM (Gaussian mixture model) clustering
K Means Clustering Gaussian Mixtu
Kmeans/k-modes, GMM clustering aims to partition n observations into k clusters. K-means define ha
to be and only to be associated to one cluster. GMM, however, defines a soft assignment for each sam
probability to be associated with each cluster. Both algorithms are simple and fast enough for clusterin
k is given.
DBSCAN

A DBSCAN illustration (image source: Wikipedia)
When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used b
density diffusion.
Hierarchical clustering
Hierarchical partitions can be visualized using a tree structure (a dendrogram). It does not need the n
and the partitions can be viewed at different levels of granularities (i.e., can refine/coarsen clusters) us
PCA, SVD and LDA

We generally do not want to feed a large number of features directly into a machine learning algorithm
irrelevant or the “intrinsic” dimensionality may be smaller than the number of features. Principal compo
value decomposition (SVD), and latent Dirichlet allocation (LDA) all can be used to perform dimension
PCA is an unsupervised clustering method that maps the original data space into a lower-dimensiona
much information as possible. The PCA basically finds a subspace that most preserve the data varian
by the dominant eigenvectors of the data’s covariance matrix.
The SVD is related to PCA in the sense that the SVD of the centered data matrix (features versus sam
left singular vectors that define the same subspace as found by PCA. However, SVD is a more versat
things that PCA may not do. For example, the SVD of a user-versus-movie matrix is able to extract th
profiles that can be used in a recommendation system. In addition, SVD is also widely used as a topic
semantic analysis, in natural language processing (NLP).
A related technique in NLP is latent Dirichlet allocation (LDA). LDA is a probabilistic topic model and it
topics in a similar way as a Gaussian mixture model (GMM) decomposes continuous data into Gauss
the GMM, an LDA models discrete data (words in documents) and it constrains that the topics are a p
Dirichlet distribution.
Conclusions
This is the work flow which is easy to follow. The takeaway messages when trying to solve a new prob
Define the problem. What problems do you want to solve?

Start simple. Be familiar with the data and the baseline results.
Then try something more complicated.
SAS Visual Data Mining and Machine Learning provides a good platform for beginners to learn mach
learning methods to their problems. Sign up for a free trial today!
WANT MORE GREAT INSIGHTS MONTHLY? | SUBSCRIBE TO THE SAS TECH REPORT
Tags machine learning algorithms machine learning data science basics data science
regression
Share 


ABOUT AUTHOR
Hui Li
Principal Staff Scientist, Data Science
Until her passing in March 2019, Dr. Hui Li was a Principal Staff Scientist of Data Science
most memorable contribution on this blog is her guide to machine language algorithms, wh
by millions of data science enthusiasts around the world. Dr. Li's work focused on deep lea
SAS recommendation systems in SAS Viya. She received her PhD degree and Master’s d
Computer Engineering from Duke University. Before joining SAS, she worked at Duke Univ
and at Signal Innovation Group, Inc. as a research engineer. Her research interests includ
heterogeneous data, collaborative filtering recommendations, Bayesian statistical modeling
RELATED POSTS
A… A…
May May
05, 05,
2022 2022
Su W
sta ho
Jason Burke Bryan Harris
9 COMMENTS
Daymond Ling on April 12, 2017 7:58 pm
Thank you for the cheat-sheet, it provides a nice taxonomy for people to understand the relation
use it in my machine learning class to help students round out their world view.
Hui Li on April 17, 2017 9:54 am
Thank Daymond.
Let us know if you have any questions when teaching the students using the information.
Hector Alvaro Rojas on April 21, 2017 11:12 am
This is a great cheat-sheet to understand and remember the relationship between the most usu
I have not seen something similar like this published online yet.
I think it could be nice to incorporate the "cost" variable, the principal’s reasons why each selec
examples of applications for each one. I know that this suggestion means a lot of work and sca
Anyway, it could be a nice new project to be done, don’t you think so?
Congratulations for the work already done anyway!
Thanks, Hector. Incorporating the "cost" variable is a pretty wider area in machine learnin
considered as a subfield of reinforcement learning -- based on the cost (reward), the age
he/she wants to take. I considered this problem for a while and haven't found a good exa
I have a time, I will write a blog specifically for the reinforcement learning.
charles on April 24, 2017 9:54 am
An excellent blog. Thank you
Thank you.
Don Maclean on April 25, 2017 11:34 am
Excellent summary but I think the target audience is a few steps beyond "beginner". I showed th
study machine learning, and they were overwhelmed.
Anastassia Dr Lauterbach on April 26, 2017 6:31 am
Great blog, thank you. I will use it when talking to non tech companies about starting doing ML
Anmar Abdul-Rahman on December 10, 2020 11:07 pm
Very helpful and concise, I was saddened to read that the author passed away in March 2019.

LEAVE A REPLY
Your Comment
Your Name
Your Email
Your Website
Contact Us 

Follow Us

Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog

Uploaded by

Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog

Uploaded by

5/10/22, 10:06 AM Which machine learning algorithm should I use?

- The SAS Data Science Blog

Blogs All Topics  All Industries 

Which machine learning algorith

Topics | Advanced Analytics Machine Learning

The size, quality, and nature of data.

The available computational time.

The machine learning algorithm cheat sheet

How to use the cheat sheet

If you need a hierarchical result, use hierarchical clustering.

Types of machine learning algorithms

Considerations when choosing an algorithm

When to use specific algorithms

Linear regression and Logistic regression

Linear regression Logistic regression

Group By Linear Regression Logistic Regression in S

Linear SVM and kernel SVM

Trees and ensemble trees

A decision tree for prediction model

Neural networks and deep learning

A convolution neural network architecture (image source: wikipedia creative commons)

In other words, shallow neural networks have evolved into deep

Deep Learning: What it is and why it matters

k-means/k-modes, GMM (Gaussian mixture model) clustering

K Means Clustering Gaussian Mixtu

A DBSCAN illustration (image source: Wikipedia)

PCA, SVD and LDA

Define the problem. What problems do you want to solve?

Principal Staff Scientist, Data Science

Jason Burke Bryan Harris

Daymond Ling on April 12, 2017 7:58 pm

Hui Li on April 17, 2017 9:54 am

Hector Alvaro Rojas on April 21, 2017 11:12 am

Congratulations for the work already done anyway!

Hui Li on April 24, 2017 11:29 am

charles on April 24, 2017 9:54 am

An excellent blog. Thank you

Hui Li on April 24, 2017 11:30 am

Don Maclean on April 25, 2017 11:34 am

Anastassia Dr Lauterbach on April 26, 2017 6:31 am

Anmar Abdul-Rahman on December 10, 2020 11:07 pm

You might also like