Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog
Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog
This resource is designed primarily for beginner to intermediate data scientists or analysts who are int
applying machine learning algorithms to address the problems of their interest.
A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is
The answer to the question varies depending on many factors, including:
Even an experienced data scientist cannot tell which algorithm will perform the best before trying diffe
advocating a one-and-done approach, but we do hope to provide some guidance on which algorithms
clear factors.
Editor's note: This post was originally published in 2017. We are republishing it with an u
this topic. You can watch How to Choose a Machine Learning Algorithm below. Or keep r
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 1/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
sheet that helps you find the right algorithm for your project.
Since the cheat sheet is designed for beginner data scientists and analysts, we will make some simpl
about the algorithms.
The algorithms recommended here result from compiled feedback and tips from several data scientist
and developers. There are several issues on which we have not reached an agreement and for these
commonality and reconcile the difference.
Additional algorithms will be added in later as our library grows to encompass a more complete set of
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 2/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
If you want to perform dimension reduction then use principal component analysis.
If you need a numeric prediction quickly, use decision trees or linear regression.
Sometimes more than one branch will apply, and other times none of them will be a perfect match. It’s
paths are intended to be rule-of-thumb recommendations, so some of the recommendations are not e
talked with said that the only sure way to find the very best algorithm is to try all of them.
Supervised learning
Supervised learning algorithms make predictions based on a set of examples. For example, historical
future prices. With supervised learning, you have an input variable that consists of labeled training dat
variable. You use an algorithm to analyze the training data to learn the function that maps the input to
function maps new, unknown examples by generalizing from the training data to anticipate results in u
Classification: When the data are being used to predict a categorical variable, supervised learn
This is the case when assigning a label or indicator, either dog or cat to an image. When there a
binary classification. When there are more than two categories, the problems are called multi-cla
Regression: When predicting continuous values, the problems become a regression problem.
Forecasting: This is the process of making predictions about the future based on past and pres
used to analyze trends. A common example might be an estimation of the next year sales based
year and previous years.
Semi-supervised learning
The challenge with supervised learning is that labeling data can be expensive and time-consuming. If
unlabeled examples to enhance supervised learning. Because the machine is not fully supervised in t
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 3/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
is semi-supervised. With semi-supervised learning, you use unlabeled examples with a small amount
learning accuracy.
Unsupervised learning
When performing unsupervised learning, the machine is presented with totally unlabeled data. It is as
patterns that underlie the data, such as a clustering structure, a low-dimensional manifold, or a sparse
Clustering: Grouping a set of data examples so that examples in one group (or one cluster) are
some criteria) than those in other groups. This is often used to segment the whole dataset into s
performed in each group to help users to find intrinsic patterns.
Dimension reduction: Reducing the number of variables under consideration. In many applica
high dimensional features and some features are redundant or irrelevant to the task. Reducing t
the true, latent relationship.
Reinforcement learning
Reinforcement learning is another branch of machine learning which is mainly utilized for sequential d
this type of machine learning, unlike supervised and unsupervised learning, we do not need to have a
the learning agent interacts with an environment and learns the optimal policy on the fly based on the
environment. Specifically, in each time step, an agent observes the environment’s state, chooses an a
feedback it receives from the environment. The feedback from an agent’s action has many important c
the resulting state of the environment after the agent has acted on it. Another component is the rewar
agent receives from performing that particular action in that particular state. The reward is carefully ch
for which we are training the agent. Using the state and reward, the agent updates its decision-making
term reward. With the recent advancements of deep learning, reinforcement learning gained significan
demonstrated striking performances in a wide range of applications such as games, robotics, and con
learning models such as Deep-Q and Fitted-Q networks in action, check out this article.
When presented with a dataset, the first thing to consider is how to obtain results, no matter what thos
Beginners tend to choose algorithms that are easy to implement and can obtain results quickly. This w
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 4/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
the first step in the process. Once you obtain some results and become familiar with the data, you ma
sophisticated algorithms to strengthen your understanding of the data, hence further improving
the res
Even in this stage, the best algorithms might not be the methods that have achieved the highest repor
usually requires careful tuning and extensive training to obtain its best achievable performance.
Linear regression is an approach for modeling the relationship between a continuous dependent varia
predictors X . The relationship between y and X can be linearly modeled as y = β T X + ϵ Given the
{xi , yi }
N
i=1
, the parameter vector β can be learnt.
If the dependent variable is not continuous but categorical, linear regression can be transformed to log
link function. Logistic regression is a simple, fast yet powerful classification algorithm. Here we discus
dependent variable y only takes binary values {yi ∈ (−1, 1)}
N
i=1
(it which can be easily extended to
problems).
In logistic regression we use a different hypothesis class to try to predict the probability that a given ex
versus the probability that it belongs to the "-1" class. Specifically, we will try to learn a function of the
p(yi = 1|xi ) = σ(β
T
xi ) and p(yi = −1|xi ) = 1 − σ(β
T
xi ) . Here σ(x) =
1
is a sigmo
1+exp(−x)
examples{xi , yi }N
i=1
, the parameter vector β can be learnt by maximizing the log-likelihood of β give
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 5/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
minimize ||w||
w
T
subject to yi (w Xi − b) ≥ 1, i = 1, … , n.
A support vector machine (SVM) training algorithm finds the classifier represented by the normal vect
This hyperplane (boundary) separates different classes by as wide a margin as possible. The problem
constrained optimization problem:
Kernel tricks are used to map a non-linearly separable functions into a higher
dimension linearly separable function.
When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separab
dimension linearly separable space.
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 6/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
When most dependent variables are numeric, logistic regression and SVM should be the first try for c
easy to implement, their parameters easy to tune, and the performances are also pretty good. So thes
beginners.
Decision trees, random forest and gradient boosting are all algorithms based on decision trees. There
trees, but they all do the same thing – subdivide the feature space into regions with mostly the same l
to understand and implement. However, they tend to over-fit data when we exhaust the branches and
Random Forrest and gradient boosting are two popular ways to use tree algorithms to achieve good a
the over-fitting problem.
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 7/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
Neural networks flourished in the mid-1980s due to their parallel and distributed processing ability. Bu
impeded by the ineffectiveness of the back-propagation training algorithm that is widely used to optim
networks. Support vector machines (SVM) and other simpler models, which can be easily trained by s
problems, gradually replaced neural networks in machine learning.
In recent years, new and improved training techniques such as unsupervised pre-training and layer-w
a resurgence of interest in neural networks. Increasingly powerful computational capabilities, such as
(GPU) and massively parallel processing (MPP), have also spurred the revived adoption of neural net
in neural networks has given rise to the invention of models with thousands of layers.
A neural network consists of three parts: input layer, hidden layers and
output layer. The training samples define the input and output layers. A neural network i
When the output layer is a categorical variable, then the neural
network is a way to address classification problems. When the output
layer is a continuous variable, then the network can be used to do regression. When the output layer
the network can be used to extract intrinsic features. The number of hidden layers defines the model c
capacity.
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 8/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
Kmeans/k-modes, GMM clustering aims to partition n observations into k clusters. K-means define ha
to be and only to be associated to one cluster. GMM, however, defines a soft assignment for each sam
probability to be associated with each cluster. Both algorithms are simple and fast enough for clusterin
k is given.
DBSCAN
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 9/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used b
density diffusion.
Hierarchical clustering
Hierarchical partitions can be visualized using a tree structure (a dendrogram). It does not need the n
and the partitions can be viewed at different levels of granularities (i.e., can refine/coarsen clusters) us
PCA is an unsupervised clustering method that maps the original data space into a lower-dimensiona
much information as possible. The PCA basically finds a subspace that most preserve the data varian
by the dominant eigenvectors of the data’s covariance matrix.
The SVD is related to PCA in the sense that the SVD of the centered data matrix (features versus sam
left singular vectors that define the same subspace as found by PCA. However, SVD is a more versat
things that PCA may not do. For example, the SVD of a user-versus-movie matrix is able to extract th
profiles that can be used in a recommendation system. In addition, SVD is also widely used as a topic
semantic analysis, in natural language processing (NLP).
A related technique in NLP is latent Dirichlet allocation (LDA). LDA is a probabilistic topic model and it
topics in a similar way as a Gaussian mixture model (GMM) decomposes continuous data into Gauss
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 10/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
the GMM, an LDA models discrete data (words in documents) and it constrains that the topics are a p
Dirichlet distribution.
Conclusions
This is the work flow which is easy to follow. The takeaway messages when trying to solve a new prob
SAS Visual Data Mining and Machine Learning provides a good platform for beginners to learn mach
learning methods to their problems. Sign up for a free trial today!
WANT MORE GREAT INSIGHTS MONTHLY? | SUBSCRIBE TO THE SAS TECH REPORT
Tags machine learning algorithms machine learning data science basics data science
regression
Share
ABOUT AUTHOR
Hui Li
Until her passing in March 2019, Dr. Hui Li was a Principal Staff Scientist of Data Science
most memorable contribution on this blog is her guide to machine language algorithms, wh
by millions of data science enthusiasts around the world. Dr. Li's work focused on deep lea
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 11/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
SAS recommendation systems in SAS Viya. She received her PhD degree and Master’s d
Computer Engineering from Duke University. Before joining SAS, she worked at Duke Univ
and at Signal Innovation Group, Inc. as a research engineer. Her research interests includ
heterogeneous data, collaborative filtering recommendations, Bayesian statistical modeling
RELATED POSTS
A… A…
May May
05, 05,
2022 2022
Su W
sta ho
9 COMMENTS
Thank you for the cheat-sheet, it provides a nice taxonomy for people to understand the relation
use it in my machine learning class to help students round out their world view.
Thank Daymond.
Let us know if you have any questions when teaching the students using the information.
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 12/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
This is a great cheat-sheet to understand and remember the relationship between the most usu
I have not seen something similar like this published online yet.
I think it could be nice to incorporate the "cost" variable, the principal’s reasons why each selec
examples of applications for each one. I know that this suggestion means a lot of work and sca
Anyway, it could be a nice new project to be done, don’t you think so?
Thanks, Hector. Incorporating the "cost" variable is a pretty wider area in machine learnin
considered as a subfield of reinforcement learning -- based on the cost (reward), the age
he/she wants to take. I considered this problem for a while and haven't found a good exa
I have a time, I will write a blog specifically for the reinforcement learning.
Thank you.
Excellent summary but I think the target audience is a few steps beyond "beginner". I showed th
study machine learning, and they were overwhelmed.
Great blog, thank you. I will use it when talking to non tech companies about starting doing ML
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 13/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
Very helpful and concise, I was saddened to read that the author passed away in March 2019.
LEAVE A REPLY
Your Comment
Your Name
Your Email
Your Website
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 14/15
5/10/22, 10:06 AM Which machine learning algorithm should I use? - The SAS Data Science Blog
Contact Us
Follow Us
https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/ 15/15