Slides - Machine Learning and Advanced Analytics Using Python
Slides - Machine Learning and Advanced Analytics Using Python
Machine Learning and Advanced
Analytics using Python
1
What will you learn in this course?
Recap of Python, Supervised & Unsupervised
Machine Learning algorithms, and model
performance evaluation using python.
1
1/8/2021
Module – 1
1 2 3 4 5
Recap of Introduction Supervised Unsupervised K-Means
Python to Machine Machine Machine Clustering
Learning Learning Learning
What are the common
packages you used in Python
before?
2
1/8/2021
Python Libraries
Data is everywhere today!!
Facebook: 10
Google: processes Youtube: 1 hour of
million photos
24 peta bytes of video uploaded
uploaded every
data per day. every second.
hour.
“By 2020 the
Twitter: 400 Astronomy:
digital universe
million tweets per Satellite data is in
will reach 44
day. hundreds of PB.
zettabytes...”
6
3
1/8/2021
Let’s discuss
What is machine learning?
4
1/8/2021
What is Learning?
1 2 3
“Learning denotes “Learning is “Machine learning
changes in a system constructing or refers to a system
that ... enable a system modifying capable of the
to do the same task … representations of what autonomous acquisition
more efficiently the next is being experienced.” ‐ and integration of
time.” ‐ Herbert Simon Ryszard Michalski knowledge.”
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
10
5
1/8/2021
Learning System Model
Testing
Input Learning
Samples Method
System
Training
11
Why is machine learning required?
Need for
Black‐box Rapidly
Lack of human customization
human changing
experts and
expertise phenomenon
personalization
12
6
1/8/2021
A classic example of a task that requires machine learning: It is very
hard to say what makes a 2
13
Recognizing patterns:
• Facial identities or facial expressions
• Handwritten or spoken words
Some • Medical images
Generating patterns:
examples that
• Generating images or motion sequences
machine
Recognizing anomalies:
learning • Unusual credit card transactions
solves • Unusual patterns of sensor readings in a nuclear
power plant
Prediction:
• Future stock prices or currency exchange rates
14
7
1/8/2021
3 vital things to define
Task: Recognizing hand‐written words
Performance Metric: Percentage of
words correctly classified
Experience: Database of human‐labeled
images of handwritten words
15
Supervised (inductive) learning –
• Given: training data + desired outputs (labels)
Unsupervised learning –
Types of • Given: training data (without desired outputs)
Learning Semi‐supervised learning –
• Given: training data + a few desired outputs
Reinforcement learning –
• Rewards from sequence of actions
16
8
1/8/2021
Supervised learning Unsupervised learning
Semi‐supervised learning
17
Training and
Test Sets
18
9
1/8/2021
• The data has no target attribute.
Unsupervised • We want to explore the data to find
Learning some intrinsic structures in them.
19
Clustering
What is Clustering?
20
10
1/8/2021
• Clustering is a technique for finding similarity groups in data, called clusters.
I.e.,
• It groups data instances that are similar to (near) each other in one
cluster and data instances that are very different (far away) from each
other into different clusters.
21
Think of it like
this – In layman
figures
22
11
1/8/2021
K‐Means
Algorithm
A Clustering Technique
23
K‐means is a partitional clustering algorithm
The k‐means algorithm partitions the given data into k clusters.
Each cluster has a cluster center, called
k is specified by the user
centroid.
24
12
1/8/2021
Given k, the k‐
means algorithm
How does it works as follows:
work?
Randomly
choose k data Re‐compute
Assign each If a
points (seeds) the centroids
data point to convergence
to be the using the
the closest criterion is not
initial current cluster
centroid met, go to 2).
centroids, memberships.
cluster centers
25
Simple: easy to understand and to implement
Efficient
Strengths & K‐means is the most popular clustering algorithm.
Weaknesses The algorithm is only applicable if the mean is defined.
The user needs to specify k.
The algorithm is sensitive to outliers
26
13
1/8/2021
Let’s dive
• Using Google Colab ‐
straight to the https://colab.research.google.com
Hands‐On
27
Module – 2
1 2 3 4 5
Self Clustering in Supervised Feature Support Vector
Organizing Real Use Machine Engineering & Machines
Maps (SOM) Cases Learning Feature Sets (SVM)
28
14
1/8/2021
Self Organizing Maps (SOM) is
an unsupervised neural network
technique that approximates an
unlimited number of input data
by a finite set of models
arranged in a grid, where
neighbor nodes correspond to
more similar models.
The models are produced by a
learning algorithm that
automatically orders them on
the two‐dimensional grid along
with their mutual similarity.
29
Clustering ‐ Real life Examples
Example 1: groups people of similar sizes together to make “small”, “medium” and “large” T‐
Shirts.
Tailor‐made for each person: too
One‐size‐fits‐all: does not fit all.
expensive
Example 2: In marketing, segment customers according to their similarities
To do targeted marketing.
30
15
1/8/2021
• Data includes both the input and the
Supervised Learning desired results.
31
Think of the following examples.
• An emergency room in a hospital measures 17 variables (e.g., blood pressure,
age, etc) of newly admitted patients.
• A decision is needed: whether to put a new patient in an intensive‐care unit.
• Due to the high cost of ICU, those patients who may survive less than a month
are given higher priority.
• Problem: to predict high‐risk patients and discriminate them from low‐risk
patients.
32
16
1/8/2021
Another example..
• A credit card company receives lots of applications for new cards. Each application
contains information about the applicant for the card,
• age
• Marital status
• annual salary
• location
• outstanding debts
• credit rating
• Family information etc
• Problem: to decide whether an application should be approved or not approved.
33
Our focus: To learn a target function that can be
used to predict the values of a discrete class
attribute, e.g., approve or not‐approved, and
high‐risk or low risk.
34
17
1/8/2021
Feature engineering
• The first thing we need to do when creating a machine
learning model is to decide what to use as features.
• Features are the to the model, like a person’s name or
favorite color. pieces of information that we take from the
text and give to the algorithm so it can work its magic.
• For example, if we were doing classification on health,
some features could be a person’s height, weight, gender,
and so on. We would exclude things that maybe are
known but aren’t useful
35
Benefits of Feature Engineering
& Feature Sets
• Reduces Overfitting : Less redundant data means less
opportunity to make decisions based on noise.
• Improves Accuracy : Less misleading data means modeling
accuracy improves.
• Reduces Training Time : Fewer data points reduce
algorithm complexity and algorithms train faster.
36
18
1/8/2021
Let’s dive
• Using Google Colab ‐
straight to the https://colab.research.google.com
Hands‐On
37
Support Vector Machines
What is it and what is the algorithm?
38
19
1/8/2021
• SVMs are linear or non‐linear classifiers that
find a hyperplane to separate two class of data,
positive and negative.
What are • SVM not only has a rigorous theoretical
SVMs? foundation, but also performs classification
more accurately than most other methods in
applications, especially for high dimensional
data
39
• The hyperplane that separates positive and negative training
data is
〈w ⋅ x〉 + b = 0
• It is also called the decision boundary (surface).
What is a
hyperplane?
40
20
1/8/2021
How to choose the
best hyperplane?
• SVM looks for the
separating hyperplane with
the largest margin.
• Machine learning theory
says this hyperplane
minimizes the error bound
41
• Accuracy
Pros • Works well on smaller cleaner datasets
• It can be more efficient because it uses a subset of
training points
• Isn’t suited to larger datasets as the training time
Cons with SVMs can be high
• Less effective on noisier datasets with overlapping
classes
42
21
1/8/2021
• It is a performance
measurement for machine
learning classification
problem where output can
be two or more classes. It is
a table with 4 different
combinations of predicted
and actual values.
Confusion Matrix
43
Let’s dive
• Using Google Colab ‐
straight to the https://colab.research.google.com
Hands‐On
44
22
1/8/2021
Machine Learning and Advanced
Analytics using Python
45
Module ‐ 3
1 2 3
Decision Trees Bayesian ML in real use
Networks cases
46
23
1/8/2021
Recap what we discussed yesterday..
• An emergency room in a hospital measures 17 variables (e.g., blood pressure,
age, etc) of newly admitted patients.
• A decision is needed: whether to put a new patient in an intensive‐care unit.
• Due to the high cost of ICU, those patients who may survive less than a month
are given higher priority.
• Problem: to predict high‐risk patients and discriminate them from low‐risk
patients.
47
Another example..
• A credit card company receives lots of applications for new cards. Each application
contains information about the applicant for the card,
• age
• Marital status
• annual salary
• location
• outstanding debts
• credit rating
• Family information etc
• Problem: to decide whether an application should be approved or not approved.
48
24
1/8/2021
Our focus: To learn a target function that can be
used to predict the values of a discrete class
attribute, e.g., approve or not‐approved, and
high‐risk or low risk.
49
Decision Trees
What are they and what is the algorithm?
50
25
1/8/2021
Decision tree learning
is one of the most
widely used techniques
for classification.
Introduction
The classification
model is a tree, called
decision tree.
51
52
26
1/8/2021
A decision tree can be converted to a set of rules
53
Let’s
understand how
it works.
54
27
1/8/2021
Let’s dive
• Using Google Colab ‐
straight to the https://colab.research.google.com
Hands‐On
55
How do you evaluate classifiers?
Predictive Accuracy
56
28
1/8/2021
Efficiency
• time to construct
the model
• time to use the
Robustness
model
Other factors
Scalability Interpretability
57
Naïve Bayes
What is it and what is the algorithm?
58
29
1/8/2021
In simple terms, a Naive Bayes classifier assumes that
the presence of a feature in a class is unrelated to the
presence of any other feature.
What is it?
For example, a fruit may be an apple if it is red, round,
and about 3 inches in diameter. Even if these features
depend on each other or upon the existence of the
other features, all of these properties independently
contribute to the probability that this fruit is an apple
and that is why it is known as ‘Naive’.
59
• Naive Bayes model is
easy to build and
particularly useful for
very large data sets.
Along with simplicity,
Naive Bayes is known
to outperform even
highly sophisticated
classification methods.
60
30
1/8/2021
Naïve Bayes Example
61
Pros and Cons of Naïve Bayes
Advantages:
Good results obtained
Easy to implement Very efficient
in many applications
Disadvantages
Assumption: class conditional independence, therefore loss of accuracy when
the assumption is seriously violated (those highly correlated data sets)
62
31
1/8/2021
Let’s dive
• Using Google Colab ‐
straight to the https://colab.research.google.com
Hands‐On
63
What are the industry
use cases of Machine
Learning?
64
32
1/8/2021
65
Source: bigdata4analytics
Module – 4
01 02 03 04
Logistic Model Performance Optimizing
Regression Evaluation Metrics ML Models
66
33
1/8/2021
Logistic Regression
What is it and what is the algorithm?
67
What is the difference between
Linear Regression & Logistic
Regression?
68
34
1/8/2021
What is linear regression?
• Linear regression quantifies the relationship
between one or more predictor variables and
one outcome variable.
• For example, linear regression can be used to
quantify the relative impacts of age, gender, and
diet (the predictor variables) on height (the
outcome variable).
69
Example
Sales = 168 + 23
Advertising 70
35
1/8/2021
What is logistic regression?
• Logistic regression is the appropriate regression
analysis to conduct when the dependent
variable is binary.
• Like all regression analyses, the logistic
regression is a predictive analysis.
• Logistic regression is used to describe data and
to explain the relationship between one
dependent binary variable and one or more
nominal, ordinal, interval or ratio‐level
independent variables.
71
72
36
1/8/2021
• Model Evaluation is an integral part of
the model development process.
• It helps to find the best model that
represents our data and how well the
chosen model will work in the future.
• Evaluating model performance with the
Model Evaluation data used for training is not acceptable
in data science because it can easily
generate overoptimistic and overfitted
models.
• There are two main types
• Hold‐out
• Cross‐Validation
73
Performance Metrics (Regression)
• Mean Absolute Error ‐ Sum of the absolute
differences between predictions and actual
values.
• Mean Squared Error ‐ Measures the average of
the squares of the errors—that is, the average
squared difference between the estimated values
and what is estimated.
74
37
1/8/2021
Confusion Matrix
Accuracy
Performance
Metrics Precision
(Classification)
Recall or Sensitivity
F1 Score
75
Optimizing is necessary to tune the hyperparameters to
ensure that the model performs upto par.
Optimizing
ML Models
There are two basic methods of hyperparameter
optimization:
Grid Search Random Search
76
38
1/8/2021
Let’s dive
• Using Google Colab ‐
straight to the https://colab.research.google.com
Hands‐On
77
Thank you
Quick Recap
78
39