Module 2 - PART 1

PRESIDENCY UNIVERSITY
Bengaluru
Module 2
Supervised machine learning algorithms
Agenda
• Introduction to the Machine Learning (ML) Framework
• Types of ML,
• Types of variables/features used in ML algorithms,
• One-hot encoding,
• Simple Linear Regression,
• Multiple Linear Regression,
• Evaluation metrics for regression model
Artificial Intelligence
2
Agenda
• Classification models
• Decision Tree algorithms using Entropy and Gini
Index as measures of node impurity,
• Model evaluation metrics for classification
algorithms,
• Cohen's Kappa Statistic,
• Multi-class classification
• Class Imbalance problem.
• Naïve Bayes Classifiers
• Naive Bayes model for sentiment classification
3
What is Machine Learning
 Definition
 A Machine Learning system learns from historical data, builds the
prediction models, and whenever it receives new data, predicts the
output for it.
 The accuracy of predicted output depends upon the amount of data,
as the huge amount of data helps to build a better model which
predicts the output more accurately.
 Importance of machine learning
 Finding hidden patterns and extracting useful information from data
 Solving complex problems and decision making in many fields
(applications)
 Feature of ML: Data-Driven Technology
 similar to data mining as it also deals with huge amount of data.
 uses data to detect various patterns in a given dataset
 learn from past data and improve automatically.
4
Applications of machine learning
5
Lifecycle of machine learning
Gathering Data
 To identify the different data sources, as data can be collected
from various sources such as files, database or internet.
 The quantity and quality of the collected data will determine
the accuracy of the prediction and efficiency of the output.
 This step includes the below tasks:
 Identify various data sources
 Collect data
 Integrate the data obtained from different sources – This
coherent set of data is called dataset
6
Data preparation: This step can be further divided into two
processes:
 Data exploration: To understand the characteristics, format,
and quality of data to find Correlations, general trends, and
outliers for an effective outcome.
 Data pre-processing: Cleaning of data is required to
address the quality issues: Missing Values, Duplicate data,
Invalid data and Noise, which can be solved using filtering
techniques.
Data Wrangling
 Reorganizing, mapping and transforming raw, unstructured
data to a useable format.
 This step involves data aggregation and data visualization.
7
Data Analysis
 The aim of this step is to build a machine learning model to analyze
the data and review the outcome.
Train Model
 Datasets are used to train the model using various machine learning
algorithms – to understand various patterns, rules, and, features.
Test Model
 Tests accuracy of the model with respect to the requirements of
project or problem.
Deployment
 Performance of the project is checked with the available data and
deployed which is similar to making the final report for a project.
8
Difference between AI & ML
Artificial Intelligence Machine learning
Artificial intelligence is a Machine learning is a subset of AI
technology which enables a which allows a machine to
machine to simulate human automatically learn from past data
behavior. without programming explicitly.
AI is working to create an Machine learning is working to
intelligent system which can create machines that can perform
perform various complex tasks. only those specific tasks for which
they are trained.
The main applications of AI The main applications of machine
are Siri, Expert System, Online learning are Online recommender
game playing, intelligent system and Google search
humanoid robot, etc. algorithms
It includes learning, reasoning, It includes learning and self-
and self-correction. correction when introduced with
new data.
9
Machine learning - dataset
 A dataset is a collection of data in which data is arranged in some
order. A dataset can contain any data from a series of an array to a
database table.
 Types of data in datasets
o Numerical data: Such as house price, temperature, etc.
o Categorical data: Such as Yes/No, True/False, Blue/green, etc.
o Ordinal data: These data are similar to categorical data but
can be measured on the basis of comparison.
 Types of Datasets
o Training Dataset: This data set is used to train the model i.e.;
these datasets are used to update the weight of the model.
10
Contd…
o Validation Dataset
 It is used to verify that the increase in the accuracy of the training
dataset is actually increased if we test the model with the data that is not
used in the training.
 If the accuracy over the training dataset increases while the accuracy
over the validation dataset decreases, then this results in the case of high
variance i.e., overfitting.
o Test Dataset
 Most of the time when we try to make changes to the model based upon
the output of the validation set then unintentionally, we make the model
peek into our validation set and as a result, our model might get overfit
on the validation set as well.
 To overcome this issue, we have a test dataset that is only used to test the
final output of the model in order to confirm the accuracy.
11
Contd…
How to get the datasets / Popular sources for ML dataset
 Kaggle Dataset
 UCI Machine Learning Repository
 Datasets via AWS
 Google's Dataset Search Engine
 Microsoft Dataset
 Awesome Public Dataset Collection
 Government Datasets
 Computer Vision Datasets
 Scikit-learn dataset
12
Machine learning-
data preprocessing
 Definition: Data pre-processing is a process of preparing the raw data
and making it suitable for a machine learning model.
 Significance
 A real-world data contains noises, missing values, and maybe in an
unusable format which cannot be directly used for machine learning
models.
 Data pre-processing is required tasks for cleaning the data and
making it suitable for a machine learning model.
 Steps
Getting the dataset
 The data is usually put in CSV file ("Comma-Separated Values"
files; it is a file format which allows us to save the tabular data, such
as spreadsheets. It is useful for huge datasets and can use these
datasets in programs).
13
CONTD…
Importing libraries
oNumpy: used for including any type of mathematical operation in the
code.
oMatplotlib: used to plot any type of charts in Python for the code.
oPandas: used for importing and managing the datasets. It is an open-
source data manipulation and analysis library.
Importing datasets
read_csv() function: used to read a csv file
To distinguish the matrix of features (independent variables) and
dependent variables from dataset - iloc[ ] method is used to extract
the required rows and columns from the dataset.
To extract dependent variables, again, we will use Pandas.iloc[]
method.
14
Contd…
 Handling Missing Data

o By deleting the particular row: delete the specific row or
column which consists of null values. But this way is not so
efficient and removing data may lead to loss of information
which will not give the accurate output.
o By calculating the mean: In this way, we will calculate the
mean of that column or row which contains any missing value
and will put it on the place of missing value. This strategy is
useful for the features which have numeric data such as age,
salary, year, etc. Here, we will use this approach.
o To handle missing values, we will use Scikit-learn library in
our code, which contains various libraries for building
machine learning models. Here we will use Imputer class
of sklearn.preprocessing library.
15
Contd…
 Encoding Categorical Data
o LabelEncoder() class from preprocessing library is used for
encoding the variables into digits.
o Categorical variables usually have strings for their values. Many
machine learning algorithms do not support string values for the input
variables. Therefore, we need to replace these string values with
numbers. This process is called categorical variable encoding.
o Types of encoding:

One-hot encoding

Dummy encoding
 One-hot encoding
• In one-hot encoding, we create a new set of dummy (binary) variables

that is equal to the number of categories (k) in the variable.
16
Contd…
 For example, let’s say we have a categorical variable Color with
three categories called “Red”, “Green” and “Blue”, we need to
use three dummy variables to encode this variable using one-hot
encoding. A dummy (binary) variable just takes the value 0 or 1
to indicate the exclusion or inclusion of a category.
• In one-hot encoding,

“Red” color is encoded as [1 0 0] vector of size 3.

“Green” color is encoded as [0 1 0] vector of size 3.

“Blue” color is encoded as [0 0 1] vector of size 3.
17
Contd…
 Dummy encoding
 Dummy encoding also uses dummy (binary) variables. Instead of creating
a number of dummy variables that is equal to the number of categories
(k) in the variable, dummy encoding uses k-1 dummy variables.
 To encode the same Color variable with three categories using the
dummy encoding, we need to use only two dummy variables
 In dummy encoding, “Red” color is encoded as [1 0] vector of size 2,

“Green” color is encoded as [0 1] vector of size 2, “Blue” color is encoded
as [0 0] vector of size 2.
 Dummy encoding removes a duplicate category present in the one-hot
encoding.
18
Contd…
 Splitting dataset into training, validation and test set
 Feature scaling- working with outliers

 Feature scaling is the final step of data pre-processing in
machine learning.
 It is a technique to standardize the independent variables of
the dataset in a specific range.
 In feature scaling, we put our variables in the same range and
in the same scale so that no any variable dominates the other
variable.
19
Feature selection techniques in
Machine Learning
Artificial Intelligence 20
Feature selection
• A feature is an attribute that has an impact on a problem
or is useful for the problem, and choosing the important
features for the model is known as feature selection.
• Definition: Feature selection is a way of selecting the
subset of the most relevant features from the original
features set by removing the redundant, irrelevant, or noisy
features.
• Significance of Feature Selection:
 It helps in avoiding the curse of dimensionality.
 It helps in the simplification of the model so that it can
be easily interpreted by the researchers.
 It reduces the training time.
 It reduces overfitting hence enhance the generalization.
21
feature selection techniques
22
Supervised feature selection
techniques
• Wrapper Methods
 In wrapper methodology, selection of features is done by
considering it as a search problem, in which different
combinations are made, evaluated, and compared with
other combinations.
 It trains the algorithm by using the subset of features
iteratively.
 On the basis of the output of the model, features are added
or subtracted, and with this feature set, the model has
trained again.
23
Contd…
• Filter Methods
 In Filter Method, features are selected on the basis of
statistics measures. This method does not depend on the
learning algorithm and chooses the features as a pre-
processing step.
 The filter method filters out the irrelevant feature and
redundant columns from the model by using different
metrics through ranking.
 The advantage of using filter methods is that it needs low
computational time and does not overfit the data.
24
Contd…
• Embedded Methods
 Embedded methods combined the advantages of both filter
and wrapper methods by considering the interaction of
features along with low computational cost. These are fast
processing methods similar to the filter method but more
accurate than the filter method.
 These methods are also iterative, which evaluates each
iteration, and optimally finds the most important features
that contribute the most to training in a particular
iteration.
25
Feature engineering for machine
learning
• Feature engineering is the pre-processing step of
machine learning, which extracts features from raw data.
• Feature engineering in ML contains mainly four processes:
 Feature Creation: finding the most useful variables to be
used in a predictive model.,
 Transformations: This step of feature engineering
involves adjusting the predictor variable to improve the
accuracy and performance of the model.
 Feature Extraction: Is an automated feature engineering
process that generates new variables by extracting them
from the raw data
 Feature Selection: Is a way of selecting the subset of the
most relevant features from the original features set by
removing the redundant, irrelevant, or noisy features
26
Feature engineering techniques
for ML
• Imputation: Imputation is responsible for handling irregularities
within the dataset.
• Handling Outliers: Standard deviation can be used to identify
the outliers. Z-score can also be used to detect outliers.
• Log Transform: helps in handling the skewed data, and it makes
the distribution more approximate to normal after
transformation.
• Binning: used to normalize the noisy data.
• Feature Split: is the process of splitting features intimately into
two or more parts and performing to make new features.
• One hot encoding: It is a technique that converts the categorical
data in a form so that they can be easily understood by machine
learning algorithms and hence can make a good prediction.
27
Machine learning - types
28
Supervised learning
• Supervised learning is the types of machine learning in which
machines are trained using well "labelled" training data,
and on basis of that data, machines predict the output. The
labelled data means some input data is already tagged with
the correct output.
29
Types of Supervised learning
Regression Algorithms
•Are used if there is a relationship between the input variable and the
output variable. Example: Weather forecasting, Market Trends, etc.
•Regression algorithms under supervised learning: Linear
Regression, Non-Linear Regression, Polynomial Regression, Ridge
Regression and Lasso Regression.
Classification Algorithms
•Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-No, Male-
Female, True-false, etc. Example: Spam Filtering.
•Classification algorithms under supervised learning: Random Forest,
Decision Trees, Logistic Regression, Support vector Machines
30
Important Terminologies
 Dependent Variable: The main factor in Regression analysis which
we want to predict or understand is called the dependent variable. It
is also called target variable.
 Independent Variable: The factors which affect the dependent
variables or which are used to predict the values of the dependent
variables are called independent variable, also called as a predictor.
 Outliers: Outlier is an observation which contains either very low
value or very high value in comparison to other observed values. An
outlier may hamper the result, so it should be avoided.
 Multicollinearity: If the independent variables are highly
correlated with each other than other variables, then such condition
is called Multicollinearity.
 Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is
called Overfitting.
 Underfitting: If our algorithm does not perform well even with
training dataset, then such problem is called underfitting.
31
Unsupervised Machine Learning
 Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed to act on
that data without any supervision.
 The goal of unsupervised learning is to find the underlying structure

of dataset, group that data according to similarities, and represent
that dataset in a compressed format.
32
Unsupervised Machine Learning -
Types
 Types:
 Clustering: Clustering is a method of grouping the objects into clusters
such that objects with most similarities remains into a group and has
less or no similarities with the objects of another group.
 Association: An association rule is an unsupervised learning method
which is used for finding the relationships between variables in the
large database. It determines the set of items that occurs together in
the dataset.
 Unsupervised learning algorithms: K-means clustering, Hierarchal
clustering, Anomaly detection, Neural Networks, Principle Component
Analysis, Apriori algorithm
 Advantage of Unsupervised Learning: “Preferable” as it is easy to get
unlabeled data in comparison to labeled data.
 Disadvantages of Unsupervised Learning: The result might be less
accurate as input data is not labeled, and algorithms do not know the exact
output in advance.
33
Semi-Supervised Learning
• Semi-Supervised learning is a type of Machine Learning
algorithm that lies between Supervised and Unsupervised
machine learning.
• The main aim of semi-supervised learning is to effectively use
all the available data, rather than only labelled data like in
supervised learning.
• Advantages: It is highly efficient and is used to solve
drawbacks of Supervised and Unsupervised Learning
algorithms.
• Disadvantages
 Iterations results may not be stable.
 We cannot apply these algorithms to network-level data.
 Accuracy is low.
34
Reinforcement Learning
• Reinforcement learning works on a feedback-based process, in
which an AI agent (A software component) automatically explore
its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance.
• Agent gets rewarded for each good action and get punished for
each bad action; hence the goal of reinforcement learning agent is
to maximize the rewards.
• In reinforcement learning, there is no labelled data like supervised
learning, and agents learn from their experiences only.
• Due to its way of working, reinforcement learning is employed in
different fields such as Game theory, Operation Research,
Information theory, multi-agent systems.
• A reinforcement learning problem can be formalized using Markov
Decision Process(MDP).
35
Reinforcement Learning
• Categories of Reinforcement Learning
 Positive Reinforcement Learning: Specifies increasing the tendency
that the required behavior would occur again by adding something.
 Negative Reinforcement Learning: It increases the tendency that the
specific behavior would occur again by avoiding the negative condition.
• Applications: Robotics, Text Mining, Resource Management, Video Games.
• Advantages
 The learning model of RL is similar to the learning of human beings;
hence most accurate results can be found.
 Helps in achieving long term results.
• Disadvantages
 RL algorithms require huge data and computations.
 Too much reinforcement learning can lead to an overload of states which
can weaken the results.
36
Linear Regression analysis
• It is a statistical method that is used for predictive analysis.
• Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between
a dependent (y) and one or more independent (y) variables,
hence called as linear regression.
• Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
37
Linear Regression analysis
• Types of Linear Regression
 Simple Linear Regression: If a single independent variable is used
to predict the value of a numerical dependent variable, then such a
Linear Regression algorithm is called Simple Linear Regression.
 Multiple Linear regression: If more than one independent variable
is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Multiple Linear
Regression.
• Model Performance: R-squared method:
 R-squared is a statistical method that determines the goodness of fit.
 The high value of R-square determines the less difference between
the predicted values and actual values and hence represents a good
model.
 It can be calculated from the below formula:
38
Simple Linear regression
• Models the relationship between a dependent variable and a single independent
variable. The relationship shown by a Simple Linear Regression model is linear
or a sloped straight line.
• Simple Linear regression algorithm has mainly two objectives:
 Model the relationship between the two variables. Eg: Income and expenditure, experience and
Salary, etc.
 Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year,
etc.
• The Simple Linear Regression model can be represented using the below
equation:
y= a0+a1x+ ε
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which is either increasing or
decreasing.
ε = The error term. (For a good model it will be negligible)
39
Multiple linear regression
• Multiple Linear Regression is one of the important regression
algorithms which models the linear relationship between a
single dependent continuous variable and more than one
independent variable.
• For MLR, the dependent or target variable(Y) must be the
continuous/real, but the predictor or independent variable may
be of continuous or categorical form.
• Each feature variable must model the linear relationship with
the dependent variable.
• MLR tries to fit a regression line through a multidimensional
space of data-points.
• Example: Prediction of CO2 emission based on engine size and
number of cylinders in a car.
40
Multiple linear regression
• MLR equation:
o In Multiple Linear Regression, the target variable(Y) is a linear
combination of multiple predictor variables x 1, x2, x3, ...,xn.
Y= b0+b1x1+ b2x2+ b3x3+...... bnxn
where, Y= Output/Response variable, b 0, b1, b2, b3 , bn....= Coefficients of
the model, x1, x2, x3, x4,...= Various Independent/feature variable
• Assumptions for Multiple Linear Regression:

 A linear relationship should exist between the Target and predictor
variables.
 The regression residuals must be normally distributed.
 MLR assumes little or no multicollinearity (correlation between
the independent variable) in data.
41
Evaluation metrics for regression
model
• In regression problems, the prediction error is used to define the
model performance. The prediction error is also referred to as
residuals and it is defined as the difference between the actual and
predicted values.
• Residuals are important when determining the quality of a model.
• Residual = actual value — predicted value
error(e) = y — ŷ
• We can technically inspect all residuals to judge the model’s
accuracy, but this does not scale if we have thousands or millions of
data points. That’s why we have summary measurements that take
our collection of residuals and condense them into a single value
representing our model's predictive ability.
42
model - Mean Absolute Error (MAE)
• It is the average of the absolute differences between the actual
value and the model’s predicted value.
where, N = total number of data points, Yi = actual value, Ŷi =

predicted value.
• A small MAE suggests the model is great at prediction, while a
large MAE suggests that your model may have trouble in certain
areas. MAE of 0 means that your model is a perfect predictor of the
outputs.
• Advantages of MAE: It is most Robust to outliers.
• Disadvantages of MAE: The graph of MAE is not differentiable so
we have to apply various optimizers like Gradient descent which
can be differentiable.
43
model – Mean Squared Error
• It is the average of the squared differences between the actual and
the predicted values. Lower the value, the better the regression
model.
where, n = total number of data points, yi = actual value, ŷi =

predicted value
• If you have outliers in the dataset then it penalizes the outliers most
and the calculated MSE is bigger.
• Advantages of MSE - The graph of MSE is differentiable, so you
can easily use it as a loss function.
• Disadvantages of MSE - If you have outliers in the dataset then it
penalizes the outliers most and the calculated MSE is bigger. So, in
short, It is not Robust to outliers which were an advantage in MAE.
44
model – Route Mean Squared Error
• It is the average root-squared difference between the real value and
the predicted value.
• lower the RMSE value, the better the model is with its predictions.
• A Higher RMSE indicates that there are large deviations between the
predicted and actual value.
where, n = total number of data points, yj = actual value, ŷj=

predicted value
• Advantages of RMSE: The output value is in the same unit as the
required output variable which makes interpretation of loss easy.
• Disadvantages of RMSE: It is not that robust to outliers as
compared to MAE.
45
model – R Squared
• R2 score is a metric that tells the performance of your model,
not the loss in an absolute sense that how many wells did your
model perform.
• So, with help of R squared we have a baseline model to
compare a model which none of the other metrics provides.
• The same we have in classification problems which we call a
threshold which is fixed at 0.5. So basically R2 squared
calculates how must regression line is better than a mean line.
46
model – R Squared
• Now, how will you interpret the R2 score? suppose If the R2 score is
zero then the above regression line by mean line is equal means 1 so 1-
1 is zero.
• So, in this case, both lines are overlapping means model performance is
worst, It is not capable to take advantage of the output column.
• Now the second case is when the R2 score is 1, it means when the
division term is zero and it will happen when the regression line does
not make any mistake, it is perfect. In the real world, it is not possible.
• So we can conclude that as our regression line moves towards
perfection, R2 score move towards one. And the model performance
improves.
• The normal case is when the R2 score is between zero and one like 0.8
which means your model is capable to explain 80 per cent of the
variance of data.
47
model – Adjusted R Squared
• The disadvantage of the R2 score is while adding new features
in data the R2 score starts increasing or remains constant but it
never decreases because It assumes that while adding more
data variance of data increases.
• But the problem is when we add an irrelevant feature in the
dataset then at that time R2 sometimes starts increasing which
is incorrect.
• Hence, To control this situation Adjusted R Squared came into
existence.
48
model – Adjusted R Squared
• Now as K increases by adding some features so the
denominator will decrease, n-1 will remain constant.
• R2 score will remain constant or will increase slightly so the
complete answer will increase and when we subtract this from
one then the resultant score will decrease.
• So, this is the case when we add an irrelevant feature in the
dataset.
• And if we add a relevant feature then the R2 score will increase
and 1-R2 will decrease heavily and the denominator will also
decrease so the complete term decreases, and on subtracting
from one the score increases.
49
Thank you
Artificial Intelligence 50

Module 2 - PART 1

Uploaded by

Copyright:

Available Formats

Module 2 - PART 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 2 - PART 1

Uploaded by

Copyright:

Available Formats

PRESIDENCY UNIVERSITY

 Handling Missing Data

• In one-hot encoding, we create a new set of dummy (binary) variables

 In dummy encoding, “Red” color is encoded as [1 0] vector of size 2,

 Feature scaling- working with outliers

 The goal of unsupervised learning is to find the underlying structure

• Assumptions for Multiple Linear Regression:

where, N = total number of data points, Yi = actual value, Ŷi =

where, n = total number of data points, yi = actual value, ŷi =

where, n = total number of data points, yj = actual value, ŷj=

You might also like