Classification
Classification
Classification
The Classification algorithm is a Supervised Learning technique that is used to identify the category of new
observations on the basis of training data. In Classification, a program learns from the given dataset or observations and
then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc. Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or
animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data,
which means it contains input with the corresponding output.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
y=f(x), where y = categorical output
The algorithm which implements the classification on a dataset is known as a classifier. There are two types of
Classifications:
•Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
•Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Classification Algorithms can be further divided into the Mainly two category:
•Linear Models Logistic Regression, Support Vector Machines
•Non-linear Models K-Nearest Neighbours, Kernel SVM, Naïve Bayes, Decision Tree Classification, Random Forest
Classification
LOGISTIC REGRESSION
•Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for
predicting the categorical dependent variable using a given set of independent variables.
•Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can
be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
•Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression
problems, whereas Logistic regression is used for solving the classification problems.
•In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).
•The curve from the logistic function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not
based on its weight, etc.
•Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using
continuous and discrete datasets.
•Logistic Regression can be used to classify the observations using different types of data and can easily determine the most effective variables
used for the classification.
K-Nearest Neighbor(KNN)
•K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the
category that is most similar to the available categories.
•K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new
data appears then it can be easily classified into a well suite category by using K- NN algorithm.
•KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a
category that is much similar to the new data.
•Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat
or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will
find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in
either cat or dog category.
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
•Step-1: Select the number K of the neighbors
•Step-2: Calculate the Euclidean distance of K number of neighbors
•Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
•Step-4: Among these k neighbors, count the number of the data points in each category.
•Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
•Step-6: Our model is ready.
Naïve Bayes Classifier
•Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification
problems.
•It is mainly used in text classification that includes a high-dimensional training dataset.
•Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast
machine learning models that can make quick predictions.
•It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
•Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles.
Bayes’ Theorem is a fundamental concept in probability theory and statistics. It describes the probability of an event based on
prior knowledge of conditions that might be related to the event. The
DECISION TREE CLASSIFICATION
In classification, decision trees are utilized to classify instances into different classes or categories based on input features.
ENTROPY
A decision tree is built top-down from a root node and involves partitioning
the data into subsets that contain instances with similar values
(homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a
sample. If the sample is completely homogeneous the entropy is zero and if
the sample is an equally divided it has entropy of one.
GINI INDEX
Gini index is a measure of impurity or purity used while
creating a decision tree.An attribute with the low Gini index
should be preferred as compared to the high Gini index.It only
creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
INFORMATION GAIN
The information gain is based on the decrease in entropy after
a dataset is split on an attribute. Constructing a decision tree is
all about finding attribute that returns the highest information
gain (i.e., the most homogeneous branches).
SUPPORT VECTOR MACHINE
Support Vector Machines (SVMs) work by finding the optimal hyperplane that
best separates the classes in the feature space. Here's a step-by-step overview
of how SVM works:
1.Data Representation: SVM starts with a dataset consisting of labeled
examples, where each example is represented by a set of features and belongs
to one of two classes (for binary classification).
2.Feature Space: SVM maps the input data into a higher-dimensional space
using a technique called the kernel trick. This is done to transform the data in a
way that makes it easier to separate the classes using a hyperplane.
3.Finding the Hyperplane: SVM then aims to find the hyperplane that best
separates the classes in this higher-dimensional space. The optimal hyperplane
is the one that maximizes the margin, which is the distance between the
hyperplane and the nearest data points (support vectors) from each class. The
margin represents the level of confidence in the classification.
4.Margin Optimization: SVM identifies the support vectors, which are the data
points that lie closest to the hyperplane. It then optimizes the margin by finding
the hyperplane that maximizes the distance between the support vectors and
the hyperplane. This is typically done by solving a constrained optimization
problem.
5.Handling Non-Linear Data: In cases where the data is not linearly separable,
SVM uses kernel functions to map the data into a higher-dimensional space
where it becomes separable. Common kernel functions include linear,
polynomial, radial basis function (RBF), and sigmoid kernels. The choice of
kernel function depends on the nature of the data and the problem at hand.
6.Classification: Once the optimal hyperplane is determined, SVM can classify
new data points by evaluating which side of the hyperplane they fall on. If a
data point lies on one side of the hyperplane, it is classified as belonging to one
class, and if it lies on the other side, it is classified as belonging to the other
class.