1694600817-Unit2.3 KNN CU 2.0
1694600817-Unit2.3 KNN CU 2.0
1694600817-Unit2.3 KNN CU 2.0
Unit 2.3
K-Nearest Neighbours
Reference
K-Nearest Neighbours
Disclaimer
The content is curated from online/offline resources and used for educational purpose only
K-Nearest Neighbours
K-Nearest Neighbours
KNN
K-Nearest Neighbours
Learning Objectives
You will learn in this lesson:
• Concept of Neighbour
• Data Similarity Measure
• Modelling
• Estimation of K Neighbours
• Model Inferencing
K-Nearest Neighbours
Reference
K-Nearest Neighbours
Introduction
• It is a supervised learning algorithm and easy to implement.
• It assumes the similarity between the new case/data and available cases and put the new case into the
category that is most similar to the available categories.
• It stores all the available data and classifies a new data point based on the similarity. This means when
new data appears then it can be easily classified into a well suite category.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training set immediately instead
it stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.
• used for Regression as well as for Classification but mostly it is used for the Classification problems.
• Major challenge is to identify value K
K-Nearest Neighbours
Introduction
• Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know
either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of the new data set to the cats and dogs' images and
based on the most similar features it will put it in either cat or dog category.
Reference
K-Nearest Neighbours
KNN Classifier
K-Nearest Neighbours
Telecom Dataset
K-Nearest Neighbours
Notion of Neighbourhood
K-Nearest Neighbours
Inference
• Now, the question is, “To what extent can we trust
our judgment, which is based on the first nearest
neighbor?”
• It might be a poor judgment, especially if the first
nearest neighbor is a very specific case, or an outlier
!
• What if we chose the five nearest neighbors, and did
a majority vote among them ?
Telecom Dataset
K-Nearest Neighbours
Decision Resolving
• Does this make more sense?
Yes !
• In this case, the value of K in the k-nearest
neighbours' algorithm is 5.
• This example highlights the intuition behind the k-
nearest neighbours' algorithm.
Telecom Dataset
K-Nearest Neighbours
Implementation Steps
𝑫ⅈ𝒔 𝒙, 𝒚 = 𝒙ⅈ − 𝒚 ⅈ 𝟐
ⅈ=𝟎
K-Nearest Neighbours
Value of K ?
• A low value of K causes a highly complex model,
which might result in over-fitting of the model.
• It means the prediction process is not generalized
enough to be used for out-of-sample cases.
Telecom Dataset
K-Nearest Neighbours
Optimizing K?
• So, how we can find the best value for K?
• Calculate the accuracy of the model by choosing K=1 using all samples in your test set.
• Repeat this process, increasing the K, and see which K is best for your model.
• In this example, K=4 gives the best accuracy.
K-Nearest Neighbours
Summary
• KNN is Supervised Machine Learning Algorithm
• We have seen Euclidean distance to find the similarities between data points
• Working of KNN with telecom customer dataset
• Choosing correct or Optimum value of K is important
K-Nearest Neighbours
Quiz
1) Which is the number of nearby neighbours to be used to classify the new record ?
a. KNN
b. Validation data
c. Euclidean Distance
d. All the above
a. KNN
K-Nearest Neighbours
Quiz
Quiz
a) Manhattan
b) Minkowski
c) Tanimoto
d) Euclidean
d). Euclidean
K-Nearest Neighbours
Quiz
4) K-NN algorithm does more computation on test time rather than train time.
a TRUE
b) FALSE
a) TRUE
K-Nearest Neighbours
Reference
• https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
• https://towardsdatascience.com/basic-probability-theory-and-statistics-3105ab637213
• https://www.analyticsvidhya.com
• https://www.researchgate.com
• https://www.towardsdatascience.com
• https://www.geeksforgeeks.org/k-nearest-neighbor-algorithm-in-python/
K-Nearest Neighbours
Thank you...!