Lecture 07 KNN 14112022 034756pm

K-NEAREST NEIGHBOR
LECTURER:
Humera Farooq, Ph.D.
Computer Sciences Department,
Bahria University (Karachi Campus)
Outline
2
• Introduction
• Voronoi Diagram
• K-NN Working
• Distance Matric
• K for Neighbors
• Curse-of-Dimensionality
K-Nearest Neighbor
 Two classes , Two features
 label a new point the same as the closest known
point
 Label should be red
Label is red.
What is KNN
4
• Similar labels for similar features.

• classify new test point using similar training data points.
 Given some new example x for which we need to predict
its class y
 Find most similar training examples
 Classify x “like” these most similar examples

kNN
 How to determine similarity?
 How many similar training examples to consider?
 How to resolve inconsistencies among the training examples?
 A powerful classification algorithm used in pattern recognition.
 K nearest neighbors stores all available examples and classifies new example based
on a similarity measure (e.g distance function)
 A non-parametric lazy learning algorithm (An Instance- based Learning method).

k – Nearest Neighbor
 Generalizes 1-NN to smooth away noise in the
labels
 A new point is now assigned the most frequent
label of its k nearest neighbors
Label is red, when k = 3
Label is brown, when k = 7

KNN Example
Food Chat Fast Price Bar BigTip
(3) (2) (2) (3) (2)
1 great yes yes normal no yes
2 great no yes normal no yes
3 mediocre yes no high no no
4 great yes yes normal yes yes
Similarity metric: Number of matching attributes (k=2)

New examples:
 Example 1 (great, no, no, normal, no) Yes
 most similar: number 2 (1 mismatch, 4 match)  yes

Second most similar example: number 1 (2 mismatch, 3 match)  yes
 Example 2 (mediocre, yes, no, normal, no) Yes/No

 Most similar: number 3 (1 mismatch, 4 match)  no
Second most similar example: number 1 (2 mismatch, 3 match)  yes
K-Nearest Neighbor
 An arbitrary instance is represented by (a1(x),
a2(x), a3(x),.., an(x))
 ai(x) denotes features
 Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)
 Continuous valued target function
 mean value of the k nearest training examples
KNN
• K-nearest neighbours uses the local neighborhood to obtain a
prediction
• The K memorized examples more similar to the one that is being

classified are retrieved
•
• A distance function is needed to compare the examples similarity
• This means that if we change the distance function, we change how

examples are classified
Voronoi Diagram
 KNN define a region known as decision boundary. Decision surface formed by the training
examples. This division of feature space is known as Voroni partitioning.

K-NN Working
 No assumptions about the distribution of the data

 Non-parametric algorithm (No parameters)
 Hyper-Parameters
 k (number of neighbors)
 Distance metric (to quantify similarity)
 Complexity (both time and storage) of prediction increases with the
size of training data.
 Can also be used for regression (average or inverse distance
weighted average)
 If the ranges of the features differ, feaures with bigger values will
dominate decision
 In general feature values are normalized prior to distance calculation
Distance Matric
 Euclidian Distance : For continuous data
 Hamming Distance : For categorical data
 Manhattan Distance/ City Block : Absolute value of

differences, For continuous data
 Minkowski : For continuous data
If q = 2 then Euclidian , if 1 then Manhattan
 Cosine : not able to satisfy the decision boundary, used for
calculating the angular distance between vectors, follow the
dot product. Or
Selecting the K for Neighbors
 Increase k
 Makes KNN less sensitive to noise , high variance
 Alternatively increasing K value will reduce the noise
 Decrease k:
 Allows capturing finer structure of space
 Pick k not too large, but not too small (depends on data)
 For binary classification problem, use odd value of k.
 Reason: Chances of Tie
 Missing Data
 Average value of the feature
 Use cross validation

Curse-of-Dimensionality
 Prediction accuracy can quickly degrade when
number of attributes grows.
 Irrelevant attributes easily “swamp” information from
relevant attributes
 When many irrelevant attributes, similarity/distance
measure becomes less reliable
 Remedy
 Try to remove irrelevant attributes in pre-processing step
 Weight attributes differently
 Increase k (but not too much)
Advantages
1. No Training Period: KNN is called Lazy Learner (Instance based learning). It
does not learn anything in the training period. It does not derive any discriminative
function from the training data. It stores the training dataset and learns from it only
at the time of making real time predictions. This makes the KNN algorithm much
faster than other algorithms that require training e.g. Linear Regression etc.
2. Since the KNN algorithm requires no training before making predictions, new

data can be added seamlessly which will not impact the accuracy of the
algorithm.
3. KNN is very easy to implement. There are only two parameters required to
implement KNN i.e. the value of K and the distance function (e.g. Euclidean or
Manhattan etc.)
Dis advantages
1. Does not work well with large dataset: In large datasets, the cost of
calculating the distance between the new point and each existing points is
huge which degrades the performance of the algorithm.
2. Does not work well with high dimensions: The KNN algorithm doesn't work
well with high dimensional data because with large number of dimensions, it
becomes difficult for the algorithm to calculate the distance in each dimension.
3. Need feature scaling: We need to do feature scaling (standardization and
normalization) before applying KNN algorithm to any dataset. If we don't do so,
KNN may generate wrong predictions.
4. Sensitive to noisy data, missing values and outliers: KNN is sensitive to noise in
the dataset. We need to manually impute missing values and remove outliers.
FAST kNN
 kNN Computational complexity: high
 How to make it faster?
 Dimensionality Reduction
 Feature Selection
 PCA
 Use efficient method to find nearest neighbors
 KD Tree
KD Tree
 k-Dimensional tree
 Extended version of binary search tree in higher dimension
 Pick the splitting dimension

 Randomly
 Large variance dimension
 Pick the middle value of the feature along the selected

dimension after sorting along that dimension
 Use this value as the root node and construct a binary tree and
keep going
Example
 This dataset is about was taken from the UCI Repository. In this dataset, we have 3 attributes which have
sepal length, sepal width, and species. Species have a target attribute. In target attribute, we have three
species (Setosa, Virginia, and Versicolor) and our target finds the nearest species which belong from three
species using the k-Nearest Neighbors.
Target: New flower found, need to classify

“Unlabeled”.
Feature of the new unlabeled flower:
Sepal Length Sepal Species

Width
5.2 3.1 ??
Example
Solution :
First step is to find distance using the Euclidean distance between Actual and
Observed sepal length and sepal width. For the first instance dataset
X = Observed sepal length=5.2
Y = Observed sepal width=3.1
Now Actual which is given in the dataset
A = Actual sepal length = 5.3
B = Actual sepal width=3.7
Distance Formula =
Distance (Sepal Length, Sepal Width = = 0.608
Sepal Sepal Species Distance

Length Width
5.3 3.7 Setosa 0.608
Example
Arrange in Ascending order according to the
distance .
We can assign ranking based on the minimum
distance :
Rest All
Example
New Example (K values)
For K =1 , it is in rank = 1 (Feature Species is Setosa so K=1 is Setosa)
For K = 2 , it is rank = 1 and 2
For K = 5
Sepal Length Sepal Species

Width
5.2 3.1 ??
Example
REFERENCES
Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
https://medium.com/machine-learning-researcher/k-nearest-neighbors-in-machine-learning-
e794014abd2a

Lecture 07 KNN 14112022 034756pm

Uploaded by

Copyright:

Available Formats

Lecture 07 KNN 14112022 034756pm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 07 KNN 14112022 034756pm

Uploaded by

Copyright:

Available Formats

K-NEAREST NEIGHBOR

• Similar labels for similar features.

 Find most similar training examples

 Classify x “like” these most similar examples

 A powerful classification algorithm used in pattern recognition.

 A non-parametric lazy learning algorithm (An Instance- based Learning method).

Label is red, when k = 3

Label is brown, when k = 7

Similarity metric: Number of matching attributes (k=2)

 Example 1 (great, no, no, normal, no) Yes

 most similar: number 2 (1 mismatch, 4 match)  yes

 Example 2 (mediocre, yes, no, normal, no) Yes/No

• The K memorized examples more similar to the one that is being

• This means that if we change the distance function, we change how

examples. This division of feature space is known as Voroni partitioning.

 No assumptions about the distribution of the data

 Hamming Distance : For categorical data

 Manhattan Distance/ City Block : Absolute value of

 Alternatively increasing K value will reduce the noise

 Use cross validation

2. Since the KNN algorithm requires no training before making predictions, new

 Pick the splitting dimension

 Large variance dimension

 Pick the middle value of the feature along the selected

Target: New flower found, need to classify

Sepal Length Sepal Species

Sepal Sepal Species Distance

For K =1 , it is in rank = 1 (Feature Species is Setosa so K=1 is Setosa)

For K = 2 , it is rank = 1 and 2

Sepal Length Sepal Species

You might also like