ML Unit V
ML Unit V
ML Unit V
Instance-based Learning
Introduction
Instance-based learning methods simply store the training examples instead of learning explicit description of
the target function.
– Generalizing the examples is postponed until a new instance must be classified.
– When a new instance is encountered, its relationship to the stored examples is examined in order to assign a
target function value for the new instance.
• Instance-based learning includes nearest neighbor, locally weighted regression, and case-based reasoning
methods.
• Instance-based methods are sometimes referred to as lazy learning methods because they delay processing
until a new instance must be classified.
• A key advantage of lazy learning is that instead of estimating the target function once for the entire instance
space, these methods can estimate it locally and differently for each new instance to be classified.
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and put the new
case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the new data set to the cats and
dogs’ images and based on the most similar features it will put it in either cat or dog category.
o k-Nearest Neighbor Learning
Suppose we have a new data point and we need to put it in the required category. Consider the below
image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance, we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:
o Here, the 3 nearest neighbors are from category A, hence this new data point must belong to category
A.
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for all the
training samples.
Locally Weighted Regression (LWR):
Locally weighted regression is a supervised learning algorithm. It is a non-parametric algorithm, that is, the
model does not learn a fixed set of parameters as is done in ordinary linear regression. Rather parameters θ are
computed individually for each query point x. While computing θ, a higher “preference” is given to the points
in the training set lying in the vicinity of x than the points lying far away from x.
The modified cost function is:
m
J (θ)=∑ w(i) (θT x (i )− y (i) )2
i=1
❑ ❑
2 −3.0 −5.0
w =exp ( )=0.000335
2(0.5)2
So, J(θ) = 0.9802*(θTx1 - y1) + 0.000335*(θTx2 - y2)
Thus, the weights fall exponentially as the distance between x and x(i) increases and so does the contribution of
error in prediction for x(i) to the cost. Consequently, while computing θ, and focus more on reducing (θTx(i) -
y(i))2 for the points lying closer to the query point (having larger value of w(i)).
Radial basis functions (RBF):
Radial Basis Functions (RBF) are real-valued functions that use supervised machine learning (ML) to perform
as a non-linear classifier. Its value depends on the distance between the input and a certain fixed point. Radial
basis functions provide ways to approximate multivariable functions by using linear combinations of terms
that are based on a single univariate function. The Gaussian function is the most common used radial basis
function.
The radial basis function for a neuron consists of a centre and a radius (also called the spread). The radius may
vary between different neurons. In DTREG-generated RBF networks, each dimension's radius can differ.
A Radial basis function works by defining itself by the distance from its origin. This is done by incorporating
the absolute value of the function. Absolute values are defined as the value without its associated sign (positive
or negative). For example, the absolute value of -4, is 4. Accordingly, the radial basis function is a function in
which its values are defined as:
φ(x)=φ (||X||)
The Gaussian variation of the Radial Basis Function, often applied in Radial Basis Function Networks, is a
popular alternative. The formula for a Gaussian with a one-dimensional input is:
The Gaussian function can be plotted out with various values for Beta:
An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms
A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who
creates case law.
An engineer copying working elements of nature (practicing biomimicry), is treating nature as a
database of solutions to problems.
Customer support, help-desk systems: diagnosis and therapy of customer’s problems, medical
diagnosis
Product recommendation and configuration: e-commerce
Textual CBR: text classification, judicial applications (in particular in the countries where common
law (not civil law) is applied) [like USA, UK, India, Australia, many others]
Applicability also in ill-structured and bad understood application domains.
There are three main types of CBR that differ significantly from one another concerning case representation
and reasoning:
A case is represented through a list of questions that varies from one case to another;
knowledge is contained in customer / agent conversations.
Architecture of CBR/ CBR Cycle:
CBR Cycle:
Despite the many different appearances of CBR systems, the essentials of CBR are captured in a surprisingly
simple and uniform process model.
• The CBR cycle is proposed by Aamodt and Plaza.
• The CBR cycle consists of 4 sequential steps around the knowledge of the CBR system.
• RETRIEVE
• REUSE
• REVIZE
• RETAIN
RETRIEVE:
• One or several cases from the case base are selected, based on the modeled similarity.
• The retrieval task is defined as finding a small number of cases from the case-base with the
highest similarity to the query.
• This is a k-nearest-neighbor retrieval task considering a specific similarity function.
• When the case base grows, the efficiency of retrieval decreases => methods that improve retrieval
efficiency,
e.g. specific index structures such as kd-trees, case-retrieval nets, or discrimination networks.
REUSE:
• Reusing a retrieved solution can be quite simple if the solution is returned unchanged as the
proposed solution for the new problem.
• Adaptation (if required, e.g. for synthetic tasks).
• Several techniques for adaptation in CBR
- Transformationaladaptation
- Generative adaptation
• Most practical CBR applications today try to avoid extensive adaptation for pragmatic reasons.
REVISE:
• In this phase, feedback related to the solution constructed so far is obtained.
• This feedback can be given in the form of a correctness rating of the result or in the form of a
manually corrected revised case.
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent
retain phase.
• Retain
• The retain phase is the learning phase of a CBR system (adding a revised case to the case base).
• Explicit competence models have been developed that enable the selective retention of cases
(because of the continuous increase of the case-base).
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent
retain phase.
RETAIN:
• The retain phase is the learning phase of a CBR system (adding a revised case to the case base).
• Explicit competence models have been developed that enable the selective retention of cases
(because of the continuous increase of the case-base).
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent
retain phase.
Eager Learning:
The main advantage gained in employing an eager learning method, such as an artificial neural
network, is that the target function will be approximated globally during training, thus requiring
much less space than using a lazy learning system.
Eager learning systems also deal much better with noise in the training data.
Eager learning is an example of offline learning, in which post-training queries to the system have
no effect on the system itself, and thus the same query to the system will always produce the same
result.
Lazy Learning:
Instance based learning includes nearest neighbour and locally weighted regression methods that
assume instances can be represented as points in a Euclidean space. It also includes case-based
reasoning methods that use more complex, symbolic representations for instances.
Instance-based methods which are also referred as "lazy" learning methods because they delay
processing until a new instance must be classified. The lazy learner can create many local
approximations.
A key advantage of this kind of delayed, or lazy, learning is that instead of estimating the target
function once for the entire instance space, these methods can estimate it locally and differently for
each new instance to be classified.