ML Unit V

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

UNIT-V

Instance-based Learning
Introduction
Instance-based learning methods simply store the training examples instead of learning explicit description of
the target function.
– Generalizing the examples is postponed until a new instance must be classified.
– When a new instance is encountered, its relationship to the stored examples is examined in order to assign a
target function value for the new instance.
• Instance-based learning includes nearest neighbor, locally weighted regression, and case-based reasoning
methods.
• Instance-based methods are sometimes referred to as lazy learning methods because they delay processing
until a new instance must be classified.
• A key advantage of lazy learning is that instead of estimating the target function once for the entire instance
space, these methods can estimate it locally and differently for each new instance to be classified.

k-Nearest Neighbor Learning

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and put the new
case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the new data set to the cats and
dogs’ images and based on the most similar features it will put it in either cat or dog category.
o k-Nearest Neighbor Learning

k-Nearest Neighbor Algorithm

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
o Step-6: finally, the model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider the below
image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance, we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:

o Here, the 3 nearest neighbors are from category A, hence this new data point must belong to category
A.

Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for all the
training samples.
Locally Weighted Regression (LWR):
Locally weighted regression is a supervised learning algorithm. It is a non-parametric algorithm, that is, the
model does not learn a fixed set of parameters as is done in ordinary linear regression. Rather parameters θ are
computed individually for each query point x. While computing θ, a higher “preference” is given to the points
in the training set lying in the vicinity of x than the points lying far away from x.
The modified cost function is:
m
J (θ)=∑ w(i) (θT x (i )− y (i) )2
i=1

where, w(i) is a non-negative “weight” associated with training point x(i).


For x(i) is lying closer to the query point x, the value of w(i) is large, while for x(i) s lying far away from x the
value of w(i) is small.
A typical choice of w(i) is:
(i ) 2
i −x −x
w =exp ( 2
)

where, Ͳ is called the bandwidth parameter and controls the rate at which w(i) falls with distance from x
Clearly, if | x(i) - x| is small w(i) is close to 1 and if | x(i) - x| is large w(i) is close to 0.
Thus, the training-set-points lying closer to the query point x contribute more to the cost J(θ) than the points
lying far away from x.
For example –
Consider a query point x = 5.0 and let x1 and x2 be two points in the training set such that x1 = 4.9 and x2 = 3.0.
Using the formula
(i ) 2
i −x −x
w =exp ( 2
) with Ͳ=0.5;

❑ ❑
1 −4.9 −5.0
w =exp ( )=0.9802
2(0.5)2

❑ ❑
2 −3.0 −5.0
w =exp ( )=0.000335
2(0.5)2
So, J(θ) = 0.9802*(θTx1 - y1) + 0.000335*(θTx2 - y2)
Thus, the weights fall exponentially as the distance between x and x(i) increases and so does the contribution of
error in prediction for x(i) to the cost. Consequently, while computing θ, and focus more on reducing (θTx(i) -
y(i))2 for the points lying closer to the query point (having larger value of w(i)).
Radial basis functions (RBF):
Radial Basis Functions (RBF) are real-valued functions that use supervised machine learning (ML) to perform
as a non-linear classifier. Its value depends on the distance between the input and a certain fixed point. Radial
basis functions provide ways to approximate multivariable functions by using linear combinations of terms
that are based on a single univariate function. The Gaussian function is the most common used radial basis
function.
The radial basis function for a neuron consists of a centre and a radius (also called the spread). The radius may
vary between different neurons. In DTREG-generated RBF networks, each dimension's radius can differ.
A Radial basis function works by defining itself by the distance from its origin. This is done by incorporating
the absolute value of the function. Absolute values are defined as the value without its associated sign (positive
or negative). For example, the absolute value of -4, is 4. Accordingly, the radial basis function is a function in
which its values are defined as:
φ(x)=φ (||X||)
The Gaussian variation of the Radial Basis Function, often applied in Radial Basis Function Networks, is a
popular alternative. The formula for a Gaussian with a one-dimensional input is:

The Gaussian function can be plotted out with various values for Beta:

Radial Basis Function Example


Let us consider a fully trained Radial Basis Function Example.
A dataset has two-dimensional data points belonging to two separate classes. An RBF Network has been
trained with 20 RBF neurons on the said data set. We can mark the prototypes selected and view the category
one score on the input space. For viewing, we can draw a 3-D mesh or a contour plot.
The areas of highest and lowest category one score should be marked separately.
In the case of category one output node:
 All the weights for category 2 RBF neurons will be negative.
 All the weights for category 1 RBF neurons will be positive.
Finally, an approximation of the decision boundary can be plotted by computing the scores over a finite grid.

Case-Based Reasoning (CBR):


Case-based reasoning (CBR) is a problem-solving paradigm that is different from other major A.I.
approaches. A CBR system can be used in risk monitoring, financial markets, Défense, and marketing just
to name a few. CBR learns from past experiences to solve new problems. Rather than relying on a domain
expert to write the rules or make associations along generalized relationships between problem descriptors
and conclusions, a CBR system learns from previous experience in the same way a physician learns from
his patients.
 Case-Based reasoning (CBR), broadly construed, is the process of solving new problems based on
the solutions of similar past problems.
 It is an approach to model the way humans think to build intelligent systems.
 Case-based reasoning is a prominent kind of analogy making.
 CBR: Uses a database of problem solutions to solve new problems.
 Store symbolic description (tuples or cases)—not points in a Euclidean space
 Applications: Customer-service (product-related diagnosis), legal ruling.
 Case-Based Reasoning is a well-established research field that involves the investigation of
theoretical foundations, system development and practical application building of experience-based
problem solving with base line of by remembering the past experience.
 It can be classified as a sub-discipline of Artificial Intelligence
 learning process is based on analogy but not on deduction or induction
 best classified as supervised learning (recall the distinction between supervised, unsupervised
and reinforcement learning methods typically made in Machine Learning)
 Learning happens in a memory-based manner.
 Case – previously made and stored experience item
 Case-Base – core of every case – based problem solver - collection of cases
Everyday examples of CBR:

 An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms
 A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who
creates case law.
 An engineer copying working elements of nature (practicing biomimicry), is treating nature as a
database of solutions to problems.

Few commercially/industrially successful AI methods:

 Customer support, help-desk systems: diagnosis and therapy of customer’s problems, medical
diagnosis
 Product recommendation and configuration: e-commerce
 Textual CBR: text classification, judicial applications (in particular in the countries where common
law (not civil law) is applied) [like USA, UK, India, Australia, many others]
 Applicability also in ill-structured and bad understood application domains.
There are three main types of CBR that differ significantly from one another concerning case representation
and reasoning:

1. Structural (a common structured vocabulary, i.e. an ontology)


2. Textual (cases are represented as free text, i.e. strings)
3. Conversational

 A case is represented through a list of questions that varies from one case to another;
knowledge is contained in customer / agent conversations.
Architecture of CBR/ CBR Cycle:

CBR Cycle:
Despite the many different appearances of CBR systems, the essentials of CBR are captured in a surprisingly
simple and uniform process model.
• The CBR cycle is proposed by Aamodt and Plaza.
• The CBR cycle consists of 4 sequential steps around the knowledge of the CBR system.
• RETRIEVE
• REUSE
• REVIZE
• RETAIN
RETRIEVE:
• One or several cases from the case base are selected, based on the modeled similarity.
• The retrieval task is defined as finding a small number of cases from the case-base with the
highest similarity to the query.
• This is a k-nearest-neighbor retrieval task considering a specific similarity function.
• When the case base grows, the efficiency of retrieval decreases => methods that improve retrieval
efficiency,
e.g. specific index structures such as kd-trees, case-retrieval nets, or discrimination networks.
REUSE:
• Reusing a retrieved solution can be quite simple if the solution is returned unchanged as the
proposed solution for the new problem.
• Adaptation (if required, e.g. for synthetic tasks).
• Several techniques for adaptation in CBR
- Transformationaladaptation
- Generative adaptation
• Most practical CBR applications today try to avoid extensive adaptation for pragmatic reasons.
REVISE:
• In this phase, feedback related to the solution constructed so far is obtained.
• This feedback can be given in the form of a correctness rating of the result or in the form of a
manually corrected revised case.
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent
retain phase.
• Retain
• The retain phase is the learning phase of a CBR system (adding a revised case to the case base).
• Explicit competence models have been developed that enable the selective retention of cases
(because of the continuous increase of the case-base).
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent
retain phase.
RETAIN:
• The retain phase is the learning phase of a CBR system (adding a revised case to the case base).
• Explicit competence models have been developed that enable the selective retention of cases
(because of the continuous increase of the case-base).
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent
retain phase.

Eager Learning:

 In artificial intelligence, eager learning is a learning method in


which the system tries to construct a general, explicit
description of input- independent target function during
training of the system, where generalization beyond the
training data is delayed until a query is made to the system.

 The main advantage gained in employing an eager learning method, such as an artificial neural
network, is that the target function will be approximated globally during training, thus requiring
much less space than using a lazy learning system.
 Eager learning systems also deal much better with noise in the training data.

 Eager learning is an example of offline learning, in which post-training queries to the system have
no effect on the system itself, and thus the same query to the system will always produce the same
result.

Lazy Learning:

 Lazy learning is a learning method in which


generalization of the training data is delayed until a query
is made to the system, where the system tries to generalize
the training data before receiving queries.

 Lazy learning methods simply store the data and


generalizing beyond these data is postponed until an
explicit request is made.

 Instance-based learning methods such as nearest neighbour and


locally weighted regression are conceptually straightforward approaches to approximating real-
valued or discrete- valued target functions.

 Instance based learning includes nearest neighbour and locally weighted regression methods that
assume instances can be represented as points in a Euclidean space. It also includes case-based
reasoning methods that use more complex, symbolic representations for instances.

 Instance-based methods which are also referred as "lazy" learning methods because they delay
processing until a new instance must be classified. The lazy learner can create many local
approximations.

 A key advantage of this kind of delayed, or lazy, learning is that instead of estimating the target
function once for the entire instance space, these methods can estimate it locally and differently for
each new instance to be classified.

 One disadvantage of instance-based approaches is that the cost


of classifying new instances can be high. This is because
nearly all computation takes place at classification.

 A second disadvantage is too many instance-based approaches,


especially nearest neighbor approaches, is that they typically
consider all attributes of the instances when attempting to
retrieve similar training examples from memory. If the target
concept depends on only a few of the many available
attributes, then the instances that are truly most "similar" may
well be a large distance apart.
Eager vs. Lazy Learning:

Eager Learning Lazy Learning


 Eager learning methods construct  Lazy learning methods simply store
general, explicit description of the the data and generalizing beyond
target function based on the provided these data is postponed until an
training examples. explicit request is made.
 Eager learning methods use the same  Lazy learning methods can construct
approximation to the target function, a different approximation to the
which must be learned based on target function for each encountered
training examples and before input query instance.
queries are observed.  Lazy learning methods simply store
 Eager learning methods construct the data and generalizing beyond
general, explicit description of the these data is postponed until an
target function based on the provided explicit request is made.
training examples.  Lazy learning methods can construct
 Eager learning methods use the same a different approximation to the
approximation to the target function, target function for each encountered
which must be learned based on query instance.
training examples and before input  Lazy learning is very suitable for
queries are observed. complex and incomplete problem
domains, where a complex target
function can be represented by a
collection of less complex local
approximations.

You might also like