0% found this document useful (0 votes)
16 views83 pages

Unit - 3

Uploaded by

Romesh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
16 views83 pages

Unit - 3

Uploaded by

Romesh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 83

Classification Logic

• In Supervised Learning, the model learns by example. Along with our input variable, we also give our
model the corresponding correct labels. While training, the model gets to look at which label
corresponds to our data and hence can find patterns between our data and those labels.

Some examples of Supervised Learning include:

• It classifies spam Detection by teaching a model of what mail is spam and not spam.
• Speech recognition where you teach a machine to recognize your voice.
• Object Recognition by showing a machine what an object looks like and having it pick that object
from among other objects.

We can further divide Supervised Learning into the following:

Figure 1: Supervised Learning Subdivisions


Classification In Machine Learning

• The Classification algorithm is a Supervised Learning technique that is used to identify the category of new
observations on the basis of training data. In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0
or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories.

• The classification predictive modeling is the task of approximating the mapping function from input
variables to discrete output variables. The main goal is to identify which class/category the new data will
fall into.
• The best example of an ML classification algorithm is Email Spam Detector.
• Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue",
"fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes
labeled input data, which means it contains input with the corresponding output.

• In classification algorithm, a discrete output function(y) is mapped to input variable(x).

y=f(x), where y = categorical output

• The main goal of the Classification algorithm is to identify the category of a given dataset, and these
algorithms are mainly used to predict the output for the categorical data.

• Classification algorithms can be better understood using the below diagram. In the below diagram, there are
two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to
other classes.
The algorithm which implements the classification on a dataset is known as a classifier.
There are two types of Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes,
then it is called as Binary Classifier.

The algorithm which implements the classification on a dataset is known as a classifier. There are two types of
o
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two outcomes,

Classifications:
then it is called as Multi-class
Example: Classifications of types of crops, Classification of types of music.
Classifier.

• Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier.

Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.

• Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class
Classifier.

Example: Classifications of types of crops, Classification of types of music.


Learners in Classification Problems:
In the classification problems, there are two types of learners:

Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset.
In Lazy learner case, classification is done on the basis of the most related data stored in the training
dataset. It takes less time in training but more time for predictions.

Example: K-NN algorithm, Case-based reasoning

Eager Learners: Eager Learners develop a classification model based on a training dataset before
receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less
time in prediction.

Example: Decision Trees, Naïve Bayes, ANN.


Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:

• Linear Models
➢ Logistic Regression
➢ Support Vector Machines

• Non-linear Models
➢ K-Nearest Neighbours
➢ Kernel SVM
➢ Naïve Bayes
➢ Decision Tree Classification
➢ Random Forest Classification
Classification Algorithms

• In machine learning, classification is a supervised learning concept which basically categorizes a set of data into
classes. The most common classification problems are – speech recognition, face detection, handwriting
recognition, document classification, etc.
• It can be either a binary classification problem or a multi-class problem too. There are a bunch of machine
learning algorithms for classification in machine learning. Let us take a look at those classification algorithms in
machine learning.

Logistic Regression

• It is a classification algorithm in machine learning that uses one or more independent variables to determine an
outcome. The outcome is measured with a dichotomous variable meaning it will have only two possible
outcomes.

• The goal of logistic regression is to find a best-fitting relationship between the dependent variable and a set of
independent variables. It is better than other binary classification algorithms like nearest neighbour since it
quantitatively explains the factors leading to classification.
Advantages and Disadvantages

• Logistic regression is specifically meant for classification, it is useful in understanding how a set of
independent variables affect the outcome of the dependent variable.

• The main disadvantage of the logistic regression algorithm is that it only works when the predicted
variable is binary, it assumes that the data is free of missing values and assumes that the predictors
are independent of each other.

Use Cases

1. Identifying risk factors for diseases

2. Word classification

3. Weather Prediction

4. Voting Applications
Naive Bayes:

• Naive Bayes is a classification algorithm that assumes that predictors in a dataset are independent. This
means that it assumes the features are unrelated to each other.

For example: if given a banana, the classifier will see that the fruit is of yellow color, oblong-shaped and long
and tapered. All of these features will contribute independently to the probability of it being a banana and are
not dependent on each other. Naive Bayes is based on Bayes’ theorem, which is given as:

Figure 3 : Bayes’ Theorem

Where :

P(A | B) = how often happens given that B happens

P(A) = how likely A will happen

P(B) = how likely B will happen

P(B | A) = how often B happens given that A happens


Advantages and Disadvantages

• The Naive Bayes classifier requires a small amount of training data to estimate the necessary parameters
to get the results. They are extremely fast in nature compared to other classifiers.

• The only disadvantage is that they are known to be a bad estimator.

Use Cases

1. Disease Predictions

2. Document Classification

3. Spam Filters

4. Sentiment Analysis
Decision Trees:

• A Decision Tree is an algorithm that is used to visually represent decision-making.


• A Decision Tree can be made by asking a yes/no question and splitting the answer to lead to another decision.
The question is at the node and it places the resulting decisions below at the leaves. The tree depicted below is
used to decide if we can play tennis.

Advantages and Disadvantages

• A decision tree gives an advantage of simplicity to understand and visualize, it requires very little data preparation
as well.

• The disadvantage that follows with the decision tree is that it can create complex trees that may bot categorize
efficiently. They can be quite unstable because even a simplistic change in the data can hinder the whole
structure of the decision tree.
Use Cases

1. Data exploration

2. Pattern Recognition

3. Option pricing in finances

4. Identifying disease and risk threats


K-Nearest Neighbour

• It is a lazy learning algorithm that stores all instances corresponding to training data in n-dimensional space.
• It is a lazy learning algorithm as it does not focus on constructing a general internal model, instead, it works
on storing instances of training data.
Advantages And Disadvantages

1. This algorithm is quite simple in its implementation and is robust to noisy training data. Even if
the training data is large, it is quite efficient.

2. The only disadvantage with the KNN algorithm is that there is no need to determine the value of
K and computation cost is pretty high compared to other algorithms.

Use Cases

1. Industrial applications to look for similar tasks in comparison to others

2. Handwriting detection applications

3. Image recognition

4. Video recognition

5. Stock analysis
Random Forest

• Random decision trees or random forest are an ensemble learning method for classification, regression, etc. It
operates by constructing a multitude of decision trees at training time and outputs the class that is the mode
of the classes or classification or mean prediction(regression) of the individual trees.

• A random forest is a meta-estimator that fits a number of trees on various subsamples of data sets and then
uses an average to improve the accuracy in the model’s predictive nature.
• The sub-sample size is always the same as that of the original input size but the samples are often drawn with
replacements.
Advantages and Disadvantages

• The advantage of the random forest is that it is more accurate than the decision trees due to the
reduction in the over-fitting.
• The only disadvantage with the random forest classifiers is that it is quite complex in implementation
and gets pretty slow in real-time prediction.

Use Cases

1. Industrial applications such as finding if a loan applicant is high-risk or low-risk

2. For Predicting the failure of mechanical parts in automobile engines

3. Predicting social media share scores

4. Performance scores
Support Vector Machine

• The support vector machine is a classifier that represents the training data as points in space separated
into categories by a gap as wide as possible. New points are then added to space by predicting which
category they fall into and which space they will belong to.
Advantages and Disadvantages

• It uses a subset of training points in the decision function which makes it memory efficient and is
highly effective in high dimensional spaces.
• he only disadvantage with the support vector machine is that the algorithm does not directly provide
probability estimates.

Use cases
1. Business applications for comparing the performance of a stock over a period of time

2. Investment suggestions

3. Classification of applications requiring accuracy and efficiency


Evaluating a Classification model:

Once our model is completed, it is necessary to evaluate its performance; either it is a Classification or
Regression model. So for evaluating a Classification model, we have the following ways:

Confusion Matrix:

• The confusion matrix provides us a matrix/table as output and describes the performance of the model.

• It is also known as the error matrix.

• The matrix consists of predictions result in a summarized form, which has a total number of correct predictions
and incorrect predictions. The matrix looks like as below table:

Actual Positive Actual Negative

Predicted Positive True Positive False Positive


Predicted Negative False Negative True Negative
Accuracy

• Accuracy is a ratio of correctly predicted observation to the total observations

Accuracy = TP+TN/TP+FP+FN+TN

True Positive: The number of correct predictions that the occurrence is positive.

True Negative: Number of correct predictions that the occurrence is negative.

F1- Score

• It is the weighted average of precision and recall.

Precision And Recall

• Precision is the fraction of relevant instances among the retrieved instances, while recall is the fraction of relevant
instances that have been retrieved over the total number of instances.
• They are basically used as the measure of relevance.
ROC Curve

• Receiver operating characteristics or ROC curve is used for visual comparison of classification models, which
shows the relationship between the true positive rate and the false positive rate .
• The area under the ROC curve is the measure of the accuracy of the model.
Concept Learning
• Inducing general functions from specific training examples is a
main issue of machine learning.
• Concept Learning: Acquiring the definition of a general category
from given sample positive and negative training examples of the
category.
• Concept Learning can seen as a problem of searching through
a predefined space of potential hypotheses for the hypothesis
that best fits the training examples.
• The hypothesis space has a general-to-specific ordering of
hypotheses, and the search can be efficiently organized by taking
advantage of a naturally occurring structure over the hypothesis
space.

1
Concept Learning
• A Formal Definition for Concept Learning:

Inferring a boolean-valued function from training examples of


its input and output.

• An example for concept-learning is the learning of bird-concept from


the given examples of birds (positive examples) and non-birds (negative
examples).
• We are trying to learn the definition of a concept from given examples.

2
A Concept Learning Task – Enjoy Sport
Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES

2 Sunny Warm High Strong Warm Same YES

3 Rainy Cold High Strong Warm Change NO

4 Sunny Warm High Strong Warm Change YES

ATTRIBUTES CONCEPT

• A set of example days, and each is described by six attributes.


• The task is to learn to predict the value of EnjoySport for arbitrary day,
based on the values of its attribute values.
3
EnjoySport – Hypothesis Representation
• Each hypothesis consists of a conjuction of constraints on the
instance attributes.
• Each hypothesis will be a vector of six constraints, specifying the values
of the six attributes
– (Sky, AirTemp, Humidity, Wind, Water, and Forecast).
• Each attribute will be:
? - indicating any value is acceptable for the attribute(don’t care)
singlevalue – specifying a single required value (ex. Warm) (specific)
0 - indicating no value is acceptable for the attribute (no value)

4
Hypothesis Representation
• A hypothesis:
Sky AirTemp Humidity Wind Water Forecast
< Sunny, ? , ? , Strong , ? , Same >
• The most general hypothesis – that every day is a positive example
<?, ?, ?, ?, ?, ?>
• The most specific hypothesis – that no day is a positive example
<0, 0, 0, 0, 0, 0>
• EnjoySport concept learning task requires learning the sets of days for
which EnjoySport=yes, describing this set by a conjunction of
constraints over the instance attributes.

5
EnjoySport Concept Learning Task
Given
– Instances X : set of all possible days, each described by the attributes
• Sky – (values: Sunny, Cloudy, Rainy)
• AirTemp – (values: Warm, Cold)
• Humidity – (values: Normal, High)
• Wind – (values: Strong, Weak)
• Water – (values: Warm, Cold)
• Forecast – (values: Same, Change)
– Target Concept (Function) c : EnjoySport : X → { 0,1}
– Hypotheses H : Each hypothesis is described by a conjunction of constraints on
the attributes.
– Training Examples D : positive and negative examples of the target function
Determine
– A hypothesis h in H such that h(x) = c(x) for all x in D.
6
The Inductive Learning Hypothesis
• Although the learning task is to determine a hypothesis h identical to the
target concept cover the entire set of instances X, the only information
available about c is its value over the training examples.
– Inductive learning algorithms can at best guarantee that the output hypothesis fits the target
concept over the training data.
– Lacking any further information, our assumption is that the best hypothesis regarding
unseen instances is the hypothesis that best fits the observed training data. This is the
fundamental assumption of inductive learning.

• The Inductive Learning Hypothesis - Any hypothesis found to


approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over
other unobserved examples.

7
Concept Learning As Search
• Concept learning can be viewed as the task of searching through a large
space of hypotheses implicitly defined by the hypothesis representation.
• The goal of this search is to find the hypothesis that best fits the training
examples.
• By selecting a hypothesis representation, the designer of the learning
algorithm implicitly defines the space of all hypotheses that the program
can ever represent and therefore can ever learn.

8
Enjoy Sport - Hypothesis Space
• Sky has 3 possible values, and other 5 attributes have 2 possible values.
• There are 96 (= 3.2.2.2.2.2) distinct instances in X.
• There are 5120 (=5.4.4.4.4.4) syntactically distinct hypotheses in H.
– Two more values for attributes: ? and 0
• Every hypothesis containing one or more 0 symbols represents the
empty set of instances; that is, it classifies every instance as negative.
• There are 973 (= 1 + 4.3.3.3.3.3) semantically distinct hypotheses in H.
– Only one more value for attributes: ?, and one hypothesis representing empty set of
instances.
• Although EnjoySport has small, finite hypothesis space, most learning
tasks have much larger (even infinite) hypothesis spaces.
– We need efficient search algorithms on the hypothesis spaces.

Machine Learning 9
General-to-Specific Ordering of Hypotheses
• Many algorithms for concept learning organize the search through the hypothesis
space by relying on a general-to-specific ordering of hypotheses.
• By taking advantage of this naturally occurring structure over the hypothesis space, we
can design learning algorithms that exhaustively search even infinite hypothesis spaces
without explicitly enumerating every hypothesis.

• Consider two hypotheses


h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)

• Now consider the sets of instances that are classified positive by h1 and by h2.
– Because h2 imposesfewer constraintson the instance, it classifiesmore instancesas
positive.
– In fact, any instance classified positive by h1 will also be classified positive by h2.
– Therefore, we say that h2 is more general than h1.

10
More-General-Than Relation
• For any instance x in X and hypothesis h in H, we say that x satisfies h
if and only if h(x) = 1.

• More-General-Than-Or-Equal Relation:
Let h1 and h2 be two boolean-valued functions defined over X.
Then h1 is more-general-than-or-equal-to h2 (written h1 ≥ h2)
if and only if any instance that satisfies h2 also satisfies h1.

• h1 is more-general-than h2 ( h1 > h2) if and only if h1≥h2 is true and


h2≥h1 is false. We also say h2 is more-specific-than h1.

11
More-General-Relation

• h2 > h1 and h2 > h3


• But there is no more-general relation between h1 and h3 12
FIND-S Algorithm
• FIND-S Algorithm starts from the most specific hypothesis and
generalize it by considering only positive examples.
• FIND-S algorithm ignores negative examples.
– As long as the hypothesis space contains a hypothesis that describes the true target concept,
and the training data contains no errors, ignoring negative examples does not cause to any
problem.
• FIND-S algorithm finds the most specific hypothesis within H that is
consistent with the positive training examples.
– The final hypothesis will also be consistent with negative examples if the correct target
concept is in H, and the training examples are correct.

13
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is
satisfied by x
3. Output hypothesis h

14
FIND-S Algorithm - Example

15
Unanswered Questions by FIND-SAlgorithm
• Has FIND-S converged to the correct target concept?
– Although FIND-S will find a hypothesis consistent with the training data, it has no way to
determine whether it has found the only hypothesis in H consistent with the data (i.e., the
correct target concept), or whether there are many other consistent hypotheses as well.
– We would prefer a learning algorithm that could determine whether it had converged and,
if not, at least characterize its uncertainty regarding the true identity of the target concept.

• Why prefer the most specific hypothesis?


– In case there are multiplehypotheses consistent with the training examples, FIND-S will
find the most specific.
– It is unclear whether we should prefer this hypothesis over, say, the most general, or some
other hypothesis of intermediate generality.

16
Unanswered Questions by FIND-SAlgorithm
• Are the training examples consistent?
– In most practical learning problems there is some chance that the training examples will
contain at least some errors or noise.
– Such inconsistent setsof training examples can severely mislead FIND-S, given the fact
that it ignores negative examples.
– We would prefer an algorithm that could at least detect when the training data is
inconsistent and, preferably, accommodate such errors.

• What if there are several maximally specific consistent hypotheses?


– In the hypothesis language H for the EnjoySport task, there is always a unique, most
specific hypothesis consistent with any set of positive examples.
– However, for other hypothesis spaces there can be several maximally specific hypotheses
consistent with the data.
– In this case, FIND-S must be extended to allow it to backtrack on its choices of how to
generalize the hypothesis, to accommodate the possibility that the target concept lies along
a different branch of the partial ordering than the branch it has selected.

17
Common Sense
based Learning
The Goals of Artificial Intelligence

• The need to reconsider the goals of AI

• Mental Amplification

• Thanks to engineering, we can travel faster and


farther than our muscles can take us, see things we
can’t otherwise see, talk louder than our lungs can
shout.
Expert Systems
• Our system: Some diagnosis expert system like
MYCIN

• The Patient: An old rusted car in the back yard

• Question & Answer session with the expert system

➢ Are there spots on the body? YES


➢ Are there more spots on the trunk than
anywhere else? YES
➢What color are the spots? REDDISH BROWN

• Diagnosis: The car has measles

• Degree of confidence: HIGH


Example taken from Google Techtalk by Doug Lenat, may 2006
Expert Systems (cont.)

• Our system: An intelligent car loan approval system

• Question & Answer session with the expert system

➢ Date of Birth: 1989

➢ Time spent at current job: 19 YEARS

• Result: Loan approved

Example taken from Google Techtalk by Doug Lenat, May 2006


Expert Systems (cont.)

• So why do the “expert” systems have this problem?

• Because they don’t have common sense

• The expert systems only know equations and


variables.
Search
• Is the Eiffel tower taller than the Taj Mahal?

• Cannot combine knowledge it already has access to.

• Why can’t the search engine do the simple math


and give us the answer

• Lack of common sense


Natural Language Processing
• The police watched demonstrators...
…because they feared violence.
…because they advocated violence.

• Mary and Sue are sisters.


Mary and Sue are mothers.

• George Burns: “My aunt is in the hospital, I went


to see her today, and took her flowers.”

Gracie Allen: “George, That’s terrible! You should


have brought her flowers.”

Example taken from Google Techtalk by Doug Lenat, may 2006


ASSUME OUR
COMPUTER
NOW HAS

Common Sense
Search
Query: “someone smiling”

When you are happy, you smile

You become happy when someone


you love accomplishes a milestone

Taking one’s first step is a milestone

Parents love their children

Caption: “A mother helping her child


take her first step”
Search
Query: “Government buildings damaged in terrorist
events in Beirut between 1990 and 2001.”

Beirut is in Lebanon

Embassies are govt. buildings

1993 is in the 1990’s

If there was a pipe bombing, then it is mostly


a terrorist attack and not an accident etc.

Document: “1993 pipe bombing of France’s embassy


in Lebanon”
Example taken from Google Techtalk by Doug Lenat, may 2006
Natural Language Processing
• The police watched demonstrators...
…because they feared violence.
…because they advocated violence.

• Mary and Sue are sisters.


Mary and Sue are mothers.

• George Burns: “My aunt is in the hospital, I went


to see her today, and took her flowers.”

Gracie Allen: “George, That’s terrible! You should


have brought her flowers.”

Example taken from Google Techtalk by Doug Lenat, may 2006


Common Sense
• Ask yourself, how would an automated vehicle know that a
snowman standing at the edge of the street is not going to run
into the road? Humans use their common-sense knowledge to
realize that is not going to happen.

• Why is it so difficult for us to give intelligent agents common-


sense knowledge? As illustrated in the previous example, we
use this knowledge intuitively, without thinking about it. Often,
we do not even realize we are doing it.

• Common sense is all the background knowledge we have


about the physical and social world that we have absorbed
over our lives.
• It includes such things as our understanding of physics,
(causality, hot and cold), as well as our expectations about
how humans behave.
For example:
A person can figure out how to drive on the left hand side of the road in
England even if they have only driven in countries that drive on the right
hand side. They can infer what is the same and what is different.
So How do we
implement

Common Sense?
What is this “Knowledge”?
• Millions of facts, rules of thumb etc.
• Represented as sentences in some language.
• If the language is Logic, then computers can do
deductive reasoning automatically.
• This representation of a set of concepts within a
domain and the relationships between those
concepts is called Ontology
• The sentences are expressed in formal logic
notation.
• The words and the logic sentences about them are
called Formal Ontology
Hierarchy in Ontology
Predicate Calculus Representation
A predicate calculus representation assumes a universe of individuals, with
relations and functions on those individuals, and sentences formed by
combining relations with the logical connectives and, or, and not.

Parents love their children

This can be represented as

(ForAll ?P (ForAll ?C
(implies
(and
(isa ?P Person)
(child ?P ?C))
(loves ?P ?C)))))

For all P, For all C, P is a person AND C is a child of P


implies P loves C
Cyc
• Cyc is an AI project that attempts to assemble a
comprehensive ontology and knowledge of everyday
common sense knowledge.

• Its goal is to enable AI applications to perform


human like reasoning.

• The project was started by CYcorp, a Texas based


company.

• All the aforementioned features were incorporated in


Cyc.
Cyc
• Cyc has a huge knowledge base which it uses for
reasoning.

• Contains

• 15,000 predicates
• 300,000 concepts
• 3,200,000 assertions

• All these predicates, concepts and assertions are


arranged in numerous ontologies.
Cyc: Features
Uncertain Results

• Query: “who had the motive for the assassination of


Rafik Hariri?”

• Since the case is still an unsolved political mystery,


there is no way we can ever get the answer.

• In cases like these Cyc returns the various view


points, quoting the sources from which it built its
inferences.
• For the above query, it gives two view points
• “USA and Israel” as quoted from a editorial in Al Jazeera
• “Syria” as quoted from a news report from CNN

Example taken from Google Techtalk by Doug Lenat, may 2006


Cyc: Features (cont.)

• It uses Google as the search engine in the


background.

• It filters results according to the context of the query.

• For example, if we search for assassination of Rafik


Hariri, then it omits results which have a time stamp
before that of the assassination date.
Cyc: Features (cont.)
Qualitative Queries

Query: “Was Bill Clinton a good President of the


United States?”
• In cases like these, Cyc returns the results in a pros
and cons type and leave it to the user to make a
conclusion.

Queries With No Answer

Query: “At this instance of time, Is Alice inhaling or


Exhaling?”
• The Cyc system is intelligent enough to figure out
queries which can never be answered correctly.
Example taken from Google Techtalk by Doug Lenat, may 2006
The Dream
• The ultimate goal is to build enough common sense
into the Cyc system such that it can understand
Natural Language.

• Once it understands Natural Language, all the


system has to do is crawl through all the online
material and learn new common sense rules and
evolve.

• This two step process of building common sense


and using machine learning techniques to learn new
things will make the Cyc system an infinite source of
knowledge.
Drawbacks
• There is no single Ontology that works in all cases.

• Although Cyc is able to simulate common sense it


cannot distinguish between facts and fiction.

• In Natural Language Processing there is no way the


Cyc system can figure out if a particular word is used
in the normal sense or in the sarcastic sense.

• Adding knowledge is a very tedious process.


Limitation of common sense
• It is important to distinguish between wisdom and “common sense,” which
is considered applicable only to practical matters, not to feelings or abstract
concepts.
• Wisdom increases with experience, but it also includes more subjective
notions and judgments. There is no debate that walking through traffic is
foolish and a threat to your safety; it is objectively dangerous and shows a
lack of “common sense.”
• However, there is no objective proof that leaving a comfortable job and
taking a risk at a bigger opportunity is a “sensible” decision. Wisdom can
help to inform the latter decision, but not “common sense”.
• This natural ability to make good decisions and behave rationally helps you
function within society, obey social norms, protect yourself, assess
situations correctly, and develop relationships.
• This can lead to conflict at any stage of life. Although “common sense” is
regarded as generally available knowledge and practical application, since
each individual’s experience is different, so is their understanding of and
access to “common sense”.
Conclusion
• Intelligent software agents must use common sense in order to
reason.
• Common-sense knowledge is required before intelligent software
agents can anticipate how people and the physical world react.
• Deep learning models do not currently understand what they
produce, and have no common-sense knowledge.
• The Common sense Transformers (COMET) project attempts to
train models with information about the world in ways similar to
how a human would acquire such knowledge.
• The COMET project and other similar efforts are still in the
research phase.

Artificial intelligence researchers have not been successful in giving


intelligent agents the common-sense knowledge they need to reason about
the world. Without this knowledge, it is impossible for intelligent agents to
truly interact with the world. Traditionally, there have been two
unsuccessful approaches to getting computers to reason about the world—
symbolic logic and deep learning. A new project, called COMET, tries to
bring these two approaches together. Although it has not yet succeeded, it
offers the possibility of progress.
Explanation-Based Learning (EBL)
One definition:
Learning general problem-
solving techniques by
observing and analyzing
human solutions to
specific problems.

1
Explanation based learning

• Explanation based learning has ability to learn from a single training


instance. Instead of taking more examples the explanation based
learning is emphasized to learn a single, specific example.
• The usage of EBL can be seen in Efficient parsing of constraint-based
grammars and in compilation of grammar-based language models
for speech recognition.
• For example, consider the Ludo game. In a Ludo game, there are
generally four colours of buttons.
• For a single colour there are four different squares. Suppose the colours
are red, green, blue and yellow. So maximum four members are possible
for this game. Two members are considered for one side (suppose green
and red) and other two are considered for another side (suppose blue and
yellow). So for any one opponent the other will play his game.
• A square sized small box marked by symbols one to six is circulated
among the four members. The number one is the lowest number and the
number six is the highest for which all the operations are done.
Explanation based learning
• Always any one from the 1st side will try to attack any one member in
the 2nd side and vice versa. At any instance of play the players of one
side can attack towards the players of another side.
• Likewise, all the buttons may be attacked and rejected one by one and
finally one side will win the game. Here at a time the players of one side
can attack towards the players of another side. So for a specific player,
the whole game may be affected.
• Hence we can say that always explanation based learning is
concentrated on the inputs like a simple learning program, the idea
about the goal state, the idea about the usable concepts and a set of rules
that describes relationships between the objects and the actions
The EBL Hypothesis

•The EBL is based on the hypothesis that an intelligent


system can learn a general concept after observing only a
single example.
•By understanding why an example is a member of a
concept, can learn the essential properties of the concept.
•EBL uses prior knowledge to analyse or explain each
training example in order to infer what properties are
relevant to the target function and which are irrelevant.

4
Learning by Generalizing Explanations
Given
• Goal concept (e.g., some predicate calculus
• statement)
• Training example (facts)
• Domain Theory (inference rules)
• Operationality Criterion

• Given this four inputs, the task is to determine a


generalization of the training example that is sufficient
concept definition for the goal concept and that satisfies the
operationality criteria.
• The operationality criterion requires that the final concept
definition be described in terms of the predicates used to
describe the training example.
5
Steps in EBL

EBL involves 2 steps:

Explanation — The domain theory is used to eliminate all


the unimportant training example while retaining the
important ones that best describe the goal concept.

Generalization — The explanation of the goal concept is


made as general and widely applicable as possible. This
ensures that all cases are covered, not just certain specific
ones.
EBL Architecture:

•EBL model during training


• During training, the model generalizes the training example in such a way
that all scenarios lead to the Goal Concept, not just in specific cases. (As
shown in Fig )

Fig : Training EBL Model


•EBL model after training
• Post training, EBL model tends to directly reach the hypothesis
space involving the goal concept. (As shown in Fig )

Fig : Trained EBL Model


Explanation based generalization (EBG)
• Explanation based generalization (EBG) is an algorithm for explanation
based learning, described in Mitchell at al. (1986).
• It has two steps first, explain method and secondly, generalize method.
• During the first step, the domain theory is used to prune away all the
unimportant aspects of training examples with respect to the goal
concept.
• The second step is to generalize the explanation as far as possible while
still describing the goal concept.
• Consider the problem of learning the concept bucket. We want to
generalize from a single example of a bucket. At first collect the
following information.
Example:

3. Goal: Bucket

B is a bucket if B is liftable, stable and open-vessel.


Given a training example and a functional description, we want to build a general
structural description of a bucket.
Imperfect Theories and EBL
• Incomplete Theory Problem
• Cannot build explanations of specific problems because of
missing knowledge

• Intractable Theory Problem


• Have enough knowledge, but not enough computer time
to build specific explanation

• Inconsistent Theory Problem


• Can derive inconsistent results from a theory (e.g.,
because of default rules)

13
Some Complications
• Inconsistencies and Incompleteness may be
due to abstractions and assumptions that make
a theory tractable.

• Inconsistencies may arise from missing


knowledge (incompleteness).e.g., making the
closed-world assumption

14
Issues with Imperfect Theories
• Detecting imperfections
– “broken” explanations (missing clause)
– contradiction detection (proving P and not P)
– multiple explanations (but expected!)
– resources exceeded
• Correcting imperfections
• experimentation - motivated by failure type
(explanation- based)
• make approximations/assumptions - assume something
is true

15
EBL as Operationalization (Speedup Learning)
• Assuming a complete problem solver and
unlimited time, EBL already knows how to
recognize all the concepts it will know.
• What it learns is how to make its
knowledge operational (Mostow).

16
Knowledge-Level Learning
By Newell, Dietterich
• EBL as Knowledge closure
– all things that can be inferred from a collection of rules and facts
• “Pure” EBL only learns how to solve faster, not how to solve problems
previously insoluble.
• Inductive learners make inductive leaps and hence can solve more
after learning.
• What about considering resource-limits (e.g., time) on problem
solving?

17
Here, A Horn clause is a logical formula of a particular rule-like form which gives
it useful properties for use in logic programming, formal specification, and model
theory.

You might also like