Unit 5 1

lOMoARcPSD|45814754
AI Unit-V Chapter-I - II Uncertainty- Probabilistic Reasoning
Ai And Machine Learning (Jawaharlal Nehru Technological University, Hyderabad)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university

Downloaded by Sriram Kuriseti ([email protected])
lOMoARcPSD|45814754
ARTIFICIAL INTELLIGENCE
UNIT – V (Uncertain knowledge and Learning)
Chapter-I: Uncertainty: Acting under Uncertainty, Basic Probability Notation, Inference Using
Full Joint Distributions, Independence, Bayes’ Rule and Its Use
Chapter-II: Probabilistic Reasoning: Representing Knowledge in an Uncertain Domain, The
Semantics of Bayesian Networks, Efficient Representation of Conditional Distributions,
Approximate Inference in Bayesian Networks, Relational and First-Order Probability, Other
Approaches to Uncertain Reasoning; Dempster-Shafer theory.
Probabilistic reasoning in Artificial intelligence

Uncertainty:
 Till now, we have learned knowledge representation using first-order logic and propositional logic
with certainty, which means we were sure about the predicates.
 With this knowledge representation, we might write A→B, which means if A is true then B is true,
but consider a situation where we are not sure about whether A is true or not then we cannot express
this statement, this situation is called uncertainty.
 So, to represent uncertain knowledge, where we are not sure about the predicates, we need uncertain
reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
 Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty.
 We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that
is the result of someone's laziness and ignorance.
Mr. Mohammed Afzal, Asst. Professor in AIML

Mob: +91-8179700193, Email: [email protected]

lOMoARcPSD|45814754
 In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such
as "It will rain today," "behaviour of someone for some situations," "A match between two teams or
two players." These are probable sentences for which we can assume that it will happen but not sure
about it, so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.
o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
As probabilistic reasoning uses probability and related terms, so before understanding probabilistic
reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical
measure of the likelihood that an event will occur. The value of probability always remains between 0 and 1
that represent ideal uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.


lOMoARcPSD|45814754
 Event: Each possible outcome of a variable is called an event.

 Sample space: The collection of all possible events is called sample space.
 Random variables: Random variables are used to represent the events and objects in the real world.
 Prior probability: The prior probability of an event is probability computed before observing new
information.
 Posterior Probability: The probability that is calculated after all evidence or information has taken
into account. It is a combination of prior probability and new information.
Conditional probability:
 Conditional probability is a probability of occurring an event when another event has already
happened.
 Let's suppose, we want to calculate the event A when event B has already occurred, "the probability
of A under the conditions of B", it can be written as:
Where P(A ∧ B) = Joint probability of A and B
P(B)= Marginal probability of B.
 If the probability of A is given and we need to find the probability of B, then it will be given as:
 It can be explained by using the below Venn diagram, where B is occurred event, so sample space
will be reduced to set B, and now we can only calculate event A when event B is already occurred
by dividing the probability of P(A⋀B) by P(B).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes English and
mathematics, and then what is the percent of students those who like English also like mathematics?


lOMoARcPSD|45814754
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.
Hence, 57% are the students who like English also like Mathematics.
Bayes' theorem in Artificial intelligence

Bayes' theorem:
 Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines
the probability of an event with uncertain knowledge.
 In probability theory, it relates the conditional probability and marginal probabilities of two random
events.
 Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.
 It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
 Bayes' theorem allows updating the probability prediction of an event by observing new information
of the real world.
Example:
 If cancer corresponds to one's age, then by using Bayes' theorem, we can determine the probability
of cancer more accurately with the help of age.
 Bayes' theorem can be derived using product rule and conditional probability of event A with known
event B:
 As from product rule we can write:
P(A ⋀ B)= P(A|B) P(B) or
 Similarly, the probability of event B with known event A:
P(A ⋀ B)= P(B|A) P(A)
 Equating right hand side of both the equations, we will get:


lOMoARcPSD|45814754
 The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most
modern AI systems for probabilistic inference.
 It shows the simple relationship between joint and conditional probabilities. Here, P(A|B) is known
as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we
have occurred an evidence B.
 P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the
probability of evidence.
 P(A) is called the prior probability, probability of hypothesis before considering the evidence
 P(B) is called marginal probability, pure probability of an evidence.
 In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be
written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:

 Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
 This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one.
 Suppose we want to perceive the effect of some unknown cause, and want to compute that cause,
then the Bayes' rule becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time.
He is also aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.

lOMoARcPSD|45814754
Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so
we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the
card is king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face
card is a king card.
Solution:
P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:
Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:
o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
Bayesian Belief Network in artificial intelligence

 Bayesian belief network is key computer technology for dealing with probabilistic events and to
solve a problem which has uncertainty. We can define a Bayesian network as:


lOMoARcPSD|45814754
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their
conditional dependencies using a directed acyclic graph."
 It is also called a Bayes network, belief network, decision network, or Bayesian model.
 Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
 Real world applications are probabilistic in nature, and to represent the relationship between multiple
events, we need a Bayesian network. It can also be used in various tasks including prediction,
anomaly detection, diagnostics, automated insight, reasoning, time series prediction,
and decision making under uncertainty.
 Bayesian Network can be used for building models from data and experts opinions, and it consists of
two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
 The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph. These links
represent that one node directly influence the other node, and if there is no directed link that means
that nodes are independent with each other.
In the above diagram, A, B, C, and D are random variables represented by the nodes of the
network graph.
If we are considering node B, which is connected with node A by a directed arrow, then node
A is called the parent of Node B.
Node C is independent of node A.


lOMoARcPSD|45814754
The Bayesian network has mainly two components:

o Causal Component
o Actual numbers
 Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which
determines the effect of the parent on that node.
 Bayesian network is based on Joint probability distribution and conditional probability.
 So, let's first understand the joint probability distribution:
Joint probability distribution:
 If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2,
x3.. xn, are known as Joint probability distribution.
 P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
 In general, for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia,
who have taken a responsibility to inform Harry at work when they hear the alarm. David always calls
Harry when he hears the alarm, but sometimes he got confused with the phone ringing and calls at that time
too. On the other hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here
we would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network structure is showing that
burglary and earthquake is the parent node of the alarm and directly affecting the probability of
alarm's going off, but David and Sophia's calls depend on alarm probability.


lOMoARcPSD|45814754
o The network is representing that our assumptions do not directly perceive the burglary and also do
not notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set
of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values
List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o DavidCalls(D)
o SophiaCalls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the
above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


lOMoARcPSD|45814754
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= True) P(A= False)
True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999
Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability of Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95
Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm".
A P(S= True) P(S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distributions.
The semantics of Bayesian Network:

There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.


lOMoARcPSD|45814754
2. To understand the network as an encoding of a collection of conditional independence

statements.
It is helpful in designing inference procedure.
Building a Bayesian Net

 Intuitively, "to construct a Bayesian Net for a given set of variables, we draw arcs from cause
variables to immediate effects. In almost all cases, doing so results in a Bayesian network [whose
conditional independence implications are accurate]." (Heckerman, 1996)
More formally, the following algorithm constructs a Bayesian Net:
1. Identify a set of random variables that describe the given problem domain
2. Choose an ordering for them: X1, ..., Xn
3. for i=1 to n do
a) Add a new node for Xi to the net
b) Set Parents(Xi) to be the minimal set of already added nodes such that we have conditional
independence of Xi and all other members of {X1, ..., Xi-1} given Parents(Xi)
c) Add a directed arc from each node in Parents(Xi) to Xi
d) If Xi has at least one parent, then define a conditional probability table at Xi: P(Xi=x | possible
assignments to Parents(Xi)). Otherwise, define a prior probability at Xi: P(Xi)
 There is not, in general, a unique Bayesian Net for a given set of random variables. But all represent the
same information in that from any net constructed every entry in the joint probability distribution can
be computed.
 The "best" net is constructed if in Step 2 the variables are topologically sorted first. That is, each variable
comes before all of its children. So, the first nodes should be the roots, then the nodes they directly
influence, and so on.
 The algorithm will not construct a net that is illegal in the sense of violating the rules of probability.
Example:
Consider the problem domain in which when I go home I want to know if someone in my family is home
before I go in. Let's say I know the following information:
1) Why my wife leaves the house, she often (but not always) turns on the outside light. (She also
sometimes turns the light on when she's expecting a guest.)
2) When nobody is home, the dog is often left outside.

lOMoARcPSD|45814754
3) If the dog has bowel-troubles, it is also often left outside.

4) If the dog is outside, I will probably hear it barking (though it might not bark, or I might hear a different
dog barking and think it's my dog).
Given this information, define the following five Boolean random variables:
O: Everyone is Out of the house L: The Light is on
D: The Dog is outside
B: The dog has Bowel troubles
H: I can Hear the dog barking
From this information, the following direct causal influences seem appropriate:
1. H is only directly influenced by D. Hence H is conditionally independent of L, O and B given D.
2. D is only directly influenced by O and B. Hence D is conditionally independent of L given O and B.
3. L is only directly influenced by O. Hence L is conditionally independent of D, H and B given O.
4. O and B are independent.
Based on the above, the following is a Bayesian Net that represents these direct causal relationships (though
it is important to note that these causal connections are not absolute, i.e., they are not implications):
Next, the following quantitative information is added to the net; this information is usually given by an
expert or determined empirically from training data.
o For each root node (i.e., node without any parents), the prior probability of the random variable
associated with the node is determined and stored there
o For each non-root node, the conditional probabilities of the node's variable given all possible
combinations of its immediate parent nodes are determined. This results in a conditional probability
table (CPT) at each non-root node.
Doing this for the above example, we get the following Bayesian Net:


lOMoARcPSD|45814754
 Notice that in this example, a total of 10 probabilities are computed and stored in the net, whereas
the full joint probability distribution would require a table containing 25 = 32 probabilities. The
reduction is due to the conditional independence of many variables.
 Two variables that are not directly connected by an arc can still affect each other. For example, B
and H are not (unconditionally) independent, but H does not directly depend on B.
 Given a Bayesian Net, we can easily read off the conditional independence relations that are
represented. Specifically, each node, V, is conditionally independent of all nodes that are not
descendants of V, given V's parents. For example, in the above example H is conditionally
independent of B, O, and L given D. So, P(H | B, D, O, L) = P(H | D).
Computing Joint Probabilities from a Bayesian Net
To illustrate how a Bayesian Net can be used to compute an arbitrary value in the joint probability
distribution, consider the Bayesian Net shown above for the "home domain."
Goal: Compute P(B, ~O, D, ~L, H)
P(B, ~O, D, ~L, H) = P(H, ~L, D, ~O, B) = P(H | ~L, D, ~O, B) * P(~L, D, ~O, B) by Product Rule
= P(H|D) * P(~L, D, ~O, B)
by Conditional Independence of H and L,O, and B given D
= P(H|D) P(~L | D,~O,B) P(D,~O,B) by Product Rule
= P(H|D) P(~L|~O) P(D, ~O, B) by Conditional Independence of L and D, and L and B, given O
= P(H|D) P(~L|~O) P(D | ~O, B) P(~O, B) by Product Rule
= P(H|D) P(~L|~O) P(D|~O, B) P(~O | B) P(B) by Product Rule
= P(H|D) P(~L|~O) P(D|~O, B) P(~O) P(B) by Independence of O and B = (.3)(1 - .6)(.1)(1 - .6)(.3)
= 0.00144
where all of the numeric values are available directly in the Bayesian Net (since P(~A|B) = 1 - P(A|B)).


lOMoARcPSD|45814754
APPROXIMATE INFERENCE IN BAYESIAN NETWORKS

Direct sampling methods
The simplest kind of random sampling process for Bayesian networks generates events from a network that
has no evidence associated with it. The idea is to sample each variable in turn, in topological order. The
probability distribution from which the value is sampled is conditioned on the values already assigned to the
variable’s parents.
Likelihood weighting
Likelihood weighting avoids the inefficiency of rejection sampling by generating only events that are
consistent with the evidence e. It is a particular instance of the general statistical technique of importance
sampling, tailored for inference in Bayesian networks.
Inference by Markov chain simulation

Markov chain Monte Carlo (MCMC) MARKOV CHAIN algorithms work quite differently from rejection
sampling and likelihood weighting. Instead of generating each sample from scratch, MCMC algorithms
generate each sample by making a random change to the preceding sample. It is therefore helpful to think of
an MCMC algorithm as being in a particular current state specifying a value for every variable and
generating a next state by making random changes to the current state.


lOMoARcPSD|45814754
FIRST-ORDER PROBABILITY MODELS

Possible worlds
For Bayesian networks, the possible worlds are assignments of values to variables; for the Boolean case in
particular, the possible worlds are identical to those of propositional logic. For a first-order probability
model, then, it seems we need the possible worlds to be those of first-order logic—that is, a set of objects
with relations among them and an interpretation that maps constant symbols to objects, predicate symbols to
relations, and function symbols to functions on those objects.
Relational probability models

Like first-order logic, RPMs have constant, function, and predicate symbols. We can refine the model by
introducing a context-specific independence.
A context-specific independence allows a variable to be independent of some of its parents given certain
values of others.
Open-universe probability models
 A vision system doesn’t know what exists, if anything, around the next corner, and may not know if
the object it sees now is the same one it saw a few minutes ago.
 A text-understanding system does not know in advance the entities that will be featured in a text, and
must reason about whether phrases such as “Mary,” “Dr. Smith,” “she,” “his cardiologist,” “his
mother,” and so on refer to the same object.
 An intelligence analyst hunting for spies never knows how many spies there really are and can only
guess whether various pseudonyms, phone numbers, and sightings belong to the same individual.
Representing ignorance: Dempster–Shafer theory
 The Dempster–Shafer theory DEMPSTER–SHAFER is designed to deal with the distinction between
uncertainty and ignorance. Rather than computing the probability of a proposition, it computes the
probability that the evidence supports the proposition. This measure of belief is called a belief function,
written Bel(X).
 The mathematical formulation of Dempster–Shafer theory is similar to those of probability theory; the
main difference is that, instead of assigning probabilities to possible worlds, the theory assigns masses
to sets of possible world, that is, to events.
 The masses still must add to 1 over all possible events. Bel(A) is defined to be the sum of masses for all
events that are subsets of (i.e., that entail) A, including A itself. With this definition, Bel(A) and


lOMoARcPSD|45814754
Bel(¬A) sum to at most 1, and the gap—the interval between Bel(A)and 1 − Bel(¬A)—is often
interpreted as bounding the probability of A.
 As with default reasoning, there is a problem in connecting beliefs to actions. Whenever there is a gap
in the beliefs, then a decision problem can be defined such that a Dempster–Shafer system is unable to
make a decision.
Bel(A) should be interpreted not as a degree of belief in A but as the probability assigned to all the
possible worlds (now interpreted as logical theories) in which A is provable.
Example:
let us consider a room where four person are presented A, B, C, D(lets say) And suddenly lights out and
when the lights come back B has been died due to stabbing in his back with the help of a knife. No one
came into the room and no one has leaved the room and B has not committed suicide. Then we have to find
out who is the murderer?
o Either {A} or{C} or {D} has killed him.
o Either {A, C} or {C, D} or {A, C} have killed him.
o Or the three of them kill him i.e; {A, C, D}
o None of the kill him {o}(let us say).
These will be the possible evidences by which we can find the murderer by measure of plausibility.
Using the above example we can say :
Set of possible conclusion (P): {p1, p2....pn} where P is set of possible conclusion and cannot be exhaustive
means at least one (p)i must be true.(p)i must be mutually exclusive. Power Set will contain 2n elements
where n is number of elements in the possible set.
For example:
If P = { a, b, c}, then Power set is given as {o, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}= 23 elements.
Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is evidence for {K or B}
which cannot be divided among more specific beliefs for K and B.
Belief in K: The belief in element K of Power Set is the sum of masses of element which are subsets of K.
This can be explained through an example
Lets say K = {a, b, c}
Bel(K) = m(a) + m(b) + m(c) + m(a, b) + m(a, c) + m(b, c) + m(a, b, c)
Plausibility in K: It is the sum of masses of set that intersects with K. i.e; Pl(K) = m(a) + m(b) + m(c) + m(a,
b) + m(b, c) + m(a, c) + m(a, b, c)


lOMoARcPSD|45814754
Characteristics of Dempster Shafer Theory:

It will ignorance part such that probability of all events aggregates to 1. Ignorance is reduced in this theory
by adding more and more evidences. Combination rule is used to combine various types of possibilities.
Advantages:
o Uncertainty interval reduces.
o DST has much lower level of ignorance.
o Diagnose Hierarchies can be represented using this.
o Person dealing with such problems is free to think about evidences.
Disadvantages:
o In this computation effort is high, as we have to deal with 2n of sets.


Unit 5 1

Uploaded by

Unit 5 1

Uploaded by

lOMoARcPSD|45814754

AI Unit-V Chapter-I - II Uncertainty- Probabilistic Reasoning

Ai And Machine Learning (Jawaharlal Nehru Technological University, Hyderabad)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

UNIT – V (Uncertain knowledge and Learning)

Probabilistic reasoning in Artificial intelligence

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

P(A) = 0, indicates total uncertainty in an event A.

P(A) =1, indicates total certainty in an event A.

P(¬A) = probability of a not happening event.

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

 Event: Each possible outcome of a variable is called an event.

Where P(A ∧ B) = Joint probability of A and B

P(B)= Marginal probability of B.

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Bayes' theorem in Artificial intelligence

P(A ⋀ B)= P(A|B) P(B) or

 Similarly, the probability of event B with known event A:

P(A ⋀ B)= P(B|A) P(A)

 Equating right hand side of both the equations, we will get:

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Applying Bayes' rule:

Downloaded by Sriram Kuriseti ([email protected])

P(king): probability that the card is King= 4/52= 1/13

Application of Bayes' theorem in Artificial intelligence:

Bayesian Belief Network in artificial intelligence

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

The Bayesian network has mainly two components:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

List of all events occurring in this network:

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Conditional probability table for David Calls:

Conditional probability table for Sophia Calls:

The semantics of Bayesian Network:

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

2. To understand the network as an encoding of a collection of conditional independence

Building a Bayesian Net

More formally, the following algorithm constructs a Bayesian Net:

Downloaded by Sriram Kuriseti ([email protected])

3) If the dog has bowel-troubles, it is also often left outside.

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Computing Joint Probabilities from a Bayesian Net

= P(H|D) * P(~L, D, ~O, B)

by Conditional Independence of H and L,O, and B given D

= P(H|D) P(~L | D,~O,B) P(D,~O,B) by Product Rule

= P(H|D) P(~L|~O) P(D | ~O, B) P(~O, B) by Product Rule

= P(H|D) P(~L|~O) P(D|~O, B) P(~O | B) P(B) by Product Rule

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

APPROXIMATE INFERENCE IN BAYESIAN NETWORKS

Inference by Markov chain simulation

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

FIRST-ORDER PROBABILITY MODELS

Relational probability models

Representing ignorance: Dempster–Shafer theory

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Mr. Mohammed Afzal, Asst. Professor in AIML

Downloaded by Sriram Kuriseti ([email protected])

Characteristics of Dempster Shafer Theory: