c14 15bayesian Networks 2020
c14 15bayesian Networks 2020
c14 15bayesian Networks 2020
I bÑwnsonñanoÑwn.Ño :ÑwoosnÑoimqmiÑÑwyo:ÑÑb
a. him
2ⁿᵈ node
<
↳
ñngnriri :Ñwñwoni%nrEwosjñw9ni
3rd
'
node Chair
mnñ
highlight : Back -
Cloth depend on
( into Ñow )
Set CPT probability for CHAIR
Set CPT probability for BACK_CLOTH
When there is a label WET PAINT on the chair ...
When there is no label on the chair ...
When there is no label but see painted on shirt ...
When there is a label WET PAINT on the chair and see
clean back shirt ...
When there is a label WET PAINT on the chair and see
painted back shirt ...
Probabilistic Agent
sensors
?
environment
agent
actuators
I believe that the sun
will still exist tomorrow
with probability 0.999999
and that it will be a sunny
with probability 0.6
Bayesianism is a
controversial but
increasingly popular
approach of statistics
that offers many
benefits, although not
everyone is persuaded
of its validity
FYI
Bayesians Networks based on a statistical approach
presented by a mathematician, Thomas Bayes in
1763.
Introduced by Pearl (1986 )
Resembles human reasoning Ñn9vimqwanÑunÑsnwww.vef
=
-
Causal relationship =
ÑÑwmqÑwwo
now .
State of Road
Icy/ not icy
Watson Holmes
Crash/No crash Crash/No crash
Watson Crashed !
State of Road
Information Icy/ not icy
Flow
Watson Holmes
Crash/No crash Crash/No crash
But Roads are dry
Watson Holmes
Crash/No crash Crash/No crash
100-1 .
A simple Bayesian Network
Ñnww
Icy
Holmes ñiusnowÑwinÑui
P(XHolmes|XIcy): P(XWatson|XIcy):
=
P(XIcy):
Crash
?
?
Watson ÑnÑÑiun
,
Holmes grass
Wet/Dry Wet
Information Rain Sprinkler
Flow Yes/no On/Off
Holmes grass
Wet Wet
Bayesian vs. the Classical Approach'w%iÑw%n nwniismain.ioon.si
represent it ooo person aunts
Syntax:
a set of nodes, one per variable ;
void : node iintiñ ink 1 ÑI
↓
éwnᵈñTuw many into Ñngnoir
Ñwniñniñw Decision
Making
Learning Bayes Nets :B
aysian To n.ioiioqanatmn.rs networks
Parent ✗ depend on
ya
Formally, X
The current state depends
on only a finite history of
previous states.
First-order Markov Process:
P(xt|x0:xt-1) = P(xt|xt-1)
iñnowoisnrioisxoillon ( Xo ) owns : n%Ñ's ✗ t -
,
Likelihood Prior =
n !iñwn%Ñ
ñÑU match
P (e / h ) P ( h )
an hypothesis ñÑ ñiluwñ evidence
P ( h / e)
P (e)
→ he hypothesis ] =
riwwiig.in v80
knowledge www.nownt
el evidence ) input ñiornimi observe aniwñsiloiiiw
Posterior
→ =
cancer (L);
5N .
1MÑoqjw
Inference
"
"
"
Smoking history
"
°"
↓ depend
P(b1|s1)=0.25 P(l1|s1)=0.003
P(b1|s2)=0.05 -2
yes
no
P(l1|s2)=0.00005
↑
o.se
Bronchitis =
n
,
snafu's wog
Lung Cancer
Fatigue X-ray
P(f1|b1,l1)=0.75
P(x1|l1)=0.6
P(f1|b1,l2)=0.10
P(x1|l2)=0.02
P(f1|b2,l1)=0.5
P(f1|b2,l2)=0.05
P( x | b, s, l , f ) P( x | l )
Consequently the joint probability distribution can now be expressed as
go.UU.mimn.cn
cancer
you
x-ray
[ 11-0.25 )
S
Eg, in this network
B L
F X
Representing the Joint
Distribution
In general, for a network with nodes X1, X2 n then
n
P( x1 , x2 ,..., xn ) P( xi | pa( xi ))
i 1
An enormous saving can be made regarding the number of values required for
the joint distribution.
To determine the joint distribution directly for n binary variables 2n 1 values
are required.
For a BN with n binary variables and each node has at most k parents then less
than 2kn values are required.
ñ'ÑwmqÑwwo
Causality and Bayesian
naw .
Networks
Clearly not every BN describes causal relationships between the variables.
Consider the dependence between Lung Cancer, L, and the X-ray test, X. By
focusing on just these variables we might be tempted to represent them by the
following BN niki rios
lung cancer ✗ -
ray
P(x1|l1)=0.6
P(l1)=0.001 L X P(x1|l2)=0.02
:/ni
lung
:Ñw cancer in
P(l1|x1)=0.02915
P(l1|x2)=0.00041
L X P(x1)=0.02058
Smoking
on it Common Causes nÑDÑo Bronchitis ñu
Burglary Earthquake
Effect
Common minor inn
,
mind
Burglary in : earth
quan .
Alarm
X Y
F X
Example
Topology of network encodes conditional
independence assertions:
Weather Cavity
Toothache Catch
52
s
¥
→
:
9
I
see
or
J
-
=
E
O
9
A Simple Belief Network
Burglary Earthquake
causes
Intuitive meaning of arrow
Directed acyclic
Alarm
graph (DAG)
effects
Nodes are random variables
JohnCalls MaryCalls
Assigning Probabilities to Roots
P(B) P(E)
Burglary 0.001 Earthquake 0.002
Alarm
JohnCalls MaryCalls
Conditional Probability Tables
P(B) P(E)
Burglary 0.001 Earthquake 0.002
B E P(A|B,E)
0.001 0.002
T T 0.95
Alarm
0.998
T
0.999
F 0.94
F T 0.29
F F 0.001
P(B) P(E)
Burglary 0.001 Earthquake 0.002
22 B E P(A|B,E)
=6
T T 0.95
Alarm T
F
F
T
0.94
0.29
F F 0.001
I =
2
A P(J|A) A P(M|A)
JohnCalls MaryCalls
0 T 0.90
F 0.05
T 0.70
F 0.01
What the BN Means
P(B) P(E)
Burglary 0.001 Earthquake 0.002
B E P(A| )
T T 0.95
Alarm T
F
F
T
0.94
0.29
F F 0.001
P(x1,x2 n) = P(xi|Parents(Xi))
A P(J|A) A P(M|A)
JohnCalls T 0.90 MaryCalls T 0.70
F 0.05 F 0.01
Calculation of Joint Probability
P(B) P(E)
Burglary 0.001 Earthquake 0.002
B E P(A| )
P(J M A B E) T T 0.95
A A
JohnCalls T 0.90 MaryCalls T 0.70
F 0.05 F 0.01
What The BN Encodes
Burglary Earthquake
Alarm
For example, John does
not observe any burglaries
JohnCalls directly MaryCalls
J M P(B| )
T T ? Distribution conditional to
the observations made
Inference Patterns
Burglary Earthquake Burglary Earthquake
Starts
converging
Moves
Independence Relations In BN
diverging
Battery
linear
P(B) P( A ) P( B | A )
A
A B C
P(C) = ???
BN Inference
Chain:
X1 X2 Xn
A B
p(A,B,C) = p(C|A,B)p(A)p(B)
A B C Marginal Independence:
p(A,B,C) = p(A) p(B) p(C)
Examples of 3-way Bayesian Networks
A B Independent Causes:
p(A,B,C) = p(C|A,B)p(A)p(B)
A B C Markov dependence:
p(A,B,C) = p(C|B) p(B|A)p(A)
Inference Ex. 2
Cloudy
Sprinkler
Algorithm is computing Rain
not individual
probabilities, but entire tables
WetGrass
Two ideas crucial to avoiding exponential blowup:
because of the structure of the BN, some
P(W)
subexpression Pin(w
the| rjoint
, s)Pdepend
(r | c)Ponly
(s | on
c)Pa(csmall
) number
of variableR ,S ,C
By computing P(them
w | r ,once
s) and P(rcaching
| c)P(sthe
| c)result,
P(c) we
can avoid Rgenerating
,S
themC exponentially many times
P(w | r , s)fC (r , s) fC (R , S)
R ,S
Approaches to inference
Exact inference
Inference in Simple Chains
Variable elimination
Clustering / join tree algorithms
Approximate inference
Stochastic simulation / sampling methods
Markov chain Monte Carlo methods
Stochastic simulation - direct
Suppose you are given values for some subset of the
variables, G, and want to infer values for unknown
variables, U
Randomly generate a very large number of
instantiations from the BN
Generate instantiations for all variables start at root
Direct sampling:
Sample each variable in topological order,
conditioned on values of parents.
I.e., always sample from P(Xi |
parents(Xi))
Example
1. Sample from P(Cloudy). Suppose returns true.
N S ( x1 ,..., xn )
P( x1 ,..., xn )
N
Suppose N samples, n nodes. Complexity O(Nn).
Repeat:
1. Sample Cloudy, given current values of its Markov blanket:
Sprinkler = true, Rain = false. Suppose result is false. New
state:
[false, true, false, true]
That is: for all variables Xi, the probability of the value xi of Xi
appearing in a sample is equal to P(xi | e).
----------------------------------------------------
S1 S2 S3 Sn Hidden
Learning Bayesian
Networks
Mailroom
E B
Data + Inducer
R A
Prior information E B P(A | E,B)
C e b .9 .1
e b .7 .3
e b .8 .2
e b .99 .01
Mailroom
Known Structure -- Complete Data
E, B, A
<Y,N,N>
<Y,Y,Y>
<N,N,Y>
<N,Y,Y>
.
. E B
<N,Y,Y>
Inducer A E B P(A | E,B)
E B e b .9 .1
E B P(A | E,B)
e b .7 .3
e b ? ?
A
e b .8 .2
e b ? ?
e b .99 .01
e b ? ?
e b ? ?
Goal
E [M ] B [M ] A[M ] C [M ]
Mailroom
Unknown Structure -- Complete Data
E, B, A
<Y,N,N>
<Y,Y,Y>
<N,N,Y>
<N,Y,Y>
.
. E B
<N,Y,Y>
Inducer A E B P(A | E,B)
E B e b .9 .1
E B P(A | E,B)
e b .7 .3
e b ? ?
A
e b .8 .2
e b ? ?
e b .99 .01
e b ? ?
e b ? ?
Sound
Sound
Sound
Structure
Constraint based
Perform tests of conditional independence
Search for a network that is consistent with the
observed dependencies and independencies
Structure
Score based
Define a score that evaluates how well the
(in)dependencies in a structure match the observations
Search for a structure that maximizes the score
Pros & Cons
Statistically motivated
Can make compromises
Takes the structure of conditional probabilities into
account
Computationally hard
Mailroom
Heuristic Search
Define a search space:
nodes are possible structures
edges denote adjacency of structures
Traverse this space looking for high-scoring
structures
Search techniques:
Greedy hill-climbing
Best first search
Simulated Annealing
...
Mailroom
Typical operations: E
S C
D
E
S C S C
E E
D D
Exploiting DecomposabilityMailroom
in
Local Search
S C S C
E E
D D
S C S C
E E
D D
Greedy Hill-Climbing
Simplest heuristic local search
Start with a given network
empty network
best tree
a random network
At each iteration
Evaluate all possible changes
Apply change that leads to best improvement in score
Reiterate
Stop when no modification improves score
Each step requires evaluating approximately
n new changes
Greedy Hill-Climbing: Possible
Mailroom
Pitfalls
Greedy Hill-Climbing can get struck in:
Local Maxima:
All one-edge changes reduce the score
Plateaus:
Some one-edge changes leave the score unchanged
Happens because equivalent networks received the same
score and are neighbors in the search space
Both occur during structure search
Standard heuristics can escape both
Random restarts
TABU search
Mailroom
Summary
Belief update
Role of conditional independence
Belief networks
Causality ordering
Inference in BN
Stochastic Simulation
Learning BNs
Mailroom
A Bayesian Network
37 variables, 509 parameters (instead of 237) MINVOLSET
PRESS
MINOVL FIO2 VENTALV
BP
Mailroom
Population-Wide Approach
Anthrax Release Global nodes
Population-Wide Approach
Anthrax Release Global nodes
Anthrax Release
Gender
Age Decile Age Decile Gender
Respiratory Respiratory
CC CC
ED Admit ED Admit ED Admit ED Admit
from Anthrax from Other from Anthrax from Other
Respiratory CC Respiratory CC
When Admitted When Admitted
ED Admission ED Admission
Person Model (Initial Prototype) Mailroom
Anthrax Release
Female
20-30 50-60 Male
Gender
Age Decile Age Decile Gender
Respiratory Respiratory
CC CC
ED Admit ED Admit ED Admit ED Admit
Unknown
from Anthrax False from Other from Anthrax from Other
Respiratory CC Respiratory CC
When Admitted When Admitted