Lecture 11

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Bayesian Decision Theory

Primary source of reference: Pattern Classification – Duda


and Hart
An Example
• “Sorting incoming Fish on a conveyor
according to species using optical sensing”

Sea bass (Class 1)


Species
Salmon (Class 2)

Pattern Classification,
1
Chapter 1
Pattern Classification,
2
Chapter 1
• Problem Analysis

– Set up a camera and take some sample images to


extract features like

• Length of the fish


• Lightness (based on the gray level)
• Width of the fish

Pattern Classification,
3
Chapter 1
This is a linear classifier like
Perceptron.
Pattern Classification,
4
Chapter 1
Introduction
• The sea bass/salmon example
(a two class problem)

 For example if we randomly catch 100 fishes and


out of this if 75 are sea bass and 25 are salmon.
 Let the rule, in this case is: For any fish say its class
is sea bass.
 What is the error rate of this rule?
 This information which is independent of feature
values is called apriori knowledge.

5
• Let the two classes are 1 and 2
– P(1) + P( 2) = 1
– State of nature (class) is a random variable
– If P(1) = P(2), we say it is of uniform priors
• The catch of salmon and sea bass is equi-probable

6
• Decision rule with only the prior information
– Decide 1 if P(1) > P(2), otherwise decide 2
• This is not a good classifier.
• We should take feature values into account !
• If x is the pattern we want to classify, then use the rule:

If P(1 | x) > P(2 | x) then assign class 1


Else assign class 2

• P(1 | x) is called posteriori probability of class 1 given that


the pattern is x.

7
Bayes rule
• From data it might be possible for us to
estimate p( x | j ), where i = 1 or 2. These are
called class-conditional distributions.
• Also it is easy to find apriori probabilities P(1)
and P(2) . How this can be done?
• Bayes rule combines apriori probability with
class conditional distributions to find
posteriori probabilities.

8
Bayes Rule

P(A, B) P(A|B) * P(B)


P(B|A) = ----------- = ----------------
P(A) P(A)

This is Bayes Rule

Bayes, Thomas (1763) An essay


towards solving a problem in the doctrine
of chances. Philosophical Transactions
of the Royal Society of London, 53:370-
418

9
p(x | j ) . P (j )
P(j | x) = ---------------------
p(x)
– Where in case of two categories
j 2
p ( x)   p ( x |  j ) P ( j )
j 1

Likelihood . Prior
– Posterior = ----------------------
Evidence
10
11
12
• Decision given the posterior probabilities

X is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

Therefore:
whenever we observe a particular x, the probability
of error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1

13
• Minimizing the probability of error

• Decide 1 if P(1 | x) > P(2 | x);


otherwise decide 2

Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(error of Bayes decision)

14
Average error rate
Average probability of error, P(error) is :

P(error) =  P(error | x) p( x)dx


This is the expected value of P(error|x) w.r.t. x ,
i.e., Ex[P(error | x)]

15
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

16
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

17
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

18
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

19
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

20
21
• But, what is the error, if we use only apriori
probabilities?

22
23
24
25
• Same error? Where is the advantage?!

26
27
• But, P(error) based on apriori probabilities
only is 0.5.
• Error based on the Bayes classifier is the lower
bound.
– Any classifier’s error is greater than or equal to
this.
• One can prove this!

28
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,

• Can you solve this?

29
Apriori Probabilities plays an
important role.
This is the knowledge about the
domain
Example
• Given height of a person we wish to classify
whether he/she is from India or Nepal.
• We assume that there are no other classes.
(Each and every person should belong to
either class “India” or to the class “Nepal”)
• For time being assume that we have only
height. (Only one feature)
Example: continued …
• Let h be the height and c be the class of a
person.
• Let the height is discretized as 2.0, 2.5, 3.0,
3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0.
• If height is 5.6, we round it to 5.5.
• We randomly took 100 people who are all
Nepalis. For each height value we counted
how many people are there.
Example: continued
•If we take randomly 100 Nepalis, their heights are as below.
•We found probabilities (these are approximate probability values!)
•These probabilities are called class conditional probabilities, i.e.,
P(h | Nepal).
•For example, P(h = 3.5 | class = Nepal) = 0.1

Height 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

count 0 1 5 10 10 25 25 10 10 4 0 0 0

Probability 0 0.01 0.05 0.1 0.1 0.25 0.25 0.1 0.1 0.04 0 0 0
Class-conditional Distribution
• Class-conditional distribution for Nepalis

0.25

0.2

0.15 probability

0.1

0.05

0
2 3 4 5 6 7 8
Example: continued …
• Similarly, we took
randomly 100 persons
who are Indians and 0.25
found their respective 0.2
class-conditional
0.15
probabilities. probability

0.1

0.05

0
2 3 4 5 6 7 8

Class-conditional Distribution for the class “India”


Example: continued …
• So you took these probabilities to IIIT Sri City.
• You are asked to classify a student whose
height is 4.5.
• You searched the tables and found that P(4.5
| “Nepal”) = 0.25 and P(4.5 | “India”) = 0.1.
• So, you declared the person is a Nepali.
• …. Somewhere ….. Some thing is wrong …!
Example: continued …
• The security-person at the Gate who is watching you told
in a surprise tone… “Sir, don’t you know that in our
college we have only Indians and there are no Nepalis”.
• This is what is called as Prior knowledge.
• If you randomly take 100 people, if 50 of them are
Indians and 50 of them are Nepalis then the rule you
applied is correct.
– In IIITS, if you randomly take 100 students, all of them will be
Indians… So, this rule is incorrect!!
Example: continued …
• Actually you need to findout
P(Nepal | height = 4.5) and
P(India | height = 4.5)
and accordingly you need to classify.
• This is called as Posterior Probability.
Posterior Probability: Bayes Rule
• P(class = Nepal | height = 4.5)

P(height = 4.5 | class = Nepal) P(Nepal)


= --------------------------------------------------------
P(height = 4.5)

• Here, P(Nepal) is the Prior Probability


RELATIONSHIP BETWEEN K-NNC
AND THE BAYES CLASSIFIER

40
41
42
43
44
45
46
47
48

You might also like