Lecture 11
Lecture 11
Lecture 11
Pattern Classification,
1
Chapter 1
Pattern Classification,
2
Chapter 1
• Problem Analysis
Pattern Classification,
3
Chapter 1
This is a linear classifier like
Perceptron.
Pattern Classification,
4
Chapter 1
Introduction
• The sea bass/salmon example
(a two class problem)
5
• Let the two classes are 1 and 2
– P(1) + P( 2) = 1
– State of nature (class) is a random variable
– If P(1) = P(2), we say it is of uniform priors
• The catch of salmon and sea bass is equi-probable
6
• Decision rule with only the prior information
– Decide 1 if P(1) > P(2), otherwise decide 2
• This is not a good classifier.
• We should take feature values into account !
• If x is the pattern we want to classify, then use the rule:
7
Bayes rule
• From data it might be possible for us to
estimate p( x | j ), where i = 1 or 2. These are
called class-conditional distributions.
• Also it is easy to find apriori probabilities P(1)
and P(2) . How this can be done?
• Bayes rule combines apriori probability with
class conditional distributions to find
posteriori probabilities.
8
Bayes Rule
9
p(x | j ) . P (j )
P(j | x) = ---------------------
p(x)
– Where in case of two categories
j 2
p ( x) p ( x | j ) P ( j )
j 1
Likelihood . Prior
– Posterior = ----------------------
Evidence
10
11
12
• Decision given the posterior probabilities
Therefore:
whenever we observe a particular x, the probability
of error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1
13
• Minimizing the probability of error
Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(error of Bayes decision)
14
Average error rate
Average probability of error, P(error) is :
15
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,
16
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,
17
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,
18
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,
19
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,
20
21
• But, what is the error, if we use only apriori
probabilities?
22
23
24
25
• Same error? Where is the advantage?!
26
27
• But, P(error) based on apriori probabilities
only is 0.5.
• Error based on the Bayes classifier is the lower
bound.
– Any classifier’s error is greater than or equal to
this.
• One can prove this!
28
•Consider a one dimensional two class problem. The feature used is color of fish. Color can be either white or dark P( ω 1 ) = 0.75, P( ω2 ) = 0.25,
29
Apriori Probabilities plays an
important role.
This is the knowledge about the
domain
Example
• Given height of a person we wish to classify
whether he/she is from India or Nepal.
• We assume that there are no other classes.
(Each and every person should belong to
either class “India” or to the class “Nepal”)
• For time being assume that we have only
height. (Only one feature)
Example: continued …
• Let h be the height and c be the class of a
person.
• Let the height is discretized as 2.0, 2.5, 3.0,
3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0.
• If height is 5.6, we round it to 5.5.
• We randomly took 100 people who are all
Nepalis. For each height value we counted
how many people are there.
Example: continued
•If we take randomly 100 Nepalis, their heights are as below.
•We found probabilities (these are approximate probability values!)
•These probabilities are called class conditional probabilities, i.e.,
P(h | Nepal).
•For example, P(h = 3.5 | class = Nepal) = 0.1
count 0 1 5 10 10 25 25 10 10 4 0 0 0
Probability 0 0.01 0.05 0.1 0.1 0.25 0.25 0.1 0.1 0.04 0 0 0
Class-conditional Distribution
• Class-conditional distribution for Nepalis
0.25
0.2
0.15 probability
0.1
0.05
0
2 3 4 5 6 7 8
Example: continued …
• Similarly, we took
randomly 100 persons
who are Indians and 0.25
found their respective 0.2
class-conditional
0.15
probabilities. probability
0.1
0.05
0
2 3 4 5 6 7 8
40
41
42
43
44
45
46
47
48