0% found this document useful (0 votes)
43 views

Naïve Bayes Classifier: Ke Chen

The document provides an overview of Naive Bayes classifiers. It explains that Naive Bayes is a probabilistic classifier that applies Bayes' theorem with a strong (naive) independence assumption. It makes predictions by calculating the probabilities of possible outcomes given the feature values. The document outlines the learning and classification process, including estimating probabilities from training data and making predictions by selecting the class with the highest posterior probability. It also discusses some of the advantages of Naive Bayes, such as fast training and testing, as well as limitations like its independence assumption.

Uploaded by

prabumn
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Naïve Bayes Classifier: Ke Chen

The document provides an overview of Naive Bayes classifiers. It explains that Naive Bayes is a probabilistic classifier that applies Bayes' theorem with a strong (naive) independence assumption. It makes predictions by calculating the probabilities of possible outcomes given the feature values. The document outlines the learning and classification process, including estimating probabilities from training data and making predictions by selecting the class with the highest posterior probability. It also discusses some of the advantages of Naive Bayes, such as fast training and testing, as well as limitations like its independence assumption.

Uploaded by

prabumn
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Nave Bayes Classifier

Ke Chen
http://intranet.cs.man.ac.uk/mlo/comp20411
/
Extended by Longin Jan Latecki
[email protected]
COMP20411 Machine Learning

Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Example: Play Tennis
Relevant Issues
Conclusions
COMP20411 Machine Learning

Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM

b) Model the probability of class memberships given input data


Example: multi-layered perceptron with the cross-entropy cost

c) Make a probabilistic model of data within each class


Examples: naive Bayes, model based classifiers

a) and b) are examples of discriminative classification


c) is an example of generative classification
b) and c) are both examples of probabilistic classification

COMP20411 Machine Learning

Probability Basics
Prior, conditional and joint probability

P(X )
Prior probability:

Conditional probability:
P( X1 |X2 ), P(X2 |X1 )

Joint probability:X ( X1 , X2 ), P( X ) P(X1 ,X2 )

P(X1 ,X2 ) P( X2 |X1 )P( X1 ) P( X1 |X2 )P( X2 )


Relationship:

Independence:
P( X2 |X1 ) P( X2 ), P( X1 |X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2 )

Bayesian Rule

Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence
COMP20411 Machine Learning

Example by Dieter Fox

Probabilistic Classification
Establishing a probabilistic model for classification

Discriminative model

P(C |X ) C c1 , , c L , X (X1 , , Xn )

Generative model

P( X |C ) C c1 , , c L , X (X1 , , Xn )

MAP classification rule

MAP: Maximum A Posterior

Assign x to c* ifP(C c * |X x ) P(C c |X x) c c * , c c1 , , c L

Generative classification with the MAP rule

P( X |C )P(C )
Apply Bayesian rule to convert:
P(C |X )
P( X |C )P(C )
P( X )
COMP20411 Machine Learning

Feature Histograms

P(x)
C1

C2

Slide by Stephen Marsland

Posterior Probability
P(C|x)

0
Slide by Stephen Marsland

Nave Bayes
Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
Difficulty: learning the joint probability
P( X1 , , Xn |C )

Nave Bayes classification

Making the assumption that all input attributes are


independent
P( X , X , , X |C ) P( X |X , , X ; C )P( X , , X |C )
1

P( X1 |C )P( X2 , , Xn |C )
P( X1 |C )P( X2 |C ) P( Xn |C )
*
[ PMAP
( x1 |c *classification
) P( xn |c * )]P( crule
) [ P( x1 |c ) P( xn |c )]P(c ), c c * , c c1 , , c L
COMP20411 Machine Learning

11

Nave Bayes
Nave Bayes Algorithm (for discrete input attributes)

Learning Phase: Given a training set S,


For each target value of ci (ci c1 , , c L )

P (C ci ) estimate P(C ci ) with examples in S;

For every attribute value a jk of each attribute x j ( j 1, , n; k 1, , N j )


P ( X j a jk |C ci ) estimate P( X j a jk |C ci ) with examples in S;

Output: conditional probability tables;x j ,for


Nj L
elements

X ( a1 , , an )

Test Phase: Given an unknown instance

*
*
( a |c * up
( a |c * )]to
( cassign
( a the
( a |cc*
to
tables
label
X
if
[ PLook
)

P
P
)

[
P
|
c
)

P
)]
P
(
c
),
c

c
, c c1 , , c L
1
n
1
n

COMP20411 Machine Learning

12

Example
Example: Play Tennis

COMP20411 Machine Learning

13

Example
Learning Phase
Outlook

Play=Yes Play=No

Temperature

Play=Yes

Play=No

Sunny

2/9

3/5

Hot

2/9

2/5

Overcast

4/9

0/5

Mild

4/9

2/5

Rain

3/9

2/5

Cool

3/9

1/5

Humidity

Play=Yes Play=No

Wind

Play=Yes

Play=No

High

3/9

4/5

Strong

3/9

3/5

Normal

6/9

1/5

Weak

6/9

2/5

P(Play=Yes) = 9/14

P(Play=No) = 5/14

COMP20411 Machine Learning

14

Example
Test Phase

Given a new instance,


x=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9

P(Outlook=Sunny|Play=No) = 3/5

P(Wind=Strong|Play=Yes) = 3/9

P(Wind=Strong|Play=No) = 3/5

P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5


P(Huminity=High|Play=No) = 4/5
P(Huminity=High|Play=Yes) = 3/9
P(Play=Yes) = 9/14

P(Play=No) = 5/14

MAP rule

P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x) < P(No|x), we label x to be No.


COMP20411 Machine Learning

15

Relevant Issues
Violation of Independence Assumption

For many real world tasks,


P( X1 , , Xn |C ) P( X1 |C ) P( Xn |C )

Nevertheless, nave Bayes works surprisingly well


anyway!

Zero conditional probability Problem


X j a jk , P ( X j a jk |C ci ) 0

If no example contains
attribute
( xvalue
P ( x1 |cthe
)

P
(
a
|
c
)

P
n |ci ) 0
i
jk i
In this circumstance,

n mp
For a remedy, Pconditional
estimated with
( X a |C c probabilities
) c
j

jk

during test

nm
nc : number of training examples for which X j a jk and C ci
n : number of training examples for which C ci

p : prior estimate (usually, p 1 /t for t possible values of X j )

m : weight to prior (number of " virtual" examples, m 1)


COMP20411 Machine Learning

16

Relevant Issues
Continuous-valued Input Attributes

Numberless values for an attribute

Conditional probability modeled with the normal


distribution
( X )2
1
P ( X j |C ci )

j
ji

exp
2

2 ji
2 ji

ji : mean (avearage) of attribute values X j of examples for which C ci

ji : standard deviation of attribute values X j of examples for which C ci

for X ( X1 , , Xn ), C c1 , , c L
LearningnPhase:
L
P(C ci ) i 1, , L
Output:
and
fornormal
X ( X1distributions
, , Xn )
Test Phase:
Calculate conditional probabilities with all the normal distributions
Apply the MAP rule to make a decision
COMP20411 Machine Learning

17

Conclusions
Nave Bayes based on the independence assumption

Training is very easy and fast; just requiring considering


each attribute in each class separately

Test is straightforward; just looking up tables or calculating


conditional probabilities with normal distributions

A popular generative model

Performance competitive to most of state-of-the-art


classifiers even in presence of violating independence
assumption

Many successful applications, e.g., spam mail filtering

Apart from classification, nave Bayes can do more


COMP20411 Machine Learning

18

You might also like