Ramos Intro HMMINTRODUCTIONTOMARKOVMODELS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

LECTURE 1:

INTRODUCTION TO MARKOV
MODELS
OUTLINE
 Markov model
 Hidden Markov model (HMM)

 Example: dice & coins

 Example: recognizing eating activities


MOTIVATION
MARKOV CHAIN
P

P sentence
is
next of 0.6
P P
P
P P P
P P paragraph
P 0.05
What

the word this


P
P 0.05
P line
P P
P
P
are end at
0.3

message

P
MARKOV CHAIN: WEATHER EXAMPLE
 Design a Markov Chain to predict
the weather of tomorrow using
previous information of the past
days.

 Our model has only 3 states:


𝑆 = 𝑆1 , 𝑆2 , 𝑆3 , and the name of
each state is 𝑆1 = 𝑆𝑢𝑛𝑛𝑦 ,
𝑆2 = 𝑅𝑎𝑖𝑛𝑦, 𝑆3 = 𝐶𝑙𝑜𝑢𝑑𝑦.

 To establish the transition


probabilities relationship between
states we will need to collect data.
 Assume the data produces the following transition
probabilities:

𝑃 𝑆𝑢𝑛𝑛𝑦 𝑆𝑢𝑛𝑛𝑦 = 0.8


𝑃 𝑅𝑎𝑖𝑛𝑦 𝑆𝑢𝑛𝑛𝑦 = 0.05 1
𝑃 𝐶𝑙𝑜𝑢𝑑𝑦 𝑆𝑢𝑛𝑛𝑦 = 0.15

𝑃 𝑆𝑢𝑛𝑛𝑦 𝑅𝑎𝑖𝑛𝑦 = 0.2


𝑃 𝑅𝑎𝑖𝑛𝑦 𝑅𝑎𝑖𝑛𝑦 = 0.6 1
𝑃 𝐶𝑙𝑜𝑢𝑑𝑦𝑦 𝑅𝑎𝑖𝑛𝑦 = 0.2

𝑃 𝑆𝑢𝑛𝑛𝑦 𝐶𝑙𝑜𝑢𝑑𝑦 = 0.2


𝑃 𝑅𝑎𝑖𝑛𝑦 𝐶𝑙𝑜𝑢𝑑𝑦 = 0.3 1
𝑃 𝐶𝑙𝑜𝑢𝑑𝑦 𝐶𝑙𝑜𝑢𝑑𝑦 = 0.5
 Let’s say we have a sequence: Sunny, Rainy, Cloudy,
Cloudy, Sunny, Sunny, Sunny, Rainy, ….; so, in a day
we can be in any of the three states.

 We can use the following state sequence notation: 𝑞1 ,


𝑞2 , 𝑞3 , 𝑞4 , 𝑞5 , … . ., where 𝑞𝑖 𝜖 {𝑆𝑢𝑛𝑛𝑦, 𝑅𝑎𝑖𝑛𝑦, 𝐶𝑙𝑜𝑢𝑑𝑦}.

 In order to compute the probability of tomorrow’s


weather we can use the Markov property:

𝑃 𝑞1 , … , 𝑞𝑛 = 𝑃(𝑞𝑖 |𝑞𝑖−1 )
𝑖=1
 Exercise 1: Given that today is Sunny, what’s the probability that
tomorrow is Sunny and the next day Rainy?

𝑃 𝑞2 , 𝑞3 𝑞1 = 𝑃 𝑞2 𝑞1 𝑃 𝑞3 𝑞1 , 𝑞2

= 𝑃 𝑞2 𝑞1 𝑃 𝑞3 𝑞2
= 𝑃 𝑆𝑢𝑛𝑛𝑦 𝑆𝑢𝑛𝑛𝑦 𝑃 𝑅𝑎𝑖𝑛𝑦 𝑆𝑢𝑛𝑛𝑦
= 0.8 (0.05)
= 0.04
 Exercise 2: Assume that yesterday’s weather was Rainy, and today is
Cloudy, what is the probability that tomorrow will be Sunny?

𝑃(𝑞3 |𝑞1 , 𝑞2 ) = 𝑃 𝑞3 𝑞2

= 𝑃 𝑆𝑢𝑛𝑛𝑦 𝐶𝑙𝑜𝑢𝑑𝑦

= 0.2
WHAT IS A MARKOV MODEL?
 A Markov Model is a stochastic model which models
temporal or sequential data, i.e., data that are ordered.

 It provides a way to model the dependencies of current


information (e.g. weather) with previous information.

 It is composed of states, transition scheme between states,


and emission of outputs (discrete or continuous).

 Several goals can be accomplished by using Markov models:


 Learn statistics of sequential data.
 Do prediction or estimation.
 Recognize patterns.
WHAT IS A HIDDEN MARKOV MODEL
(HMM)?
 A Hidden Markov Model, is a stochastic model where
the states of the model are hidden. Each state can emit
an output which is observed.

 Imagine: You were locked in a room for several days


and you were asked about the weather outside. The
only piece of evidence you have is whether the person
who comes into the room bringing your daily meal is
carrying an umbrella or not.
 What is hidden? Sunny, Rainy, Cloudy
 What can you observe? Umbrella or Not
MARKOV CHAIN VS. HMM
 Markov Chain:
 HMM:

U = Umbrella
NU = Not Umbrella
 Let’s assume that 𝑡 days had passed. Therefore, we
will have an observation sequence O = {𝑜1 , … , 𝑜𝑡 } ,
where 𝑜𝑖 𝜖 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎, 𝑁𝑜𝑡 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎 .

 Each observation comes from an unknown state.


Therefore, we will also have an unknown sequence
𝑄 = 𝑞1 , … , 𝑞𝑡 , where 𝑞𝑖 𝜖 𝑆𝑢𝑛𝑛𝑦, 𝑅𝑎𝑖𝑛𝑦, 𝐶𝑙𝑜𝑢𝑑𝑦 .

 We would like to know: 𝑃(𝑞1 , . . , 𝑞𝑡 |𝑜1 , … , 𝑜𝑡 ).


HMM MATHEMATICAL MODEL
 From Bayes’ Theorem, we can obtain the probability
for a particular day as:

𝑃 𝑜𝑖 𝑞𝑖 𝑃(𝑞𝑖 )
𝑃 𝑞𝑖 𝑜𝑖 =
𝑃(𝑜𝑖 )
For a sequence of length 𝑡:

𝑃 𝑜1 , … , 𝑜𝑡 𝑞1 , … , 𝑞𝑡 𝑃(𝑞1 , … , 𝑞𝑡 )
𝑃 𝑞1 , … , 𝑞𝑡 𝑜1 , … , 𝑜𝑡 =
𝑃(𝑜1 , … , 𝑜𝑡 )
 From the Markov property:

𝑃 𝑞1 , … , 𝑞𝑡 = 𝑃(𝑞𝑖 |𝑞𝑖−1 )
𝑖=1

 Independent observations assumption:

𝑃 𝑜1 , … , 𝑜𝑡 𝑞1 , … , 𝑞𝑡 = 𝑃(𝑜𝑖 |𝑞𝑖 )
𝑖=1
 Thus:

𝑡 𝑡

𝑃 𝑞1 , … , 𝑞𝑡 𝑜1 , … , 𝑜𝑡 ∝ 𝑃(𝑜𝑖 |𝑞𝑖 ) 𝑃(𝑞𝑖 |𝑞𝑖−1 )


𝑖=1 𝑖=1

HMM Parameters:
• Transition probabilities 𝑃(𝑞𝑖 |𝑞𝑖−1 )
• Emission probabilities 𝑃(𝑜𝑖 |𝑞𝑖 )
• Initial state probabilities 𝑃(𝑞𝑖 )
HMM PARAMETERS
 A HMM is governed by the following parameters:

λ = {𝐴, 𝐵, 𝜋}
 State-transition probability matrix 𝐴
 Emission/Observation/State Conditional Output
probabilities 𝐵
 Initial (prior) state probabilities 𝜋

 Determine the fixed number of states (𝑁):

𝑆 = 𝑠1 , … , 𝑠𝑁
 State-transition probability matrix:

𝑎11 𝑎12 . . . 𝑎1𝑁 𝑁


= 1 (Each row/Outgoing arrows)
𝑗=1 𝑎𝑖𝑗
𝑎21 𝑎23 . . . 𝑎2𝑁
. . . .
A= . . . . 𝑎𝑖𝑗 = 𝑃(𝑞𝑡 = 𝑠𝑗 |𝑞𝑡−1 = 𝑠𝑖 ), 1 ≤ 𝑖, 𝑗 ≤ 𝑁
. . . .
𝑎𝑁1 𝑎𝑁2 . . . 𝑎𝑁𝑁 𝑎𝑖𝑗 ≥ 0

𝑎𝑖𝑗 → 𝑇𝑟𝑎𝑛𝑠𝑖𝑠𝑖𝑡𝑜𝑛 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑓𝑟𝑜𝑚 𝑠𝑡𝑎𝑡𝑒 𝑠𝑖 𝑡𝑜 𝑠𝑗

𝑎𝑖𝑗

𝑠𝑖 𝑠𝑗
 Emission probabilities: A state will generate an
observation (output), but a decision must be taken
according on how to model the output, i.e., as discrete
or continuous.

 Discrete outputs are modeled using pmfs.

 Continuous outputs are modeled using pdfs.


 Discrete Emission Probabilities:

Observation Set: 𝑉 = {𝑣1 , … , 𝑣𝑊 }


𝑏1 𝑣1 𝑠𝑖

𝑏𝑖 𝑣𝑘 = 𝑃 𝑜𝑡 = 𝑣𝑘 𝑞𝑡 = 𝑠𝑖 , 1≤𝑘≤𝑊
𝑏1 𝑣2 … 𝑏1 𝑣𝑊
𝑣1
𝑏1 (𝑣1 ) 𝑏1 (𝑣2 ) . . . 𝑏1 (𝑣𝑊 )
𝑏2 (𝑣1 ) 𝑏2 (𝑣2 ) . . . 𝑏2 (𝑣𝑊 ) 𝑣2 𝑣𝑊
. . .
𝐵= .
. . .
. . . .
𝑏𝑁 (𝑣1 )𝑏𝑁 (𝑣2 ) . . . 𝑏𝑁 (𝑣𝑊 )
 Initial (prior) probabilities: these are the probabilities
of starting the observation sequence in state 𝑞𝑖 .

𝜋1
𝜋2
. 𝜋𝑖 = 𝑃 𝑞1 = 𝑠𝑖 , 1≤𝑖≤𝑁
𝜋= .
.
𝑁
𝜋𝑁
𝜋𝑖 = 1
𝑖=1
HMM EXAMPLE: COINS & DICE

𝑃 𝐻 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.9 𝑃 𝐻 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.95


𝑃 𝑇 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.1 𝑃 𝑇 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.05

𝑂𝑢𝑡𝑝𝑢𝑡𝑠 = {1, 2, 3, 4, 5, 6} 𝑂𝑢𝑡𝑝𝑢𝑡𝑠 = {1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6}

http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html
HMM EXAMPLE: COINS & DICE
𝑃 𝐻 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.9 𝑃 𝑇 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.1 𝑃 𝐻 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.95

State2
State1
Green
Red Die
Die
(6 sides)
(12 sides)

1
𝑃 𝑇 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.05
2

3 1
6
4 5 6 1
1 5
1 4
1 1 1 2 3

0.9 0.1 1
𝐴= 𝜋=
0.05 0.95 0
http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html
0.2
HMM EXAMPLE: COINS & DICE
0.18

0.16

0.14

0.12 1 1 1 1 1 1
𝑏1 𝑜𝑡 ={ , , , , , }
Probaility

0.1
6 6 6 6 6 6
0.08

0.06

0.04

0.02

0
0 1 2 3 4 5 6 7
Red Die Outcome

0.6
7 1 1 1 1 1
𝑏2 𝑜𝑡 ={ , , , , , }
0.5
12 12 12 12 12 12
0.4
Probaility

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7
Green Die Outcome
HMM EXAMPLE: COINS & DICE
𝑃 𝐻 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.9 𝑃 𝑇 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.1 𝑃 𝐻 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.95

State2
State1
Green
Red Die
Die
(6 sides)
(12 sides)

1
𝑃 𝑇 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.05
2

3 1
6
4 5 6 1
1 5
1 4
1 1 1 2 3

1 1 1 1 1 1
0.9 0.1 1 6 6 6 6 6
𝐴= 𝜋=
0 𝐵= 6
0.05 0.95 7 1 1 1 1 1
http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html 12 12 12 12 12 12
HMM TO CLASSIFY WRIST MOTIONS
RELATED TO EATING ACTIVITIES

273 Participants

Wrist Motion:

Rest Bite Drink ?


THE “LANGUAGE”
DATA

Word p006 p098 p215 Total

Rest 24 44 21 87

Utensiling 21 37 16 74

Bite 29 44 18 91

Drink 5 15 4 24
DATA SEQUENCE:
Training Data

p006 p098 p215

drink

utensiling
States

bite

rest

50 100 150 200 250


Sequence
A, 𝜋:
0.32
0.21 0.33
𝜋=
0.26 0.16 0.26
0.72 0.09
State 1 State 2
GMM
Rest Bite
GMM 0.42

0.11
0.24 0.13 0.04
0.42
0.50

0.00 Sate 4
0.08
Sate 3
GMM Utensiling Drink

0.38 GMM

0.04 0.29
WHAT CAN WE DO NEXT?

 State Sequence Decoding (Viterbi Algorithm):


Given a HMM we can find the best single state
sequence (path) Q = 𝑞1 , … , 𝑞𝑇 that best explains a
known observation sequence 𝑂 = 𝑜1 , … , 𝑜𝑇 .

 Observation Sequence Evaluation (Forward-


Backward Algorithm): Evaluate a sequence of
observations 𝑂 = 𝑜1 , … , 𝑜𝑇 given several alternative
HMMs, and determine which one best recognizes the
observation sequence (classification).
REFERENCES
 Rabiner, L.R.; , "A tutorial on hidden Markov models and selected applications in speech
recognition," Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989
 John R. Deller, John, and John H. L. Hansen. “Discrete-Time Processing of Speech Signals”. Prentice Hall, New
Jersey, 1987.
 Barbara Resch (modified Erhard and Car Line Rank and Mathew Magimai-doss); “Hidden Markov Models A
Tutorial for the Course Computational Intelligence.”
 Henry Stark and John W. Woods. “Probability and Random Processes with Applications to Signal Processing
(3rd Edition).” Prentice Hall, 3 edition, August 2001.
 HTKBook: http://htk.eng.cam.ac.uk/docs/docs.shtml

You might also like