Hidden Markov Model in Automatic Speech Recognition
Hidden Markov Model in Automatic Speech Recognition
Outline
Discrete Markov process
Hidden Markov Model
Vitterbi algorithm
Forward algorithm
Parameter estimation
Type of them
a21
a13
S2
States={S1,S2,S3} P(qt=Si|qt-1=Sj)=aij
Sum(j=1..N)(aij)=1
a23
a22
a32
a31
S3
Teacher-mood-model
Situation:
Your school teacher gave three dierent types of daily homework assignments:
A: took about 5 minutes to complete
B: took about 1 hour to complete
C: took about 3 hours to complete
Your teacher did not reveal openly his mood to you daily, but you knew that
your teacher was either in a bad, neutral, or a good mood for a whole day.
Mood changes occurred only overnight.
Question: How were his moods related to the homework type assigned that
day?
Model parameters:
S - states {good, neutral, bad}
akl,- probability that state l is
followed by k
alphabet {A,B,C}
ek(x) probability that state k emit
symbol x
Monday: A
Tuesday: C
Wednesday: B
Thursday: A
Friday: C
Questions:
What did his mood curve look like most likely that week?
(Searching for the most probable path Viterbi algorithm)
What is the probability that he would assign this order of homework
assignments? (Probability of a sequence - Forward algorithm)
How do we adjust the model parameters (S,aij,ei(x)) to maximize P(O| )
(create a HMM for a given sequence set)
Algorithm
Iteratively build up matrix fk(i).
Probability of symbol sequence is sum of entries in last
column.
Type of HMMs
Till now I was speaking about HMMs with
discrete output (finite ||).
Extensions:
continuous observation probability density function
mixture of Gaussian pdfs
HMMs in ASR
Separation
Feature
extraction
Understanding
Application
HMM
decoder
HMMs in ASR
How HMM can used to classify feature sequences to known classes.
Make a HMM to each class.
By determineing the probability of a sequence to the HMMs, we can decide which
HMM could most probable generate the sequence.
HMMs in ASR
Hierarchical system of HMMs
HMM of a triphone
HMM of a triphone
HMM of a triphone
Language model
HMM Limitations
Data intensive
Computationally intensive
50 phones = 125000 possible triphones
3 states per triphone
3 Gaussian mixture for each state