Expectation Maximization Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Implementation of EM algorithm in HMM

training

Jens Lagergren

April 22, 2009


EM Algorithms: An expectation-maximization (EM) algorithm is used in statis-
tics for finding maximum likelihood estimates of parameters in probabilistic
models, where the model depends on hidden variables. EM is a 2 step iterative
method. In the first step(E-step) it computes an expectation of the log likelihood
with respect to the current estimate of the distribution for the hidden variables.
In second step (M-step), it computes the parameters which maximize the ex-
pected log likelihood. In each cycle, it uses the aproximations from previous step
to aproximate the solution progressively[wikipedia.com]. In BioInformatics, EM
is commonly used to generate HMM’s from selected genetic sequences.

Training Hidden Markov Models using EM


HMM profiles are usually trained with a set of sequences that are known to
belong to a single family. Once the HMM has been trained and optimized to
identify these known members, it is then used to clasify unknown genes. Namely,
this requires us to construct a HMM, M , that generates a collection of sequences
F with the maximum probability:
Y
max Pr[x|M ] (1)
HMMsM
x∈F

But this can be computationally hard. Therefore we break down the problem
into following sub-problems
• Finding the right structure, i.e. determining the number of layers. (We
will not consider this sub-problem at the moment)
• Finding optimal parameters for a given structure.

To solve the latter problem we now let


• Aππ0 be the expected number of π → π 0 transitions.
• Aπ be the expected number of visits to π.

• Gπ,σ be the expected number of times σ is generated from state π.


• θ be the set of parameters(hidden variables to be optimized) composed of:
– eπ (σ) is the probability to generate the symbol σ in state π.
– aππ0 is the transition probability from state π to π 0 .
We also continue with the notation X i := x1 , . . . , xi and Πi = π1 , . . . , πi , from
previous notes.

1
Expectation Maximization
We will use the EM algorithm to optimize the parameters to create the best
fitting HMM from a given set of sequences F . From the initial HMM M and
the initial parameter set θ we can generate a new optimized parameter set: θ0 ,
by iterating this procedure the aproximation will eventually converge with the
local maximum. We calculate this new set θ0 as follows
X
E[Aππ0 | x, θ]
x∈F
a0ππ0 = X
E[Aπ | X, θ]
x∈F
X
E[Gπ,σ | X, θ]
x∈F
e0π (σ) = X
E[Aπ | X, θ]
x∈F

We can compute each nominator and denominator separately:


1. E[Aπ |X, θ] (denominator)
2. E[Gπ,σ |X, θ] (nominator for e0π (σ))
3. E[Aππ0 |X, θ] (nominator for a0ππ0 )

1. Calculating the denominator E[Aπ |X, θ]:


Notice that: X
E[Aπ |X, θ] = E[Aππ0 |X, θ]
π0

Therefore E[Aπ |X, θ] is easily computed once E[Aπ,π0 |X, θ] is resolved in point 3.

2. Calculating the nominator:E[Gπ,σ |X, θ]

X
E[Gπ,σ |X, θ] = P r[πi = π|X, θ]
i,xi =σ
X P r[πi = π, X|θ]
=
i,x =σ
P r[x|θ]
i

X X P r[πi−1 = π 0 , πi = π, X|θ]
=
i,x =σ 0
P r[x|θ]
i π

3. Calculating E[Aππ0 |X, θ]

X
E[Aππ0 |X, θ] = Pr[πi = π, πi+1 = π 0 |X, θ]
i
P
i Pr[πi = π, πi+1 = π 0 , X|θ]
=
Pr[X|θ]

2
so it is enough to be able to compute Pr[πi = π, πi+1 = π 0 , X|θ]

Pr[πi = π, πi+1 = π 0 , X|θ]


= Pr[πi = π, πi+1 = π 0 , X|X i , πi = π, θ]Pr[X i , πi = π|θ]
= Pr[xi+1 , . . . , xn , πi+1 = π 0 |πi = π, θ] Pr[X i , πi = π|θ]
| {z }| {z }
The Markov property! fπ (i)

= Pr[xi+1 , . . . , xn |πi+1 = π , πi = π, θ] Pr[πi+1 = π 0 |πi = π, θ] fπ (i)


0
| {z }
aππ0

= Pr[xi+2 , . . . , xn |πi+1 = π , θ] Pr[xi+1 |πi+1 = π 0 ] aππ0 fπ (i)


0
| {z }| {z }
bπ (i+1) eπ0 (xi+1 )

= bπ (i + 1)eπ0 (xi+1 )aππ0 fπ (i)

... where
1. bπ (i + 1) is the ”backward” variable defined as:

bπ (i) = Pr[xi+1 , . . . , xn |πi = π, θ]

which can be computed using dynamic programming in a similar way as fπ (i)


was computed in lecture 5.
2. fπ (i) is the ”forward” variable defined in lecture 5.
3. eπ0 (xi+1 ) and aππ0 were already calculated in the last cycle.

Stopping the iteration


When to stop? Notice that for θ0 given by a0ππ0 and e0π (σ) either
• θ0 are the locally optimal parameters, in which case you may stop the
algorithm.
Q Q
• or x∈F Pr[x|θ0 ] > x∈F [x|θ], which means a solution has not been
reached yet.

Time analysis of EM alorithm in HMM Training


Problem
A full time analysis of the complete EM algorithm would need to analyze the
number of iterations required for convergence which may vary in different cases,
nevertheless one may estimate the time required to calculate some of its com-
ponents, Ie:
Claim 1. maxπ1 ,...,πn Pr[x1 , . . . , xn , π0 , . . . , πn ] can be computed in time O(|Q|2 l).

3
Proof: Let

vπ (i) = max Pr[x1 , . . . , xi , π0 , . . . , πi−1 , πi = π]


π1 ,...,πi−1

and ½
1 when π = qstart
vπ (0) =
0 otherwise.
Then

vπ (i) = max Pr[x1 , . . . , xi , π0 , . . . , πi−1 , πi = π]


π1 ,...,πi−1

= max max Pr[X i , Πi−2 , πi−1 = π 0 , πi = π]


0 π π1 ,...,πi−1

= max max Pr[X i , Πi−2 , πi−1 = π 0 , πi = π|X i−1 , Πi−2 , π0 , πi−1 = π 0 ]


0 π π1 ,...,πi−1

Pr[X i−1 , Πi−2 , π0 , πi−1 = π 0 ]


= max max Pr[xi , πi = π|πi−1 = πi−1 ]
0 π π1 ,...,πi−1 | {z }
The Markov property!
i−1 i−2
Pr[X , Π , π0 , πi−1 = π 0 ]
= max max Pr[xi , πi = π]Pr[πi = π|πi−1 = πi−1 ]
0 π π1 ,...,πi−1

Pr[X i−1 , Πi−2 , π0 , πi−1 = π 0 ]


 
 
 i−1 i−2 0 
= Pr[xi , πi = π] max  Pr[π i = π|π i−1 = π i−1 ] max Pr[X , Π , π 0 , π i−1 = π ] 
| {z } π 0 | {z } π1 ,...,πi−1 
eπ (xi ) aπ 0 π | {z }
vπ0 (i−1)

= eπ (xi ) max
0
a π0 π v (i − 1)
π0
π

That is,

vπ (i) = max
0
vπ0 (i − 1)aπ0 π eπ (xi )
π

Since there are |Q|n subproblems Vπ (i) and each can be computed in time O(|Q|)
this gives a recursion that can be computed in time O(|Q|2 n).

You might also like