Hidden Semi Markov Models Theory Algorithms and Applications 1St Edition Yu Full Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

Hidden Semi-Markov models : theory,

algorithms and applications 1st Edition


Yu
Visit to download the full and correct content document:
https://ebookmass.com/product/hidden-semi-markov-models-theory-algorithms-and-a
pplications-1st-edition-yu/
Hidden Semi-Markov Models
Hidden Semi-Markov Models
Theory, Algorithms and Applications

Shun-Zheng Yu

AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK


OXFORD • PARIS • SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Elsevier
Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
225 Wyman Street, Waltham, MA 02451, USA
Copyright r 2016 Elsevier Inc. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our arrangements
with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency,
can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and
experience broaden our understanding, changes in research methods, professional practices, or
medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in
evaluating and using any information, methods, compounds, or experiments described herein. In
using such information or methods they should be mindful of their own safety and the safety of
others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage to persons or property as a matter of products
liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
ISBN: 978-0-12-802767-7

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress

For information on all Elsevier publications


visit our website at http://store.elsevier.com/
PREFACE

A hidden semi-Markov model (HSMM) is a statistical model. In this


model, an observation sequence is assumed to be governed by an
underlying semi-Markov process with unobserved (hidden) states.
Each hidden state has a generally distributed duration, which is associ-
ated with the number of observations produced while in the state, and
a probability distribution over the possible observations.

Based on this model, the model parameters can be estimated/


updated, the predicted, filtered, and smoothed probabilities of partial
observation sequence can be determined, goodness of the observation
sequence fitting to the model can be evaluated, and the best state
sequence of the underlying semi-Markov process can be found.
Due to those capabilities of the HSMM, it becomes one of the most
important models in the area of artificial intelligence/machine learning.
Since the HSMM was initially introduced in 1980 for machine recogni-
tion of speech, it has been applied in more than forty scientific and
engineering areas with thousands of published papers, such as speech
recognition/synthesis, human activity recognition/prediction, network
traffic characterization/anomaly detection, fMRI/EEG/ECG signal
analysis, equipment prognosis/diagnosis, etc.
Since the first HSMM was introduced in 1980, three other basic
HSMMs and several variants of them have been proposed in the litera-
ture, with various definitions of duration distributions and observation
distributions. Those models have different expressions, algorithms,
computational complexities, and applicable areas, without explicitly
interchangeable forms. A unified definition, in-depth treatment and
foundational approach of the HSMMs are in strong demand to
explore the general issues and theories behind them.

However, in contrast to a large number of published papers that


are related to HSMMs, there are only a few review articles/chapters on
HSMMs, and none of them aims at filling the demand. Besides, all
existing review articles/chapters were published several years ago. New
viii Preface

developments and emerging topics that have surfaced in this field need
to be summarized.

Therefore, this book is intended to include the models, theory,


methods, applications, and the latest information on development in
this field. In summary, this book will provide:

• a unified definition, in-depth treatment and foundational approach


of the HSMMs;
• a survey on the latest development and emerging topics in this field;
• examples helpful for the general reader, teachers and students in
computer science, and engineering to understand the topics;
• a brief description of applications in various areas;
• an extensive list of references to the HSMMs.

For these purposes, this book presents nine chapters in three parts.
In the first part, this book defines a unified model of HSMMs, and dis-
cusses the issues related to the general HSMM, which include:

1. the forward backward algorithms that are the fundamental algo-


rithms of HSMM, for evaluating the joint probabilities of partial
observation sequence;
2. computation of the predicted/filtered/smoothed probabilities, expec-
tations, and the likelihood function of observations, which are
necessary for inference in HSMM;
3. the maximum a posteriori estimation of states and the estimation of
best state sequence by Viterbi HSMM algorithm;
4. the maximum-likelihood estimation, training and online update of
model parameters; proof of the re-estimation algorithms by the EM
algorithm;
5. practical issues in the implementation of the forward backward
algorithms.
By introducing certain assumptions and some constraints on the
state transitions, the general HSMM becomes the conventional
HSMMs, including explicit duration HMM, variable transition HMM,
and residual time HMM. Those conventional models have different
capability in modeling applications, with different computational com-
plexity and memory requirement involved in the forward backward
algorithms and the model estimation.
Preface ix

In the second part, this book discusses the state duration distribu-
tions and the observation distributions, which can be nonparametric or
parametric depending on the specific preference of the applications.

Among the parametric distributions, the most popular ones are the
exponential family distributions, such as Poisson, exponential,
Gaussian, and gamma. A mixture of Gaussian distributions is also
widely used to express complex distributions.
Other than the exponential family and the mixed distributions, the
Coxian distribution of state duration can represent any discrete proba-
bility density function, and the underlying series parallel network also
reveals the structure of different HSMMs.

A multispace probability distribution is applied to express a compo-


sition of different dimensional observation spaces, or a mixture of con-
tinuous and discrete observations. A segmental model of observation
sequence is used to describe parametric trajectories that change over
time. An event sequence model is used to model and handle an obser-
vation sequence with missed observations.
In the third part, this book discusses variants and applications of
HSMMs. Among the variants of HSMMs, a switching HSMM allows
the model parameters to be changed in different time periods. An
adaptive factor HSMM allows the model parameters to be a function
of time. A context-dependent HSMM lets the model parameters be
determined by a given series of contextual factors. A multichannel
HSMM describes multiple interacting processes. A signal model of
HSMM uses an equivalent form to express an HSMM.

There usually exists a class of HSMMs that are specified for the
applications in an area. For example, in the area of speech synthesis,
speech features (observations to be obtained), instead of the model
parameters, are to be determined. In the area of human activity recog-
nition, unobserved activity (hidden state) is to be estimated. In the
area of network traffic characterization/anomaly detection, perfor-
mance/health of the entire network is to be evaluated. In the area of
fMRI/EEG/ECG signal analysis, neural activation is to be detected.
ACKNOWLEDGMENTS

I would like to thank Dr Yi XIE, Bai-Chao LI, and Jian-Zheng LUO,


who collected a lot of papers that are related to HSMMs, and sorted
them based on the relevancy to the applicable theories, algorithms,
and applications. Their work is instrumental for me to complete the
book in time. I also want to express my gratitude toward the reviewers
who carefully read the draft and provide me with many valuable com-
ments and suggestions. Dr Yi XIE, Bai-Chao LI, Wei-Tao WU,
Qin-Liang LIN, Xiao-Fan CHEN, Yan LIU, Jian KUANG, and
Guang-Rui WU proofread the chapters. Without their tremendous
effort and help, it would have been extremely difficult for me to finish
this book as it is now.
CHAPTER
Introduction
1
This chapter reviews some topics that are closely related to hidden
semi-Markov models, and introduces their concepts and brief history.

1.1 MARKOV RENEWAL PROCESS AND SEMI-MARKOV


PROCESS
In this chapter, we briefly review the Markov renewal process and
semi-Markov process, as well as generalized semi-Markov process
and discrete-time semi-Markov process.

1.1.1 Markov Renewal Process


A renewal process is a generalization of a Poisson process that allows
arbitrary holding times. Its applications include such as planning for
replacing worn-out machinery in a factory. A Markov renewal process
is a generalization of a renewal process that the sequence of holding
times is not independent and identically distributed. Their distributions
depend on the states in a Markov chain. The Markov renewal processes
were studied by Pyke (1961a, 1961b) in 1960s. They are applied in
M/G/1 queuing systems, machine repair problem, etc. (Cinlar, 1975).
Denote S as a state space and Xn AS, n 5 0, 1, 2, . . ., as states in the
Markov chain. Let ðXn ; Tn Þ be a sequence of random variables, where
Tn are the jump times of the states. The inter-arrival times of the states
are τ n 5 Tn 2 Tn21 . If
P½τ n11 # τ; Xn11 5 jjðX0 ; T0 Þ; ðX1 ; T1 Þ; . . .; ðXn 5 i; Tn Þ
5 P½τ n11 # τ; Xn11 5 jjXn 5 i;
for any n $ 0, τ $ 0, i; jAS, then the sequence ðXn ; Tn Þ is called a
Markov renewal process. In other words, in a Markov renewal
process, the next state Xn11 5 j and the inter-arrival time τ n11 to the
next state is dependent on the current state Xn 5 i and independent of
Hidden Semi-Markov Models. DOI: http://dx.doi.org/10.1016/B978-0-12-802767-7.00001-2
© 2016 Elsevier Inc. All rights reserved.
2 Hidden Semi-Markov Models

the historical states X0 ; X1 ; . . .; Xn21 and the jump times T1 ; . . .; Tn .


Define the state transition probabilities by
Hij ðτÞ 5 P½τ n11 # τ; Xn11 5 jjXn 5 i
with Hij ð0Þ 5 0, that is, at any time epoch multiple transitions are not
allowed. Let
NðtÞ 5 maxfn : Tn # tg
count the number of renewals in the interval ½0; t. Then NðtÞ , N for
any given t $ 0. ðNðtÞÞt $ 0 is called a Markov renewal counting
process.

1.1.2 Semi-Markov Process


A semi-Markov process is equivalent to a Markov renewal process in
many aspects, except that a state is defined for every given time
in the semi-Markov process, not just at the jump times. Therefore, the
semi-Markov process is an actual stochastic process that evolves over
time. Semi-Markov processes were introduced by Levy (1954) and
Smith (1955) in 1950s and are applied in queuing theory and reliability
theory.
For an actual stochastic process that evolves over time, a state must
be defined for every given time. Therefore, the state St at time t is defined
by St 5 Xn for tA½Tn ; Tn11 Þ. The process ðSt Þt $ 0 is thus called a semi-
Markov process. In this process, the times 0 5 T0 , T1 , ? , Tn , ?
are the jump times of ðSt Þt $ 0 , and τ n 5 Tn 2 Tn21 are the sojourn
times in the states. Every transition from a state to the next state is
instantaneously made at the jump times.
For a time-homogeneous semi-Markov process, the transition
density functions are
hij ðτÞdτ  P½τ # τ n11 , τ 1 dτ; Xn11 5 jjXn 5 i;
where hij ðτÞ is independent of the jumping time Tn . It is the probability
density function that after having entered state i at time zero the process
transits to state j in between time τ and τ 1 dτ. They must satisfy
X ðN
hij ðτÞdτ 5 1;
jAS 0
Introduction 3

for all iAS. That is, state i must transit to another state in the
time ½0; NÞ.

If the number of jumps in the time interval ½0; T is NðTÞ 5 n,


then the sample path ðst ; tA½0; TÞ is equivalent to the sample path
 P 
x0 , τ 1 , x1 , . . ., τ n , xn , T 2 nk51 τ k with probability 1. Then the joint
distribution of the process ðst Þ0 # t # T is
" #
X
n
P x0 ; τ 01 # τ 1 ; x1 ; . . .; τ 0n # τ n ; xn ; T 2 τ 0k jNðTÞ 5 n
k51
ð τ1 ð τn  X
n ! n
0
5 P½x0   ? 1 2 Wxn T 2 τk U L hxk21 xk ðτk0 ÞUdτk0 ;
0 0 k51 k51

Ðτ P
where Wi ðτÞ 5 0 jAS hij ðτ 0 Þdτ 0 is the probability that the process
stays in state i for at most time τ before transiting to another state,
and 1 2 Wi ðτÞ is the probability that the process will not make transi-
tion from state i to any other state within time τ. The likelihood func-
tion corresponding to the sample path (x0 , τ 1 , x1 , . . ., τ n , xn ,
P
T 2 nk51 τ k ) is thus
" # !!
X
n X
n
L x0 ; τ 1 ; x1 ; . . .; τ n ; xn ; T 2 τ k 5 P½x0 U 1 2 Wxn T 2 τk
k51 k51
n
U L hxk21 xk ðτ k Þ:
k51

Suppose the current time is t. The time that has been passed since
last jump is defined by Rt 5 t 2 TNðtÞ . Then the process ðSt ; Rt Þ is a
continuous time homogeneous Markov process.

The semi-Markov process can be generated by different types of


random mechanisms (Nunn and Desiderio, 1977), for instances:
1. Usually, it is thought as such a stochastic process that after having
entered state i, it randomly determines the successor state j based
on the state transition probabilities aij , and then randomly
determines the amount of time τ staying in state i before going to
state j based on the holding time density function fij ðτÞ, where
4 Hidden Semi-Markov Models

ÐN
aij  P½Xn11 5 jjXn 5 i 5 0 hij ðτÞdτ is the transition probability
P
from state i to state j, s.t. jAS aij 5 1, and

fij ðτÞdτ  P½τ # τ n11 , τ 1 dτjXn 5 i; Xn11 5 j 5 hij ðτÞdτ=aij


is the probability that the transition to the next state will occur in
the time between τ and τ 1 dτ given that the current state is i and
the next state is j. In this model,

hij ðτÞ 5 aij fij ðτÞ:

2. The semi-Markov process can be thought as a stochastic process that


after having entered state i, it randomly determines the waiting time τ
for transition out of state i based on the waiting time density function
wi ðτÞ, and then randomly determines the successor state j based on the
state transition probabilities aði;τÞj , where wi ðτÞ is the density function
of the waiting time for transition out of state i defined by
X
wi ðτÞdτ 5 P½τ # τ n11 , τ 1 dτjXn 5 i 5 hij ðτÞdτ;
jAS

and
aði;τÞj  P½Xn11 5 jjXn 5 i; τ n11 5 τ
is the probability that the system will make the next transition to
state j, given time τ and current state i. In this model,
hij ðτÞ 5 wi ðτÞaði;τÞj :
3. The semi-Markov process can also be thought as such a process that
after having entered state i, it randomly draws the pair ðk; dik Þ for all
kAS, based on fik ðτÞ, and then determines the successor state and length
of time in state i from the smallest draw. That is, if dij 5 minkAS fdik g,
then the next transition is to state j and the length of time the process
holds in state i before going to state j is dij . In this model,

hij ðτÞ 5 fij ðτÞ L ð1 2 Fik ðτÞÞ;


k6¼j

Ðτ
where Fik ðτÞ 5 0 fik ðτ 0 Þdτ 0 , and Lk6¼j ð1 2 Fik ðτÞÞ is the probability
that the process will not transit to another state except j by time τ.
This type of semi-Markov process is applied to such as reliability
analysis (Veeramany and Pandey, 2011). An example of this type of
semi-Markov process is as follows.
Introduction 5

Example 1.1
Suppose a multiple-queue system contains L queues, each with known
inter-arrival time distribution and departure time distribution. Let ql
be the length of the lth queue and define the state at time t by
St 5 ðq1 ; . . .; qL Þ. Then every external arrival to a queue will increment the
queue length and every departure from a queue will decrement the queue
length if it is greater than zero. Therefore, each arrival/departure will
result in a transition of the system to a corresponding state. Denote an
external arrival to queue l by e1 l 5 ð0; . . .; 1; . . .; 0Þ and a departure from
queue l with ql . 0 by e2 l 5 ð0; . . .; 21; . . .; 0Þ. Then the next state is
ðq1 ; . . .; qL Þ 1 e1 2
l or ðq1 ; . . .; qL Þ 1 el for l 5 1, . . . ,L. The time to the next
state transition is determined by which e1 2
l or el , for l 5 1, . . . ,L, occurs
first, based on their inter-arrival/departure time distributions.

1.1.3 Generalized Semi-Markov Process


A generalized semi-Markov process extends a semi-Markov process
by letting an event trigger a state transition and the next state be
determined by the current state and the event that just occurred. It is
applied to discrete event systems (Glynn, 1989).
In Example 1.1, if the multiple queues form an open queuing network,
such as in a packet switching network, a data packet sent out of a switch
will go out of the network or randomly select one of its neighbor switches
to enter. Every switch is assumed to have an input queue, as shown in
Figure 1.1. Therefore, a state transition depends on the current state as
well as the next arrival/departure event. The semi-Markov process is thus
extended to so-called Generalized Semi-Markov Process.
Suppose that the current state is St 5 STn 5 Xn , and the next event is
En11 that will cause the transition into next state Xn11 at time epoch
Tn11 . Denote EðiÞ as the set of events that can cause the process being
out of state i. Then
hij ðτÞdτ  P½τ # τ n11 , τ 1 dτ; Xn11 5 jjXn 5 i
X
5 P½τ n11 Adτ; Xn11 5 j; En11 5 ejXn 5 i
eAEðiÞ
X
5 P½τ n11 Adτ; En11 5ejXn 5 iUP½Xn11 5 jjXn 5 i; En11 5 e;
eAEðiÞ

where τ n11 Adτ represents τ # τ n11 , τ 1 dτ. Therefore, the transition


probability is extended to P½Xn11 5 jjXn 5 i; En11 5 e, that is, the next
6 Hidden Semi-Markov Models

External
arrival
External Out of
arrival net
Dept.

Out of
net

External Dept.
arrival Out of
net
External
arrival

Figure 1.1 Open queuing network.


There are five switches in the open network. Each switch has a queue of first-in-first-out. When a switch receives a
packet, regardless it is an external arrival or a departure from another switch, the packet will be input into its
queue. The packet in the “head” of the queue will be sent out of the network or to one of other switches. The
vector of the lengths of those five queues is treated as the state. Therefore, each external arrival or a departure
from a queue will change the state.

transition depends on both the current state and the next event. The
next transition epoch Tn11 is determined by the event En11 AEðiÞ that
occurs first among all events of EðiÞ. That is,

P½τ n11 Adτ; En11 5 ejXn 5 i 5 P½τ n11 AdτjXn 5 i; En11 5 e


U L P½Te0 ; next 2 Tn . τjXn 5 i; En11 5 e0 ;
e0 AEðiÞ
e0 6¼ e

where Te0 ;next denotes the time epoch that event e0 will occur. In consid-
ering that each event has an inter-event time distribution, the inter-event
time for event e has passed ce;n at time epoch Tn since event e lastly
occurred. Suppose Te;last and Te;next , for Te;last # Tn , Te;next , are the
epochs that event e lastly occurred and will appear at. The inter-event
time for event e is thus ye;n 5 Te;next 2 Te;last . Then,

P½τ n11 AdτjXn 5 i; En11 5 e


P½ce;n 1 τ # ye;n , ce;n 1 τ 1 dτjXn 5 i; En11 5 e
5
P½ ye;n . ce;n jXn 5 i; En11 5 e

and
P½Te0 ;next 2 Tn . τjXn 5 i; En11 5 e0 
P½ce0 ;n 1 τ , ye0 ;n jXn 5 i; En11 5 e0 
5 :
P½ ye0 ;n . ce0 ;n jXn 5 i; En11 5 e0 
Introduction 7

Example 1.1 (continued)


In the open queuing network system, a departure from queue l can occur
only if ql . 0. That is, not all events can occur while the system is in a
given state. Then for state Xn 5 xn , the event set that can cause the system
out of state xn is
Eðxn Þ 5 fe1 2
l :l 5 1; . . .; Lg , fel :xn ðlÞ . 0; l 5 1; . . .; Lg;

where xn ðlÞ 5 ql is the lth element of xn . A data packet departed from


queue l will randomly select queue k to enter with probability plk , and go
P
out of the system with probability 1 2 Lk51 plk . Therefore, for the given
event e2
l , the former results in that the system transits from state Xn 5 xn
to state Xn11 5 xn 1 e2 1
l 1 ek with probability plk , and the latter to state
2
P
Xn11 5 xn 1 el with probability 1 2 Lk51 plk . For an external arrival
event e1 1
l , the probability from state Xn 5 xn to state Xn11 5 xn 1 el is 1.

1.1.4 Discrete-Time Semi-Markov Process


If t is discrete, that is, t 5 0, 1, 2, . . ., then ðSt Þt $ 0 is called a discrete-
time semi-Markov process. In this case, if t1 is the starting time of state
jAS and t2 its ending time, then St1 5 j, St1 11 5 j, . . ., St2 5 j with
St1 21 6¼ j and St2 11 6¼ j. The sojourn time of ðSt Þt $ 0 in state j is an
integer τ 5 t2 2 t1 1 1 $ 1. In this case, the transition mass functions are
hij ðτÞ  P½τ n11 5 τ; Xn11 5 jjXn 5 i:
Define

hðmÞ
ij ðτÞ  P½τ n1m 5 τ; Xn1m 5 jjXn 5 i

as the probability that, starting from state i at time Tn , the semi-


Markov process will do the mth jump at time Tn 1 τ to state j. It must
satisfy τ $ m. In other words, within a finite time τ, the semi-Markov
process can make at most τ jumps, that is,
X
N X
τ
hðmÞ
ij ðτÞ 5 hðmÞ
ij ðτÞ:
m51 m51

It is due to this fact that the discrete-time semi-Markov process


is different from the continuous-time semi-Markov process (Barbu
and Limnios, 2008), which can make infinite jumps within time τ.
8 Hidden Semi-Markov Models

Similarly define the state transition probabilities of a discrete-time


semi-Markov process by
gij ðτÞ 5 P½St1τ 5 jjSt 5 i

and the cumulative distribution function of waiting time in state i by


XX
τ
Wi ðτÞ 5 hij ðτ 0 Þ;
jAS τ 0 51

where 1 2 Wi ðτÞ is the probability that the sojourn time of state i is at


least τ. Then the Markov renewal equation for the transition probabili-
ties of discrete-time semi-Markov process is
XX
τ
gij ðτÞ 5 ð1 2 Wi ðτÞÞUI ði 5 jÞ 1 hik ðlÞgkj ðτ 2 lÞ;
kAS l51

where I(i 5 j) is an indicator function which equals 1 if i 5 j; otherwise 0.


This recursive formula can be used to compute the transition functions
gij ðτÞ of the discrete-time semi-Markov process.

Example 1.2
There are two libraries. Readers can borrow books from and return to
any of them, as shown in Figure 1.2. The statistics of the libraries shows
that the transition mass functions are

h1;1 ðτÞ 5 0:24 3 0:7τ21 ; h1;2 ðτÞ 5 0:04 3 0:8τ21


h2;1 ðτÞ 5 0:12 3 0:6τ21 ; h2;2 ðτÞ 5 0:07 3 0:9τ21 :

h1,2 (τ) = 0.04 × 0.8τ–1

h1,1(τ) = 0.24 × 0.7τ–1 1 2 h2,2(τ) = 0.07 × 0.9τ–1

h2,1(τ) = 0.12 × 0.6τ–1

Figure 1.2 The semi-Markov process.


There are two states. Each state can transit to itself or the other one with transition mass functions. For example,
state 1 can transit to state 1 with transition mass function h1;1 ðτÞ and state 2 with h1;2 ðτÞ.
Introduction 9

Then we can get the probability that a book is borrowed from library
P
i and returned to library j by aij 5 Nτ51 hij ðτÞ. It yields that

a1;1 5 0:24=0:3 5 0:8 a1;2 5 0:04=0:2 5 0:2


a2;1 5 0:12=0:4 5 0:3 a2;2 5 0:07=0:1 5 0:7:
The holding time probabilities after a book is borrowed from library i
and decided to return to library j are fij ðτÞ 5 hij ðτÞ=aij , that is,

f1;1 ðτÞ 5 0:3 3 0:7τ21 f1;2 ðτÞ 5 0:2 3 0:8τ21


f2;1 ðτÞ 5 0:4 3 0:6τ21 f2;2 ðτÞ 5 0:1 3 0:9τ21 :
The mean holding time that a book is borrowed from library i and
P
returned to library j is d ij 5 Nτ51 fij ðτÞτ. That is,

d 1;1 5 1=0:3  3:33 d 1;2 5 1=0:2 5 5


d 2;1 5 1=0:4 5 2:5 d 2;2 5 1=0:1 5 10:
These show that a book borrowed from library 2 and returned to
library 2 often has longer holding period. Thus the mean holding times
that a book is borrowed from different libraries are

d 1 5 a1;1 d 1;1 1 a1;2 d 1;2  3:67


d 2 5 a2;1 d 2;1 1 a2;2 d 2;2 5 7:75:
These show that people hold a book borrowed from library 2 for a
longer time.

1.2 HIDDEN MARKOV MODELS


In this chapter, we briefly review the hidden Markov model (HMM).
An HMM is defined as a doubly stochastic process. The underlying
stochastic process is a discrete-time finite-state homogeneous Markov
chain. The state sequence is not observable and so is called hidden. It
influences another stochastic process that produces a sequence of
observations. The HMM was first proposed by Baum and Petrie
(1966) in the late 1960s. An excellent tutorial of HMMs can be found
in Rabiner (1989), a theoretic overview of HMMs can be found
in Ephraim and Merhav (2002) and a discussion on learning and
inference of HMMs in understanding of Bayesian networks (BNs) is
presented in Ghahramani (2001).
10 Hidden Semi-Markov Models

Assume a homogeneous (i.e., time-invariant) discrete-time Markov


chain with a set of (hidden) states S 5 f1; . . .; Mg. The state sequence is
denoted by S1:T  ðS1 ; . . .; ST Þ, where St AS is the state at time t.
A realization of S1:T is denoted as s1:T . Define aij  P½St 5 jjSt21 5 i
as the transition probability from state i to state j, and πj  P½S0 5 j
the initial distribution of the state.

Denote the observation sequence by O1:T  ðO1 ; . . .; OT Þ, where


Ot AV is the observation at time t and V 5 fv1 ; v2 ; . . .; vK g is the set of
observable values. The emission probability of observing vk while tran-
siting from state i to state j is denoted by bij ðvk Þ  P½vk jSt21 5 i; St 5 j.
It is often assumed in the literature that the observation is independent
of the previous state and hence

bij ðvk Þ 5 bj ðvk Þ  P½vk jSt 5 j:

Therefore, the set of model parameters of an HMM is


λ  faij ; bij ðvk Þ; πj : i; jAS; vk AVg. The standard HMM is explained in
Figure 1.3.
Given the set of model parameters λ and an instance of the observa-
tion O1:T 5 o1:T , the probability that this observed sequence is generated
by the model is P½o1:T jλ. A computation method for P½o1:T jλ using the
sumproduct expression is
X
P½o1:T jλ 5 P½St 5 j; o1:T jλ
jAS
X
5 P½St 5 j; o1:t jλP½ot11:T jSt 5 j; λ
jAS
X
5 αt ðjÞβ t ðjÞ;
jAS

where
αt ðjÞ  P½St 5 j; o1:t jλ
is the forward variable defined as the joint probability of St 5 j and the
partial observed sequence o1:t , and

β t ðjÞ  P½ot11:T jSt 5 j; λ

the backward variable defined as the probability of future observations


given the current state. In the derivation of the sumproduct expres-
sion, the Markov property that the future observations are dependent
Introduction 11

S0 S1 S2 … ST
Graphical model
o1 o2 oT

Observation seq o1 o2 oT

Emission prob bs0s1(o1) bs1s2(o2) … bsT–1sT(oT)

State 1 …
State 2 S0 …
.
Trellis

. . . S2 .
. . .
. S1

.
State M …
ST

a22
π1 a11 a a2M
⎡ a11 a12 L a1M 12
M
⎢a 2
π2 a22 a2M 1 a32
π= A = ⎢ 21 a13
M ⎢ M a34
M O M
πM ⎢ 3 4 5
⎣ aM1 aM2 L aMM a43 a44
Initial state Transition probability State transition
distribution matrix diagram

Figure 1.3 Standard hidden Markov model.


In the graphical model, an HMM has one discrete hidden node and one discrete or continuous observed node
per slice. Circles denote continuous nodes, squares denote discrete nodes, clear means hidden, shaded means
observed. The arc from node U to node V indicates that U “causes” V. An instance of the hidden Markov pro-
cess is shown in the trellis, where the thick line represents a state path, and the thin lines represent the available
transitions of states. The state transition probabilities are specified in the transition probability matrix A with
the initial state s0 selected according to the initial state distribution π. Equivalent to the state transition proba-
bility matrix A, the underlying Markov chain of the HMM can be expressed by the state transition diagram.
The process produces observation o1 with emission probability bs0 ;s1 ðo1 Þ while transiting from state s0 to s1 , o2
with bs1 ;s2 ðo2 Þ while from s1 to s2 ,. . ., until the final observation oT .

on the current state and independent of the partial observed sequence


o1:t given the current state is applied, that is,
P½ot11:T jSt 5 j; o1:t ; λ 5 P½ot11:T jSt 5 j; λ:
Then the forwardbackward algorithm (also called BaumWelch
algorithm) for HMM is
X
αt ð jÞ 5 P½St21 5 i; St 5 j; o1:t jλ
iAS
X
5 P½St21 5 i; o1:t21 jλP½St 5 j; ot jSt21 5 i; λ (1.1)
iAS
X
5 αt21 ðiÞaij bij ðot Þ; 1 # t # T; jAS;
iAS
X
β t ð jÞ 5 aji bji ðot11 Þβ t11 ðiÞ; 0 # t # T 2 1; jAS; (1.2)
iAS
12 Hidden Semi-Markov Models

with the initial conditions α0 ðjÞ 5 πj and β T ðjÞ 5 1; jAS, where in order
P
to make Eqn (1.2) true for β T21 ðjÞ 5 P½oT jST21 5 j; λ 5 iAS aji bji ðoT Þ,
it is assumed β T ðjÞ 5 1.

The maximum a posteriori probability (MAP) estimation s^t of state


St at a given time t after the entire sequence o1:T has been observed
can be determined by
P½St 5 i; o1:T jλ
s^t 5 arg max P½St 5 ijo1:T ; λ 5 arg max
iAS iAS P½o1:T jλ
5 arg max P½St 5 i; o1:T jλ
iAS
5 arg max αt ðiÞβ t ðiÞ;
iAS

for 0 # t # T.
A limitation of the MAP estimation is that the resulting state
sequence may not be a valid state sequence. This can be easily proved.
Let C be a matrix with the elements cij 5 αt ðiÞaij bij ðot11 Þβ t11 ðjÞ and
ci0 j0 5 0, that is, the transition from state i0 to j0 at time t is not valid. But
it is still possible that i0 5 arg maxi Ce and j 0 5 arg maxj eT C, that is,
those two states could be the best choice based on the MAP estimation,
where e is an all-ones vector and eT its transpose. However, this limita-
tion does not affect its successful application in vast areas including the
area of digital communications. A famous algorithm in the digital
communication areaPis the BCJR algorithm (Bahl et al., 1974), which
replaces bij ðot Þ with xt eij ðxt ÞRðot ; xt Þ in the forwardbackward formu-
las (1.1) and (1.2) for decoding, where xt is the output associated with a
state transition at time t, eij ðxt Þ 5 P½xt jSt21 5 i; St 5 j; λ is the output
probability, ot is the observation of xt and Rðot ; xt Þ 5 P½ot jxt  is the
channel transition probability. The difference between the BaumWelch
algorithm and the BCJR algorithm is shown in Figure 1.4.

Similar to the derivation of the forward formula for the HMM, the
Viterbi algorithm can be readily derived. Define
δt ðjÞ  max P½S1:t21 ; St 5 j; o1:t jλ:
S1:t21

Then from the similarity we can see that by replacing the sumproduct
of Eqn (1.1) with the max-product the Viterbi algorithm of HMM is
yielded by:
δt ðjÞ 5 maxfδt21 ðiÞaij bij ðot Þg; 1 # t # T; jAS;
iAS
Introduction 13

Ot
Markov Detector
source

(a) Baum–Welch model

P[ot | xt ]

Markov xt Ot
Channel Detector
source

Noise
(b) BCJR model

Figure 1.4 BaumWelch and BCJR models.


(a) The hidden Markov source produces signal ot at time t, for t 5 1, . . ., T, according to the emission probabilities
bst21 ;st ðot Þ while the source transits from state st21 to st . The sequence of signals is detected by the detector,
which is used to estimate the states of the source.
(b) The hidden Markov source produces signal xt at time t, for t 5 1, . . ., T, according to the emission probabilities
est21 ;st ðxt Þ while the source transits from state st21 to st . Due to noise of the channel, the signal xt is transformed
to ot according to the transition probability P[ot|xt] while transferring over the channel. The sequence o1:T
observed by the detector is used to estimate the states of the source.

where the initial value δ0 ð jÞ 5 πj , and S1:t21 5 ðS1 ; . . .; St21 Þ is the


partial state sequence. The variable δt ðjÞ represents the score of the
surviving sequence among all possible sequences that enter the state j
at time t. Therefore, by tracing back from maxjAS δT ðjÞ one can find
the optimal state sequence ðS1 ; . . .; ST Þ that maximizes the likelihood
function P½o1:T jS1:T ; λ, when the prior state sequence S1:T is
uniform.

1.3 DYNAMIC BAYESIAN NETWORKS


A BN is defined by a directed acyclic graph (DAG) in which nodes
correspond to random variables having conditional dependences on
the parent nodes, and arcs represent the dependencies between
variables. A dynamic Bayesian networks (DBN) was first proposed by
Dean and Kanazawa (1989), which extends the BN by providing
an explicit discrete temporal dimension that uses arcs to establish
dependencies between variables in different time slices. DBNs can
be generally used to represent very complex temporal processes.
Various HMMs and hidden semi-Markov models (HSMMs) as well as
state space models (SSMs) can be expressed using the DBN models
(Murphy, 2002b). For example, an HMM can be represented as a
DBN with a single state variable and a single observation variable in
each slice, as shown in Figure 1.3. In contrast, each slice of a general
14 Hidden Semi-Markov Models

DBN can have any number of state variables and observation


variables. Because there are procedures for learning the parameters
and structures, DBNs have been applied in many areas, such as
motion processes in computer vision, speech recognition, genomics,
and robot localization (Russell and Norvig, 2010).
Let Ut denote the set of variables/nodes in the DBN at time slice t,
and Xt AUt be one of its variables. Denote the parents of Xt by
PaðXt Þ 5 Pat ðXt Þ , Pat21 ðXt Þ, where Pat ðXt Þ are its parents in the same
slice and Pat21 ðXt Þ the others in the previous slice. Let UðkÞt be a subset
of Ut , and define the set of its parents by

PaðUðkÞ ðkÞ
t Þ 5 fPaðXt Þ : Xt AUt g:

Similarly define Pat ðUðkÞ


t Þ and Pat21 ðUðkÞ
t Þ such that
ðkÞ ðkÞ ðkÞ
PaðUt Þ 5 Pat ðUt Þ , Pat21 ðUt Þ.
Because the DBN is a DAG, Ut can be divided into non-
overlapped subsets Uð1Þ
t ; . . .; Ut
ðKÞ
such that for each Xtðk11Þ AUðk11Þ
t ,
ðk11Þ S k ðlÞ ðk11Þ ðkÞ
there exist Pat ðXt ÞD l51 Ut and Pat ðXt Þ - Ut 6¼ null.
That is, the intra-slice parents of each Xtðk11Þ AUðk11Þ
t belong to the
ð1Þ ðkÞ
previous subsets Ut ; . . .; Ut , and it has at least one parent belong-
ing to UðkÞ
t .

Based on this definition, the start subset Uð1Þ t has no intra-slice


parents, that is, Pat ðUt Þ 5 null. Each variable Xt AUð2Þ
ð1Þ ð2Þ
t has at least
ð1Þ
one parent belonging to Ut . Therefore, there exists at least one
directed path of length 1 from Uð1Þ t to Xtð2Þ . Hence, we can conclude
that for each Xtðk11Þ AUðk11Þt , there exists at least one directed path of
length k that across Ut , Ut , . . ., UðkÞ
ð1Þ ð2Þ
t , until Xt
ðk11Þ
. But this does not
ðk11Þ
prevent Xt from having a shorter path starting from any of the
previous subsets. In other words, the maximum length of directed path
to Xtðk11Þ AUðk11Þ
t must be k. The last subset UðKÞ t contains the variables
that have the longest paths of length K, and has no intra-slice children.
Any subset UðkÞ t , k 5 1, . . ., K, can have inter-slice parents, that is,
ðkÞ
Pat21 ðUt Þ 6¼ null.
As shown in Figure 1.5, a general DBN can be expressed by a left-
to-right network, in which each “node” represents a subset of nodes of
Introduction 15

Slice t –1 Slice t Slice t +1

Ut(1)
−1
U(2)
t −1
(K )
Ut −1 Ut
(1) (2)
Ut
(K )
Ut
(1)
Ut +1
(2)
Ut +1 Ut(K)
+1

Intra-slice
Cut-set of arcs
Inter-slice

Figure 1.5 Left-right expression of a general DBN.


Each of the circles represents a subset of nodes of the DBN, and the arcs represent the dependency that a node in
the child subset is dependent on its parents in the parent subset. All the slices, except the first and the last slices
(not shown in the figure), have the same structure of DBN. In slice t, the first subset of nodes, Uð1Þ t , has inter-slice
parents in, for instance, Uð2Þt21 of the previous slice t 2 1 without intra-slice parents. The other subsets of nodes in
ðKÞ
slice t may have both inter-slice and intra-slice parents. For example, the subset of nodes, Ut , has inter-slice
parents in Uð1Þ ð1Þ ð2Þ
t21 and intra-slice parents in Ut , Ut , and Ut
ðK21Þ
. In contrast, the subset of nodes, Uð2Þ
t , has no
inter-slice parents in the previous slice t 2 1. Besides, each subset of nodes, UðkÞt , must have at least one parent in
Uðk21Þ
t , for k 5 2, . . ., K.

the original DBN and each “arc” denotes that the child subset has at
least one parent in the parent subset. The solid arrows represent the
intra-slice arcs and the dotted arrows the inter-slice arcs.

Based on the partition of DAG, the transition probabilities from


Ut21 to Ut are:
" #
ð1Þ ðKÞ
[ ðkÞ
K21
P½Ut jUt21  5 P½Ut jUt21 ?P Ut jUt21 ; Ut
k51
ð1Þ ð1Þ
5 L P½Xt jPat21 ðXt Þ?
Xtð1Þ AUð1Þ
t
  (1.3)
ðKÞ ðKÞ ðKÞ
L P Xt jPat21 ðXt Þ; Pat ðXt Þ
XtðKÞ AUðKÞ
t

5 L P½Xt jPaðXt Þ:


Xt AUt

By unrolling the network until T slices, the joint distribution for a


sequence of length T can be obtained:
T
P½U1:T  5 P½U0 U L P½Ut jUt21 ;
t51

where P½U0  is the initial distribution of the random variables. Let


ut 5 fXt 5 xt : Xt AUt g be a realization of Ut . Then P½ut  can be calcu-
lated recursively by
X
P½ut  5 P½Pat21 ðut ÞP½ut jPat21 ðut Þ; (1.4)
Pat21 ðut Þ
16 Hidden Semi-Markov Models

where Pat21 ðut Þ are the set of boundary nodes going out of slice t 2 1
to t.

For each realization xt and an instance Paðxt Þ, there is a model


parameter P½xt jPaðxt Þ. Then by Eqn (1.3) the transition probabilities
P½ut jut21  can be determined with N multiplications, where N is the
number of random variables contained in Ut . For example, if every
random variable has D parents and can take M discrete values, then
for a variable Xt the DBN requires M D11 parameters to represent the
transition probabilities P½Xt jPaðXt Þ. Hence, the DBN requires NM D11
parameters to represent the transition probabilities P½Ut jUt21 .

Denote UðEÞ
t as the set of entry nodes from slice t 2 1 into slice t by
letting
UðEÞ
t  fXt : Pat21 ðXt Þ 6¼ nullg:

Let UðIt Þ 5 Ut \UðEÞ


t be the set of inner nodes of slice t. Then for an
inner node Xt of Uðk11Þ t , that is, Xt AUðIt Þ - Uðk11Þ
t , we have
Pat21 ðXt Þ 5 null and
[
k k 
[
PaðXt Þ 5 Pat ðXt ÞD UðlÞ ðEÞ
t DUt , UðIÞ ðlÞ
t - Ut :
l51 l51

Therefore,
" #
K 
[
P½UðIt Þ jUðEÞ
t 5P UðIt Þ - UðkÞ
t jUðEÞ
t
k51
h i
5 P UðIt Þ - Uð1Þ
t jU ðEÞ
t U
" #
[K  
P UðIÞ
t - Ut
ðkÞ
jUðEÞ ðI Þ ð1Þ
t , Ut - Ut
k52
...
" #
K X
k21
5 LP UðIÞ
t - UðkÞ ðEÞ
t jUt , UðIÞ
t - UðlÞ
t
k51 l51
K
5L L P½Xt jPaðXt Þ
k51 Xt AUðI Þ - UðkÞ
t t

5 L P½Xt jPaðXt Þ:


Xt AUðI
t
Þ
Introduction 17

That is, P½uðIt Þ juðEÞ ðIÞ


t  can be determined by jUt j multiplications of
the corresponding model parameters. This implies that the transition
from slice t 2 1 to slice t has two stages: inter-slice transition from slice
t 2 1 to slice t with probability P½uðEÞ t jut21 , and intra-slice transition
ðI Þ ðEÞ
within slice t with probability P½ut jut .
S
Because Pat ðUðk11Þ
t ÞD kl51 UðlÞ
t accordingto the definition of Ut
ðk11Þ
,
ðk11Þ ðk11Þ S k ðlÞ ðk11Þ
we have Pat ðUt Þ - Ut 5 null and Pat l51 Ut - Ut 5 null.
In other words, any variable Xt AUðk11Þ t cannot be a parent of
ð1Þ ðk11Þ
Ut ; . . .; Ut . Therefore, the transition probabilities P½uðIt Þ juðEÞ
t  for given
ðEÞ
ut satisfy:
X X K
P½uðIt Þ juðEÞ
t 5 L L P½xt jPaðxt Þ
uðI Þ
uðI Þ k51 xt AuðI Þ -uðkÞ
t t t t

X K21 X
5 L L P½xt jPaðxt Þ L P½x0t jPaðx0t Þ
uðI Þ ðKÞ
t \ut
k51 xt AuðI Þ
t -ut
ðkÞ
uðKÞ
t
x0t AuðI Þ
t -ut
ðKÞ

X K21
5 L L P½xt jPaðxt Þ
k51 xt AuðI Þ -uðkÞ
uðI Þ ðKÞ
t \ut t t

...
51:

Since P½UðIt Þ jUðEÞ ðI Þ ðEÞ


t ; Ut21  5 P½Ut jUt , we have a forward formula
X
P½uðEÞ
t 5 P½uðEÞ ðI Þ ðEÞ ðIÞ
t21 ; ut21 ; ut ; ut 
uðEÞ ;uðI Þ ;uðI Þ
t21 t21 t
X
5 P½uðEÞ ðIÞ ðEÞ ðEÞ ðIÞ ðEÞ
t21 P½ut21 jut21 P½ut jut21 P½ut jut  ; (1.5)
uðEÞ ;uðI Þ ;uðI Þ
t21 t21 t
X
5 P½uðEÞ ðI Þ ðEÞ ðEÞ ðDÞ
t21 P½ut21 jut21 P½ut jut21 
uðEÞ ;uðI Þ
t21 t21

where uðDÞ ðEÞ


t21  Pat21 ðut Þ are the set of boundary nodes going out of
slice t 2 1 to t. Then P½ut  5 P½uðEÞ ðI Þ ðEÞ
t P½ut jut . It can be seen that when
jUðEÞ ðEÞ ðDÞ
t j , N and P½ut jut21  can be easily obtained, Eqn (1.5) has fewer
dimensions than Eqn (1.4).
18 Hidden Semi-Markov Models

As shown in Figure 1.5, any cut-set of arcs divides the DBN into a
left part and a right part. If the starting/ending nodes of the arcs in the
cut-set are given, such as given uðDÞ ðEÞ
t21 or ut , the transition probabilities
to the nodes in the right part can be completely determined. For
ðEÞ
instance, for given the transition probabilities P½uðIt Þ juðEÞ
P ut , ðIÞ t  can be
ðEÞ
determined and uðI Þ P½ut jut  5 1. Based on this observation, we can
t
try to find a cut-set of arcs, which can be inter-slice, intra-slice, or
mixed ones, so that they have the minimum number of starting/ending
nodes. Let UðMinÞ
t denote the set of nodes in one ends of the cut-set of
arcs. Then P½UðMinÞ
t  may have the minimum number of dimensions.

Suppose one random variable in Ut is the observation ot and the


others are states. Denote the set of states by St . The observation ot
is given and cannot be a parent of a state. The observation probabilities
are P½ot jPaðot Þ. Therefore, if we define the forward variables by
X
αt ðst Þ  P½o1:t ; st  5 P½o1:t ; st21 ; st 
st21
X
5 αt21 ðst21 ÞP½st jst21 P½ot jo1:t21 ; st21 ; st  (1.6)
st21
X
5 αt21 ðst21 ÞP½st jsðDÞ
t21 P½ot jPaðot Þ
st21

then the DBN becomes a HMM. Similarly, the forward variables can
be defined with reduced-dimensions by

αt ðsðEÞ ðEÞ
t Þ  P½o1:t ; st 
X
5 αt21 ðsðEÞ ðIÞ ðEÞ ðEÞ ðI Þ ðDÞ
t21 ÞP½st21 jst21 P½st ; st jst21 P½ot jPaðot Þ
sðEÞ ;sðI Þ ;sðI Þ
t21 t21 t
X
5 αt21 ðsðEÞ ðIÞ ðEÞ ðEÞ ðDÞ ðI Þ ðEÞ
t21 ÞP½st21 jst21 P½st jst21 P½st jst P½ot jPaðot Þ:
sðEÞ ;sðI Þ ;sðI Þ
t21 t21 t

(1.7)

The backward variables for HMM can be defined by


β t ðst Þ  P½ot11:T jst  or β t ðsðEÞ ðEÞ
t Þ  P½ot11:T jst , and so the backward
formula can be readily derived.

Compared with Eqns (1.4) and (1.5), we can see that in the discrete
cases the computational complexity of Eqns (1.6) and (1.7) and the
number of model parameters for an HMM are consistent to the
Introduction 19

corresponding DBN. In the literature, one generally argues that


the HMM would require M 2N model parameters to represent the
transition probabilities P½Ut jUt21  and thus would be much complex
than the corresponding DBN model while every random variable has
few parents.
Using the forwardbackward algorithms of HMMs, the model para-
meters of a DBN can be estimated. Based on the given set of model
parameters λ, a variety of inference problems can be solved including:
filtering P½St jo1:t ; λ, predicting P½St1τ jo1:t ; λ for τ . 0, fixed-lag
smoothing P½St2τ jo1:t ; λ, fixed-interval smoothing P½St jo1:T ; λ for given
observation sequence, most likely path Pfinding arg maxS1:t P½S1:t jo1:t ; λ,
and likelihood computing P½o1:t jλ 5 S1:t P½S1:t ; o1:t jλ. Therefore, for a
discrete-state DBN the simplest inference method is to apply the for-
wardbackward algorithms of HMMs. Though it turns out that the
“constant” for the per-update time and space, the complexity is almost
always exponential in the number of state variables.

For exact inference in DBNs, another simple method is unrolling


the DBNs until it accommodates the whole sequence of observations
and then using any algorithm for inference in BNs such as variable
elimination, clustering methods, etc. However, a naive application
of unrolling would not be particularly efficient for a long sequence of
observations.

In general, one can use DBNs to represent very complex temporal


processes. However, even in the cases that have sparsely connected
variables, one cannot reason efficiently and exactly about those
processes. Applying approximate methods seems to be the way.
Among a variety of approximation methods, the particle filtering
(Gordon et al., 1993) is a sequential importance re-sampling method.
It is very commonly used to estimate the posterior density of the state
variables given the observation variables. Particle filtering is consis-
tent and efficient, and can be used for discrete, continuous, and
hybrid DBNs.

1.4 CONDITIONAL RANDOM FIELDS


Different from the directed graphical model of DBNs, conditional
random fields (CRFs) are a type of undirected probabilistic graphical
model whose nodes can be divided into exactly two disjoint sets, that
20 Hidden Semi-Markov Models

is, the observations O and states S. Therefore, a DBN is a model of


the joint distribution P½O; S, while a CRF is a model of the condi-
tional distribution P½SjO because O is not necessary to be included
into the same graphical structure as S. Hence, in a CRF, only S
is assumed being indexed by the vertices of an undirected graph
G 5 (V,E), and is globally conditioned on the observations O. The rich
and global features of the observations can then be used, while
dependencies among the observations do not need to be explicitly
represented. In the literature, the directed graphical model of DBNs is
referred as a generative model in which the observations O usually
cannot be a parent of a state and be considered being probabilistically
generated by the states S. In contrast, the undirected graphical model
is termed as a discriminative model in which a model of P½O is not
required. CRFs have been applied in shallow parsing, named entity
recognition, gene finding, object recognition, and image segmentation
(Sutton and McCallum, 2006).

A CRF is generally defined by P½sjo if for any fixed o, the distribu-


tion P½sjo factorizes according to a factor graph G of S (Sutton and
McCallum, 2006). If the factor graph G 5 (V, E) of S is a chain or a
tree, the conditional distribution P½sjo has the form
!
1 X X
P½sjo 5 exp λk fk ðe; se ; oÞ 1 μk0 gk0 ðv; sv ; oÞ ; (1.8)
ZðoÞ eAE;k vAV ;k0

where se and sv are the component sets of s associated with the vertices
of edge e and vertice v, respectively, the feature functions fk and gk0 are
given and fixed, and ZðoÞ is an instance-specific normalization function
defined by
!
X X X
ZðoÞ 5 exp λk fk ðe; se ; oÞ 1 μk0 gk0 ðv; sv ; oÞ : (1.9)
s eAE;k vAV ;k 0

Example 1.3 From an HMM to a CRF


For an instance of observation sequence o 5 ðo1 ; . . .; oT Þ and state
sequence s 5 ðs1 ; . . .; sT Þ, the set of vertices of the factor graph G
is V 5 fs1 ; . . .; sT g, and the set of edges is E 5 fst21 ; st : t 5 2; . . .; Tg,
Introduction 21

where s1 is assumed being the initial state. Then the conditional probability
factorizes as
1 1 T
P½sjo 5 P½s; o 5 L P½st jst21 P½ot jst 
P½o P½o t51
!
1 XT XT
5 exp log P½st jst21  1 log P½ot jst 
P½o t51 t51

1 X T X
5 exp log aij UI ððst21 ; st Þ 5 ði; jÞÞ
P½o t51 i;j
!
XT X
1 log bj ðxÞUI ðst 5 j ÞUI ðot 5 xÞ ;
t51 j;x

where P½s1  is represented by P½s1 js0  for simplicity, and I(x 5 x0 )


denotes an indicator function of x which takes the value 1 when
x 5 x0 and 0 otherwise. Let ZðoÞ 5 P½o, λk 5 log aij , k2ði; jÞ,
fk ðe 5 ðst21 ; st Þ; se ; oÞ 5 I ððst21 ; st Þ 5 ði; jÞÞ, μk0 5 log bj ðxÞ, k 0 2ðj; xÞ, and
gk0 ðv 5 st ; sv ; oÞ 5 I ðst 5 jÞUI ðot 5 xÞ. Then the HMM becomes a linear-
chain CRF, as given by Eqn (1.8). ZðoÞ is determined by Eqn (1.9).

From this example, we can see that an HMM can be expressed by


a CRF with specific feature functions. CRFs extend the HMM to
containing any number of feature functions that can inspect the entire
observation sequence. The feature functions fk and gk0 need not have a
probabilistic interpretation.

When the graph G is a chain or a tree, the algorithms analogous to the


forwardbackward algorithms, and Viterbi algorithm for HMMs can be
used to yield exact inference in a CRF. However, for a general graph, the
problem of exact inference in a CRF is intractable, and algorithms, such
as loopy belief propagation, can be used to obtain approximate solutions.

To estimate the parameters θ 5 fλk g , fμk0 g using training data


o ; . . .; oðNÞ and sð1Þ ; . . .; sðNÞ , maximum likelihood learning is used,
ð1Þ

where oðiÞ 5 ðoðiÞ ðiÞ


1 ; . . .; oT Þ is the ith observation sequence and
sðiÞ 5 ðsðiÞ ðiÞ
1 ; . . .; sT Þ is the corresponding ith state sequence. For the CRFs
expressed by Eqn (1.8) where all nodes have exponential family
distributions and sð1Þ ; . . .; sðNÞ observe all nodes of the graph, this
optimization is convex. Gradient descent algorithms, Quasi-Newton
22 Hidden Semi-Markov Models

methods, such as the L-BFGS algorithm, can be used to solve the


problem. In the case that some states are unobserved, those states have
to be inferred.

1.5 HIDDEN SEMI-MARKOV MODELS


Due to the non-zero probability of self-transition of a nonabsorbing
state, the state duration of an HMM is implicitly a geometric distribu-
tion. This makes the HMM has limitations in some applications.
A HSMM is traditionally defined by allowing the underlying process
to be a semi-Markov chain. Each state has a variable duration, which
is associated with the number of observations produced while in the
state. The HSMM is also called explicit duration HMM (Ferguson,
1980; Rabiner, 1989), variable-duration HMM (Levinson, 1986a;
Russell and Moore, 1985; Rabiner, 1989), HMM with explicit duration
(Mitchell et al., 1995), HSMM (Murphy, 2002a), generalized HMM
(Kulp et al., 1996), segmental HMM (Russell, 1993), and segment
model (Ostendorf and Roukos, 1989; Ostendorf et al., 1996) in the
literature, depending on their assumptions and their application areas.

In the simplest case when the observations are conditionally inde-


pendent of each other for given states, an HSMM can be expressed
using an HMM, DBN, or CRF. However, a general HSMM cannot
be expressed by a fixedly structured graphical model because the num-
ber of nodes in the graphical model is itself random which in turn
emits random-length segments of observations. For example, for a
given length T of observation sequence ðo1 ; . . .; oT Þ, the length N # T
of the state sequence ðS1 ; . . .; SN Þ is a random variable due to the
variable state durations. We do not know in advance how to divide
the observation sequence into N segments corresponding to the state
sequence, and let ot be conditioned on which state variable.

A general HSMM is shown in Figure 1.6. In the figure, the actual


sequence of events is taken to be:

1. The first state i1 and its duration d1 are selected according to the state
transition probability aði0 ;d0 Þði1 ;d1 Þ , where ði0 ; d0 Þ is the initial state and
duration. State i1 lasts for d1 5 2 time units in this instance.
2. It produces two observations ðo1 ; o2 Þ according to the emission
probability bi1 ;d1 ðo1 ; o2 Þ.
Introduction 23

Observations o1 o2 o3 o4 o5 o6 •••••• oT

Time 1 2 3 4 5 6 •••••• T

Duration d1 d2 •••••• dn
State seq i1 i2 •••••• in
Transitions

Figure 1.6 A general HSMM.


The time is discrete. At each time interval, there is an observation, which is produced by the hidden/unobservable
state. Each state lasts for a number of time intervals. For example, state i2 lasts for 4 time intervals, producing 4
observations: o3, o4, o5, and o6. When a state ends at time t, it transits to the next state at time t 1 1. For exam-
ple, state i1 ends at time 2 and transits to i2 at time 3. The transition is instantaneously finished at the jump time
from time 2 to 3. The state sequence (i1, d1), . . ., (in, dn) is corresponding to the observation sequence (o1, . . .,
P
oT) with nm51 dm 5 T, where T is the length of the observation sequence, and n is the number of state jumps/
transitions (including from initial state to the first state).

3. It transits, according to the state transition probability aði1 ;d1 Þði2 ;d2 Þ , to
state i2 with duration d2 .
4. State i2 lasts for d2 5 4 time units in this instance, which produces
four observations ðo3 ; o4 ; o5 ; o6 Þ according to the emission probabil-
ity bi2 ;d2 ðo3 ; o4 ; o5 ; o6 Þ.
5. ði2 ; d2 Þ then transits to ði3 ; d3 Þ, and ði3 ; d3 Þ transits to . . ., ðiN ; dN Þ until
the final observation oT is produced. The last state iN lasts for dN time
P
N
units, where dn 5 T and T is the total number of observations.
n51

Note that the underlying stochastic process is not observable, that


is, the states and their durations are hidden and there is no external
demarcation between the observations arising from state i1 and those
arising from state i2 . Only the sequence of observations ðo1 ; o2 ; . . .; oT Þ
can be observed.

Example 1.4
The observed workload data of a Web site consists of ot, the number of
user requests per second, with the maximum observed value
K 5 max{ot} 5 74 requests/s. The total number of observations is
T 5 3600 s (over 1 h) in the workload data set. The user request arrivals
are governed by an underlying hidden semi-Markov process. The hidden
state represents the arrival rate, which is corresponding to the number of
users that are browsing the website. For instance, state 1 corresponds to
arrival rate of 13 requests/s, state10, 30.1 requests/s, and state 20, 60
24 Hidden Semi-Markov Models

Peak hour traffic


80 30

70
25
60
20
50
Requests/s

State
40 15

30
10
20
5
10

0
0 10 20 30 40 50 60
Time (minute)

Figure 1.7 Data (requests/s) and the hidden states of the workload.
The grey line is the trace of the workload data (number of arrivals per second) that we observed. The black line is
the hidden state sequence estimated. The hidden state represents the arrival rate, and the state duration represents
the dwell time of the arrival rate. For given arrival rate, the number of arrivals per second is a random variable.

requests/s. For given state or arrival rate, the number of user requests per
second, Ot, is a random variable. Therefore, from an observed value one
cannot tell the actual arrival rate. In other words, the state is hidden.
Figure 1.7 plots the observed number of requests per second together with
the estimated hidden states (i.e., arrival rates). There are 41 state transi-
tions occurred during the period of 3600 s. Some states last for a long
period of time with the maximum duration of D 5 405 s (Yu et al., 2002).

The issues related to a general HSMM include:

1. computation of the predicted/filtered/smoothed probabilities,


expectations, and the likelihood of observations;
2. the MAP estimation of states and the maximum likely state
sequence estimation; and
3. parameter estimation/update.

Different models have different capability in modeling applications


and have different computational complexity in solving those issues.

1.6 HISTORY OF HIDDEN SEMI-MARKOV MODELS


The first approach to HSMM was proposed by Ferguson (1980),
which is partially included in the survey paper by Rabiner (1989).
Introduction 25

This approach is called the explicit duration HMM in contrast to the


implicit duration of the HMM. It assumes that the state duration is
generally distributed depending on the current state of the underlying
semi-Markov process. It also assumes the “conditional independence”
of outputs. Levinson (1986a) replaced the probability mass functions
of duration with continuous probability density functions to form a
continuously variable duration HMM. As Ferguson (1980) pointed
out, an HSMM can be realized in the HMM framework in which both
the state and its sojourn time since entering the state are taken as a
complex HMM state. This idea was exploited in 1991 by a 2-vector
HMM (Krishnamurthy et al., 1991) and a duration-dependent state
transition model (Vaseghi, 1991). Since then, similar approaches were
proposed in many applications. They are called in different names
such as inhomogeneous HMM (Ramesh and Wilpon, 1992), nonsta-
tionary HMM (Sin and Kim, 1995), and triplet Markov chains
(Pieczynski et al., 2002). These approaches, however, have the
common problem of computational complexity in some applications.
A more efficient algorithm was proposed in 2003 by Yu and
Kobayashi (2003a), in which the forwardbackward variables are
defined using the notion of a state together with its remaining sojourn
(or residual life) time. This makes the algorithm practical in many
applications.
The HSMM has been successfully applied in many areas. The
most successful application is in speech recognition. The first appli-
cation of HSMM in this area was made by Ferguson (1980). Since
then, there have been more than one hundred such papers published
in the literature. It is the application of HSMM in speech recogni-
tion that enriches the theory of HSMM and develops many algo-
rithms for HSMM.
Since the beginning of 1990s, the HSMM started being applied in
many other areas. In this decade, the main application area of
HSMMs is handwritten/printed text recognition (see, e.g., Chen et al.,
1993a). Other application areas of HSMMs include electrocardiograph
(ECG) (Thoraval et al., 1992), network traffic characterization (Leland
et al., 1994), recognition of human genes in DNA (Kulp et al., 1996),
language identification (Marcheret and Savic, 1997), ground target
tracking (Ke and Llinas, 1999), document image comparison, and
classification at the spatial layout level (Hu et al., 1999).
26 Hidden Semi-Markov Models

From 2000 to 2009, the HSMM has been obtained more and more
attentions from vast application areas. In this decade, the main
applications are human activity recognition (see, e.g., Hongeng and
Nevatia, 2003) and speech synthesis (see, e.g., Moore and Savic, 2004).
Other application areas include change-point/end-point detection
for semiconductor manufacturing (Ge and Smyth, 2000a), protein
structure prediction (Schmidler et al., 2000), analysis of branching and
flowering patterns in plants (Guedon et al., 2001), rain events time
series model (Sansom and Thomson, 2001), brain functional MRI
sequence analysis (Faisan et al., 2002), Internet traffic modelling (Yu
et al., 2002), event recognition in videos (Hongeng and Nevatia, 2003),
image segmentation (Lanchantin and Pieczynski, 2004), semantic
learning for a mobile robot (Squire, 2004), anomaly detection for
network security (Yu, 2005), symbolic plan recognition (Duong et al.,
2005a), terrain modeling (Wellington et al., 2005), adaptive cumulative
sum test for change detection in noninvasive mean blood pressure
trend (Yang et al., 2006), equipment prognosis (Bechhoefer et al.,
2006), financial time series modeling (Bulla and Bulla, 2006), remote
sensing (Pieczynski, 2007), classification of music (Liu et al., 2008),
and prediction of particulate matter in the air (Dong et al., 2009).

In the recent years since 2010, the main application areas of


HSMMs are equipment prognosis/diagnosis (see, e.g., Dong and Peng,
2011) and animal activity modeling (see, e.g., O’Connell et al., 2011).
Other application areas include such as machine translation (Bansal
et al., 2011), network performance (Wang et al., 2011), deep brain
stimulation (Taghva, 2011), image recognition (Takahashi et al.,
2010), icing load prognosis (Wu et al., 2014), irrigation behavior
(Andriyas and McKee, 2014), dynamics of geyser (Langrock, 2012),
anomaly detection of spacecraft (Tagawa et al., 2011), and prediction
of earthquake (Beyreuther and Wassermann, 2011).
CHAPTER 2
General Hidden Semi-Markov Model

This chapter provides a unified description of hidden semi-Markov


models, and discusses important issues related to inference in the HSMM.

2.1 A GENERAL DEFINITION OF HSMM


An HSMM allows the underlying process to be a semi-Markov chain
with a variable duration or sojourn time for each state. State duration d
is a random variable and assumes an integer value in the set
D 5 f1; 2; . . .; Dg, where D is the maximum duration of a state and can
be infinite in some applications. Each state can emit a series of obser-
vations, and the number of observations produced while in state i is
determined by the length of time spent in the state, that is, the duration
d. Now we provide a unified description of HSMMs.

Assume a discrete-time semi-Markov process with a set of (hidden)


states S 5 f1; . . .; Mg. The state sequence ðS1 ; . . .; ST Þ is denoted by
S1:T , where St AS is the state at time t. A realization of S1:T is denoted
as s1:T . For simplicity of notation in the following sections, we denote:
• St1 :t2 5 i—state i that the system stays in during the period from t1 to
t2 . In other words, it means St1 5 i; St1 11 5 i; . . .; and St2 5 i. Note that
the previous state St1 21 and the next state St2 11 may or may not be i.
• S½t1 :t2  5 i—state i that starts at time t1 and ends at t2 with duration
d 5 t2 2 t1 1 1. This implies that the previous state St1 21 and the
next state St2 11 must not be i.
• S½t1 :t2 5 i—state i that starts at time t1 and lasts till t2 , with
S½t1 5 i; St1 11 5 i; . . .; St2 5 i, where S½t1 5 i means that at t1 the sys-
tem entered state i from some other state, that is, the previous state
St1 21 must not be i. The next state St2 11 may or may not be i.
• St1 :t2  5 i—state i that lasts from t1 to t2 and ends at t2 with
St1 5 i; St1 11 5 i; . . .; St2  5 i, where St2  5 i means that at time t2 the
state will end and transit to some other state at time t2 1 dt2 , that is,
the next state St2 11 must not be i. The previous state St1 21 may or
may not be i.
Hidden Semi-Markov Models. DOI: http://dx.doi.org/10.1016/B978-0-12-802767-7.00002-4
© 2016 Elsevier Inc. All rights reserved.
28 Hidden Semi-Markov Models

Based on these definitions, S½t 5 i means state i starting and ending


at t with duration 1, S½t 5 i means state i starting at t, St 5 i means
state i ending at t, and St 5 i means the state at t being state i.

Denote the observation sequence ðO1 ; . . .; OT Þ by O1:T , where Ot AV


is the observation at time t and V 5 fv1 ; v2 ; . . .; vK g is the set of observable
values. For observation sequence O1:T , the underlying state sequence is
S1:d1  5 i1 , S½d1 11:d1 1d2  5 i2 ; . . .; S½d1 1?1dN21 11:d1 1?1dN 5 iN , and the state
transitions are ðin ; dn Þ-ðin11 ; dn11 Þ, for n 5 1; . . .; N 2 1, where
PN
n51 dn 5 T, i1 ; . . .; iN AS, and d1 ; . . .; dN AD. Note that the first state i1
is not necessarily starting at time 1 associated with the first observation
O1 and the last state iN is not necessarily ending at time T associated
with the last observation OT . As the states are hidden, the number N of
hidden states in the underlying state sequence is also hidden/unknown.
We note that the observable values can be discrete, continuous, or
have infinite support, and the observation Ot AV can be a value, a vec-
tor, a symbol, or an event. The length T of the observation sequence
can be very large, but is usually assumed to be finite except in the case
of online learning. There are usually multiple observation sequences in
practice, but we do not always explicitly mention this fact unless it is
required. The formulas derived for the single observation sequence
usually cannot be directly applied for the multiple observation
sequences because the sequence lengths are different with different like-
lihood functions. Therefore, while applying the formulas derived for
the single observation sequence into the case of multiple observation
sequences, the formulas must be divided by the likelihood functions
P½oðlÞ ðlÞ
1:Tl  if they have not yet appeared in the formulas, where o1:Tl is the
lth observation sequence of length Tl.

Suppose the current time is t, the process has made n 2 1 jumps,


P
and the time spent since the previous jump is Xt 5 t 2 n21 l51 dl . As
explained in Section 1.1.2, the process ðSt ; Xt Þt$1 is a discrete-time
homogeneous Markov process. Its subsequence ðin ; dn Þn$1 is also a
Markov process based on the Markov property. Then we can define
the state transition probability from state i having duration h to state
j 6¼ i having duration d by
aði;hÞð j;dÞ  P½S½t11:t1d 5 jjS½t2h11:t 5 i;
General Hidden Semi-Markov Model 29

which is assumed independent of Ptime t,Pfor i; jAS, h; dAD. The transi-


tion probabilities must satisfy jAS\fig dAD aði;hÞð j;dÞ 5 1, for all given
iAS and hAD, with zero self-transition probabilities aði;hÞði;dÞ 5 0, for all
iAS and h; dAD. In other words, when a state ends at time t, it cannot
transit to the same state at the next time t 1 1 because the state dura-
tions are explicitly specified by some distributions other than geometric
or exponential distributions. From the definition we can see that the
previous state i started at t 2 h 1 1 and ended at t, with duration h.
Then it transits to state j having duration d, according to the state
transition probability aði;hÞðj;dÞ . State j will start at t 1 1 and end at
t 1 d. This means both the state and the duration are dependent on
both the previous state and its duration. While in state j, there will be d
observations Ot11:t1d being emitted. Denote this emission/observation
probability by

bj;d ðot11:t1d Þ  P½ot11:t1d jS½t11:t1d 5 j

which is assumed to be independent of time t, where ot11:t1d is the


observed values of Ot11:t1d . Let the distribution of the first state be

Πj;d  P½S½1:d 5 j

or

Πj;d  P½S1:d 5 j

depending on the model assumption that the first state is starting at


t 5 1 or before. We can equivalently let the initial distribution of the
state be

πj;d  P½S½t2d11:t 5 j; t # 0:

It represents the probability of the initial state and its duration


before time t 5 1 or before the first observation o1 obtained. The rela-
tionship between the two definitions of initial state distribution is
P P
Πj;d 5 1τ5d2D11 i;h πði;hÞ aði;hÞð j;d2τ11Þ , where if the starting time of the
first state must be t 5 1 then τ 5 1; otherwise, if the first state can start
at or before t 5 1, then 1 $ τ $ 2ðD 2 d 2 1Þ. Usually, the second defi-
nition of the initial state distribution, fπj;d g, makes the computation of
30 Hidden Semi-Markov Models

the forward variables in the HSMM algorithms simpler. Then the set
of model parameters for the HSMM is defined by
λ  faði;hÞðj;dÞ ; bj;d ðvk1 :kd Þ; πj;d : i; jAS; h; dAD; vkd AVg;
or
λ  faði;hÞð j;dÞ ; bj;d ðvk1 :kd Þ; Πj;d : i; jAS; h; dAD; vkd AVg;
where vk1 :kd represents an observable substring of length d for
vk1 . . .vkd AVd 5 V 3 . . . 3 V. This general HSMM is shown in Figure 1.6.

The general HSMM is reduced to specific models of HSMM


depending on the assumptions made. For instances,

1. If the state duration is assumed to be independent of the previous


state, then the state transition probability can be further specified as
aði;hÞð j;dÞ 5 aði;hÞj pj ðdÞ, where
aði;hÞj  P½S½t11 5 jjS½t2h11:t 5 i (2.1)
is the transition probability from state i that has stayed for duration
h to state j that will start at t 1 1, and
pj ðdÞ  P½St11:t1d 5 jjS½t11 5 j (2.2)
is the probability of duration d that state j will take. This is the
model proposed by Marhasev et al. (2006). Compared with the gen-
eral HSMM, the number of model parameters is reduced from
M 2 D2 to M 2 D 1 MD, and the state duration distributions pj ðdÞ can
be explicitly expressed using probability density functions (e.g.,
Gaussian distributions) or a probability mass function.
2. If a state transition is assumed to be independent of the duration of
the previous state, then the state transition probability from (i,h) to
( j,d) becomes aði;hÞðj;dÞ 5 aiðj;dÞ , where
aið j;dÞ  P½S½t11:t1d 5 jjSt 5 i (2.3)
is the transition probability that state i ended at t and transits to
state j having duration d. If it is assumed that a state transition for
i 6¼ j is ði; 1Þ-ð j; dÞ and a self-transition is ði; d 1 1Þ-ði; dÞ, for
dAD, then the model becomes the residual time HMM (Yu and
Kobayashi, 2003a). In this model, the starting time of the state is not
of concern, but the ending time is of interest. Therefore, d represents
the remaining sojourn (or residual life) time of state j. This model is
General Hidden Semi-Markov Model 31

obviously appropriate to applications for which the residual life is of


the most concern. The number of model parameters is reduced to
M 2 D. More importantly, if the state duration is further assumed
to be independent of the previous state, then the state transition
probability can be specified as aið j;dÞ 5 ai; j pj ðdÞ. In this case, the
computational complexity will be the lowest among all HSMMs.
The number of model parameters is further reduced to M 2 1 MD.
3. If self-transition ði; dÞ-ði; d 1 1Þ is allowed and the state duration
is assumed to be independent of the previous state, then the state
transition probability becomes
!
d21
aði;hÞð j;dÞ 5 aði;hÞj L ajj ðτÞ ½1 2 ajj ðdÞ;
τ51

where aði;hÞj  P½S½t11 5 jjS½t2h11:t 5 i; ajj ðdÞ is the self-transition


probability when state j has stayed for d time units, that is,
ajj ðdÞ  P½St1d11 5 jjS½t11:t1d 5 j;

and 1 2 ajj ðdÞ 5 P½St1d 5 jjS½t11:t1d 5 j is the probability that state


j ends with duration d. This is the variable transition HMM
(Krishnamurthy et al., 1991; Vaseghi, 1991). In this model, a state
transition is either ði; dÞ-ð j; 1Þ for i 6¼ j or ði; dÞ-ði; d 1 1Þ for a
self-transition. This process is similar to the standard discrete-time
semi-Markov process. The concept of the discrete-time semi-Markov
process can thus be used in modeling an application. This model has
M 2 D 1 MD model parameters. The computational complexity is rel-
atively high compared with other conventional HSMMs.
4. If a transition to the current state is independent of the duration of
the previous state and the duration of the current state is only con-
ditioned on the current state itself, then
aði;hÞð j;dÞ 5 aij pj ðdÞ;
where aij  P½S½t11 5 jjSt 5 i is the transition probability from state
i to state j, with the self-transition probability aii 5 0. This is the
explicit duration HMM (Ferguson, 1980), with M 2 1 MD model
parameters and lower computational complexity. This is the sim-
plest and the most popular model among all HSMMs, with easily
understandable formulas and modeling concepts.
Besides, the general form bj;d ðvk1 :kd Þ of observation distributions can
be simplified and dedicated to applications. They can be parametric
32 Hidden Semi-Markov Models

(e.g., a mixture of Gaussian distributions) or nonparametric (e.g., a


probability mass function), discrete or continuous, and dependent
on or independent of the state durations. The observations can
be assumed dependent or conditionally independent for given states,
d
that is, bj;d ðvk1 :kd Þ 5 L bj ðvkτ Þ. The conditional independence makes
τ51
HSMMs simpler and so is often assumed in the literature.

2.2 FORWARDBACKWARD ALGORITHM FOR HSMM


For an observation sequence o1:T , the likelihood function for given
model parameters λ is
X
P½o1:T jλ 5 P½S1:T ; o1:T jλ:
S1:T

P
N P
n
Suppose S1:T 5ði1 ; d1 Þ?ðiN ;dN Þ, satisfying dn 5T. Let tn 5 dm .
n51 m51
N
Then P½S1:T ; o1:T jλ 5 L aðin21 ;dn21 Þðin ;dn Þ bin ;dn ðotn2111:tn Þ, where aði0 ;d0 Þði1 ;d1 Þ 5
n51
Πi1 ;d1 for simplicity. To sum over all possible S1:T , for all N $ 1,
i1 ; . . .; iN AS and d1 ; . . .; dN AD, the computational amount involved will
be huge. Therefore, a sum-product form algorithm, that is, a
forwardbackward algorithm is usually used in the literature. Now we
define the forward and backward variables.

The forward variables for HSMM are defined by


αt ð j; dÞ  P½S½t2d11:t 5 j; o1:t jλ (2.4)

and the backward variables by


β t ð j; dÞ  P½ot11:T jS½t2d11:t 5 j; λ: (2.5)

Based on the Markov property, the current/future observations are


dependent on the current state, for example,
P½ot2d11:t jS½t2d2h11:t2d 5 i; S½t2d11:t 5 j; λ 5 P½ot2d11:t jS½t2d11:t 5 j; λ

and
P½ot11:T jS½t2d11:t 5 j; S½t11:t1h 5 i; λ 5 P½ot11:T jS½t11:t1h 5 i; λ;
General Hidden Semi-Markov Model 33

and independent of the previous observations, for example,


P½S½t2d11:t 5 j; ot2d11:t jS½t2d2h11:t2d 5 i; o1:t2d ; λ
5 P½S½t2d11:t 5 j; ot2d11:t jS½t2d2h11:t2d 5 i; λ
and
P½ot1h11:T jS½t11:t1h 5 i; ot11:t1h ; λ 5 P½ot1h11:T jS½t11:t1h 5 i; λ:
Using these equations, it is easy to obtain the forwardbackward
algorithm for a general HSMM:
X
αt ð j; dÞ 5 P½S½t2d2h11:t2d 5 i; S½t2d11:t 5 j; o1:t jλ
i6¼j;h
X
5 αt2d ði; hÞUP½S½t2d11:t 5 j; ot2d11:t jS½t2d2h11:t2d 5 i; λ
i6¼j;h
X
5 αt2d ði; hÞUaði;hÞðj;dÞ UP½ot2d11:t jS½t2d11:t 5 j; λ
i6¼j;h
X
5 αt2d ði; hÞUaði;hÞðj;dÞ Ubj;d ðot2d11:t Þ;
i6¼j;h

(2.6)
for t . 0, dAD, jAS, and
X
β t ð j; dÞ 5 P½S½t11:t1h 5 i; ot11:T jS½t2d11:t 5 j; λ
i6¼j;h
X
5 aðj;dÞði;hÞ UP½ot11:T jS½t11:t1h 5 i; λ
i6¼j;h
X (2.7)
5 aðj;dÞði;hÞ Ubi;h ðot11:t1h ÞUP½ot1h11:T jS½t11:t1h 5 i; λ
i6¼j;h
X
5 aðj;dÞði;hÞ Ubi;h ðot11:t1h ÞUβ t1h ði; hÞ;
i6¼j;h

for t , T, dAD, jAS.

2.2.1 Symmetric Form of the ForwardBackward Algorithm


Though the backward formula (2.7) seems a little bit different from
the forward formula (2.6), it can be transformed to the same form
as the forward one. Now we derive the symmetric form of the
34 Hidden Semi-Markov Models

forwardbackward algorithm. If we use the starting time of the


given state to express the backward variables, that is, let
βvt2d11 ð j; dÞ 5 bj;d ðot2d11:t Þβ t ð j; dÞ, then the backward formula (2.7)
becomes
X
βvt2d11 ð j; dÞ 5 bj;d ðot2d11:t Þ að j;dÞði;hÞ βvt11 ði; hÞ
i6¼j;h

or
X
βvt ð j; dÞ 5 bj;d ðot:t1d21 Þ að j;dÞði;hÞ βvt1d ði; hÞ:
i6¼j;h

If we further denote the backward variables in the reverse time


order, that is, let t0 5 T 2 t 1 1, β 0t ð j; dÞ 5 βvT2t11 ð j; dÞ, o0t 5 oT2t11 ,
and a0ði;hÞðj;dÞ 5 aðj;dÞði;hÞ , then the backward formula becomes
X
β 0t ð j; dÞ 5 bj;d ðoT2t11:T2t111d21 Þ aðj;dÞði;hÞ βvT 2t111d ði; hÞ
i6¼j;h
X
5 bj;d ðo0t:t2d11 Þ aðj;dÞði;hÞ β 0t2d ði; hÞ
i6¼j;h
X
5 β 0t2d ði; hÞa0ði;hÞðj;dÞ bj;d ðo0t2d11:t Þ:
i6¼j;h

We can see that the backward recursion is exactly the same as the
forward formula (2.6) when it is expressed in the reverse time order.
This can potentially reduce the requirement for the silicon area on a
chip if the backward logic module uses the forward one. A symmetric
forwardbackward algorithm for the residual time model was intro-
duced by Yu and Kobayashi (2003a).

2.2.2 Initial Conditions


The initial conditions generally can have two different assumptions:

1. The general assumption of boundary conditions assumes that the first


state begins at or before observation o1 and the last state ends at or
after observation oT . In this case, we can assume that the process
starts at 2N and terminates at 1N. The observations out of the
sampling period [1,T] can be any possible values, that is, bj;d ðUÞ 5 1
General Hidden Semi-Markov Model 35

for any jAS; dAD. Therefore, in the forward formula (2.6)


bj;d ðot2d11:t Þ is replaced with the distribution bj;d ðo1:t Þ if t 2 d 1 1 # 1
and t $ 1, and in the backward formula (2.7) bi;h ðot11:t1h Þ is replaced
with bi;h ðot11:T Þ if t 1 1 # T and t 1 h $ T. We then have the initial
conditions for the forward recursion formula (2.6) as follows:

ατ ð j; dÞ 5 P½S½τ2d11:τ 5 jjλ 5 πj;d ; τ # 0; dAD; (2.8)

where fπj;d g can be the equilibrium distribution of the underlying


semi-Markov process. Because, for t 1 h $ T,

P½S½t11:t1h 5 i; ot11:T jS½t2d11:t 5 j; λ 5 aðj;dÞði;hÞ bi;h ðoTt11 Þ;

then from the backward recursion formula (2.7) we can see that
β t1h ði; hÞ 5 1, for t 1 h $ T. Therefore, the initial conditions for the
backward recursion formula (2.7) are as follows:

β τ ði; dÞ 5 1; τ $ T; dAD: (2.9)

If the model assumes that the first state begins at t 5 1 and the last
state ends at or after observation oT , it is a right-censored HSMM
introduced by Guedon (2003). Because this is desirable for many
applications, it is taken as a basis for an R package for analyzing
HSMMs (Bulla et al., 2010).
2. The simplified assumption of boundary conditions assumes that the
first state begins at time 1 and the last state ends at time T. This is
the most popular assumption one can find in the literature. In this
case, the initial conditions for the forward recursion formula (2.6) are

α0 ð j; dÞ 5 πj;d ; dAD;
ατ ð j; dÞ 5 0; τ , 0; dAD;

and the initial conditions for the backward recursion formula (2.7)
are

β T ði; dÞ 5 1; dAD;
β τ ði; dÞ 5 0; τ . T; dAD:

Note that the initial distributions of states can be assumed as


Πj;d  P½S½1:d 5 jjλ. Therefore, the initial conditions for the forward
recursion formula can be changed to αd ðj; dÞ 5 Π j;d bj;d ðo1:d Þ, for dAD,
and all others αt ðj; dÞ, for t 6¼ d and t # D, being zeros.
36 Hidden Semi-Markov Models

Therefore, the forwardbackward algorithm for the general HSMM


is as follows, where the self-transition probabilities a(i;h)(i;d) 5 0, for all
i, h, and d:

Algorithm 2.1 ForwardBackward Algorithm for the


General HSMM
The Forward Algorithm
1. For j 5 1; . . .; M and d 5 1; . . .; D, let α0 ðj; dÞ 5 πj;d ;
2. If the simplified assumption that the first state must start at t 5 1
is assumed, let ατ ðj; dÞ 5 0 for τ , 0; otherwise let ατ ðj; dÞ 5 πj;d
for τ , 0;
3. For t 5 1; . . .; T {
for j 5 1; . . .; M and d 5 1; . . .; D {
X
αt ðj; dÞ 5 at2d ði; hÞaði;hÞðj;dÞ bj;d ðot2d11:t Þ;
i;h

}
}
The Backward Algorithm
1. For j 5 1; . . .; M and d 5 1; . . .; D, let β T ðj; dÞ 5 1;
2. If the simplified assumption that the last state must end at t 5 T is
assumed, let β τ ðj; dÞ 5 0 for τ . T; otherwise, let β τ ðj; dÞ 5 1 for τ . T.
3. For t 5 T 2 1; . . .; 1 {
for j 5 1; :::; M and d 5 1; . . .; D {
X
β t ðj; dÞ 5 aðj;dÞði;hÞ bi;h ðot11:t1d Þβ t1d ði; hÞ;
i;h

}
}

2.2.3 Probabilities
After the forward variables fαt ðj; dÞg and the backward variables
fβ t ðj; dÞg are determined, all other probabilities of interest can be com-
puted. For instances, the filtered probability that state j starts at
t 2 d 1 1 and ends at t, with duration d, given partial observed
sequence o1:t , can be determined by
  αt ðj; dÞ
P S½t2d11:t 5 jjo1:t ; λ 5 ;
P½o1:t jλ
General Hidden Semi-Markov Model 37

and the predicted probability that state j will start at t 1 1 and end at
t 1 d, with duration d, given partial observed sequence o1:t by
X
αt ði; hÞaði;hÞðj;dÞ
  i6¼j;h
P S½t11:t1d 5 jjo1:t ; λ 5 ;
P½o1:t jλ

where
X XX X
P½o1:t jλ 5 P½St 5 j; o1:t jλ 5 P½S½t2d11:t1k 5 j; o1:t jλ
j j d 0 # k # D2d
XX X X
5 P½S½t2d11:t1k 5 j; o1:t1k jλ
j d 0 # k # D 2 d ot11:t1k
XX X X X
5 αt2d ði; hÞ aði;hÞðj;d1kÞ bj;d1k ðot2d11:t1k Þ:
j d i6¼j;h 0 # k # D2d ot11:t1k

These readily yield


P the filtered probability of state j ending at t,
P½St 5 jjo1:t ; λ 5 d P½S½t2d11:t 5 jjo1:t ; λ, and the predicted
P probabil-
ity of state j starting at t 1 1, P½S½t11 5 jjo1:t ; λ 5 d P½S½t11:t1d 5
jjo1:t ; λ.

The smoothed or posterior probabilities, such as P½St 5 jjo1:T ; λ,


P½St 5 i; St11 5 jjo1:T ; λ and P½S½t2d11:t 5 jjo1:T ; λ, for given entire
observation sequence o1:T and model parameters λ can be determined
by the following equations, for h; dAD, i; jAS, i 6¼ j, and t 5 1; . . .; T,

ηt ð j; dÞ  P½S½t2d11:t 5 j; o1:T jλ 5 αt ð j; dÞβ t ðj; dÞ; (2.10)

representing the joint probability that the observation sequence is o1:T


and the state is j having duration d by time t given the model,

ξ t ði; h; j; dÞ  P½S½t2h11:t 5 i; S½t11:t1d 5 j; o1:T jλ


(2.11)
5 αt ði; hÞaði;hÞðj;dÞ bj;d ðot11:t1d Þβ t1d ðj; dÞ;

representing the joint probability that the observation sequence is o1:T


and the transition from state i of duration h to state j of duration d
occurs at time t given the model,
ξt ði; jÞ  P½St 5 i; S½t11 5 j; o1:T jλ
XX (2.12)
5 ξ t ði; h; j; dÞ;
hAD dAD
38 Hidden Semi-Markov Models

representing the joint probability that the observation sequence is


o1:T and the transition from state i to state j occurs at time t given
the model,

γt ðjÞ  P½St 5 j; o1:T jλ


X
5 P½S½τ2d11:τ 5 j; o1:T jλ
τ; d :

τ$t$τ2d 11
(2.13)

X X
D
5 ητ ðj; dÞ;
τ$t d5τ2t11

representing the joint probability that the observation sequence is o1:T


and the state is j at time t given the model, and
X X
P½o1:T jλ 5 P½St 5 j; o1:T jλ 5 γt ð jÞ;
jAS jAS

being the likelihood probability that the observed sequence o1:T is


generated by the model λ. Then, the smoothed probabilities can be
obtained by letting:

ηt ðj; dÞ 5 ηt ðj; dÞ=P½o1:T jλ be the smoothed probability of being


in state j having duration d by time t given the model and the
observation sequence;
ξt ði; h; j; dÞ 5 ξt ði; h; j; dÞ=P½o1:T jλ the smoothed probability of
transition at time t from state i occurred with duration h to state j
having duration d given the model and the observation sequence;
ξt ði; jÞ 5 ξt ði; jÞ=P½o1:T jλ the smoothed probability of transition at
time t from state i to state j given the model and the observation
sequence; and
γt ðjÞ 5 γ t ðjÞ=P½o1:T jλ the smoothed probability of being in state j
at time t given the model and the observation sequence.

Obviously, the conditional factor P½o1:T jλ is common for all


the smoothed/posterior probabilities. Therefore, it is often omitted
for simplicity in the literature. Similarly, in the rest of this book,
Another random document with
no related content on Scribd:
wait for him until he had completed his business at Rome and then
continue with him the journey to the Philippines. This was the
beginning of a warm friendship between Bishop Brent and ourselves,
and no one can have lived in the Philippines since, or have been
familiar with the affairs of the Islands, without knowing what a
blessing his work and presence have been to the Philippine people,
and how much he has aided the Government in its task.
We engaged passage on the steamship Trave, sailing from New
York to Gibraltar about the middle of May; the day for our departure
was close at hand; many good-byes had been said; and, altogether,
the immediate future was looking bright, when suddenly I found
myself once more within the orbit of my unlucky star. My son Robert
chose this opportune moment to develop a case of scarlet fever. Of
course that left me and the children out of all the plans and I was
compelled to accept a hastily made arrangement which provided for
my remaining behind and following my husband and his interesting
party on a later ship. Fortunately Robert was not with the other
children when he contracted the disease. He was visiting friends in
another part of town and I had him removed immediately to the
Good Samaritan Hospital, then settled down to my vigil which might
be long or short as fortune decreed.
My husband’s mother was in Millbury while all these things were
transpiring and he called her up on the long distance telephone to
tell her about Bobby’s illness and to say good-bye.
“Then Nellie cannot go with you?’ said Mrs. Taft.
“No, I’m sorry to say she can’t,” said my husband.
“But you have now an extra stateroom, have you not?”
“Yes, Mother.”
“Well, Will, I don’t think you ought to make such a trip alone when
you are so far from strong, so I just think I’ll go with you in Nellie’s
place,” said my mother-in-law.
And she did. The intrepid old lady of seventy-four packed her
trunks and was in New York ready to sail within twenty-four hours,
and my husband wrote that she acted altogether with an energy and
an enterprise which filled him with pleasure and pride. On the
steamer, and later at the hotel Quirinal in Rome, she presided with
dignity for more than a month over a table at which daily gathered a
company composed of a Colonial Governor, a Supreme Judge, a
Roman Bishop, an Anglican Bishop and a United States Army officer.
Her activity and fearlessness kept her family and friends in a state
of astonishment a good part of the time. She went wherever she liked
and it never seemed to occur to her that it was unusual for a woman
of her age to travel everywhere with so much self-reliance. She
thought nothing of crossing the American continent every year to
visit her daughter or sister on the Pacific Coast, and out in Manila we
used to laugh at the possibility of her appearing on the scene at any
moment. In fact, she very seriously considered coming at one time. I
was glad that she could go with my husband to Rome because she
really could be a comfort and a help and not at all a responsibility.
Robert was not nearly as ill as we expected he would be and in a
few weeks I was able to make definite plans for joining my husband.
My sister, Mrs. Anderson, was going to Paris so I took advantage of
the opportunity to enjoy her companionship on the voyage and sailed
with her on the fourth of June, landing in France and going by train
to Rome.
That the record of our ill-luck may be quite complete I must add
that on the way across the Atlantic my son Charlie managed to pick
up whooping-cough, and that by the time we reached Rome he had
passed it on to Helen. Her first remark to her father was a plaintive
query: “Papa, why is it we can never go anywhere without catching
something?”
I devoutly hoped that we had caught everything there was to catch
and that we might now venture to predict a period of peace.
I found my party very comfortably bestowed. They were occupying
a whole floor at the Quirinal, the largest hotel then open in the city,
and were keeping what appeared to me to be considerable “state.” It
looked as if they had the entire building to themselves, but that was
because it was midsummer when few tourists visit Rome and when
all Roman society is supposed to flock to its mountain homes and to
northern resorts. However, midsummer though it was, a good many
members of the “Black,” or Vatican division of society, still lingered
in the city and I found them evincing every desire to make our stay
both pleasant and memorable. Before I arrived Mr. Taft had already
“met, called upon, taken tea with and dined with Cardinals, Princes,
counts, marquises, and distinguished Englishmen and Americans
resident in Rome,” to quote from one of his own letters, but he had a
good many things to do over again in my honour. He had also had an
audience with Pope Leo XIII, and was deep in the rather distracting
uncertainties and intricacies of his negotiations.
He did not have the pleasure of seeing the King of Italy whom he
had a great desire to meet, because, even though the American
Ambassador had made all the arrangements, etiquette did not permit
such an audience until his relations with the Vatican had terminated,
and by that time the King had gone to the military manœuvres in
North Italy.
My husband’s position was one of very great delicacy. By the
nature of our national institutions it is not possible for us to send a
representative to the Vatican in a diplomatic capacity no matter what
the emergency may be, and Mr. Roosevelt in sending this
Commission to Rome had no intention that its office should be
construed into a formal recognition of the Vatican, which could not
fail to raise a storm of protest and opposition in this country. So the
instructions given to Mr. Taft by Secretary Root were made very
definite on this point. After reviewing the necessity for taking such
action on the part of our government and covering the favourable
reports on the proposed negotiations submitted by the Philippine
Committees of the House and Senate, the instructions began with
paragraph one:
One of the controlling principles of our government is the complete separation of
church and state, with the entire freedom of each from any control or interference
by the other. This principle is imperative wherever American jurisdiction extends,
and no modification or shading thereof can be a subject of discussion.
Following this in numbered paragraphs, a tentative plan for the
adjustment of the Friar difficulties is outlined and the instructions
end with paragraph nine:
Your errand will not be in any sense or degree diplomatic in its nature, but will
be purely a business matter of negotiation by you as Governor of the Philippines
for the purchase of property from the owners thereof, and the settlement of land
titles in such a manner as to contribute to the best interests of the people of the
Islands.
These instructions were easier to receive than to carry out, since
from the beginning the Vatican made every possible effort to give the
mission a diplomatic aspect and to cast upon it the glamour of great
official solemnity, and Mr. Taft had constantly to keep his mind alert
to the danger of accidental acquiescence in a misinterpretation of his
position. To take a position which would soothe the feelings of
American Catholics and yet not shock the conscience of any
Protestant was something like being ground between the proverbial
millstones. However, Cardinal Rampolla very graciously met the
business-like ideas of the Commission and arranged a private
audience with Pope Leo at which the propositions of the Philippine
government were to be outlined to him.
My husband’s memory of this now historic mission to Rome seems
to include little which was not directly connected with the business in
hand, but Judge Smith displays a more impressionable bent. In
answer to an inquiry as to what he recalls of the visit he wrote Mr.
Taft a most interesting letter. All his memoranda of the trip,
including letters, journals and souvenirs, were destroyed in the San
Francisco fire, but he says:
“After our arrival there was a long wait that arrangements might
be made for an audience with the Holy Father, but finally the date
was fixed and the Commission, at high noon, in evening dress and
top hats, went to the Vatican and passed up the long staircase, lined
with Swiss Guards, which leads to the State apartments. We were
received by the Chamberlain and several other functionaries and
were conducted from one apartment to another until finally we were
ushered into the presence of Leo XIII, to whom you made a
statement of the matters which were to be made the subject of
negotiation.
“This statement had been previously translated into French by
Bishop O’Gorman and Colonel Porter, and you will remember there
were some things about Bishop O’Gorman’s French which did not
meet with the entire approval of Colonel Porter. Whether you
arbitrated the matter and selected the appropriate phrase which
should have been used I do not know, but I do know that at one time
there was danger of the severance of the friendly relations which had
theretofore prevailed between the good Bishop and the good old
Colonel.
“My recollection of the Holy Father is that his face was like
transparent parchment, that he had the brilliant eyes of a young man
and that he was wonderfully alert of mind, although bent over by the
weight of years.
“Of course, none of us could forget Cardinal Rampolla,—tall,
slender, straight, vigorous in both mind and body, impenetrable, and
cold as fate. A man evidently of wonderful intellect and fully equal to
any demands that might be put upon him as the diplomat of the
Vatican.”
I might add that the first part of my husband’s speech, a copy of
which I have, consisted of a few remarks appropriate to the
presentation of a gift from President Roosevelt to the Pope. This gift
was a specially bound set of Mr. Roosevelt’s own works.
When the formal interview was at an end the Pope came down
from the dais on which he sat and indulged in a fifteen or twenty
minute personal conversation with the members of the Commission.
“He asked for the pleasure of shaking my hand,” writes my husband
to his brother Charles, in the usual vein of humour which obtains
between them, adding, “a privilege which I very graciously accorded
him.” He also joked about Mr. Taft’s proportions, saying that he had
understood he had been very ill, but from observation he saw no
reason to suppose that the illness had been serious. He poked gentle
fun at Bishop O’Gorman and made kindly inquiries of Judge Smith
and Major Porter; then he walked with the party to the door and
bowed them out, a courtesy which I believe was unprecedented.
“He had a great deal more vigour of motion,” writes Mr. Taft, “and
a great deal more resonance of voice than I had been led to suppose.
I had thought him little more than a lay figure, but he was full of
lively interest and gesture, and when my address was being read he
smiled and bowed his head in acquiescence.”
“We visited the catacombs,” says Judge Smith, “St. Peter’s, St.
Paul’s beyond the walls, and a few of the basilicas of ancient Rome
now dedicated to Christian worship. The Borghese and various other
art galleries left their impression, as did some of the interesting old
palaces, notably the one which was then threatening to fall into the
Tiber, and the ceiling of which bears the famous fresco of Cupid and
Psyche.
“One day during our first wait we had dinner out at the American
College as guests of Monsignor Kennedy, where you (Mr. Taft) made
a speech which brought much applause from the students in red
cassocks, and everybody was happy. After dinner some of us made a
visit to a villa by the Orsini on the hills overlooking the Campagna,
which villa had recently been purchased by the college as a summer
home.

“You will remember our call on Cardinal Martinelli and the dinner
we had with good old Cardinal Satolli who took such a pride in the
wine produced by his own vineyards, a wine, by the way, which was
not unreservedly approved by the owners of other vineyards. One of
the most delightful experiences of all was our dinner with the good
Episcopal Rector, Dr. Nevin, when ox-tongue done in the Russian
style was served as the piece de resistánce. You cannot forget how
shocked were some of the circles in Rome to find Bishop O’Gorman
and myself at such a festal board under such circumstances, and how
Pope Leo showed his thorough understanding of American
institutions by saying that American Catholics might very properly
do things which would be very much misunderstood if done by
Romans. The Episcopal Rector was a mighty hunter, a great traveller,
and gifted with a fund of anecdote which made him a most delightful
host.”
I found this highly social and sociable party rather impatiently
awaiting a reply to their formal, written proposals to the Vatican
which had been turned over to a Commission of Cardinals. They
were giving a fine imitation of outward leisurely poise, but among
themselves they were expressing very definite opinions of the
seemingly deliberate delays to which they were being subjected. Mr.
Taft was anxious to sail for Manila on the 10th of July, and already
had his passage booked on the Koenig Albert, but the immediate
prospect seemed to be that he would be held in Rome for the rest of
the summer.
He did not have the greatest confidence that he would succeed in
the mission which meant so much to his future course in the Islands,
and, indeed, it was quite evident that he would not succeed without
prolonged effort to be continued after he left Rome. The various
Cardinals lost no opportunity to assure him that the Vatican was in
full sympathy with the proposals made and that he might expect a
very early and satisfactory termination of the business, but he
decided not to believe anything until he should see the signatures to
the contract. The factions and the politics of the Vatican were most
perplexing. The monastic orders were the conservative element in
the negotiations, being willing enough to sell the Friars’ lands at a
valuation to be decided upon by a board of five members, two
representing the church, two representing the United States
government and the fifth to be selected from some other country, but
they were not willing to consent to the withdrawal of the Friars from
the Philippine Islands. Then there were wheels within wheels; Papal
candidates and candidates for Cardinals who thrust into the
negotiations considerations for agreeing or not agreeing which
greatly puzzled the purely business-like representatives of the
American government.
But I was not particularly annoyed by the delay. I found much to
interest me in Rome, and I saw my husband improving in general
health and gaining the strength he needed for a re-encounter with
the difficulties in tropic Manila. Prominent Republican leaders had
aroused his impatience at different times by publicly announcing
that, in all probability, he was “going out to the Philippines to die.”
He wrote to his brother from Rome:
“I dislike being put in such an absurd position before the country
as that of playing the martyr. I’m not asking any favours on account
of health or any other cause, nor am I taking the position that I am
making any sacrifice. I think that a great and unusual opportunity
has been offered me and if I can improve it, all well and good, but I
don’t want any sympathy or emotional support.”
He was easily aroused to resentment on the subject, but, just the
same, it was gratifying to observe him quite rapidly regaining his
normal vigour and buoyancy.
My mother-in-law was having a most wonderful time. She was
comfortably established at the Quirinal in rooms next to ours, and
was enjoying the devoted attention of every man in the party whether
he wore ecclesiastical frock, military uniform or plain citizens’
clothes. She went everywhere and saw everything and was as
indefatigable in her enjoyment as any of us. She met old-time friends
whom she had known when she and Judge Taft were in the
diplomatic corps abroad, and with them she indulged in pleasant
reminiscence. After I arrived she became more energetic than ever
and led me a lively pace at sightseeing and shopping, because, as she
wrote to another daughter-in-law, Mrs. Horace Taft, “Nellie is not at
all timid and as she speaks French we can go anywhere.”
I soon found that in spite of official and personal protest to the
contrary we were considered quite important personages, and the
elaborate hospitality we were offered kept us busy at nearly all hours
when hospitality is at all in order. There were teas and luncheons,
dinners and receptions, and functions of every description, and we
met a great many renowned and interesting people, both Roman and
foreign. Mr. W. T. Stead, the correspondent for the London Times
who was lost on the Titanic, was one of them. Then there was Mr.
Laffan, proprietor of the New York Sun, and Mrs. Laffan, and Dr.
Hillis of Brooklyn who was in Rome with his son. An attractive
personality, who interested us very much and whose hospitality we
enjoyed, was Princess Rospigliosi, the wife of an Italian nobleman,
who lived in an enchanting house. She had a very beautiful daughter
who was at that time keenly interested in the controversy as to
whether or not Catholics should vote in Rome. She was strongly in
favour of their doing so and, with extraordinary directness, carried
her advocacy straight to the Pope and insisted that it was a great
mistake for Catholics not to take advantage of the ballot and by that
means secure the political rights to which they were entitled. Pope
Leo, although very much impressed by what she said, insisted that it
was not yet time to urge the reform suggested, and wound up by
saying, “My good daughter, you go altogether too fast for me!” I don’t
doubt that by this time the young Princess is a warm supporter of
woman’s suffrage.
Also, we were entertained by a Mr. McNutt who had been in our
diplomatic corps at one time in Madrid and Constantinople, at
another time had been tutor to the sons of the Khedive of Egypt, and
was then one of the Papal Chamberlains. He had married a woman of
wealth, a Miss Ogden of New York.
Mr. McNutt had one of the most elaborate and beautiful palaces I
ever saw. He had studied the customs of Roman society in the
picturesque days of the Medicis and the Borgias, had rented the
Pamphili Palace and restored it to its pristine glory, and it was here
that he entertained us at a dinner, with cards afterward.
I felt like an actor in a mediæval pageant whose costume had not
been delivered in time for the performance. Cardinals in their
gorgeous robes, with gold snuff-boxes, gave to the scene a high
colour among the soberer tones of Bishops and Archbishops and
uniformed Ambassadors. Then there were Princes and Princesses
and other nobilities of Roman society, the men displaying gay
ribands and decorations, the women in elaborate costumes, and all
in a “stage setting” as far removed from modernity as a magnificent
old-world palace could be. To make this reproduction of old customs
complete our host made a point of having liveried attendants with
flaming torches to light the Cardinals to and from their carriages.
Before I reached Rome, Mr. Taft and his associates had been
present at a Papal consistory at which the Pope presided over the
College of Cardinals. They were the guests of the Pope and occupied
the Diplomatic Box. I was sorry to miss this exceptional privilege, but
we were given ample opportunities for seeing and hearing several
noteworthy religious festivals both at St. Peter’s and the church of St.
John of Lateran. I was educated in the strictest Presbyterianism,
while my husband’s mother was a Unitarian, and Puritan in her
training and in all her instincts. We could not help feeling that we
had been led into a prominent position in a strange environment.
But, unshaken though we were in our religious affiliations, we
appreciated the real beauty of the ceremonies and knew that we
should rejoice in the unusual privilege accorded us which would
never be ours again.
It was near the end of our stay in Rome that we had our audience
with the Pope,—Mrs. Taft, Robert, Helen and I. I wore a black
afternoon gown with a black veil on my head, while Mrs. Taft wore
her widow’s veil as usual. Helen, I dressed in white and, to her very
great excitement, she wore a white lace veil. Bishop O’Gorman
accompanied us and when we reached the door of the Vatican under
the colonnade at the right of St. Peter’s, we were met by some
members of the Swiss Guard in their curious uniforms, conducted
through endless corridors and rich apartments until we came to a
small waiting-room where we were left for a few moments by
ourselves. We had only time to adjust our veils and compose
ourselves when the door on one side opened and we were
ceremoniously ushered into the presence of Leo XIII who sat on a
low chair under a simple canopy at the far end of the room. He rose
to greet us as we entered, and as we were presented one by one he
extended his hand over which we each bowed as we received his
blessing.
He began speaking to me in French and finding that I could
answer him in that language he talked with me for perhaps half an
hour with a most charmingly graceful manner of comment and
compliment. He spoke of Mr. Roosevelt’s present and wished that he
knew English so that he might read the books. He referred to Mr.
Roosevelt as “President Roomvine” which was as near as he seemed
to be able to get to that very un-Latin name; said that he himself, in
his youth, had been devoted to the chase and would like very much to
read “The Strenuous Life.”
Later he called Robert to his side and gave him a special blessing,
saying that he hoped the little boy would follow in the footsteps of
McKinley and Roosevelt. He asked Bob what he expected to be when
he grew up and my self-confident son replied that he intended to be
Chief Justice of the Supreme Court. I suppose he had heard the Chief
Justiceship talked about by his father until he thought it the only
worthy ambition for a self-respecting citizen to entertain.
When we arose to go, His Holiness escorted us to the door and
bowed us out with a kindly smile in his fine young eyes that I shall
never forget.
Shortly after this I left Rome. It was getting hot and my husband
persuaded me to take the children away, promising to join us for a
short breath of mountain air before he sailed for Manila. It had been
decided that I should remain in Europe for a month or so and I was
to choose the place best suited for recuperation. I went first to
Florence for a week, then to the Grande Albergo Castello de
Aquabella at Vallombrosa. The sonorous name of this hotel should
have been a sufficient warning to me of the expense of living there,
but I was not in a mood to anticipate any kind of unpleasant
experience.
It is a beautiful place reached by a funicular railway from a station
about fifteen miles from Florence, and is where Milton wrote parts of
Paradise Lost. The hotel was an old castle remodelled, and as we
were almost the only guests and were attended by relays of most
obsequious servants we managed to feel quite baronial. We spent our
time being as lazy as we liked, or driving in the dense black forests of
pine which cover the mountains and through vistas of which we
could catch fascinating glimpses of the beautiful, town-dotted valley
of the Arno some thousands of feet below.
On the 20th of July my husband came up and joined us in this
delightful retreat. He had just received his final answer from the
Vatican and, while he was disappointed at not being able to settle the
matter then, he was hopeful that a way had been found which,
though it would entail much future labour, would lead to a
satisfactory solution of the problems. An Apostolic Delegate,
representing the Vatican, was to be sent to Manila to continue the
negotiations on the ground, and Pope Leo assured Mr. Taft that he
would receive instructions to bring about such an adjustment as the
United States desired. This assurance was carried out, but only after
Leo’s long pontificate had come to an end.
The final note was written by Cardinal Rampolla who rendered
“homage to the great courtesy and high capacity” with which Mr. Taft
had filled “the delicate mission,” and closed by declaring his
willingness to concede that “the favourable result” must in a large
measure be attributed to my husband’s “high personal qualities.”
I had hoped to have Mr. Taft with us at Vallombrosa for a week or
so before he sailed, but the time allotted in our plans for this was
taken up by delays in Rome, so that when he did arrive he had only
twenty-four hours to stay. His final audience with the Pope was
arranged for the following Monday, there were a number of minor
details to be attended to, and he was to sail Thursday morning from
Naples on the Princess Irene, to which he had been obliged to
transfer from the Koenig Albert.
The last audience with His Holiness consisted chiefly in an
exchange of compliments and expressions of thanks for courtesies
extended, but it had additional interest in that the Pope chose to
make it the occasion for personally presenting to the members of the
party certain small gifts, or souvenirs, which he had selected for
them. He had previously sent an inquiry through Bishop O’Gorman
as to whether or not the Commissioners would accept decorations,
but Mr. Taft replied that the American constitution forbids the
acceptance of such honours without the consent of Congress, so
nothing more was said about it.
The presents he did receive were a handsome Jubilee medal
displaying a portrait of His Holiness in bas relief, and a gold pen in
the form of a large feather with the papal arms on it. To me the Pope
sent a small piece of old German enamel showing a copy of an
ancient picture of St. Ursula and her virgins, framed in silver and
gold beautifully wrought. Smaller gold medals were given to each of
the other Commissioners, while President Roosevelt received a copy
in mosaic of a picture of a view of Rome from a corner in the Vatican
gardens in which the Pope is seen seated with three or four Cardinals
in attendance. This, together with letters from His Holiness and
Cardinal Rampolla to the President and Mr. Hay, the Secretary of
State, was given to Bishop O’Gorman to be delivered when he arrived
in the United States.
My husband sailed from Naples on the 24th of July, and I, with the
three children and their French governess, started north by Venice
and Vienna to spend a few weeks in the mountains of Switzerland
before returning to Manila.
There were rather terrifying reports of a cholera epidemic raging
in the Philippines and I dreaded the prospects of going into it with
my children, but I knew that heroic efforts were being made to check
it and I felt confident that, in Manila at least, it would have run its
course before I should arrive, so I booked passage on the German
steamer Hamburg and on the 3rd of September sailed for the East
and the tropics once more.

SCENES ATTENDING GOVERNOR TAFT’S ARRIVAL IN


MANILA AFTER HIS FIRST ABSENCE
CHAPTER XII
LAST DAYS IN THE PHILIPPINES

When Mr. Taft reached Manila he found the city en fete and in a
state of intense excitement which had prevailed for two days during
which the people had expected every hour to hear the great siren on
the cold storage plant announce that the little Alava, the government
coastguard boat which had been sent to Singapore to get him, had
been sighted off Corregidor.
When the announcement finally came, everything in the harbour
that could manage to do so steamed down the Bay to meet him, and
when the launch to which he had transferred from the Alava came
up to the mouth of the Pásig River and under the walls of old Fort
Santiago, seventeen guns boomed out a Governor’s salute, while
whistles and bells and sirens all over the bay and river and city filled
the air with a deafening din.
Wherever his eyes rested he saw people,—crowding windows,
roofs, river banks and city walls, all of them cheering wildly and
waving hats or handkerchiefs. And the thing which moved him most
was the fact that the welcoming throng was not just representative of
the wealthy and educated class, but included thousands of the
people, barefooted and in calicoes, who had come in from the
neighbouring and even the far provinces to greet him.
Mrs. Moses asked Mr. Benito Legarda, one of the Filipino
members of the Commission, whether or not there had ever been a
like demonstration in honour of the arrival of a Spanish Governor,
and his answer was:
“Yes, there were demonstrations always, but the government paid
the expenses.”
In this case the very opposite was true. The government had no
money to waste on celebrations and all government buildings, such
as the City Hall, the Post Office and the Ayuntamiento, were
conspicuously bare. Their nakedness was positively eloquent of
economy in the midst of the riot of gay bunting, the flags, the
pennants and the palm leaves in which the rest of the city was
smothered. Then there were extraordinary and elaborate arches
spanning the streets through which the Governor was to be
conducted. One of these, erected by the Partido Federal, displayed a
huge allegorical picture which had a peculiar significance. Filipina, a
lovely lady draped in flowing gauze, was seen, in an attitude which
combined appeal with condescension, presenting to Columbia a
single star, implying that she desired to be accepted as one of the
States of the Union.
I am indebted to the descriptive art of Mrs. Moses, to photographs
and to my own knowledge of the Filipino way of doing things for the
mental picture I have of this celebration.
At the landing near the Custom House my husband found a great
procession in line, ready to escort him to the Ayuntamiento where
the speeches of welcome were to be made. There were regiments of
cavalry, infantry and artillery, as well as platoon after platoon of
native and American police with as many bands as there were
divisions of the procession. Picked men from the volunteer regiments
acted as a special guard for the Governor’s carriage and they must
have added much to the impressive array, because I know of my own
observation that the volunteers were always as fine a looking body of
men as it would be possible to find anywhere.
When Mr. Taft reached the Ayuntamiento he listened to glowing
speeches of tribute and welcome in the Marble Hall, then he stood
for hours shaking hands with the people who, in orderly file, passed
in and out of the building which was large enough to hold only a very
small fraction of them. When this was over and his audience had
settled down he proceeded to tell them in a clear and simple way all
about his experiences in Rome and how far the negotiations with the
Vatican had proceeded. This was a matter of paramount importance
to the Filipinos and they listened with an intensity of interest which
Mr. Taft said seemed to promise serious consequences if the business
could not be carried to a successful conclusion.
However, despite the joy and festivity with which he was greeted
upon his return, the Governor did not find general conditions in the
islands either prosperous or happy.
Everything that could possibly happen to a country had happened
or was happening. The cholera epidemic was still raging, and while it
had abated to a considerable extent in Manila it was at its worst in
Iloilo and other provinces. There had been from seventy to eighty
cases a day in Manila for a long time, and the quarantine regulations
had incensed the ignorant people to a point where force had to be
used to secure obedience. They did not understand sanitary
measures and wanted none of them; they clung to their superstitious
beliefs, and were easily made to accept as truth wild statements to
the effect that the Americans were poisoning the wells and rivers and
had stopped transportation and business with the sole purpose of
starving or otherwise destroying the entire population. Even the
educated ones were not without their time-honoured prejudices in
this regard, for while Mr. Taft was in Rome he receive a cabled
protest from Filipino members of the Commission with a request
that he order the quarantine raised.
When he arrived in Manila the cholera cases had fallen to between
ten and twenty a day and business had been resumed to a certain
extent, but the situation was still critical and a fresh outbreak on
account of polluted water was to be expected at any time. All the
sources of water supply were patrolled by American soldiers day and
night and every precaution was taken; whole sections of the city were
burned in an attempt to stamp out the pestilence, but the disease had
to run its course and it was months before it was completely
eradicated.
While the people were dying of cholera the carabaos, the only
draught and farm animals in the Islands, were dying by thousands of
an epidemic of rinderpest. This scourge, too, was fought with all the
force of both the civil and military arms of the government, but
before it could be checked it had carried off a large majority of the
carabaos in the Archipelago with the result that agriculture and all
other industries dependent upon this mode of transportation were
paralysed. A general drought in China made a rice famine a practical
certainty, even if the people should have money to buy rice, so the
future looked black indeed.
The cholera and rinderpest had greatly reduced government
revenues and many plans for much needed public works had to be
modified or abandoned, while the condition of the currency added to
the general chaos. There was no gold standard and the fluctuations
in the value of silver made it necessary for the Governor to issue a
proclamation about once a week fixing a new rate of exchange. In
this way it was calculated that the government, with insufficient
income at the best, lost a round million dollars gold during a period
of ten months.
To cap all and add exasperation to uneasiness the ladrones had
become increasingly active with hard times and were harrying the
districts around Manila to such an extent that the people were in
constant terror. The ravages of the rinderpest had made the carabao
a very valuable animal and the chief object of the ladrones was to
steal such as were left and drive them off to be sold in distant
provinces. Nor were they at all particular about their highwaymen’s
methods or chary of sacrificing human life. There was a veritable
hotbed of ladronism at Caloocan, a suburb of Manila, which was
augmented by the roughs and toughs from the crowded and
miserable districts in the lower city, while across the Bay in Cavite
province, known as the “mother of insurrection,” there were several
hundred rifles in the hands of marauders who hid away in the hills
and jungles and made conditions such that Mr. Taft was asked by the
Director of Constabulary to suspend the writ of habeas corpus, thus
declaring the province in a practical state of siege. Mr. Taft would not
do this, saying that he thought the only course was to “hammer away
with the constabulary until the abuse was stamped out by the regular
methods of supposedly peaceful times,” but the worst feature of the
situation was that wherever ladronism showed its head there would
be cohorts of “irreconcilables”—posing in everyday life as loyal
citizens—ready, within the limits of personal safety, to encourage and
assist it. Anything to hamper and harass the government.
Shortly after Mr. Taft’s arrival in Manila, the vice-Governor,
General Wright, and Mrs. Wright left the Islands for a well-earned
vacation and my husband wrote that the amount of work which
confronted him was staggering. He took on General Wright’s
department in addition to his own duties, and if it hadn’t been that
he had at least half way learned not to try to “hurry the East” he
probably would not have lasted long.
Among the first steps to be taken was to provide against the
inevitable famine, and to do this it was necessary for the Government
to send to China and Saigon for large quantities of rice to be stored in
public godowns. They bought and brought to Manila something like
forty million pounds of this first of all necessities to an oriental
people, and the intention was to sell it at cost when the market
supply began to run low and prices began to soar beyond the poor
man’s reach. A certain degree of paternalism has always been, is
now, and probably always will be necessary in the government of the
Filipino people.
Mr. Taft besought the United States Congress to appropriate a sum
to be used for the importation of work animals, for the purchase of
rice and the furnishing of work on public improvements. The
animals were not to be given away, but were eventually to be sold at
reasonable prices. Three millions were appropriated and spent.
Congress was also petitioned to establish a gold standard of
currency, and this too was done, to the inexpressible relief of
everybody interested in the Philippine welfare, in the following
January. The currency now is as sound as our own, every silver peso,
which corresponds to the old “dollar Mex,” being worth fifty cents
gold.
When I arrived in Manila in early October I found the situation
more interesting than it had ever been, even though it was
distracting to the men who had to deal with it. My first necessity was,
of course, to settle myself once more at Malacañan. During my
absence the old Palace had been all done over, painted and patched
and cleaned and redecorated until it was quite unlike its quaint, old
dilapidated self. Some of the colours were a shade too pronounced
and some of the decorations ran a little more to “graceful patterns”
than suited my taste, but I was glad of the added comfort and
cleanliness.
It was difficult in the beginning to accustom myself to cholera
conditions. The disease was communicated to very few Americans or
other white foreigners, but safety was secured at the price of eternal
vigilance. Water could not be drunk unless it was boiled under one’s
personal supervision; nothing uncooked could be eaten, not even a
piece of imported fruit, unless it had first been washed in a carbolic
solution, a process, I may say, which added nothing desirable to its
flavour; a good many other precautions were necessary which made
us feel as if we were living always in the lowering shadow of some
dreadful catastrophe, but, even so, we were surprisingly calm about
it—everybody was—and managed to come through the experience
without any visible ill-effects.
There was one new thing for me, and that was a live cow. For two
long years we had manfully striven to make ourselves believe that we
liked canned milk and condensed cream just as much as we liked the
fresh milk we had been used to all our lives. In fact, we were fond of
declaring that we couldn’t tell the difference. But we could. And in
our secret hearts we all welcomed as the most delectable treat an
occasional gift of skimmed milk from a friend who had been a
pioneer in the momentous venture of importing an Australian cow.
The importation of our cow was a real event, and she straightway
took up a position of great dignity and importance in our
establishment. She roamed at will about the grounds of the Palace
and her general conduct was the subject of daily comment in the
family circle. A number of people brought in cows about this time,
but very few of them lived long enough to prove their dairy worth.
Our cow flourished and gave forth large quantities of milk, and this
fact became the subject of what was supposed to be a huge joke.
Mr. Worcester, who was the high chief health authority in the
Islands, decreed that all animals as they were brought in should be
inoculated for rinderpest, tuberculosis, and a number of other things,
—“including prickly heat,” said General Wright,—but it just so
happened that a great majority of these scientifically treated beasts
died almost immediately, and General Wright could always arouse
the wrath of Mr. Worcester—a thing he loved to do—by suggesting
that the only reason our cow lived was because “she had not been
inoculated.”
The presence of the cow having given me a true farmer spirit—at
least, I suppose it was the cow—I decided to have a garden. There
were very few vegetables that the Filipinos knew how to raise at that
time, and our longing for fresh things was constant and intense. I
selected a promising looking spot behind the Palace, had it prepared
for planting, then I bought a supply of fresh American seeds and
carefully buried them in places where I thought they might develop
into something. The result was positively astonishing. The soil was
rich and the sun was hot, and in an incredibly short time we were
having quantities of beans and cauliflower and big red tomatoes and
all kinds of things.
My ambition grew with success and I branched out into poultry.
The first thing anybody knew I had a big screened yard full of
chickens and turkeys little and big, which were a source of great
enjoyment to us all both in their noisy feathered state in the chicken
yard and done up in a variety of Ah Sing styles on our very well
supplied table. I wonder how my cook made up the “squeeze” out of
which he was cheated by my industry and thrift.
But, dwelling on these minor details I am getting far ahead of my
story. There were many things in the meanwhile engaging my
attention, the most important of which, I suppose, was the great
church schism.
Gregorio Aglipay, an Ilocano priest of the Roman Catholic Church,
joined the original insurrection against Spain, or the Friars rather, at
its inception and was excommunicated. He became an insurgent
leader with a reputation for great cruelty, and continued in the field
against Spain, and subsequently against the United States, until
resistance was no longer possible. He was among the last insurrecto
chiefs to surrender in northern Luzon. When peace was restored he
began immediately to solicit the interest and aid of other Filipino
priests, of politicians and influential men in a plan for organising an
Independent Filipino Catholic Church, and his temporary success
must have surprised even him.
ARCH ERECTED BY THE PARTIDO
FEDERAL REPRESENTING FILIPINA
OFFERING ANOTHER STAR TO THE
AMERICAN FLAG

While the people loved catholicism, the failure of the Vatican to


accede to their wishes with respect to the Friars, as expressed by the
American Commission to Rome, added impetus to the rebellious
movement and when the announcement of the new organisation was
made it was found to be based on the strongest kind of support.
Aglipay constituted himself Obispo Maximo, assumed a fine regalia,
and conferred upon fifteen or more of his lieutenants the regular
church dignities and titles of a lesser order. He offered the people the
same ceremonies, the same relief, the same confessional, and the
same faith generally to which they had always been accustomed, so
they found it easy enough to transfer their allegiance, and the new

You might also like