Machine Learning Methods For Classifying Human Physical Activity From On-Body Accelerometers
Machine Learning Methods For Classifying Human Physical Activity From On-Body Accelerometers
Machine Learning Methods For Classifying Human Physical Activity From On-Body Accelerometers
com/1424-8220/10/2/1154/htm
ARTS Lab, Scuola Superiore Sant'Anna, Piazza Martiri della Libert, 3356124 Pisa, Italy
*
Author to whom correspondence should be addressed; Tel.: +39-050-883415; Fax: +39050-883101.
Received: 31 December 2009; in revised form: 26 January 2010 / Accepted: 26 January 2010 /
Published: 1 February 2010
Abstract
: The use of on-body wearable sensors is widespread in several academic and industrial
domains. Of great interest are their applications in ambulatory monitoring and pervasive
computing systems; here, some quantitative analysis of human motion and its automatic
classification are the main computational tasks to be pursued. In this paper, we discuss how
human physical activity can be classified using on-body accelerometers, with a major
emphasis devoted to the computational algorithms employed for this purpose. In particular,
we motivate our current interest for classifiers based on Hidden Markov Models (HMMs).
An example is illustrated and discussed by analysing a dataset of accelerometer time series.
Keywords:
wearable sensors; accelerometers; motion analysis; human physical activity; machine
learning; statistical pattern recognition; Hidden Markov Models
1. Introduction
The availability of a system capable of automatically classifying the physical activity
performed by a human subject is extremely attractive for many applications in the field
of healthcare monitoring and in developing advanced human-machine interfaces. By the
term physical activity, we mean either static postures, such as standing, sitting, lying, or
dynamic motions, such as walking, running, stair climbing, cycling, and so forth. More
precisely, we distinguish in this paper between primitives, namely elementary activities
like the ones just mentioned, and composite activities, namely sequences of primitives,
e.g., sitting-standing-walking-standing-sitting, in as much the same way as we
distinguish between words and sentences in a spoken language.
The information on the human physical activity is valuable in the long-term
assessment of biomechanical parameters and physiological variables. Think, for
instance, of the limitations when the metabolic energy expenditure of a human subject
is estimated using indirect methods: serious estimation errors may occur when wearable
sensor systems composed of motion sensors, such as accelerometers, are used without
any regard to what she/he is actually doing [1,2]. The information on the physical
activity is also valuable as a source of contextual knowledge [3]. Provided that this
information is available, the human-machine interaction would be more complex and
richer [4]. In robotics, several applications which demand some capability by the robot
of recognising the users intent are, for instance, in the field of rehabilitation
engineering, where smart walking support systems are currently developed to assist
motor-impaired persons and elderly while they attempt to stand or to walk [ 57].
Mostly, the physical interaction between the user and the walking aid takes place
through handles instrumented with force/torque sensors [8]; the signals acquired from
these sensors can be exploited not only for guidance purposes, but also for gaining
some form of contextual awareness [9]. In some cases, proximity/range sensing or even
inertial sensing are used to detect incipient gait instabilities of the user [ 10,11], in order
that a prompt response by the walking aid controller may be issued in the attempt, e.g.,
to minimise the risk of fall [11].
In this paper the most common approaches to automatic classification of human
physical activity are introduced and discussed. In regard to the problem stated above,
the main steps regarding sensor selection, data acquisition, feature selection, extraction
and classification are reviewed by tracing the diagram of Figure 1. As for the machine
learning techniques needed for classification, particular emphasis is given here to
Markov modelling. Albeit identification of context without requiring external
supervision seems better suited to make intelligent systems [12], most current
approaches in the field are based on using supervised machine learning techniques. The
use of Hidden Markov Models (HMMs) is attractive, although they are known
potentially plagued by severe difficulties of parameter estimation. In this paper we
exploit an annotated dataset of signals from on-body accelerometers in order to test
several classification algorithms, including HMMs with supervised learning. Results of
a validation study are presented.
compromising the users comfort and allowing her/him to perform under unrestrained
conditions as much as possible. Although ultrasonic or electromagnetic localisation systems
[13], opto-electronic marker-based [14] or markerless systems [15] all represent possible
choices, common to all of them is the limitation that external sources are generally
required, which restricts their sensing range, and lead to additional difficulties, i.e.,
occlusions and interference. Inertial sensors are an interesting choice, since they are selfcontained, immune to occlusions and interference, although the processing is seriously
limited by sensor noise and drift, which prevent them from delivering accurate
position/orientation data beyond few seconds or minutes, unless a very sophisticated and
complex filtering is applied to raw sensor signals [16]. This is true especially for those
technologies that are the most promising in terms of cost, burden, and power consumption,
namely microelectromechanical systems (MEMS) accelerometers and gyros [17]. Most
features of MEMS inertial sensors seem to fit well with the requirements of motions
sensors for biomechanical applications, which motivates their growing use and great
interest amongst the practitioners in the field [18]. The main reason for their widespread
acceptance is that they allow, in principle, to perform quantitative functional assessment in
unrestrained conditions: tested subjects do not easily incur in those behavioural artefacts
which are typical when standard motion analysis technology is used in a specialised
laboratory [19].
Historically, accelerometers entered the biomechanical arena well in advance to gyros.
Few pioneering contributions [20,21] highlight the idea that the acceleration field of
any rigid part of the human body can be measured and reconstructed by user-worn
accelerometers, which may ultimately lead to compute the pose and orientation of this part.
Interesting works reported in the literature over the years concern, among other aspects, the
estimation of head motions [22], and the estimation of spatio-temporal parameters of gait
[23]. More recently, the availability of miniature MEMS vibrating gyros has fostered
several research reports, where they are used for applications in gait analysis, either alone
or in combination with accelerometers [24,25]. Moreover, recent developments concern the
integration of triads of accelerometers and gyros with mutually orthogonal sensitive axes
within three-dimensional strap-down inertial navigation systems that are proposed for
applications in virtual reality, pedestrian navigation, robotics, and so forth [18]; oftentimes,
they are used in combination with additional navigation aids, including Global Positioning
System (GPS) receivers and magnetometers, to provide position/velocity and attitude
navigation data [26].
Interestingly, using accelerometers is also commonplace in many other biomedical
applications, such as tremor analysis [27], assessment of physical activity [28] and
quantification of metabolic energy expenditure [29], where the computational techniques of
interest do not require error-prone procedures for nonlinear differential equations systems
integration from noisy data and uncertain initial conditions. In these applications, the
computational techniques of interest have to do mainly with the implementation of machine
learning algorithms, which are often aimed at performing nonlinear multivariate regressions
and pattern recognition.
2.2. Feature evaluation
A pattern recognition machine does not perform its classification tasks working
directly on the raw sensor data. Usually, the classification is pursued after that a data
representation is built in terms of feature variables. The choice of features with high
information content for classification purpose is both a fundamental step in the
development of any pattern recognition machine and a highly problem-dependent task.
An accelerometerthe sensor of main interest in this papermeasures the projection
along its sensitive axis of the specific force f applied to the body it is fastened. The
specific force additively combines the linear acceleration component a, due to body
motion, and the gravitational acceleration component, gboth projected along the
sensitive axis of the accelerometer [18]. In common parlance, the high-frequency
component, aka the AC component, is related to the dynamic motion the subject is
performing, e.g., walking, hand weaving, head shaking, and so forth, while the lowfrequency component of the acceleration signal, aka the zero-frequency (DC)
component, is related to the influence of gravity, hence it can be exploited to identify
static postures [30]. This is a key point in specifying the feature variables of interest,
which are usually evaluated from the raw sensor data within sliding windows with
finite and constant width, henceforth called data frames.
Although the choice of features is problem-specific, and different researchers may
pursue different approaches for their identification and computation [31], the features
proposed in this paper are quite popular amongst the practitioners in the field [32].
The DC component of acceleration is estimated by taking the signal average from the
data samples within each frame. Since each accelerometer axis provides a data frame,
the DC component feature vector can be conveniently used to get an idea about how the
body is oriented in space with respect to the gravity direction. The DC component is
thus well suited to classify postures.
Simple statistical descriptors, such as the variance, are widely used; the variance is
computed by taking the average of the squared detrended data samples within each
frame. The signal energy and the distribution of signal energy over the frequency
domain are other popular choices. Frequency-domain features can be derived from the
coefficients of time-frequency transforms, like the Short Time Frequency Transform
(STFT), the Continuous or the Discrete Wavelet Transform (CWT, DWT) [ 3234]. Beside
their role as motion signatures, energy features can also be used to assess the strength of
the motor act, the importance of which in assessing the energy expenditure incurred by
the subject is well recognised in the literature [14,35].
The frequency-domain entropy is helpful in discriminating primitives that differ in
complexity. As a matter of fact, walking and cycling can be difficult to discriminate
based on the DC component and energy features; however, the walking entropy turns
out to be much higher than the cycling entropy, mainly because of the foot impacts with
ground occurring during walking, which give rise to the distinctive high-frequency
coloured noise-like signatures typically observed in the signals from on-body
accelerometers. In this paper, the coefficients of the STFT transform are used to compute
the frequency-domain entropy [32].
The correlation coefficients between each pair of accelerometer signals are also useful
features. They are obtained by computing the dot product of pairs of frame vectors,
normalised to their length, and are highly helpful in discriminating activities that
involve motions of several body parts [36].
identification of the optimal feature set is not always feasible because of the high
computational costs connected to searching through an inordinate number of mdimensional subsets (1 m d). Usually, the feature selection step is implemented via
sub-optimal search algorithms, such as, for instance, the branch-and-bound search, the
sequential forward-backward selection (SFS-SBS), the Pudil algorithm based on a
sequential forward-backward floating search (SFFS-SFBS). Of particular interest are the
sequential search algorithms; these are iterative procedures that add and/or remove a
fixed or variable number of features at each step, while assessing the effects of these
modifications according to pre-defined quantitative criteria. One of such criteria is
based on computing the Euclidean distances between each pair of feature vectors in the
training set (k-Nearest Neighbour, k-NN). The ratio between inter-class and intra-class
distances is then maximised across the various feature subsets. Other criteria can be
devised by analysing the classifier output: the computational costs of these criteria are
generally high, however the assessment procedure turns out to be oriented at the very
goal of the classification process.
The feature extraction approach revolves around the idea that data representations
can be constructed in subspaces with reduced dimension, while at the same retaining, if
not increasing, the discriminative capability of the new set of feature variables [ 37]. This
may happen at the expense of losing their physical meaning. By far, the most popular
feature extractor is the principal component analysis (PCA) or Karhunen-Love
transform, that transforms feature variables into a smaller number of uncorrelated
variables called principal components. In this approach, upon eigenvalue analysis of
the d d data covariance matrix, the new feature vectors are the eigenvectors associated
to the m largest eigenvalues. Another approach, similar in concept, is the independent
component analysis (ICA), often applied in problems of blind source separation: a PCA
is followed by a data whitening transformation, with the aim of finding the independent
components of a process, namely the attempt is made to reduce the process to its
additive components [34].
Feature selection and feature extraction are not necessarily cascaded in some
predefined order. Oftentimes, for instance, a feature selection algorithm is either
applied to data that have been previously subjected to dimensionality reduction by
feature extraction, or without a successive extraction step.
In the geometric approach the classification is performed based upon the construction
of decision boundaries in the feature space that specify regions for each class. Decision
boundaries are constructed during the training session via iterative procedures or
geometrical considerations. As a matter of fact, Artificial Neural Networks (ANN) are
based on iteratively tessellating the feature space [33], whereas k-Nearest Neighbour (kNN) classifiers, and Nearest Mean (NM) classifiers work directly on the geometrical
distances between feature vectors from different classes [39]. Finally, Support Vector
Machines (SVM) classifiers are geometric-based classifiers that construct boundaries
maximising the margins between the nearest features relative to two distinct classes
[14]. Another popular approach is the threshold-based classifier, as noted in Table 1.
A carefully handcrafted setting of thresholds is required in order to separate the
various classes under examination. For instance, a threshold based on an energy-related
feature, or simply the data variance, helps discriminate between presence and absence
of motion. The main disadvantage of this approach is its potential sensitivity to intra
and inter individual variations and to the precise placement of sensors. In this sense
extensive handcrafting of classifier parameters is believed to be detrimental for
achieving good generalisation properties of the classifier itself [46].
The template matching approach is based on the concept of similarity between
observed data and activity templates, either defined by the designer or obtained during
the training session. The editing and condensing techniques, customarily applied to kNN classifiers, can be useful for defining the templates. A classification that is based on
individual reference patterns appears to be less susceptible than threshold-based
classification, although careful sensor placement is critical for achieving good test-retest
reliability. Applications of the template matching approach can be found, e.g., in [ 19]. In
spite that they are widely used in classifying human physical activities, threshold-based
and template matching methods are not tested in this paper.
Finally, there exist so-called binary classifiers, where the classification process is
articulated in several different steps. At each step, different strategies, based on either
i=Pr[X(t0)=Si], i=1,...,Q
hidden and only a second-level process is actually observable. The observable outputs
are called emissions.
If the assumption is made that the emissions are discrete, an alphabet containing a
finite number W of possible emissions Z , i = 1, , W is dealt with. The statistical model
is called Hidden Markov Model (HMM); its specification requires a Q W stochastic
matrix that contains the probabilities b of getting an emission Z at time t from the
state S :
i
ij
bij=Pr[Z(tn)=Zj|X(tn)=Si]bij=Pr[Z(tn)=Zj|X(tn)=Si]
(4)
where:
=(,A,B)
j=1,...,Qbj=m=1McjmN(jm,jm),
j=1,...,Q
(7)
where:
m=1Mcjm=1, j=1,...,Qm=1Mcjm=1,
j=1,...,Q
(8)
A mixture of M multivariate normal distributions N(jm, jm) with mean value jm,
covariance matrix jm and mixing parameters cjm is used to model the emissions from each
state in the chain.
The HMM modelling framework requires that three main problems are solved, (a) thru
(c): given an observation sequenceZ = [Z(t1)Z(t2)Z(tT)] and a model , evaluate: (a) the
conditional probability P(Z|); (b) the most likely sequence of statesX = [X(t1)X(t2)X(tT)]
occupied by the system; (c) identify the parameters of the model . The Viterbi algorithm is
the most widespread solver of problem (b) and the Baum-Welch algorithm is popular for
tackling problem (c). An excellent reference source for HMMs and algorithms for their
learning and testing is [49].
2.6. HMM-based sequential classifiers
Currently, HMMs are applied in a large number of pattern recognition problems. For
many years, speech recognition has been considered the killing application for HMM [49].
More recently, other applications have been investigated with remarkable achievements,
just to mention a few of them, in developing systems for hand gesture recognition [50], sign
language recognition [51], and functional assessment of human skills [52]. Specific
applications to classification of human physical activity as pursued in this paper are
relatively scarce [53]. Indeed, they seem to be more elusive as compared with the previous
ones, in the face of the great variety of human motor behaviours [32]. Nonetheless, it is
tempting to assume that primitives combine in time to form a composite activity as
prescribed by a simple Markov model.
Suppose that the training set presents only a relatively limited number of examples. A
sensible approach to deal with the difficulty of parameter estimation may be to train,
separately, different subsets of them. We propose to train the transition
parameters, i.e., and A separately from the emission parameters, i.e., , , and C, by
exploiting the annotations available in the dataset. As for the transition parameters,
since their labelling is known, the composite activities in the training set are assumed
generated by a Q-state OMM, the state and transition probabilities of which can be
estimated by event counting. The emission parameters specify the Gaussian
multivariate PDFs in the same way as the class-conditional PDFs are specified in
probabilistic classifiers, such as, for instance, GMMs. As a whole, we refer to this
initialisation phase as the first-level training phase. The values of the parameters
estimated during the first-level training phase can be further refined by on-the-fly runs
of the Baum-Welch algorithm (second-level training phase); this trick may help adapting
the cHMM behaviour, in particular, to unexpected TPM changes.
Finally, an interesting feature of the classifier we have developed resides in its
capability of managing spurious data. One difficulty for the classifier is in fact when
activity primitives are presented during operation, and examples of them are not
included in the training set. Our approach to deal with this problem consists of
computing the likelihood of each feature vector, given the GMM structure that models
the cHMM emissions. A simple threshold-based detector enables to flag anomalous
feature vectors, preventing them from being actually presented to the classifier. Figure
3 shows the block diagram of the sequential cHMM-based classifier; in the figure we
also indicated the block for spurious frame rejection.
3. Validation Study
At the time being, we are developing a wearable sensor system for indoor-outdoor
pedestrian navigation, which embodies the following sub-systems: an on-body network
of four tri-axial accelerometers, an on-foot fully integrated Inertial Measurement Unit
(IMU) that includes a triad of magnetometers, and finally a waist-worn GPS receiver.
Since the hardware and firmware components of this system are currently undergoing
their production phase, the validation of the classification methods studied in this
paper is based on analysing a dataset of acceleration waveforms, made available to us
by Prof. Intille and associates at MIT [32].
3.1. Dataset for physical activity classification
The classification methods were applied to the dataset described in [32]. Acceleration
data, sampled at 76.25 Hz, were acquired from five bi-axial accelerometers, located at the
hip, wrist, arm, ankle, and thigh. The original protocol was based on testing 20 subjects,
who were requested to perform 20 activities (Figure 4). In this paper, 13 subjects were
randomly selected for further analysis, in order to ease the development work. Moreover,
because of our interest for personal navigation based on on-foot inertial sensing, we
considered just the seven activities shown in Table 2, which involved primarily the use of
the subjects lower limbs.
Since the research goal in [32] was exclusively to test single-frame classifiers, the
available data for each subject concerned acceleration time series that were known to
correspond to each primitive. Their work was thus at the level of motor words. Our
validation study of single-frame classifiers followed their approach, although we opted for
a subject-specific training,i.e., a distinct classifier was trained for each individual subject.
As for the cHMM-based sequential classifier, we built a Q-state OMM with known model
(, A) in order to generate motor sentences from the vocabulary of motor words in Table
2(Q = 7). The simulation of a composite activity by a single subject (virtual experiment)
was made by associating, for each tested subject, one data frame to each OMM state. The
associated data frame was randomly sampled (with replacement) from the maximum
number N of frames available in the reduced dataset for each primitive and subject (18
N 58). A number S = 20 of virtual experiments was synthesised, each of which
composed of T = 300 data frames. A subset of P virtual experiments was included in the
training set.
The procedure of synthesising virtual experiments in the manner described above
implied the existence of clear-cut borders between data frames associated to different
primitives, which were managed by data cropping in creating the original dataset [32]. Of
course, real-life composite activities would be more complex, due to, say, fuzzy postural
transitions in the data. In the attempt to get a more realistic picture of the cHMM-based
sequential classifier performance, data frames from the original dataset not included in the
reduced dataset were thus randomly interspersed in the tested data sequences generated by
the OMM, in variable proportions, from null to 1:3 (max.). The resulting garbage was
managed in our system by labelling data frames as spurious if their likelihood given the
GMM structure that models the cHMM was below a properly settled threshold, as
described in Section 2.6.
3.2. Feature vectors
The feature vectors were built from 50%-overlapping sliding windows with 512 samples.
Since the sampling frequency was 76.25 Hz, each data frame lasted 6.7 seconds, with every
new frame available every 3.35 s. The DC component, the energy, the frequency-domain
entropy, and the correlation coefficients were calculated for inclusion in the feature vector.
In order to evaluate the entropy, the PDF of the STFT coefficients was estimated using an
Epanechnikov kernel density estimator. Since five dual-axis accelerometers were
considered in the experimental setup, each feature vector was composed of 30 components,
which yielded the DC component, energy and entropy for the 10 data channels, plus 55
correlation coefficients (d = 85). Different selection algorithms were considered using
the k-NN criterion. The maximum value for the criterion of selection was obtained by the
SFFS method (Pudil algorithm), yielding an optimal subset of d = 17 feature components,
amongst which 4 DC components and 13 correlation coefficients were found. The 17thdimensional feature vectors were not submitted to any feature extraction step.
The single-frame classification algorithms included in Table 3 were trained and tested
using data frames from the reduced dataset.