Machine Learning Methods For Classifying Human Physical Activity From On-Body Accelerometers

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

http://www.mdpi.

com/1424-8220/10/2/1154/htm

Sensors 2010, 10(2), 1154-1175; doi:10.3390/s100201154


Article

Machine Learning Methods for Classifying Human


Physical Activity from On-Body Accelerometers
Andrea Mannini and Angelo Maria Sabatini

ARTS Lab, Scuola Superiore Sant'Anna, Piazza Martiri della Libert, 3356124 Pisa, Italy
*
Author to whom correspondence should be addressed; Tel.: +39-050-883415; Fax: +39050-883101.
Received: 31 December 2009; in revised form: 26 January 2010 / Accepted: 26 January 2010 /
Published: 1 February 2010

Abstract

: The use of on-body wearable sensors is widespread in several academic and industrial
domains. Of great interest are their applications in ambulatory monitoring and pervasive
computing systems; here, some quantitative analysis of human motion and its automatic
classification are the main computational tasks to be pursued. In this paper, we discuss how
human physical activity can be classified using on-body accelerometers, with a major
emphasis devoted to the computational algorithms employed for this purpose. In particular,
we motivate our current interest for classifiers based on Hidden Markov Models (HMMs).
An example is illustrated and discussed by analysing a dataset of accelerometer time series.
Keywords:
wearable sensors; accelerometers; motion analysis; human physical activity; machine
learning; statistical pattern recognition; Hidden Markov Models

1. Introduction
The availability of a system capable of automatically classifying the physical activity
performed by a human subject is extremely attractive for many applications in the field
of healthcare monitoring and in developing advanced human-machine interfaces. By the
term physical activity, we mean either static postures, such as standing, sitting, lying, or
dynamic motions, such as walking, running, stair climbing, cycling, and so forth. More
precisely, we distinguish in this paper between primitives, namely elementary activities
like the ones just mentioned, and composite activities, namely sequences of primitives,
e.g., sitting-standing-walking-standing-sitting, in as much the same way as we
distinguish between words and sentences in a spoken language.
The information on the human physical activity is valuable in the long-term
assessment of biomechanical parameters and physiological variables. Think, for
instance, of the limitations when the metabolic energy expenditure of a human subject
is estimated using indirect methods: serious estimation errors may occur when wearable
sensor systems composed of motion sensors, such as accelerometers, are used without
any regard to what she/he is actually doing [1,2]. The information on the physical
activity is also valuable as a source of contextual knowledge [3]. Provided that this
information is available, the human-machine interaction would be more complex and
richer [4]. In robotics, several applications which demand some capability by the robot
of recognising the users intent are, for instance, in the field of rehabilitation
engineering, where smart walking support systems are currently developed to assist
motor-impaired persons and elderly while they attempt to stand or to walk [ 57].
Mostly, the physical interaction between the user and the walking aid takes place
through handles instrumented with force/torque sensors [8]; the signals acquired from
these sensors can be exploited not only for guidance purposes, but also for gaining
some form of contextual awareness [9]. In some cases, proximity/range sensing or even
inertial sensing are used to detect incipient gait instabilities of the user [ 10,11], in order
that a prompt response by the walking aid controller may be issued in the attempt, e.g.,
to minimise the risk of fall [11].
In this paper the most common approaches to automatic classification of human
physical activity are introduced and discussed. In regard to the problem stated above,
the main steps regarding sensor selection, data acquisition, feature selection, extraction
and classification are reviewed by tracing the diagram of Figure 1. As for the machine
learning techniques needed for classification, particular emphasis is given here to
Markov modelling. Albeit identification of context without requiring external

supervision seems better suited to make intelligent systems [12], most current
approaches in the field are based on using supervised machine learning techniques. The
use of Hidden Markov Models (HMMs) is attractive, although they are known
potentially plagued by severe difficulties of parameter estimation. In this paper we
exploit an annotated dataset of signals from on-body accelerometers in order to test
several classification algorithms, including HMMs with supervised learning. Results of
a validation study are presented.

Figure 1. Conceptual scheme of a generic classification system with supervised


learning.

2. Methods for Automatic Classification of Human Physical Activity


2.1. Wearable sensors and data acquisition

The first important aspect to be considered in building a system for automatic


classification of human physical activity concerns the choice of sensors. Wearable sensors
should be small and lightweight, in order to be fastened to the human body without

compromising the users comfort and allowing her/him to perform under unrestrained
conditions as much as possible. Although ultrasonic or electromagnetic localisation systems
[13], opto-electronic marker-based [14] or markerless systems [15] all represent possible
choices, common to all of them is the limitation that external sources are generally
required, which restricts their sensing range, and lead to additional difficulties, i.e.,
occlusions and interference. Inertial sensors are an interesting choice, since they are selfcontained, immune to occlusions and interference, although the processing is seriously
limited by sensor noise and drift, which prevent them from delivering accurate
position/orientation data beyond few seconds or minutes, unless a very sophisticated and
complex filtering is applied to raw sensor signals [16]. This is true especially for those
technologies that are the most promising in terms of cost, burden, and power consumption,
namely microelectromechanical systems (MEMS) accelerometers and gyros [17]. Most
features of MEMS inertial sensors seem to fit well with the requirements of motions
sensors for biomechanical applications, which motivates their growing use and great
interest amongst the practitioners in the field [18]. The main reason for their widespread
acceptance is that they allow, in principle, to perform quantitative functional assessment in
unrestrained conditions: tested subjects do not easily incur in those behavioural artefacts
which are typical when standard motion analysis technology is used in a specialised
laboratory [19].
Historically, accelerometers entered the biomechanical arena well in advance to gyros.
Few pioneering contributions [20,21] highlight the idea that the acceleration field of
any rigid part of the human body can be measured and reconstructed by user-worn
accelerometers, which may ultimately lead to compute the pose and orientation of this part.
Interesting works reported in the literature over the years concern, among other aspects, the
estimation of head motions [22], and the estimation of spatio-temporal parameters of gait
[23]. More recently, the availability of miniature MEMS vibrating gyros has fostered
several research reports, where they are used for applications in gait analysis, either alone
or in combination with accelerometers [24,25]. Moreover, recent developments concern the
integration of triads of accelerometers and gyros with mutually orthogonal sensitive axes
within three-dimensional strap-down inertial navigation systems that are proposed for
applications in virtual reality, pedestrian navigation, robotics, and so forth [18]; oftentimes,
they are used in combination with additional navigation aids, including Global Positioning
System (GPS) receivers and magnetometers, to provide position/velocity and attitude
navigation data [26].
Interestingly, using accelerometers is also commonplace in many other biomedical
applications, such as tremor analysis [27], assessment of physical activity [28] and
quantification of metabolic energy expenditure [29], where the computational techniques of
interest do not require error-prone procedures for nonlinear differential equations systems
integration from noisy data and uncertain initial conditions. In these applications, the
computational techniques of interest have to do mainly with the implementation of machine

learning algorithms, which are often aimed at performing nonlinear multivariate regressions
and pattern recognition.
2.2. Feature evaluation
A pattern recognition machine does not perform its classification tasks working
directly on the raw sensor data. Usually, the classification is pursued after that a data
representation is built in terms of feature variables. The choice of features with high
information content for classification purpose is both a fundamental step in the
development of any pattern recognition machine and a highly problem-dependent task.
An accelerometerthe sensor of main interest in this papermeasures the projection
along its sensitive axis of the specific force f applied to the body it is fastened. The
specific force additively combines the linear acceleration component a, due to body
motion, and the gravitational acceleration component, gboth projected along the
sensitive axis of the accelerometer [18]. In common parlance, the high-frequency
component, aka the AC component, is related to the dynamic motion the subject is
performing, e.g., walking, hand weaving, head shaking, and so forth, while the lowfrequency component of the acceleration signal, aka the zero-frequency (DC)
component, is related to the influence of gravity, hence it can be exploited to identify
static postures [30]. This is a key point in specifying the feature variables of interest,
which are usually evaluated from the raw sensor data within sliding windows with
finite and constant width, henceforth called data frames.
Although the choice of features is problem-specific, and different researchers may
pursue different approaches for their identification and computation [31], the features
proposed in this paper are quite popular amongst the practitioners in the field [32].
The DC component of acceleration is estimated by taking the signal average from the
data samples within each frame. Since each accelerometer axis provides a data frame,
the DC component feature vector can be conveniently used to get an idea about how the
body is oriented in space with respect to the gravity direction. The DC component is
thus well suited to classify postures.
Simple statistical descriptors, such as the variance, are widely used; the variance is
computed by taking the average of the squared detrended data samples within each
frame. The signal energy and the distribution of signal energy over the frequency
domain are other popular choices. Frequency-domain features can be derived from the
coefficients of time-frequency transforms, like the Short Time Frequency Transform
(STFT), the Continuous or the Discrete Wavelet Transform (CWT, DWT) [ 3234]. Beside

their role as motion signatures, energy features can also be used to assess the strength of
the motor act, the importance of which in assessing the energy expenditure incurred by
the subject is well recognised in the literature [14,35].
The frequency-domain entropy is helpful in discriminating primitives that differ in
complexity. As a matter of fact, walking and cycling can be difficult to discriminate
based on the DC component and energy features; however, the walking entropy turns
out to be much higher than the cycling entropy, mainly because of the foot impacts with
ground occurring during walking, which give rise to the distinctive high-frequency
coloured noise-like signatures typically observed in the signals from on-body
accelerometers. In this paper, the coefficients of the STFT transform are used to compute
the frequency-domain entropy [32].
The correlation coefficients between each pair of accelerometer signals are also useful
features. They are obtained by computing the dot product of pairs of frame vectors,
normalised to their length, and are highly helpful in discriminating activities that
involve motions of several body parts [36].

2.3. Feature selection and extraction


When the dimension of the feature space is high, learning the parameters of a
classifier becomes a difficult task, especially when the size of the training set is small
(the curse of dimensionality). Usually, one individuates, empirically or based on
theoretically sound considerations, as many features as needed to deal with the
classification problem at hand. The available dataset is then divided into a training set
and a test set. As a rule of thumb, the n/d ratio between the number of
instances navailable in the training set and the dimension d of the feature-space must be
at least ten. Since the achievable performance of a classification algorithm tends to
critically depend on the dimension of the feature space, methods for reduction of
dimensionality are oftentimes considered in developing the classifier. These methods
are based on two main approaches: feature selection and feature extraction [37].
The feature selection approach consists of detecting and discarding the features that
are demonstrated to minimally help to cause a correct response by the classifier. The

identification of the optimal feature set is not always feasible because of the high
computational costs connected to searching through an inordinate number of mdimensional subsets (1 m d). Usually, the feature selection step is implemented via
sub-optimal search algorithms, such as, for instance, the branch-and-bound search, the
sequential forward-backward selection (SFS-SBS), the Pudil algorithm based on a
sequential forward-backward floating search (SFFS-SFBS). Of particular interest are the
sequential search algorithms; these are iterative procedures that add and/or remove a
fixed or variable number of features at each step, while assessing the effects of these
modifications according to pre-defined quantitative criteria. One of such criteria is
based on computing the Euclidean distances between each pair of feature vectors in the
training set (k-Nearest Neighbour, k-NN). The ratio between inter-class and intra-class
distances is then maximised across the various feature subsets. Other criteria can be
devised by analysing the classifier output: the computational costs of these criteria are
generally high, however the assessment procedure turns out to be oriented at the very
goal of the classification process.
The feature extraction approach revolves around the idea that data representations
can be constructed in subspaces with reduced dimension, while at the same retaining, if
not increasing, the discriminative capability of the new set of feature variables [ 37]. This
may happen at the expense of losing their physical meaning. By far, the most popular
feature extractor is the principal component analysis (PCA) or Karhunen-Love
transform, that transforms feature variables into a smaller number of uncorrelated
variables called principal components. In this approach, upon eigenvalue analysis of
the d d data covariance matrix, the new feature vectors are the eigenvectors associated
to the m largest eigenvalues. Another approach, similar in concept, is the independent
component analysis (ICA), often applied in problems of blind source separation: a PCA
is followed by a data whitening transformation, with the aim of finding the independent
components of a process, namely the attempt is made to reduce the process to its
additive components [34].
Feature selection and feature extraction are not necessarily cascaded in some
predefined order. Oftentimes, for instance, a feature selection algorithm is either
applied to data that have been previously subjected to dimensionality reduction by
feature extraction, or without a successive extraction step.

2.4. Taxonomy of classifiers


A taxonomy of classifiers can be built according to different criteria [ 37]. First, a
distinction is between supervised and unsupervised classifiers. In supervised classifiers,
the training set is labelled, namely the membership of the feature vectors to a given
class is known to the system in advance. According to an unsupervised approach, only
the number of classes C is known, and the system responds to the instances in the
training set by assigning a label to each of them. Second, single-frame and sequential
approaches to classification can be distinguished. A single-frame classifier works by
assigning a label to each data frame it receives at its input, in isolation from the history
of previous assignments. Conversely, a sequential classifier takes the past classifications
into account in order to orient the decision on the current feature vector. The classifiers
can be further divided according to three main approaches: probabilistic, geometric,
and template matching. Table 1summarises the state of the art for classification of
human physical activity; succinct information is also included, as for sensor and feature
type and number; method of classification; number of activities and tested subjects;
accuracy of classification.

In accordance to the rules of the probabilistic approach, a feature vector x is classified


as belonging to the class which turns into the maximum value of the class-conditional

PDFs p(x|C ), i = 1, , C. The class-conditional PDF denotes how likely is a feature


vector to belong to a given class. An example of probabilistic classifiers is the optimal
Bayesian classifier. Since class-conditional PDFs are usually not known, suboptimal
implementations have to be considered, e.g., naive Bayesian, logistic, Parzen and
Gaussian Mixture Model (GMM) classifiers [32]. The Parzen classifier provides an
estimate of the class-conditional PDF by, e.g., applying a kernel density estimator to the
labelled feature vectors in the training set, while a GMM classifier estimates classconditional PDFs using mixtures of multivariate normal PDFs [38].
i

In the geometric approach the classification is performed based upon the construction
of decision boundaries in the feature space that specify regions for each class. Decision
boundaries are constructed during the training session via iterative procedures or
geometrical considerations. As a matter of fact, Artificial Neural Networks (ANN) are
based on iteratively tessellating the feature space [33], whereas k-Nearest Neighbour (kNN) classifiers, and Nearest Mean (NM) classifiers work directly on the geometrical
distances between feature vectors from different classes [39]. Finally, Support Vector
Machines (SVM) classifiers are geometric-based classifiers that construct boundaries
maximising the margins between the nearest features relative to two distinct classes
[14]. Another popular approach is the threshold-based classifier, as noted in Table 1.
A carefully handcrafted setting of thresholds is required in order to separate the
various classes under examination. For instance, a threshold based on an energy-related
feature, or simply the data variance, helps discriminate between presence and absence
of motion. The main disadvantage of this approach is its potential sensitivity to intra
and inter individual variations and to the precise placement of sensors. In this sense
extensive handcrafting of classifier parameters is believed to be detrimental for
achieving good generalisation properties of the classifier itself [46].
The template matching approach is based on the concept of similarity between
observed data and activity templates, either defined by the designer or obtained during
the training session. The editing and condensing techniques, customarily applied to kNN classifiers, can be useful for defining the templates. A classification that is based on
individual reference patterns appears to be less susceptible than threshold-based
classification, although careful sensor placement is critical for achieving good test-retest
reliability. Applications of the template matching approach can be found, e.g., in [ 19]. In
spite that they are widely used in classifying human physical activities, threshold-based
and template matching methods are not tested in this paper.
Finally, there exist so-called binary classifiers, where the classification process is
articulated in several different steps. At each step, different strategies, based on either

threshold-based or template-matching detectors, are followed to reach a binary


decision. For instance, in hierarchical binary decision trees each node is capable of
discriminating between two states, and the classification becomes progressively more
refined as the tree is descended along its branches [28].

2.5. Background on Markov models and Hidden Markov Models


Although the single-frame methods are quite widespread for classification of human
physical activity, a possibly better way to deal with this problem is to exploit the
decisions taken by the classifier in the past (sequential approach to classification). If we
turn our attention to a sequential classification approach, a composite activity (motor
sentence) can be conveniently viewed as the result of chaining a number of primitives
(motor words). The knowledge about the way humans organise the functional tasks they
are involved in during their daily life (motor language) can help describing the
statistical properties of this chaining process. The sequential approach calls quite
naturally for Markov modelling [48]. Henceforth, we assume that a composite activity
can be modelled as a first-order Markov chain, composed of a finite number Q of
states S ; each state accounts for a primitive. The time evolution of a first-order Markov
chain is governed by the following quantities:
i

prior probability vector , with size (1 Q); it is composed of the


probabilities of each state S of being the state X at the initial time t :
i

i=Pr[X(t0)=Si], i=1,...,Q

Figure 2. Graphical representation of a six-state


Markov chain: the nodes are the states of the chain; the oriented arcs between nodes
denote state-to-state transitions, including self-transitions.

aij=Pr[X(tn+1)=Sj|X(tn)=Si], i,j=1,...,Qaij=Pr[X(tn+1)=Sj|X(tn)=Si], i,j=1,...,Q


(2)

Elementary considerations of probability calculus yield the following constraints for


the transition probabilities:

aij0, j=1Qaij=1aij0, j=1Qaij=1


(3)
The prior and transition probabilities needed to create the Observable Markov Model
(OMM) (, A) associated to the Markov chain can be empirically determined based on
observations of the activity behaviour of a subject. If the TPM and the state at the
current time are known, then the most likely state that will follow is probabilistically
determined. In a more practical sense, each primitive can only be observed through a
set of raw sensor signals (the measured time series from on-body accelerometers, in the
present case). We would like to infer the hidden state from the available noisy
observations, and to trace the time history of how the primitives have evolved up to the
present time, in order to estimate the composite activity. In other words, the states are

hidden and only a second-level process is actually observable. The observable outputs
are called emissions.
If the assumption is made that the emissions are discrete, an alphabet containing a
finite number W of possible emissions Z , i = 1, , W is dealt with. The statistical model
is called Hidden Markov Model (HMM); its specification requires a Q W stochastic
matrix that contains the probabilities b of getting an emission Z at time t from the
state S :
i

ij

bij=Pr[Z(tn)=Zj|X(tn)=Si]bij=Pr[Z(tn)=Zj|X(tn)=Si]
(4)
where:

bij0 j=1Wbij=1bij0 j=1Wbij=1


(5)
Finally, an HMM is modelled by a parameter set that accounts for prior, transition
and emission probabilities:

=(,A,B)

If the emissions are continuous, continuous PDFs are to be assigned, instead of


probability mass functions (continuous emissions HMM, aka cHMM). The most common
approach to the problem of modelling continuous emissions is parametric. A given
distribution family is assumed for the emissions, and the parameters associated to the
family are used to fully specify them. For instance, for a Gaussian cHMM we have:
bj=m=1McjmN(jm,jm),

j=1,...,Qbj=m=1McjmN(jm,jm),

j=1,...,Q

(7)
where:
m=1Mcjm=1, j=1,...,Qm=1Mcjm=1,

j=1,...,Q

(8)

A mixture of M multivariate normal distributions N(jm, jm) with mean value jm,
covariance matrix jm and mixing parameters cjm is used to model the emissions from each
state in the chain.

The HMM modelling framework requires that three main problems are solved, (a) thru
(c): given an observation sequenceZ = [Z(t1)Z(t2)Z(tT)] and a model , evaluate: (a) the
conditional probability P(Z|); (b) the most likely sequence of statesX = [X(t1)X(t2)X(tT)]
occupied by the system; (c) identify the parameters of the model . The Viterbi algorithm is
the most widespread solver of problem (b) and the Baum-Welch algorithm is popular for
tackling problem (c). An excellent reference source for HMMs and algorithms for their
learning and testing is [49].
2.6. HMM-based sequential classifiers

Currently, HMMs are applied in a large number of pattern recognition problems. For
many years, speech recognition has been considered the killing application for HMM [49].
More recently, other applications have been investigated with remarkable achievements,
just to mention a few of them, in developing systems for hand gesture recognition [50], sign
language recognition [51], and functional assessment of human skills [52]. Specific
applications to classification of human physical activity as pursued in this paper are
relatively scarce [53]. Indeed, they seem to be more elusive as compared with the previous
ones, in the face of the great variety of human motor behaviours [32]. Nonetheless, it is
tempting to assume that primitives combine in time to form a composite activity as
prescribed by a simple Markov model.

In this paper we propose to build a sequential classifier composed of a Gaussian


cHMM. A potential problem with this approach is the huge number of parameters we
need to estimate. In fact, a Gaussian cHMM trained in a d-dimensional feature
space, Q primitives to be classified and M components for each mixture requires the
specification of the following parameters:

, prior probability vector, 1 Q;

A, transition probability matrix Q Q;

, set of mean value matrices, Q M d;

, set of covariance matrices, Q M d d;

C, set of mixing parameters, Q M.

Suppose that the training set presents only a relatively limited number of examples. A
sensible approach to deal with the difficulty of parameter estimation may be to train,
separately, different subsets of them. We propose to train the transition
parameters, i.e., and A separately from the emission parameters, i.e., , , and C, by
exploiting the annotations available in the dataset. As for the transition parameters,
since their labelling is known, the composite activities in the training set are assumed
generated by a Q-state OMM, the state and transition probabilities of which can be
estimated by event counting. The emission parameters specify the Gaussian
multivariate PDFs in the same way as the class-conditional PDFs are specified in
probabilistic classifiers, such as, for instance, GMMs. As a whole, we refer to this
initialisation phase as the first-level training phase. The values of the parameters
estimated during the first-level training phase can be further refined by on-the-fly runs
of the Baum-Welch algorithm (second-level training phase); this trick may help adapting
the cHMM behaviour, in particular, to unexpected TPM changes.
Finally, an interesting feature of the classifier we have developed resides in its
capability of managing spurious data. One difficulty for the classifier is in fact when
activity primitives are presented during operation, and examples of them are not
included in the training set. Our approach to deal with this problem consists of
computing the likelihood of each feature vector, given the GMM structure that models
the cHMM emissions. A simple threshold-based detector enables to flag anomalous
feature vectors, preventing them from being actually presented to the classifier. Figure
3 shows the block diagram of the sequential cHMM-based classifier; in the figure we
also indicated the block for spurious frame rejection.

3. Validation Study
At the time being, we are developing a wearable sensor system for indoor-outdoor
pedestrian navigation, which embodies the following sub-systems: an on-body network
of four tri-axial accelerometers, an on-foot fully integrated Inertial Measurement Unit
(IMU) that includes a triad of magnetometers, and finally a waist-worn GPS receiver.
Since the hardware and firmware components of this system are currently undergoing
their production phase, the validation of the classification methods studied in this
paper is based on analysing a dataset of acceleration waveforms, made available to us
by Prof. Intille and associates at MIT [32].
3.1. Dataset for physical activity classification

The classification methods were applied to the dataset described in [32]. Acceleration
data, sampled at 76.25 Hz, were acquired from five bi-axial accelerometers, located at the
hip, wrist, arm, ankle, and thigh. The original protocol was based on testing 20 subjects,
who were requested to perform 20 activities (Figure 4). In this paper, 13 subjects were
randomly selected for further analysis, in order to ease the development work. Moreover,
because of our interest for personal navigation based on on-foot inertial sensing, we
considered just the seven activities shown in Table 2, which involved primarily the use of
the subjects lower limbs.

Since the research goal in [32] was exclusively to test single-frame classifiers, the
available data for each subject concerned acceleration time series that were known to
correspond to each primitive. Their work was thus at the level of motor words. Our
validation study of single-frame classifiers followed their approach, although we opted for
a subject-specific training,i.e., a distinct classifier was trained for each individual subject.
As for the cHMM-based sequential classifier, we built a Q-state OMM with known model
(, A) in order to generate motor sentences from the vocabulary of motor words in Table
2(Q = 7). The simulation of a composite activity by a single subject (virtual experiment)
was made by associating, for each tested subject, one data frame to each OMM state. The
associated data frame was randomly sampled (with replacement) from the maximum
number N of frames available in the reduced dataset for each primitive and subject (18
N 58). A number S = 20 of virtual experiments was synthesised, each of which
composed of T = 300 data frames. A subset of P virtual experiments was included in the
training set.
The procedure of synthesising virtual experiments in the manner described above
implied the existence of clear-cut borders between data frames associated to different
primitives, which were managed by data cropping in creating the original dataset [32]. Of
course, real-life composite activities would be more complex, due to, say, fuzzy postural
transitions in the data. In the attempt to get a more realistic picture of the cHMM-based
sequential classifier performance, data frames from the original dataset not included in the
reduced dataset were thus randomly interspersed in the tested data sequences generated by
the OMM, in variable proportions, from null to 1:3 (max.). The resulting garbage was
managed in our system by labelling data frames as spurious if their likelihood given the
GMM structure that models the cHMM was below a properly settled threshold, as
described in Section 2.6.
3.2. Feature vectors

The feature vectors were built from 50%-overlapping sliding windows with 512 samples.
Since the sampling frequency was 76.25 Hz, each data frame lasted 6.7 seconds, with every
new frame available every 3.35 s. The DC component, the energy, the frequency-domain
entropy, and the correlation coefficients were calculated for inclusion in the feature vector.
In order to evaluate the entropy, the PDF of the STFT coefficients was estimated using an
Epanechnikov kernel density estimator. Since five dual-axis accelerometers were
considered in the experimental setup, each feature vector was composed of 30 components,
which yielded the DC component, energy and entropy for the 10 data channels, plus 55
correlation coefficients (d = 85). Different selection algorithms were considered using
the k-NN criterion. The maximum value for the criterion of selection was obtained by the
SFFS method (Pudil algorithm), yielding an optimal subset of d = 17 feature components,
amongst which 4 DC components and 13 correlation coefficients were found. The 17thdimensional feature vectors were not submitted to any feature extraction step.

3.3. Single-frame classification algorithms

The single-frame classification algorithms included in Table 3 were trained and tested
using data frames from the reduced dataset.

You might also like