A new classification strategy for human activity recognition using cost sensitive support vector machines for imbalanced data

Bilal M’hamed Abidine (Faculty of Electronics and Computer Sciences, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria)

Belkacem Fergani (Faculty of Electronics and Computer Sciences, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria)

Mourad Oussalah (School of Electronics, Electrical and Computer Engineering, University of Birmingham, Birmingham, UK)

Lamya Fergani (Faculty of Electronics and Computer Sciences, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria)

Kybernetes

ISSN: 0368-492X

Article publication date: 26 August 2014

Downloads

344

pdf (316 KB)

Abstract

Purpose

The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues.

Design/methodology/approach

In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem.

Findings

The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors.

Originality/value

Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.

Keywords

Citation

M’hamed Abidine, B., Fergani, B., Oussalah, M. and Fergani, L. (2014), "A new classification strategy for human activity recognition using cost sensitive support vector machines for imbalanced data", Kybernetes, Vol. 43 No. 8, pp. 1150-1164. https://doi.org/10.1108/K-07-2014-0138

Publisher

:

Emerald Group Publishing Limited

1. Introduction

As the number of elderly people continuously increases in western countries as well as third-word countries as the result of enhanced health care and living standards, the associated costs to support elderly population in performing basic Activities of Daily Living (ADL) such as cooking, brushing, dressing, cleaning, bathing and so on substantially increased as well (Wallace, 2007; Tapia et al., 2004; Kasteren et al., 2008; Fleury et al., 2010). In order to ensure their comfort and because the healthcare infrastructures are unlikely to accommodate the drastic growth of elderly population, it is suggested to assist such population at home, and thus enabling them to live longer on their own. However, this requires a minimal monitoring, which can minimize staff intervention, and, thereby, reduce the cost. For this purpose, a wireless sensor network needs to be installed in the house, whose data analysis using machine learning like technique provides an indication on the type of activity the person is involved in. This requires to build activity models and perform further means of pattern recognition (Bishop, 2006; Vapnik, 2000). The learning of such models is usually done in a supervised manner (human labeling) and requires a large annotated data sets recorded in different settings (Tapia et al., 2004; Kasteren et al., 2008; Fleury et al., 2010).

However, activity recognition data sets are generally imbalanced, meaning certain activities occur more frequently than others (e.g. sleeping is generally done once a day, while toileting is done several times a day). This can influence negatively the learning process due to known effect of minority class, which, in turn, imbalances the outcome, and may yield disastrous consequences for elderly person. Recently, the class of imbalance problem has been recognized as a crucial problem in machine learning community (Chawla, 2010; Chawla et al., 2003, 2004). Indeed, most classifiers assume a balanced distribution of classes and roughly equal misclassification costs for each class and therefore, they perform poorly in predicting the minority class for imbalanced data (Weiss, 2004; Weiss and Provost, 2003). Compared with other standard classifiers, Support Vector Machines (SVM) is acknowledged for its good performances when dealing with moderately imbalanced data (Vapnik, 2000) due to its ability in removing samples located far away from the decision boundary rule. However, it has also been recognized that the separating hyperplane of an SVM model developed with an imbalanced data set can be skewed toward the minority class (Akbani et al., 2004), which, in turn, can degrade the classification performance of the minority class. This motivates extensive research that aims to improve the effectiveness of SVM on imbalanced classification (Chawla et al., 2004; Akbani et al., 2004; Raskutti and Kowalczyk, 2004; Wu and Chang, 2005; Chen et al., 2005). Especially, approaches for addressing the imbalanced training-data problem can be categorized into two main streams: data processing approach, referred to as called external methods, and algorithmic approach referred to as internal methods. At the data level, these solutions can be divided into either oversampling (Raskutti and Kowalczyk, 2004) where new samples are created for the minority class, or undersampling where the samples are eliminated for the majority class, or some combination of the above two. Vilarino et al. (2005) used Synthetic Minority Oversampling Technique (SMOTE) oversampling proposed in Chawla et al. (2002) and random undersampling for SVM modeling on an imbalanced intestinal contractions detection task. At the algorithmic level, the solutions include adjusting the costs associated to misclassification (Thai-Nghe et al., 2010; Veropoulos et al., 1999; Abidine et al., 2013) through tuning the probabilistic estimate at the tree leaf (when working with decision trees) as well as the decision threshold, and next employing a recognition-based approach (i.e. learning from one class) rather than discrimination-based learning (i.e. distinguishing two distinct classes) (Raskutti and Kowalczyk, 2004). Akbani et al. (2004) proposed the SMOTE with Different Costs algorithm (SDC). SDC conducts SMOTE oversampling on the minority class with different error costs. Wu and Chang (2005) proposed the Kernel Boundary Alignment algorithm (KBA) that adjusts the boundary toward the majority class by modifying the kernel matrix. Many of these solutions are discussed in the papers (Chawla et al., 2003, 2004; Weiss, 2004). In addition to the naturally occurring class imbalance problem, the imbalanced data situation may also occur in one-against-rest schema in multiclass classification. Therefore, even though the training data are balanced, issues related to the class imbalance problem can eventually occur.

The main contributions of this work are twofold. First, a new classification strategy combining the SMOTE with the discriminative method named Soft-margin Support Vector Machines (C-SVM) (Vapnik, 2000) using a new criterion (Abidine et al., 2013) for the selection of the cost parameter C to appropriately tackle the problem of imbalanced class is suggested. Secondly, the performances of the proposed using data issued from sensor networks in smart homes (Tapia et al., 2004; Kasteren et al., 2008) are quantified and compared to traditional C-SVM, Cost-Sensitive-SVM (CS-SVM), Hidden Markov Model (HMM) (Bishop, 2006; Rabiner, 1989) and Conditional Random Fields (CRF) (Sutton and McCallum, 2006), where HMM and CRF have recently gained popularity in recognition activity field (Tapia et al., 2004; Kasteren et al., 2008).

The rest of this paper is organized as follows. Section 2 briefly describes the main modeling algorithms employed in our analysis; namely, HMM, CRF, C-SVM, CS-SVM. Section 3 discusses our suggested classification strategy. Section 4 presents the setup and discusses the results acquired through a series of experiments using different highly imbalanced data sets under different metrics. Finally, in Section 5 we summarize our findings and the future work.

2. Generative vs discriminative classifiers for activity recognition

2.1 Problem formulation and notations

First the raw sensors data are divided in time slices of constant interval Δt. At time t, sensor i is assigned a binary value one (Inline Equation 1) to indicate that the sensor i has been activated at least once in the time interval [t t+Δt], otherwise (Inline Equation 2). Given a set of N sensors, the observation at time t is given by the binary vector (Inline Equation 3). An activity at time slice t is given by y _t with (Inline Equation 4), where K stands for the total number of daily living activities. Examples of such activities include Leaving, Toileting, Showering, Sleeping, Drinking, Preparing Dinner, etc. The recognition task consists of finding a mapping between a sequence of observations (Inline Equation 5) and a sequence of labels (Inline Equation 6) for a total of T time steps. For sake of representation, we shall also use notations x and y to refer to x _1:T and y _1:T.

2.2 HMM

The HMM is a classic way of a probabilistic modeling of a sequential process, consisting of a hidden variable y and an observable variable x at each time step. In our case the observable variable is the vector of sensor readings and the hidden variable is the activity to recognize. The model endorses two commonly dependency assumptions:

The observable variable at time t, namely, x _t, depends only on the hidden variable y _t, where the hidden variable corresponds in this case to the activity performed, while the observable variable represents the binary sensor reading.
The hidden variable at time t, namely, y _t, depends only on the previous hidden variable y _t-1 (Markov assumption, Chen et al., 2005).

With these assumptions, the joint probability of the hidden and observable variables p(y_1:T, x_1:T), can be factorized as:

(Equation 7)

The model (1) is fully determined through the knowledge of the initial state distribution p(y ₁), the transition distribution p(y _t|y _t-1) representing the probability of passing from one state to the next one; and the observation distribution p(x _t|y _t), which is computed by assuming that each sensor reading is modeled as an independent Bernouli distribution. Finally, the parameters associated to the above are learned from training data using maximum likelihood approach. Consequently, given a new sequence of observation x_1:T, the best sequence of activity that suits that sequence of observation will be the one that maximizes the joint probability (1). This can also be found more efficiently using Viterbi like algorithm (Chen et al., 2005).

2.3 CRF

In CRF, the nodes corresponding to observable and variables are connected via indirected graph, so that unlike HMM, conditional probability can no longer be used to represent the interaction between nodes, but rather potentials were used. Besides in the same spirit as (Kasteren et al., 2008), one assumes an exponential model for the conditional probability of the entire sequence of labels y_1:T given an input observation sequence x_1:T. CRF is defined by a weighted sum of K feature functions f _i that will return either 0 or 1, indicating whether the underlying feature will be accounted for depending on the values of the input variables. Each feature function is assigned a weight λ _i that quantifies the overall contribution of the feature, which also corresponds to the actual potential value. These weights are the parameters we want to find when learning the model. Unlike HMM, where the parameters are learned by maximizing the joint probability distribution, CRF model parameters are learned using an iterative gradient method by maximizing the conditional probability distribution defined as (Kasteren et al., 2008):

(Equation 8)

One of the main consequences of this choice is that while learning the parameters of a CRF we avoid modeling the distribution of the observations, p(x _1:T). As a result, we can only use CRF to perform inference (and not to generate data), which is a characteristic of the discriminative models. To find the label y for new observed features, we take the maximum of the conditional probability as in (3). Such inferences can also be effectively performed using altered version of Viterbi's algorithm:

(Equation 9)

3. Proposed integrated classification strategy

3.1 Introduction

Despite its popularity in machine learning, SVM technique has not been extensively used in activity recognition studies as pointed out in Banos et al. (2012), Chathuramali and Rodrigo (2012), Palaniappan et al. (2012). However, bearing in mind the remarkable accuracy rates obtained in other contexts, this would suggest possible success in activity recognition as well. Strictly speaking the few published literature using SVM for activity recognition rather relies mainly on use of standard SVM, which is then extended to multi-class using one-versus-all like approach or alternative method. Typically, SVM maximizes a margin in a hyperplane separating classes. Nevertheless, it is overwhelmed by the majority class instances in the case of imbalanced data sets. CS-SVM (Akbani et al., 2004) has been suggested as a candidate solution for such purpose, where the use of different error costs for positive and negative classes, the underlying hyperplane could be pushed away from positive instances. However, it is still an open debate how to accurately choose the cost functions. This motivates the current research in this paper. For this purpose, CS-SVM is employed. Second, a new method for eliciting the cost factors associated to CS-SVM is put forward. Third, more effort is put on the preprocessing in order to reduce the effect of minority class. For this purpose, SMOTE is employed. This technique creates new instances through “phantomtransduction,” where for each positive instance, its nearest positive neighbors were identified and new positive instances were created and placed randomly in between the instance and its neighbors. Fourth, one-versus-all approach is employed in order to extend the approach to multi-classes related to various activities that need to be recognition. Figure 1 highlights the overall architecture of the solution.

Especially, in the training phase, each feature vector, constituted of binary sensor vectors for the corresponding time slice, is associated to the underlying class of ADL. More specifically, two types of feature representation were employed. The former refers to the aforementioned change-point representation where one assigns value one to sensor whose sensor reading changes during the timeslice. The latter refers to last representation where last sensor that changed state continues to give 1 and changes to 0 when a different sensor changes state. A concatenation vector of both features is then constituted to form the overall feature vector. Next, imbalance class is then corrected using SMOTE strategy. This provides a refined and enhanced data set that will be inputed to CS-SVM, where the cost parameters are appropriately tuned using our suggested criterion. The outcome of trained CS-SVM will then be used to process a new observation during the testing phase where the associated ADL class will be predicted. Each feature vector in the test data is therefore classified into an estimated ADL class.

3.2 SMOTE

This approach effectively forces the decision region of the minority class in the training set. The SMOTE algorithm generates artificial data based on the feature space similarities between existing minority examples. Synthetic examples are introduced along the line segment between each minority class example and one of its k minority class (S _min) nearest neighbors. The k-nearest neighbors (k-NN) are defined as the k elements of subset (Inline Equation 10) with (Inline Equation 11) is one of the k-NN for (Inline Equation 12), whose Euclidian distance between (Inline Equation 13) and (Inline Equation 14) is the minority instance under consideration exhibits the smallest magnitude along the n-dimensions of feature space X. To create a synthetic sample, the k-nearest neighbors are randomly chosen, then multiply the corresponding feature vector difference with a random number (Inline Equation 15), and finally, add it to x _i:

(Equation 16)

3.3 C-SVM

SVM, developed by Vapnik (2000), is based on statistical learning theory and gains popularity because of many attractive characteristics and good performances. We utilize the current standard soft-margin C-SVM (Vapnik, 2000). SVM classifies data by determining a hyperplane into a higher dimensional space (feature space). For a two class problem, we assume that we have a training set (Inline Equation 17) where (Inline Equation 18) are the observations and y _i are class labels either 1 or −1, and given some kernel function K, SVM determines the optimal α _i for each x _i to maximize the margin between the hyperplane and the closest instance to it as seen in Figure 2. The class prediction for a new test instance x is given by (Bishop, 2006):

(Equation 19)

where α _i>0 are Lagrange multipliers and b is the translation factor of the hyperplane from the origin. The training samples where α _i>0 are called support vectors. While α _i and b are determined by minimizing the following primal optimization problem, which maximizes margin 2/K(w,w), see Figure 2, between the two classes and minimizes the amount of total misclassifications (training errors) (Inline Equation 20) simultaneously:

(Equation 21)

where w is a weight vector perpendicular to the hyperplane and φ(.) is a non-linear function which maps the input space into a feature space defined by (Inline Equation 22) that is kernel matrix of the input space. The penalty constant C represents the trade-off between the empirical error ξ and the margin. It corresponds to the cost of SVM. Exponential radial basis function kernel is quite popular: (Inline Equation 23) where σ is the width parameter. In our context, cross-validation technique were used to select the width parameter.

The regularization parameter C is used to control the trade-off between maximization of the margin width and minimizing the number of training error of nonseparable samples in order to avoid the problem of overfitting (Bishop, 2006). A small value for C will increase the number of training errors, while a large C will lead to a behavior similar to that of a hard-margin SVM. In practice both parameters σ and C are varied through a wide range of values and the optimal performance assessed using a cross-validation technique to verify performance using only training set (Taylor and Cristianini, 2004).

Although SVM often produce effective solutions for balanced data sets, they are sensitive to imbalanced training data sets and produces sub-optimal models because the constraint in (4) imposes equal total influence from the positive and negative support vectors. Wu and Chang (2003) pointed out that the imbalance in the training data ratio means that the positive instances may lie further away from the “ideal” boundary than the negative instances. As a result of this phenomenon, SVM learns a boundary that is too close to and skewed toward the positive instances. They hypothesized that as a result of this imbalance, the neighborhood of a test instance close to the boundary is more likely to be dominated by negative support vectors and hence the decision function is more likely to classify a boundary point negative.

3.4 CS-SVM

Because of the fact that in case of imbalanced data sets, the learned boundary is too close to the positive instances, there is a need to bias SVM in a way that will push the boundary away from the positive instances. For this purpose, Veropoulos et al. (1999) suggested using different error costs for the positive (C+) and negative (C−) classes.

In order to deal with imbalanced training data set, the SVM soft margin objective function is modified to assign two different penalty constraints C+ and C− for positive and negative classes, respectively, yielding the quadratic optimization below:

(Equation 24)

Especially, C+ is the higher misclassification cost of the positive class, while C− is the lower misclassification cost of the negative class. Optimization (7) allows us to handle imbalanced data set by the use of different error cost for the positive and negative classes such that the hyperplane could be pushed away from the positive instances. The SVM dual formulation gives the same Lagrangian as in the original soft-margin SVM in (6), but with different constraints on α _i (Lagrangian multipliers) as follows:

(Equation 25)

In the construction of cost sensitive SVM, the cost parameter plays an indispensable role. For the cost information, some authors (Thai-Nghe et al., 2010; Veropoulos et al., 1999; Abidine et al., 2013) have proposed adjusting different penalty parameters for different classes of data which effectively improves the low classification accuracy caused by imbalanced samples. For example, it is highly possible to achieve the high classification accuracy by simply classifying all samples as the class with majority samples (positive observations), therefore the minority class (negative observations) is the error training. Veropoulos et al. (1999) proposed to increase the misclassification cost associated with the minority class (i.e. C−>C+) to eliminate the imbalance effect, although no specific guidelines have been provided by the authors for detailed quantifications of these cost factors. An approach for this purpose is provided in the next subsection.

3.5 Cost sensitive criterion

Our proposed criterion advocates a simple intuitive heuristic to select the penalty constraints C+ and C−. For this purpose, one considers the number of samples m _i in the ith class from the training data. according to the proportion of that class. Following Veropoulos's reasoning where the cost C− associated to the smallest class must be large enough in order to improve the low classification accuracy caused by imbalanced samples, one shall set C+ to one, and determine C− as the proportion of majority samples m ₊ over minority ones m ₋; namely:

(Equation 26)

where [.] is integer part of the quantity under square bracket. Notice that it always holds that (Inline Equation 27) Therefore, (7) is translated into:

(Equation 28)

In this study, the algorithms are implemented in Matlab environment where the software package LIBSVM (Chang and Lin, 2001) was used to implement the multiclass SVM classifier algorithms. It uses the one-vs-one method (Vapnik, 2000).

4. Experimental results and discussion

In this section, we first give a brief description of the data sets and provide details of our experimental setup and then we present and discuss the results.

4.1 Data sets

We used data gathered by Kasteren et al. (2008). The latter deployed a digital sensors in houses having different layouts and different number of sensors (Kasteren et al., 2011). Each sensor is attached to a wireless sensor network node. They attached these sensors to doors, cupboards, a refrigerator, and a toilet flush. Their data were divided in slices of constant length, 60 seconds, without overlapping. The sensors were installed in everyday objects such as drawers, refrigerators, containers to record activation/deactivation events (opening/closing events) as the subject carried out everyday activities. Their data was collected by a base station (BS) and annotated using a personal digital assistant PDA asking what activity she is performing every 15 minute, annotation was later corrected by inspection of the sensor data. Times where no activity is annotated are referred to as Idle (Tables I-III).

4.2 Setup and performance measures

We separate the data into a test and training set using a “leave one day out cross validation” cross-validation approach. This produces unbiased but high-variance error estimates. In this approach, each day of sensor readings is used for testing and the remaining days are used for training; this is repeated l times, with different training sets of size (l−1) and report the average performance measure. By this way, we get inferred labels for the whole data set by concatenating the resulting labels acquired for each test day. A vector of features was generated for each timeslice.

Sensors outputs are binary and represented in a feature space which is used by the model to recognize the activities performed. The vector contained one entry for each sensor, two-state sensors 0 or 1 are used and the features are the states of all sensors. We do not use the raw sensor data representation as observations; instead we use the “Change point” and “Last” representation described in Section 3.2, which have been shown to give much better results in activity recognition (Kasteren et al., 2008).

As the activity instances were imbalanced between classes. We evaluate the performance of our models using the F measure, which is calculated from the precision and recall scores. We are dealing with a multi-class classification problem and therefore define the notions of true positive (TP), false negatives (FN) and false positives (FP) for each class separately, where N is the total number of classes. With highly skewed data distribution, the overall accuracy metric at (11) is not sufficient any more. It does not take into account differences in the frequency of activities. These measures are calculated as follows:

in which [a=b] is a binary indicator giving 1 when true and 0 when false. m is the total number of samples.

4.3 Results

We compared the performance of the HMM, CRF, C-SVM, CS-SVM and Our strategy on the imbalanced data set of the house A₍₁₎ in which minority class are all classes that appear at most 1 percent of the time, while others are the majority classes that typically, have a longer duration (e.g. Leaving and Sleeping). Recall that Our strategy integrates the use of SMOTE, CS-SVM using the heuristic pointed out in 3.4 to choose the cost sensitive parameters, while in standard C-SVM and CS-SVM, one only relies on the default parameters as implemented in LIBSVM package for C-SVM without SMOTE and heuristic choice of the cost parameters. Standard CS-SVM employs our adaptive cost parameter but without SMOTE.

In our experiments, for Our strategy, the same C-SVM hyper-parameters (σ _opt, C _opt)=(1, 5) have been optimized for all (Inline Equation 33) decision functions in the range (0.1-2) and (0.1-10000), respectively, to maximize the class accuracy of leave-one-day-out cross-validation techniques. The minority class examples were over-sampled using k=4 nearest neighbors for SMOTE. Then, for CS-SVM, we searched the cost sensitive (Inline Equation 34) adapted respectively for different classes for house A₍₁₎ by using our criterion (Taylor and Cristianini, 2004). Notice that the minority class requires a large value of C compared to the majority class. This fact induces a classifier's bias in order to give more importance to the minority ones.

The summary of the accuracy and the class accuracy obtained with the concatenation matrix of “Changepoint+Last” for HMM, CRF, C-SVM, CS-SVM and CS-SVM+SMOTE are presented in Table IV. This shows that Our strategy performs better in terms of F measure, while CRF and C-SVM methods perform better in terms of accuracy.

We report in Figure 3 the classification results in terms of accuracy measure for each class with HMM, CRF, C-SVM, CS-SVM and Our strategy. One notices that HMM and Our strategy perform better for the minority activities Toileting, Showering, Breakfast, Dinner and Drink comparatively to other methods, while CRF and C-SVM perform marginally better only for the majority activities (other classes). CS-SVM method is shown to outperform CRF and C-SVM in classifying the minority activities. Nevertheless, the result also shows that Our Strategy under perform the Idle class activity. This is partly explained by the less variations occurring in the Idle class, which makes the use of cost sensitive like approach not appropriate.

In order to find out which activities are relatively harder to be recognized, we analyzed the confusion matrices of the better classifiers HMM and Our strategy in Tables V and VI, respectively. In Table V, the activities Leaving, Toileting, Showering and Sleeping give the highest accuracy. Activities Idle and Breakfast sound ill-handled by HMM method. Most confusion takes place in the Idle activity and the three kitchen activities Breakfast, Dinner and Drink.

We can see in Table VI that with Our strategy, the activity Idle perform worst, but the activities Leaving, Toileting, Showering and Sleeping are much better recognized. Strictly speaking, it is acknowledged that the kitchen activities are in general hard to recognize but Our strategy exhibits good performance results as compared to others methods.

Finally, we presented a way of compactly presenting all results in a single, allowing a quick comparison between HMM, CRF, C-SVM, CS-SVM and Our strategy performed using the same representation “Changepoint” with the four real world data sets TK26M₂, TK28 M, TK57 M, Tap80F. We utilize the leave-one-day-out cross-validation technique for the selection of width parameter, where we identified σ _opt=1, σ _opt=1, σ _opt=2 and σ _opt=2 for these data sets, respectively. Our results give us early experimental evidence that our proposed strategy works better for model classification; it consistently outperforms the other methods in terms of the F measure for all data sets.

The above results show that Our strategy performs better than the other methods in terms of Precision and F measure for all Houses. These results show the effect of taking class imbalance into account in the evaluation metric, while HMM and CRF managed to correctly recognize a higher number of timeslices.

4.4 Discussions

Using experiments on three large real world data sets, we showed the class accuracy obtained with house (TK57 M) is lower compared to others houses for all recognition methods for all recognition methods because the house (TK57 M) includes more classes than other houses.

In terms of comparing the performance of the employed classifiers (HMM, CRF, C-SVM, CS-SVM and Our strategy) for the house A₍₁₎, one shall notice the following. First HMM is trained by splitting the training data in which a separate model (Inline Equation 35) is learned for each class, as parameters are learned for each class separately. That partly explains why HMM performs better for the minority activities. The CRF model does not model each action class individually, but rather uses a single model for all classes. As a result classes that are dominantly present in the data have a bigger weight after CRF optimization. This partly results in better performance for the majority activities (“Idle,” “Leaving” and “Sleeping”). On the other hand, regular multiclass C-SVM trains several binary classifiers to differentiate the classes according to the class labels and optimize a single parameter C for all classes. As the weighting is performed in similar way regardless the type of class in C-SVM formulation, this provides an edge to classification of majority classes. While the CS-SVM with tuned heuristic parameter for each class, although it may seem naive at first glance, but it provides higher performance accuracy than C-SVM or CRF methods. Besides, introducing the extra step of the SMOTE) with the cost sensitive support vector machine CS-SVM together with adaptive cost parameter results in enhanced performances that outperform all other methods. This shows the importance of the dealing with imbalanced data set through SMOTE approach in the overall classification strategy, which provides more robustness in classifying the minority class (es). Strictly speaking, from the dependency perspective, one shall mention that the HMM approach explicitly accounts for the dependency between the hidden and observable variables as pointed out in Section 2.1, although such reasoning is missing in case of SVM and its various variants, the use of one-versus-one method to extend the pairwise comparison of SVM to multiple classes partially allows the system to account for such dependency.

The recognition of the three kitchen activities “Breakfast” “Dinner” and “Drink” is lower compared to the others activities for all methods. On the other hand, the recognition of the different tasks as demonstrated by the analysis of the confusion matrix shows the difficulty to recognize the “Idle” class, although from a regulator viewpoint, the recognition of such activity is not so important. The data set showed that this activity is by far the most common, and makes the data set particularly hard because some activities were missed during annotation where training data of the idle class might actually belong to one of the others. It might therefore be useful to assign smaller weigh to such activity. The kitchen activities consisted mainly of food related tasks are acknowledged for their difficulty to be recognized because most of the instances of these activities were performed in the same location (kitchen) using the same set of sensors. For example, “Toileting” and “Showering” are more separable because they are in two different rooms, which make the information from the door sensors enough to separate the two activities. Therefore the location of the sensors is of great importance for the performance of the recognition system.

5. Conclusions and future work

Our experiments on real world data sets showed that Our strategy can significantly increase the recognition performance to classify multiclass sensory data, and can improve the performance in prediction of the minority activities. It significantly outperforms the results of the typical methods HMM, CRF, C-SVM and CS-SVM used in activity recognition. We expect that the possibility to outperform these results will contribute to further advances in state-of-the-art methods. Developing classifiers which are robust and skew insensitive or hybrid algorithms can be the main point of interest for the future research in activity recognition imbalanced data set.

The activity Idle is by far the most common, and makes the data set particularly hard. In order to measure the effect of the dominant class Idle, it would be interesting to perform our experiments on these data sets with the Idle activity omitted as well.

In this study, we have assumed the offline inference based strategy where the activities could only be inferred when a full day has passed. In future, it would be interesting to perform Our strategy in online inference that is significantly harder, however it is necessary for specific applications.

Figure 1

Diagram of the proposed integrated strategy

Figure 2

-SVM classification problem: the classes are linearly separated in a feature space

Figure 3

Comparison of the recognition accuracy between HMM, CRF,-SVM,-SVM andfor different activities

Table I

Details of the datasets recorded in three different houses using a wireless sensor network

Table II

Overview of activities and the number of observations for each house

Table III

Recall, precision (Prec.), F measure (F Meas.) and accuracy (Acc.) results for HMM, CRF,-SVM,-SVM andwith the house Adata set

Table IV

Confusion matrix for HMM

Table V

Confusion matrix for

Table VI

Recall, precision,measure and accuracy results for HMM, CRF,-SVM,-SVM andon all four houses data sets

Equation 1

Equation 2

Equation 3

Equation 4

Equation 5

Equation 6

Equation 7

Equation 8

Equation 9

Equation 10

Equation 11

Equation 12

Equation 13

Equation 14

Equation 15

Equation 16

Equation 17

Equation 18

Equation 19

Equation 20

Equation 21

Equation 22

Equation 23

Equation 24

Equation 25

Equation 26

Equation 27

Equation 28

Equation 29

Equation 30

Equation 31

Equation 32

Equation 33

Equation 34

Equation 35

Corresponding author

Dr Belkacem Fergani can be contacted at: [email protected]

References

Abidine, M. , Fergani, B. and Clavier, L. (2013), “Importance-weighted the imbalanced data for C-SVM classifier to human activity recognition”, 8th International Workshop on Systems, Signal Processing and their Applications, WOSSPA’13, pp. 318-323.

Akbani, R. , Kwek, S. and Japkowicz, N. (2004), “Applying support vector machines to imbalanced datasets”, Proceedings of the 15th European Conference on Machine Learning (ECML 2004), pp. 39-50.

Banos, O. , Damas, M. , Pomares, H. , Prieto, A. and Rojas, I. (2012), “Daily living activity recognition based on statistical feature quality group selection”, Expert Systems with Applications, Vol. 39 No. 16, pp. 8013-8021.

Bishop, C. (2006), Pattern Recognition and Machine Learning, Springer, New York, NY.

Chang, C.C. and Lin, C.J. (2001), “LIBSVM: a library for support vector machines”, available at: www.csie.ntu.edu.tw/∼cjlin-/libsvm/

Chathuramali, K.G.M. and Rodrigo, R. (2012), “Faster human activity recognition with SVM”, Proc of International Conference on Advances in ICT for Emerging Regions, pp. 197-203.

Chawla, N.V. (2010), “Data mining for imbalanced datasets: an overview”, Data Mining and Knowledge Discovery Handbook, pp. 875-886.

Chawla, N.V. , Japkowicz, N. and Kolcz, A. (2003), Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Data Sets.

Chawla, N.V. , Japkowicz, N. and Kotcz, A. (2004), “Editorial: special issue on learning from imbalanced data sets”, SIGKDD Explorations, Vol. 6 No. 1, pp. 1-6.

Chawla, N.V. , Bowyer, K.W. , Hall, L.O. and Kegelmeyer, W.P. (2002), “SMOTE: synthetic minority over-sampling technique”, Journal of Artificial Intelligence Research, Vol. 16, pp. 321-357.

Chen, X. , Gerlach, B. and Casasent, D. (2005), “Pruning support vectors for imbalanced data classification”, Proc. of International Joint Conference on Neural Networks, pp. 1883-1888.

Fleury, A. , Vacher, M. and Noury, N. (2010), “SVM-based multi-modal classification of activities of daily living in health smart homes: sensors, algorithms and first experimental results”, IEEE Transactions on Information Technology in Biomedicine, Vol. 14 No. 2, pp. 274-283.

Kasteren, T.V. , Noulas, A. , Englebienne, G. and Kröse, B. (2008), “Accurate activity recognition in a home setting”, UbiComp ’08, ACM, New York, NY, pp. 1-9.

Kasteren, T.V. , Alemdar, H. and Ersoy, C. (2011), “Effective performance metrics for evaluating activity recognition methods”, Proc. International Conference on Computer Systems, pp. 301-310.

Palaniappan, A. , Bhargavi, R. and Vaidehi, V. (2012), “Abnormal human activity recognition using SVM based approach”, Proceedings of ICRTIT 2012, pp. 97-102.

Rabiner, L.R. (1989), “A tutorial on hidden markov models and selected applications in speech recognition”, Proceedings of the IEEE, Vol. 77 No. 2, pp. 257-286.

Raskutti, B. and Kowalczyk, A. (2004), “Extreme re-balancing for SVMs: a case study”, SIGKDD Explorations, Vol. 6 No. 1, pp. 60-69.

Sutton, C. and McCallum, A. (2006), “An introduction to conditional random fields for relational learning”, in Lise, G. and Ben, T. (Eds), Introduction to Statistical Relational Learning, The MIT Press.

Tapia, E.M. , Intille, S.S. and Larson, K. (2004), “Activity recognition in the home using simple and ubiquitous sensors”, Proc. Pervasive Computing, Lectures Notes in Computer Science Volume 3001, Vienna, pp. 158-175.

Taylor, S. and Cristianini, N. (2004), Kernel Methods for Pattern Analysis, Cambridge University Press.

Thai-Nghe, N. , Gantner, Z. and Schmidt-Thieme, L. (2010), “Cost-sensitive learning methods for imbalanced data”, Proceedings of Int. Joint Conf. on Neural Networks.

Vapnik, V.N. (2000), The Nature of Statistical Learning Theory. Statistics for Engineering and Information Science, 2nd ed., Springer Verlag.

Veropoulos, K. , Campbell, C. and Cristianini, N. (1999), “Controlling the sensitivity of support vector machines”, Proceedings of the International Joint Conference on AI, pp. 55-60.

Vilarino, F. , Spyridonos, P. , Vitria, J and Radeva, P. (2005), “Experiments with svm and stratified sampling with an imbalanced problem: detection of intestinal contractions”, Proc. of the 3rd International Conference on Advances in Pattern Recognition (ICAPR 2005), pp. 783-791.

Wallace, M. (2007), “Best practices in nursing care to older adults”, Try This, Issue No. 2, Hartford Institute for Geriatric Nursing.

Weiss, G.M. (2004), “Mining with rarity: a unifying framework”, SIGKDD Explorations, Vol. 6 No. 1, pp. 7-19.

Weiss, G.M. and Provost, F. (2003), “Learning when training data are costly: the effect of class distribution on tree induction”, Journal of Artificial Intelligence Research, Vol. 19, pp. 315-354.

Wu, G. and Chang, E. (2003), “Class-boundary alignment for imbalanced dataset learning”, Proc. of ICML Workshop on Learning from Imbalanced Data Sets II, Washington, DC.

Wu, G. and Chang, E.Y. (2005), “KBA: Kernel boundary alignment considering imbalanced data distribution”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17 No. 6, pp. 786-795.