1. Introduction
Stress can be defined as a biological and psychological response to a combination of external or internal stressors [
1,
2], which could be a chemical or biological agent or an environmental stimulus that causes stress to an organism [
3]. Stress is, in essential, the body’s coping mechanism to any kind of foreign demand or threat. At the molecular level, in a stressful situation the Sympathetic Nervous System (SNS) produces stress hormones, such as cortisol, which then, via a cascade of events, lead to the increase of available sources of energy [
4]. This large amount of energy is used to fuel a series of physiological mechanisms such as: increasing the metabolic rate, increasing heart rate and causing the dilation of blood vessels in the heart and other muscles [
5], while decreasing non-essential tasks such as immune system and digestion. Once stressors no longer impose a threat to the body, the brain fires up the Parasympathetic Nervous System (PSN) which is in charge of restoring the body to homeostasis. However, if the PSN fails to achieve homeostasis, this could lead to chronic stress; thus, causing a continual and prolonged activation of the stress response [
6]. Conversely, during acute stress, the stress response develops immediately, and it is short-lived.
Studies carried out in this field suggest that stress can lead to abnormalities in the cardiac rhythm, and this could lead to arrythmia [
7]. Additionally, stress does not only have physical implications, but it can also be detrimental to one’s mental health; in fact, chronic stress can enhance the chances of developing depression. For these reasons, it is important to develop a system that can detect and measure stress in an individual in a non-invasive manner in such way that stress can be regulated or relieved via personalised medical interventions or even by just alerting the user of their stressful state.
Furthermore, stress has been identified as one of the major causes of automobile crashes which then lead to high rates of fatalities and injuries each year [
8]. As reported by Virginia Tech Transportation Institute (VTTI) and the National Highway Traffic Safety Administration (NHTSA), lack of attention and stress were the leading cause of traffic accidents in the US, with a rate of ~80%. Therefore, being able to accurately monitor stress in drivers could significantly reduce the amount of road traffic accidents and consequently increase public road safety.
Given that stress is regulated by the Autonomous Nervous System, it can be measured via physiological measurements such as Electrocardiogram (ECG), Galvanic Skin Response (GSR), electromyogram (EMG), heart rate variability (HRV), heart rate (HR), blood pressure, breathe frequency, Respiration Rate and Temperature [
9]. These are considered to be an accurate methodology for bio signal recording as they cannot be masked or conditioned by human voluntary actions. However, this study will be mainly focusing on HRV, which is controlled by PSN and SNS; therefore, an imbalance in any functions regulated by these two nervous system branches will affect HRV [
10]. HRV is the variation in interval between successive normal RR (or NN) intervals [
11]; it is derived from an ECG reading and it is measured by calculating the time interval between two consecutive peaks of the heartbeats [
12]. As explained in [
11] the RR intervals are obtained by calculating the difference between two R waves in the QRS complex.
HRV can be subdivided into time domain and frequency domain metrics as described in
Table 1.
HRV is traditionally obtained from ECG and requires the use of computational software for calculation; this is a process is limited to laboratory or clinical settings and requires a certain degree of technical knowledge for interpretation and calculation. Thanks to the advancement of technology, however, commercially available portable devices and wearables have the capacity to monitor and record HRV measurements. Dobs et al. (2019) performed a systemic review and meta-analysis on the numerous studies that compared the quality of HRV measurements acquired from ECG and obtained from portable devices, such as Elite HRV, Polar H7 and Motorola Droid [
13]. Twenty-three studies revealed that HRV measurements obtained from portable devices resulted in a small amount of absolute error when compared to ECG; however, this error is acceptable, as this method of acquiring HRV is more practical and cost-effective, as no laboratory or clinical apparatus are required [
13].
Furthermore, the Apple Watch is one of the most best-selling and popular smartwatches in the market. Studies, carried out by Shcherbina and colleagues [
14], demonstrated that the Apple Watch was the best HR estimating smartwatch with one-minute granularity and with the lowest overall median error (below 3%) while Samsung Gear S2 reported the highest error. In addition, it is also important to validate the HRV estimation of the Apple Watch. Currently, the best way to obtain RR raw values from the Apple Watch is via the Breathe app developed by Apple. Authors in [
15] conducted an investigation that validated the Apple Watch in relation to HRV measurements derived during mental stress in 20 healthy subjects. In this study, the RR interval series provided by the Apple watch was validated using the RR interval obtained from Polar H7 [
15]. Successively, the HRV parameters were compared and their ability to identify the Autonomous Nervous System (ANS) response to mild mental stress was analysed [
15]. The results revealed that the Apple Watch HRV measurements had good reliability and the HRV parameters were able to indicate changes caused by mild mental stress as it presented a significant decrease in HF power and RMSSD in stress condition compared to the relax state [
15]. Therefore, this study suggests that the Apple Watch presents a potential non-invasive and reliable tool for stress monitoring and detection. In this study, raw RR intervals, from beat-to-beat measurements obtained from the Breathe app, are considered for stress classification.
This study is aimed at developing a good predictive model that can accurately classify stress levels from ECG-derived HRV features, obtained from automobile drivers, testing different machine learning methodologies such as K-Nearest Neighbour (KNN), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF) and Gradient Boosting (GB). Moreover, the models obtained with highest predictive power will be used as a reference for the development of a machine learning model that would be used to classify stress from HRV features derived from heart rate measurements obtained from wearable devices in a unsupervised system-based web application.
The paper is organised as follows.
Section 2 provides a discussion of related work conducted in the literature.
Section 3 describes the experimental methodology of the study, including a description of the dataset, pre-processing, hyperparameter tuning and the design protocol used for the development of a simple stress detection web application based on Apple Watch derived data.
Section 4 presents the experimental results and
Section 5 an intensive discussion of the results obtained. Lastly,
Section 6 provides the concluding remarks of the study, as well as proposed future work.
2. Related Work
As stress level changes so does the HRV and it has been proven that HRV decreases as stress increases [
11]. This is possible because HRV provides a measure to monitor the activity of the ANS and, therefore, can provide a measure of stress [
16]. Authors in [
16] explored the interaction between HRV and mental stress. Here they took ECG recordings during rest and mental task conditions, which was meant to reflect a stressful state. Linear HRV measures were then analysed in order to provide information on how the heart responds to a stressful task. The results demonstrated that the mean RR interval was significantly lower during a mental task than in the rest condition [
16]. This difference was significant only when time domain parameters (pNN50) and the mean RR interval were analysed; while the frequency domain measure did not show a significant difference, although there was an elevated LF/HF in the stressed condition [
16]. As LF is associated with the SNS and HF with PNS, the increased LF/HF ratio does suggest that there is a higher sympathetic activity in the stress condition compared to the resting state [
16].
Furthermore, investigations have been carried out in order to accurately classify stress in drivers via HRV measurements. For example, authors in [
17] aimed to classify ECG data using extracted parameters into highly stressed and normal physiological states of drivers. In this study, they extracted time domain, frequency domain and nonlinear domain parameters from HRV obtained by extracting RR intervals from QRS complexes. These extracted features were fed into the following machine learning classifiers: K Nearest Neighbor (KNN), radial basis function (RBF) and Support Vector Machine (SVM. The results showed that SVM with RBF kernel gave the highest results, with 83.33% accuracy, when applied to time and non-linear parameters, while giving an accuracy of 66.66% with frequency parameter [
17]. This was in concordance with the result obtained by [
16] as the frequency domain parameters did not give a significant difference between rest and mental tasks.
In this study, instead of analysing how each HRV measure is affected by the onset of stress, we took into consideration the combination of both time and frequency domain HRV features and how these aid stress classification with the use of machine learning models. The performance of the machine learning models was evaluated, taking into consideration the following metrics: Area Under Receiver Operator Characteristic Curve (AUROC), Recall/Sensitivity and F1 score, without relying only on accuracy. Furthermore, we detected stress in a non-invasive manner using the Apple Watch, from which we extracted heart rate data, obtained from volunteers subjected to different mental state conditions.
5. Discussion
Stress has been identified as one of the major causes of automobile crashes [
8] and an important player in the development of cardiac arrythmia [
7]; therefore, it is important to be able to detect and measure stress in a non-invasive and efficient manner. In this study, to accomplish this, we address the stress detection problem by using traditional machine learning algorithms which were trained on ECG-derived HRV metrics obtained from automobile drivers [
18,
19].
In this paper, stress classification was performed mainly using HRV-derived features as studies have shown that HRV is impacted during changes in stress levels, given that it is highly controlled by the ANS [
10]. Moreover, other investigations proved that RMSSD, AVNN and SDNN were evaluated as being the most reliable HRV metrics in distinguishing between stressful and non-stressful situations [
28]. Those findings were also confirmed in this study as shown in
Table 2, where AVNN, RMSSD and SDNN were classified as the HRV features with the highest RFE feature importance scores. Therefore, they were considered to be the features that contribute the most in the stress classification performance of the model. This further confirms that HRV features are viable markers for stress detection.
Following hyperparameter tuning, we were able to produce stress classification models with high predictive power. As shown in
Table 3, the best 3 models for the classification task imposed by original-dataset were MLP, RF and GB with AUROC of 83%, 85% and 85%, respectively; thus, these classifiers have ~84% probability of successfully distinguishing between the stress and no stress class. In addition, MLP and RF gave Recall scores of 81% while GB of 80%; indicating that ~80% of the predicted positive instances are actual positives. Furthermore, these scores were statistically greater than the Naïve Bayes baseline model (
p < 0.05) as illustrated in
Table 4.
There are very few studies performed on stress classification in drivers using HRV derived features [
17,
18], although each study took a different approach to the classification problem, the classification yielded similar results. For instance, [
17] investigated KNN, SVM-RBF and Linear SVM as their potential classifiers for stress detection. Their results suggested that SVM with RBF kernel was the best performing model by giving an accuracy of 83% [
17]. However, more extensive investigation is necessary to corroborate this finding by also considering other classification metrics.
It is also imperative to discuss the fact that stress is a result of a combination of external (environment) and internal factors (e.g., mental health). Thus, stress could be perceived as a subjective mental state; for example, certain situations like a drive in the city or in the highway might not induce the same level of stress in every individual. For instance, individuals suffering from anxiety could feel stressed in such conditions. Additionally, stress could be induced from the invasive apparatus used such as the electrodes placed in different parts of their body and the sensor placed around their diaphragm in [
18]; the fact that the subject is aware that they are being monitored for changes in their mental state could also impact their stress levels. For this reason, is important to use less intrusive and everyday devices such as smart watches or mobile phones that are already an essential part of life in this modern society.
In this paper we also aimed to develop a classification model that would detect stress from data obtained from the Apple Watch. For this purpose, the best classifiers trained on original-dataset were tested for the classification of the modified-dataset which presented features that mimic those derived from the wearable device.
Table 5 demonstrates that the overall ideal model for the stress classification of HRV features derived from wearable-obtained RR intervals, is MLP with a AUROC of 75% and a Recall of 80%. This was determined based on the Recall score, as in this stress classification task there is a high cost associated with False Negatives. For instance, if an individual’s condition, which is actually stressed, is predicted as not stressed, the cost associated with this False Negative can be high, especially in a medical or driving context which could then lead to a misdiagnosis or a car accident respectively. Therefore, it is imperative to select the model with the highest sensitivity.
Figure 8 shows the user interface (UI) of the simple stress detection web application. The purpose of this was simply to provide a visual UI to demonstrate the software functionality. This could then be implemented into a mobile or car application where the user would be alerted when stress is detected and would prompt them to relax or take breaks.
The blind-dataset, obtained from the volunteers, served as a blind test for the MLP classifier in order to measure its predictive power on unseen data in an unsupervised application system.
When classifying a stressful task, the web application was able to correctly predict stress conditions with a 71% prediction probability. Additionally, it was able to achieve a prediction probability of 79% when the model was presented with a relaxing state. However, it is important to further improve the model’s performance by investigating multiple stress levels in order to obtain more accurate stress detection.
6. Conclusions
In this paper, we developed a comparative study to determine the viability of HRV features as physiological markers for stress detection. This was achieved by computing different supervised machine learning models to determine which model can be used to analyse data extracted using wearable devices. The MLP model was considered to be an ideal algorithm for stress classification due to its 80% sensitivity score. The predictive power of this classifier was found to be statistically greater compared to the baseline model created with the Naïve Bayes algorithm with a p value of 0.001. This model was then implemented in the unsupervised stress detection application where stress can be detected from blind dataset of HRV features, and extracted from real users using wearable devices under different stress conditions.
A benefit of this study is that there is a need for technologies that would monitor stress in drivers in order to reduce car crashes, as nearly 80% of road incidents are due to drivers being under stress. This project could be the initial steps for tackling this problem. In fact, the algorithm produced in this model could be implemented in smart cars. So, when drivers are experiencing episodes of stress, the automobile could switch to autopilot as well as alert the driver of their state. This implementation could massively reduce traffic accidents as well as reduce the number of fatalities and injuries caused by car crashes.
However, the benefit of this study can also be extended to all applications in which it is important to monitor stress levels e.g., in physical rehabilitation post incident, in temporary or chronic anxiety, in mental health disease, as well as in many ageing conditions. The distribution of smart watches is growing in the population and people appreciate their functionalities. Therefore, wearable devices offer a big opportunity to extract health parameters without an uncomfortable and invasive approach.
We plan that future work should involve the improvement of the classification models by exploring a wider range of parameter values during the hyperparameter tuning process. Additionally, the Deep Learning approach could also be implemented in order to compare its performance in comparison to the supervised models used in this study.
Moreover, another future work we propose is the development of a classifier that would be able to distinguish between different levels stress: high, medium and low. In addition to this, we suggest collecting new real-world ECG data, from which HRV features could be extracted, in order to gain a better insight on the predictive power of the models obtained in this study. This would also provide a more updated dataset compared to that used in this study, dated 2005 [
18]. As technologies have advanced, a more accurate ECG recording could be acquired; thus, this would make the classification more accurate and relevant to real world implementations.
Therefore, a natural evolution of this work will require the acquisition of a large dataset through smart watches and in an extensive number of tests involving human subjects e.g., through a driving simulator. Furthermore, it will be important to test the model considering other domains focused to the elderly and health care.