Stress Detection in Computer Users From Keyboard and Mouse Dynamics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

12 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 67, NO.

1, FEBRUARY 2021

Stress Detection in Computer Users From


Keyboard and Mouse Dynamics
Lucia Pepa , Member, IEEE, Antonio Sabatelli, Lucio Ciabattoni , Member, IEEE,
Andrea Monteriù, Member, IEEE, Fabrizio Lamberti , Senior Member, IEEE,
and Lia Morra , Senior Member, IEEE

Abstract—Detecting stress in computer users, while techni- be challenging or threatening. While moderate levels of stress
cally challenging, is of the utmost importance in the workplace, can be actually beneficial to work performance, chronic stress
especially now that remote working scenarios are becoming ubiq- has been shown to be highly detrimental. Chronic stressors
uitous. In this context, cost-effective, subject-independent systems
are needed that can be embedded in consumer devices and clas- may lead to burnout, a growing concern in both western and
sify users’ stress in a reliable and unobtrusive fashion. Leveraging developing countries with an estimated lifetime prevalence
keyboard and mouse dynamics is particularly appealing in this of 4% [40]. Evidence shows that workers make more errors
context as it exploits readily available sensors. However, available when overly stressed, leading to a loss of productivity and,
studies are mostly performed in laboratory conditions, and there in the case of critical infrastructures, potentially fatal conse-
is a lack of on-field investigations in closer-to-real-world settings.
In this study, keyboard and mouse data from 62 volunteers were quences [2]. Furthermore, stress may negatively impact the
experimentally collected in-the-wild using a purpose-built Web immune and cardiovascular systems [38]. Adding to this sit-
application, designed to induce stress by asking each subject uation, the COronaVIrus Disease 2019 (COVID-19) outbreak
to perform 8 computer tasks under different stressful condi- led to a massive shift towards a Working From Home (WFH)
tions. The application of Multiple Instance Learning (MIL) to operating modality, and public announcements by major tech
Random Forest (RF) classification allowed the devised system
to successfully distinguish 3 stress-level classes from keyboard companies are sparking a debate on the potential opportunities
(76% accuracy) and mouse (63% accuracy) data. Classifiers and perils of resorting to WFH on a permanent basis. In fact,
were further evaluated via confusion matrix, precision, recall, despite its appeal, WFH may expose workers to new forms
and F1-score. of stress and burnout, as the lines between professional and
Index Terms—Stress classification, machine learning, key- personal lives become blurry and workers struggle to preserve
board, mouse, in-the-wild study. healthy boundaries between the two [39].
It is therefore crucial to equip both workers and managers
I. I NTRODUCTION alike with tools to enable proper stress management in a
remote workforce, starting with methods to detect stress and
ROVIDING computer-based systems with the capability to
P recognize emotions is an ongoing subject of study. Should
consumer devices like, e.g., laptops, smartphones, in-car enter-
other emotional states based on users’ observation [28]. In
the last years, different approaches have been investigated for
stress detection [28], [37]. Even though some of them achieved
tainment systems and home appliances be capable to achieve
quite impressive results, there are still serious problems limit-
an accurate reading of individuals’ affective states, they could
ing their applicability. First, most of the proposed methods rely
make appropriate decisions about how to interact with them,
on sensors directly attached to the users’ skin or body [22], or
and adapt system’s responses accordingly [1]. Applications are
use external recording sensors such as webcams, microphones,
plentiful in fields like human-computer interaction, robotics,
or even thermal cameras [14].
entertainment, learning, and healthcare.
Both methods have side effects: first, users are aware of
Among emotional states that could be tackled, there is one
being monitored, which could alter their affective states and
that deserves a special attention, given the key role that it
be itself a source of additional stress; second, it is unlikely
plays in work environments and for human health [22]: stress.
that users can wear or use monitoring devices continuously
Stress is a physiological response to a situation perceived to
during everyday activities [28]. Specialized hardware can be
Manuscript received June 18, 2020; revised November 17, 2020; accepted expensive and is unlikely to become commonplace in the short
December 8, 2020. Date of publication December 16, 2020; date of current term or in a WFH scenario. Last but not least, both raise major
version February 26, 2021. (Corresponding author: Lucio Ciabattoni.)
Lucia Pepa, Lucio Ciabattoni, and Andrea Monteriù are with the privacy concerns, especially in a work environment.
Department of Information Engineering, Università Politecnica delle Marche, Thus, the challenge appears to be the development of cost-
60131 Ancona, Italy (e-mail: [email protected]; [email protected]; effective, subject independent systems that can be embedded in
[email protected]).
Antonio Sabatelli is with the Department of Research and Development, consumer devices and that are able to detect users’ stress in a
Revolt SRL, 60131 Ancona, Italy (e-mail: [email protected]). reliable and unobtrusive fashion. In this article, a possible solu-
Fabrizio Lamberti and Lia Morra are with the Dipartimenti di Automatica tion to this challenge is proposed by leveraging the analysis of
ed Informatica, Politecnico di Torino, 10129 Torino, Italy (e-mail:
[email protected]; [email protected]). keystroke and mouse dynamics (K&MD). Many workers use a
Digital Object Identifier 10.1109/TCE.2020.3045228 computer on a daily basis; thus, this solution would not require
1558-4127 
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
PEPA et al.: STRESS DETECTION IN COMPUTER USERS FROM KEYBOARD AND MOUSE DYNAMICS 13

Fig. 1. Taxonomy of stress detection approaches.

dedicated hardware, could be readily deployed in a traditional physiological. While psychological effects may be evaluated
office or WFH setting, and would have minimal risks from through direct interaction with the user, e.g., through ques-
a privacy viewpoint. Furthermore, affective states evaluation tionnaires or chatbots, stress detection is most commonly
could be readily integrated in the work environment, e.g., to performed by detecting behavioral and physiological alter-
remind users to take pauses when overworked or overstressed. ations through a variety of sensors, briefly categorized in
Even though K&MD-based methods were proven Fig. 1.
suitable to identify several emotions with good Stress-related physiological processes, mediated by the
performance [7], [21], [36], their practical application is autonomic nervous system, are largely involuntary changes
still an open problem. Previous studies were mostly con- in cardiovascular, muscular and electrodermal activity, res-
ducted in laboratory conditions, and there is a lack of on-field piratory rate, skin temperature, and eye movements [22].
studies closer to actual professional settings [28]. In-the-wild They can be observed using a variety of sensors including
studies face difficulties in inducing the intended stress levels, wearable sensors [4], [9], [22] and, less commonly, eye track-
as well as collecting and labelling data, as the experimenter ing devices [12], [15] and thermal infrared imaging [22].
cannot directly interact with the subjects. Recent advances in wearable sensors allows to record phys-
he present work tries to address the above challenges by iological signals in an increasingly unobtrusive, yet accurate
designing a stress classification method based on K&MD fashion, yielding reliable and accurate stress measurements.
that leverages real-world, in-the-wild data acquired in an Nonetheless, as users may not be willing to wear or use
uncontrolled setting resembling traditional office or WFH monitoring devices constantly, it is important to investigate
scenarios. complementary strategies.
Specifically, the contribution of this article is threefold: Behavioral alterations in response to stress include bod-
• a Web-based stress induction setup for collecting K&MD ily gestures (e.g., facial expressions, body pose) [10],
data in the wild: users are asked to engage in several tasks, [13] and speech [14], which can be detected using cam-
representative of various computer-based activities, under eras, microphones and 3D cameras (e.g., Kinect) in com-
different stressful conditions; bination with computer vision and speech analysis algo-
• fine-grained 3-level stress detection based on a variety of rithms [14]. Another important line of research is detecting
K&MD features; stress from daily life activity, such as eating [41], com-
• a cross-subjects validation design: while most previous puter interactions [5], [35], [36], or driving. Besides the cost
works evaluated their algorithms through subject- of deploying such sensors, there are significant privacy and
dependent validations, which ensures higher accu- acceptability issues associated with constantly recording sub-
racy [28], cross-subject validation is essential to quantify jects. In contrast, analyzing naturally occurring interactions
algorithm robustness, especially prior to its deployment with electronic devices, such as smartphones [17] and com-
in production environments [37]. puters [1], [7], [20], [36], does not require additional and
This work extends a preliminary investigation [36] that demon- potentially intrusive hardware.
strated the feasibility of these objectives through a controlled Another important distinction is related to the setting in
study of stress detection, where just 2 classes were considered. which stress detection is carried out, e.g., during every-
day home activities [4], [9], working [5], [8], [17], driving,
and in outdoor places [4], [9]. Although some detection
II. BACKGROUND
methods can target more than one setting, special-purpose
A. Overview of Stress Detection Techniques approaches are much more common. We focus here in par-
Stress manifests itself in a plurality of ways which ticular on professional and office environments. In [5], var-
can be broadly classified as psychological, behavioural and ious technologies for monitoring office workers’ emotions,

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
14 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 67, NO. 1, FEBRUARY 2021

including stress and mental load, were compared, and mouse approach is also more suited to in-the-wild data collection,
and keyboard obtained the highest scores in most of the where additional sensors are not easy to deploy.
analyzed dimensions, including use of common hardware, A dimension that further distinguishes works in this field
cost-effectiveness, intuitiveness, availability and privacy com- concerns the diversity of features used. This is partly due to
pliance. Nonetheless, in another review on automatic stress the fact that keyboard- and mouse-based features may be task-
recognition for office environments [35], only a handful out specific [26]. Early research focused on keyboard activity, as
of the two hundred references cited in that work was actually password typing had been extensively studied for authentica-
based on K&MD. Additionally, lower accuracy is generally tion purposes [3]. More recent work [16] leveraged keystroke
obtained compared to methods based on computer vision and pause length, time per keystroke, time between keystrokes,
wearable devices. and frequency of deletion and navigation keys [16], as well as
mouse speed and directions [32].
B. Stress Detection From K&MD Results showed that, in laboratory conditions, it is feasible
Early studies of K&MD focused on simple tasks like pass- to classify stress conditions with comparable accuracy to other
word entering. When dealing with more sophisticated tasks affective states (75% with k-NN). Better results have been only
(in terms of interactions), many factors may influence typ- obtained by using ad hoc hardware or combining K&MD with
ing rhythm or mouse movements, including individuals’ age, other sensing techniques [27].
gender, handedness, skills, physical and mental state, and By moving from the above review, several research gaps
familiarity with the task, as well as external conditions, like were identified and addressed in this study. First, few works
hardware or software used, presence of disturbing elements, addressed in-the-wild setups [22], [37], and even fewer car-
etc., [26]. Unsurprisingly, this fact pushed researchers to work ried out a proper stress classification. Some had to fall back
under laboratory controlled conditions or, alternatively, to sim- to a more general valence [23] or emotion [1], [6] classifi-
plify on-field studies by focusing on specific tasks, such as cation, mainly because of the lack of data. Khan et al. [7]
computer programming [6], [7], [23]. An exception is reported performed just a regression analysis on data collected in-the-
in [1], in which participants of a user study were simply wild, suggesting the potential for a future stress detection.
requested to carry out their usual daily activities (like, e.g., Many works focused on 2-class stress detection; however,
using a word processor, or an email application) and to rate a stress classification of at least 3 levels would be closer
their emotions from time to time. The main drawback of this to clinical evaluations and more useful in real-life applica-
approach is the lack of control on the actual emotions tested tions [38]. Finally, algorithms validation can be performed
or the number of collected samples. through within-subject or between-subjects cross validation
Both on-field and laboratory studies have to deal with methods. Within-subject validation often leads to better results,
the fundamental issue of how to induce and rate affective but the ability of an algorithm to generalize over unseen users
states [33]. is crucial to quantify its robustness and suitability to real
A common solution is to rely on ad hoc tasks, often drawn scenarios [9], [14], [37]. Additionally, training user-specific
from psychological literature, to raise participants’ memory classifiers would require a great amount of data, thus limiting
load, irritation, anxiety, pressure, or similar cognitive stress applicability in many real-world applications [6], [23]. The
states. Participants are requested, e.g., to perform mental cal- need to investigate between-subjects multiclass stress classi-
culations [16], [26], play math- or logic-related games (e.g., fication is much more evident for in-the-wild studies, since
Tower of Hanoi) [18], or stressing exercises (e.g., Stroop’s previous literature did not address this issue.
color-word interference test [11]) [13], [15]), remember a num-
ber of words or digits (n-back memory task) [34], answer
III. E XPERIMENTAL P ROTOCOL FOR
questions about a given clip or text [21], etc. A smaller num-
S TRESS I NDUCTION
ber of works leveraged tasks closer to everyday activities, like
transcribing a text [19] or searching an item in a website [29]. A. Subjects Recruitment
Different stress levels can be inducted by varying task Since our study focuses on stress detection in-the-wild,
complexity [12] and/or introducing external stressors [9]. few constraints were set in recruiting subjects. In particular,
For instance, the characteristics of the test environment an invitation was sent to students and teaching staff at the
can be changed from comfortable, to neutral and stress- authors’ universities, who were asked to extend the invita-
ful through relaxing music, silence, or loud noise [24]. tion to relatives and friends. The study was conducted during
Other stressors include varying level of guidance [6], intro- the COVID-19 outbreak, when subjects were mostly working
ducing time constraints, random disturbing events (faults, remotely. A total of 62 subjects were recruited. All of them had
interruptions, etc.), monetary compensation [19] and social to be at least 18 years old (28 on average, σ = 8, 40 males and
pressure [17], [20]. These stressors are similar to those typical 22 females), native Italian speakers, and use computers daily.
of work environments, such as dealing with constant noise or Subjects with previous history of cardiac, neurological or anx-
interruptions or meeting strict deadlines. iety disorders, color blindness, or prescription drugs for sleep
Data labeling, i.e., rating of perceived (level of) emotions is disorders were asked to self-exclude from the study. Subjects
another critical factor, in particular for on-field studies. While had to confirm that they had not consumed any alcohol or
in some cases, external raters or recordings are used [31], caffeine the day of the experiment, and any psychoactive drug
the most frequent option is self-rating [1], [30]. The latter within the 48 hours preceding the experiment.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
PEPA et al.: STRESS DETECTION IN COMPUTER USERS FROM KEYBOARD AND MOUSE DYNAMICS 15

Fig. 2. Scheme of the experimental protocol. After an initial rest phase, the procedure comprises a sequence of 8 tasks (4 tasks, each repeated twice with
increasing difficulty). After each task, the subject assessed his/her stress level on a 1-to-10 scale.

B. Equipment well-known problem-solving task in experimental psychology,


The experimental protocol was administered through a being a relatively straightforward puzzle that requires very
purpose-built Web application, allowing participants to com- simple instructions and no additional domain knowledge. It
plete the test whenever they wanted, from their homes or consists of 3 rods and a number of disks of different sizes,
offices, and using their own equipment (keyboard, mouse, which can only be moved on top of smaller disks. As shown
screen, etc.). The application, which could be accessed in Fig. 2, the game starts with the disks properly stacked in
through a link provided with the invitation (https://www. the left rod. The objective is to move the entire stack to the
revoltsrl.it/stress/#/welcome), was implemented using Angular, very right rod by obeying the above constraints. With 3 disks,
a TypeScript based open-source front-end framework for Web the puzzle can be solved in 7 moves. No time constraints were
development. The application is responsible both for admin- set. On average, subjects needed 71 seconds and 15 moves to
istering the tasks, as well as for collecting data concerning solve the puzzle (σ = 38 and σ = 5, respectively).
keyboard and mouse operations, as detailed in Section IV-A. The third task was aimed at collecting mouse data by inducing
a given stress level through a n-back task. In order to approximate
C. Procedure and Tasks a common computer task, we implemented a Web-based version
The experimental protocol was designed to induce stress by of Simon Says, a well-known electronic memory game [34].
performing several computer tasks while distracted by sounds The game requires the subjects to repeat a sequence of sounds,
and other disturbances. Prior to the experiment, subjects were increasing the length by one every time they succeed, until
informed about the number of tasks, that each task had been the maximum number of sounds is reached. In case of errors,
designed to mimic common work or leisure activities, and that the sequence is repeated. On average, subjects made 4 errors
after each one they had to self-evaluate their perceived stress (σ = 3) and spent 77 seconds (σ = 48) on this task.
level on a scale from 1 (low) to 10 (high). The fourth task modeled the impact that interferences can
The experimental protocol started with a rest phase in order have on task execution, increasing subjects’ reaction time and
to set a baseline for stress measurement. Subjects were asked perceived stress. In this four-quadrant task [2] (a variant of
to watch a relaxing movie on underwater nature, included in the Stroop test [11]), subjects are shown the names of several
the application, for five to ten minutes, or until they felt as colors displayed in differently colored fonts (e.g., the word
relaxed as possible. After rating their perceived stress level, “red” in a yellow font). Subjects have to click on the col-
they were allowed to start the procedure. ored quadrant which corresponds to the spelled word (“red”
The procedure comprises four tasks, each executed twice in our example), instead of the font color (yellow in our exam-
with increasing difficulty (Fig. 2): ple). In previous studies [2], subjects had instead to click on
• Text typing (easy), copying a short text; the quadrant corresponding to the font color and ignore the
• Tower of Hanoi (easy), with 3 disks; word. Preliminary experiments showed that higher level of
• Simon Says game (easy), sequence of 5 sounds; stress could be induced by the chosen implementation [36].
• Four-quadrant test (easy); Subjects had a maximum of 3 seconds for clicking on the
• Tower of Hanoi (difficult), with 5 disks; right quadrant: after that, a new word and color were gener-
• Simon Says game (difficult), sequence of 10 sounds; ated. Every time they did not click the right quadrant in time, a
• Four-quadrant test (difficult); strong and unpleasant buzz sound was played. The task lasted
• Text typing (difficult), transcribing a dictated text. 90 seconds on average (σ = 54).
The tasks were selected for their potential to raise cognitive In the fifth task, the Tower of Hanoi game was used again.
load and anxiety based on a preliminary study conducted by However, in order to induce much higher levels of stress, 5
the authors [36], as well as previous studies [22], [35]. disks were used (requesting at least 31 moves). Moreover, two
In the first text typing task, which resembles everyday office different stressors were added, namely a disturbing tick-tock
activities [19], provided instructions stated that it was neither sound and a timer (set to 300 seconds). On average, subjects
an accuracy contest nor a race; subjects had to type at a normal needed 215 seconds (σ = 82) and 80 moves (σ = 34).
pace, and take the time needed for fixing possible mistakes. Similarly, in the sixth task, the Simon Says game was used
Average duration of this task was 339 seconds (σ = 129). again, with sequences of 10 sounds. On average, subjects made
In the second task, subjects were requested to solve a 10 errors (σ = 9) and spent 146 seconds (σ = 52) on this
simple Tower of Hanoi game (like, e.g., in [18]). This is a task. In the seventh step, the four-quadrant task was repeated

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
16 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 67, NO. 1, FEBRUARY 2021

with the addition of a tick-tock sound. Time limit for clicking • mouse inactivity (single value, count): time for which the
the correct quadrant was also reduced to 2 seconds. Subjects user is not moving the mouse (ms);
employed on average 135 seconds (σ = 43) to complete it. • number of clicks (single value, count);
In the last task, subjects were requested to type a dictated • click dwell time (array): duration of each click (ms);
text. Dictation speed was rather high, and it was not possible • click distance (array): time distance between two consec-
to pause it. Subjects were requested to type at their fastest utive clicks (ms).
pace, trying to fix typos and errors if possible. The duration All the features were validated in previous literature, as well
of this task was 320 seconds. At the end of the experiment, as in a preliminary experiment in laboratory conditions [36].
collected data were saved on the computer, and subjects were Some features may assume an invalid value in given windows
requested to send them for dataset creation and processing. (e.g., click distance needs at least two clicks in order to be
significant): these windows were excluded from the analysis.
IV. P ROPOSED M ETHOD FOR S TRESS C LASSIFICATION
A. Data Collection C. Stress Classification Algorithm
The data collected during the experimental protocol are Keyboard and mouse are rarely used at the same time when
divided in 3 categories: stress self-assessment, keyboard data, working at a computer; hence, different classifiers were built
and mouse data. Self assessment data consists of self-reported in order to predict the stress level depending on what device
measures of the perceived stress level (on a 1 to 10 scale) col- the participant was using. Each time windows was labelled
lected after each task of the experimental protocol. Keyboard as low (1–3), medium (4–7) or high (8–10) stress. After min-
data were acquired during the 2 text typing tasks. For each max normalization, feature selection was performed based on
keystroke, the software records the character typed, if it was a Neighborhood Component Analysis (NCA). In previous work,
keyup or keydown event (boolean value), the duration of the best performance were achieved when limiting the analysis
pressure (ms) and a timestamp (ms). Mouse data were acquired to the most discriminative features [36]. Building on (and
during the Tower of Hanoi, Simon Says, and four-quadrant further extending) our preliminary investigation [36], several
tasks and included, for each click, the mouse coordinates (x ML techniques were compared in order to build the classifiers:
and y), the presence of a press or release (boolean value, for k-Nearest Neighbour, Support Vector Machines, Decision Trees,
both the left and right button), the click duration and dwell and Random Forest (RF). Since RF reached the best results,
time (in ms), and a timestamp (in ms). All the data were reg- the remainder of this work will focus on this method.
istered by the Angular application, exported as CSV files and In order to deal with sparse and inaccurate labeling deriv-
imported in MATLAB for data analysis and classification. ing from on-field data, a Multiple Instance Learning (MIL)
approach was applied. MIL is a semi-supervised learning tech-
B. Feature Extraction nique where the task is learned given labelled groups, or
A sliding window of 5 seconds without overlap was applied “bags”, each containing multiple training samples. Since MIL
on keyboard and mouse data, and feature extraction was per- does not assume complete knowledge of training labels, it is
formed on each window. For features that yield an array of particularly suited to in-the-wild data analysis. In our case, all
values for each time window, the maximum, minimum, mean, the time windows from the same task share the same label;
standard deviation (std), and point-to-point (ptp) variation (dif- the classifier will learn through approximately classified time
ference between maximum and minimum) were extracted, for intervals (bags) rather than individual instances (single time
a total of 5 features. Other features were directly computed as a windows). In this article, Majority-Voting RF is selected as
single value in the window. In the following, feature categories MIL extension of a RF classifier. Bags length was set to 90
are referred to as either “array” or “single value”. seconds with an overlap of 50%.
In summary, 15 keyboard features were computed [16], [25]: A subject-independent 5-fold cross validation was adopted
• key dwell time (array): press-to-release time of each key to test both classifiers: 80% of the participants were used
(ms); for training, the remaining 20% for testing. The classifier is
• key down-to- down time (array): time elapsed from the thus tested on never-seen participants, which is a condition
press of one key to the press of the next key (ms); close to real applications, as discussed in Section II. Classifiers
• key velocity (single value, mean): number of keys pressed performance was evaluated on validation bags via confusion
per second; matrix, accuracy, precision, recall, and F1-score. Performance
• latency time (single value, mean): time elapsed from a scores are averaged over the 5 folds.
key release to the press of the next key (ms);
• number of backspaces (single value, count);
• number of key pressed (single value, count); V. R ESULTS
• key press (single value, percentage): amount of the All the participants successfully completed all the planned
window with at least one key pressed. tasks, for a total of 496 tasks. Task recordings that were
From mouse data, 22 features were computed [32]: empty (38 tasks, 7.7%), clearly shorter than the minimum time
• mouse velocity (array): change in position per second; required to complete the task (8), or much longer than the max-
• mouse acceleration (array): variation of velocity per imum plausible time (8) were excluded. After the exclusion of
second; invalid trials, a total of 411 (100 low, 219 medium, 92 high)

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
PEPA et al.: STRESS DETECTION IN COMPUTER USERS FROM KEYBOARD AND MOUSE DYNAMICS 17

Fig. 3. Stress levels self-assessed by participants for the tasks in the experi-
mental protocol. The easy and difficult versions of the same task are displayed Fig. 5. Distribution of the most discriminative typing features according to
in the same color (typing tasks: purple; Tower of Hanoi tasks: blue; Simon NCA feature selection. F2, F3, F4: key dwell time (minimum, mean, std). F6:
Says tasks: green; four-quadrant tasks: orange), rest phase is in gray. key velocity. F8, F10: key down-to-down time (minimum, std). F13: number
of keys. F14: key press percentage. F15: mean latency time.

Fig. 6. Confusion matrix for (a) keyboard and (b) mouse classifier. Columns:
true classes, rows: predicted classes.

TABLE I
R ECALL , P RECISION , AND F1-S CORE OF THE K EYBOARD (K) AND
M OUSE (M) C LASSIFIERS FOR THE 3 C LASSES , L OW (L),
Fig. 4. Distribution of the most discriminative mouse features according M EDIUM (M), AND H IGH (H) S TRESS
to NCA feature selection. F3, F4: mouse velocity (mean, std). F12, F15:
click dwell time (maximum, std). F17, F19, F20, F21: click time distance
(maximum, mean, std, ptp).

and 429 (120 low, 222 medium, 87 high) bags were extracted
from mouse and keyboard data respectively.
Participants’ stress levels were compared between the easy TABLE II
C OMPARISON W ITH THE L ITERATURE . ACRONYMS : FFNN
and difficult version of each task (Fig. 3) and between each (F EED -F ORWARD N EURAL N ETWORK ), KNN (K-N EAREST N EIGHBOUR ),
task and the rest phase. At one-way non-parametric ANOVA, SVM (S UPPORT V ECTOR M ACHINE ), RF (R ANDOM F OREST ),
median differences were statistically significant (p < 0.05) for (MIL) (M ULTIPLE I NSTANCE L EARNING )
all the tasks except the easy Tower of Hanoi and the rest phase.
The mouse most discriminative features (Fig. 4), as selected
by the NCA, were: mouse velocity (mean, std), click dwell
time (maximum, std), and click distance (maximum, mean, std,
ptp). The keyboard most discriminative features (Fig. 5) were:
key dwell time (maximum, minimum, std), key velocity, key
down-to-down time (minimum, std), number of key pressed,
key press percentage, and latency time.
Classification accuracy reached 63% and 76% for mouse
and keyboard classifiers, respectively. Fig. 6 shows confusion difficult versions of each task. At the same time, it showed
matrices for the keyboard (Fig. 6(a)) and mouse (Fig. 6(b)) that the stress level varied from one task to the other, as con-
classifiers. Columns are the true classes, rows the predicted firmed by post-hoc interviews with participants. Indeed, the
ones. The considered classes, i.e., low, medium, and high experimental protocol was devised to re-create a scenario sim-
stress are indicated with letters L, M, and H respectively. ilar to a common working day, in which different tasks need
Classification performance in terms of recall, precision and to be accomplished under varying stress levels. Eventually,
F1-score is presented in Table I. the experimental protocol induces an overall increase in per-
ceived stress level which can be correctly detected by machine
learning techniques.
VI. D ISCUSSION Differences in tasks, data, classes, and algorithms make
Statistical analysis of participants’ stress self-assessment it difficult to directly compare obtained results with exist-
revealed that the selected tasks were able to increase the stress ing literature. However, it is worth summarizing the current
level from the rest condition, as well as from the easy to the and previous contributions concerning affective states or stress

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
18 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 67, NO. 1, FEBRUARY 2021

detection using K&MD, as proposed in Table II, which reports Future works will explore how K&MD could be combined
the number of participants, setting, sensor, algorithm, num- with other stress detection methods (e.g., vision-based meth-
ber of classes and accuracy. The current work is the only ods or wearable sensors). The proposed methodology can
performing a ternary stress classification using in-the-wild be extended by exploring other tasks, more closely related
data collected from a wide sample (62 participants) and to real-life tasks, and by exploring inter-subject as well as
reaching an accuracy above 75% (using KD). Several dis- cross-subject designs. An open challenge is how to disen-
criminative features were found significant also in previous tangle variations in K&MD patterns related to the task to
works, such as key dwell time [1], [16], [21], key down-to- those due to the users’ stress response. A further research
down [1], [21], [23], key latency [1], [21], number of keys or direction to explore is how stress detection can be leveraged
keys rates [1], [16], and mouse velocity [20], [23]. Decision to enhance human-computer interface, e.g., by adapting the
trees and RF were also investigated in some previous works system behavior according to the user’s emotional states, or
leading to better performance with respect to other algo- providing feedback to the users in order to increase their
rithms [1], [21]. However, the introduction of a MIL approach awareness of their cognitive and mental state. A computer or
possibly contributed to better results by mitigating the effect mobile device may integrate data acquired by multiple IoT
of inaccurate labeling, typical of natural settings. devices and wearable sensors at different times of the day
While in line with previous literature, our results also to build an accurate, fine-grained and dynamic picture of the
showed some weaknesses, especially in the prediction of the user’s cognitive and emotional state.
low stress class (F1-score is equal to 0.47 for the keyboard,
and 0.3 for the mouse). In contrast, for both the medium and R EFERENCES
high classes, precision and recall are comparable, as indicated
[1] C. Epp, M. Lippold, and R. Mandryk, “Identifying emotional states using
by the F1-score (keyboard, 0.84 for medium stress and 0.54 for keystroke dynamics,” in Proc. Conf. Hum. Factors Comput. Syst., 2011,
high stress, mouse, 0.69 for medium stress and 0.75 for high pp. 715–724.
stress) and well above the chance level. However, an error [2] S. H. Lau, “Stress detection for keystroke dynamics,” M.S. thesis. Dept.
School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA,
in classifying the low stress class has a much lower impact 2018.
that an error in classifying a high stress class. These prob- [3] D. L. Salil and P. Banerjee, “Biometric authentication and identification
lems were also mentioned in previuos works: the difficulty to using keystroke dynamics: A survey,” J. Pattern Recognit. Res., vol. 7,
no. 1, pp. 116–139, 2012.
obtain high performance from K&MD is known in on-field [4] Y. S. Can, B. Arnrich, and C. Ersoy, “Stress detection in daily life sce-
studies [1], and it was considered a drawback of the lack of narios using smart phones and wearable sensors: A survey,” J. Biomed.
control on induced affective states and data loss. Informat., vol. 92, Apr. 2019, Art. no. 103139.
[5] D. Carneiro, P. Novais, J. C. Augusto, and N. Payne, “New methods
Overall, based on the above considerations the outcomes of for stress assessment and monitoring at the workplace,” IEEE Trans.
this study appear to be promising and particularly relevant for Affective Comput., vol. 10, no. 2, pp. 237–254, Apr.–Jun. 2019.
future developments, especially considering that results from [6] A. Kolakowska, “Towards detecting programmers’ stress on the basis of
keystroke dynamics,” in Proc. Federated Conf. Comput. Sci. Inf. Syst.,
real-world, in-the-wild setups are generally regarded as much Gdansk, Poland, 2016, pp. 1621–1626.
more informative than those from controlled setups [22], [37]. [7] I. A. Khan, W. P. Brinkman, and R. Hierons, “Towards estimating com-
Some limitations of the proposed approach, such as data puter users’ mood from interaction behaviour with keyboard and mouse,”
Front. Comput. Sci., vol. 7, pp. 943–954, Oct. 2013.
loss, are typical of in-the-wild experiments. Other encoun- [8] A. Belk, D. Portugal, P. Germanakos, J. Quintas, E. Christodoulou, and
tered limitations were documented also in controlled studies, G. Samaras, “A computer mouse for stress identification of older adults
like labeling uncertainty, class imbalance (due to the difficulty at work,” in Proc. 1st Int. Workshop Hum. Aspects Adapt. Pers. Interact.
Environ., 2016, pp. 1–4.
to solicit high stress levels), and task selection (for instance, [9] L. Rachakonda, S. P. Mohanty, E. Kougianos, and P. Sundaravadivel,
the Simon Says and the four-quadrant tasks tend to generate “Stress-lysis: A DNN-integrated edge device for stress level detection in
specific mouse patterns that may not generalize for different the IoMT,” IEEE Trans. Consum. Electron., vol. 65, no. 4, pp. 474–483,
Nov. 2019.
tasks). This last aspect could be further investigated by includ- [10] I. Lefter, G. J. Burghouts, and L. J. M. Rothkrantz, “Recognizing stress
ing tasks that are less clinically relevant as stressors, but that using semantics and modulation of speech and gestures,” IEEE Trans.
are more similar to everyday computer activities. Affective Comput., vol. 7, no. 2, pp. 162–175, Apr.–Jun. 2016.
[11] J. R. Stroop, “Interference in serial verbal reactions,” J. Exp. Psychol.,
vol. 18, no. 6, pp. 643–661, 1935.
[12] C. Jyotsna and J. Amudha, “Eye gaze as an indicator for stress level anal-
VII. C ONCLUSION ysis in students,” in Proc. Int. Conf. Adv. Comput. Commun. Informat.,
Bangalore, India, 2018, pp. 1–6.
The aim of this work was to reach a subject-independent, [13] G. Giannakakis, D. Manousos, V. Chaniotakis, and M. Tsiknakis,
multiclass stress classification in computer users in an uncon- “Evaluation of head pose features for stress detection and classifica-
trolled environment through a non-intrusive, non-invasive, and tion,” in Proc. IEEE EMBS Int. Conf. Biomed. Health Informat., Las
Vegas, NV, USA, 2018, pp. 406–409.
cost-effective solution. To this purpose, data generated by com- [14] A. Riera et al., “Electro-physiological data fusion for stress
mon multimedia input peripherals were collected in-the-wild detection,” Stud. Health Technol. Informat., vol. 181, pp. 228–232,
from 62 subjects using their own computer-based equipment. Jan. 2012.
[15] F. Mokhayeri, M.-R. Akbarzadeh-T, and S. Toosizadeh, “Mental stress
MIL applied to a RF algorithm reached the best results in detection using physiological signals based on soft computing tech-
classifying 3 stress levels. While confirming some of the lim- niques,” in Proc. 18th Iran. Conf. Biomed. Eng., Tehran, Iran, 2011,
itations known in the literature, the findings of this study pp. 232–237.
[16] L. M. Vizer, L. Zhou, and A. Sears, “Automated stress detection using
contribute at shedding further light on a challenging, though keystroke and linguistic features: An exploratory study,” Int. J. Hum.
extremely important goal for this field of research. Comput. Stud., vol. 67, no. 10, pp. 870–886, 2009.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.
PEPA et al.: STRESS DETECTION IN COMPUTER USERS FROM KEYBOARD AND MOUSE DYNAMICS 19

[17] M. Ciman and K. Wac, “Individuals’ stress assessment using human- [40] M. K. Wekenborg, B. von Dawans, L. K. Hill, J. F. Thayer, M. Penz,
smartphone interaction analysis,” IEEE Trans. Affective Comput., vol. 9, and C. Kirschbaum, “Examining reactivity patterns in burnout and
no. 1, pp. 51–65, Jan.–Mar. 2018. other indicators of chronic stress,” Psychoneuroendocrinology, vol. 106,
[18] L. Ciabattoni, F. Ferracuti, S. Longhi, L. Pepa, L. Romeo, and F. Verdini, pp. 195–205, Aug. 2019.
“Real-time mental stress detection based on smartwatch,” in Proc. IEEE [41] L. Rachakonda, S. P. Mohanty, and E. Kougianos, “iLog: An intelligent
Int. Conf. Consum. Electron., Las Vegas, NV, USA, 2017, pp. 1–2. device for automatic food intake monitoring and stress detection in the
[19] J. Hernandez, P. Paredes, A. Roseway, and M. Czerwinski, “Under pres- IoMT,” IEEE Trans. Consum. Electron., vol. 66, no. 2, pp. 115–124,
sure: Sensing stress of computer users,” in Proc. SIGCHI Conf. Hum. May 2020.
Factors Comput. Syst., 2014, pp. 51–60.
[20] T. Kowatsch, F. Wahle, and A. Filler, “Design and lab experiment of Lucia Pepa (Member, IEEE) received the mas-
a stress detection service based on mouse movements,” in Proc. 11th ter’s degree in electronic engineering in 2012,
Mediterr. Conf. Inf. Syst., 2017, pp. 1–17. and the Ph.D. degree in E-learning—Technology
[21] R. Shikder, S. Rahaman, F. Afroze, and A. A. Islam, “Keystroke/mouse Enhanced Learning from the Università Politecnica
usage based emotion detection and user identification,” in Proc. Int. delle Marche, Italy, in 2016, where she is currently
Conf. Netw. Syst. Security, Dhaka, Bangladesh, 2017, pp. 1–9. a Postdoctoral Researcher. Her primary research
[22] G. Giannakakis, D. Grigoriadis, K. Giannakaki, O. Simantiraki, interests involve affective computing and movement
A. Roniotis, and M. Tsiknakis, “Review on psychological stress detec- analysis through consumer electronics devices.
tion using biosignals,” IEEE Trans. Affective Comput., early access,
Jul. 9, 2019, doi: 10.1109/TAFFC.2019.2927337.
[23] H. Liu, O. Noel, N. Fernando, and J. C. Rajapakse, “Predicting affec-
tive states of programming using keyboard data and mouse behaviors,” Antonio Sabatelli received the M.Sc. degrees in
in Proc. 15th Int. Conf. Control Autom. Robot. Vis., Singapore, 2018, biomedical engineering from Università Politecnica
pp. 1408–1413. delle Marche, Italy, in 2019. He is currently a
[24] D. E. Vargas Ligarreto, and D. López De Luise, “Metrics design for key- Software Engineer with Revolt SRL, Ancona, Italy.
board and mouse: Assessing learning levels,” in Proc. Congr. Electron. His research interests include computational intelli-
Elect. Eng. Comput., Montevideo, Uruguay, 2017, pp. 1–4. gence, biomedical signal processing, and consumer
[25] K. Revett, F. Gorunescu, M. Gorunescu, M. Ene, S. Magalhaes, and electronics devices.
H. Santos, “A machine learning approach to keystroke dynamics based
user authentication,” Int. J. Electron. Security Digit. Forensics, vol. 1,
no. 1, pp. 55–70, 2007.
[26] Y. M. Lim, A. Ayesh, and M. Stacey, “Detecting cognitive stress from
keyboard and mouse dynamics during mental arithmetic,” in Proc. Sci. Lucio Ciabattoni (Member, IEEE) received the
Inf. Conf., London, U.K., 2014, pp. 146–152. M.Sc. and the Ph.D. degrees from Universitè
[27] M. X. Huang, J. Li, G. Ngai, and H. V. Leong, “StressClick: Politecnica delle Marche, Italy, in 2010 and 2014,
Sensing stress from gaze-click patterns,” in Proc. 24th ACM Int. Conf. respectively, where he is currently an Assistant
Multimedia, 2016, pp. 1395–1404. Professor with the Department of Information
[28] S. Koldijk, M. A. Neerincx, and W. Kraaij, “Detecting work stress Engineering. His research interests include compu-
in offices by combining unobtrusive sensors,” IEEE Trans. Affective tational intelligence, AI, renewable energy solutions,
Comput., vol. 9, no. 2, pp. 227–239, Apr.–Jun. 2018. and consumer electronics devices. He is the Chair of
[29] A. van Drunen, E. L. van den Broek, A. J. Spink, and T. Heffelaar, the IEEE Italy Section CE Society Chapter.
“Exploring workload and attention measurements with uLog mouse
data,” Behav. Res. Methods, vol. 41, no. 3, pp. 868–875, 2009.
[30] W. Maehr, EMotion: Estimation of the User’s Emotional State by Mouse Andrea Monteriù (Member, IEEE) received the
Motions. Saarbrucken, Germany: VDM Verlag Dudweiler Landstr., M.Sc. degree in electronic engineering and the Ph.D.
2005. degree in artificial intelligence systems from the
[31] A. Althothali, “Modeling user affect using interaction events,” M.S. the- Università Politecnica delle Marche, Italy, in 2003
sis, Dept. Master Math. Comput. Sci., Univ. Waterloo, Waterloo, ON, and 2006, respectively, where he is currently an
Canada, 2011. Associate Professor. His research interests mainly
[32] G. Tsoulouhas, D. Georgiou, and A. Karakos, “Detection of learners’ focus on the areas of fault diagnosis and fault tol-
affective state based on mouse movements,” J. Comput., vol. 3, no. 11, erant control applied on robotic, unmanned, and
pp. 9–18, 2011. artificial intelligent systems. He is the Vice-Chair
[33] A. Kolakowska, “A review of emotion recognition methods based on of the IEEE Italy Section CE Society Chapter.
keystroke dynamics and mouse movements,” in Proc. 6th Int. Conf. Hum.
Syst. Interact., Sopot, Poland, 2013, pp. 548–555.
[34] N. Z. Gurel, H. Jung, S. Hersek, and O. T. Inan, “Fusing near-infrared Fabrizio Lamberti (Senior Member, IEEE) received
spectroscopy with wearable hemodynamic measurements improves the M.Ss. and Ph.D. degrees in computer engineer-
classification of mental stress,” IEEE Sensors J., vol. 19, no. 9, ing from the Politecnico di Torino, Italy, in 2000
pp. 8522–8531, Oct. 2019. and 2005, respectively, where he is currently a Full
[35] A. Alberdi, A. Aztiria, and A. Basarab, “Towards an automatic early Professor. His research interests include computer
stress recognition system for office environments based on multimodal graphics, human-machine interaction, and intelligent
measurements: A review,” J. Biomed. Informat., vol. 59, pp. 49–75, systems. He is serving as Associate Editor of IEEE
Feb. 2016. T RANSACTIONS ON C ONSUMER E LECTRONICS,
[36] L. Ciabattoni, G. Foresi, F. Lamberti, A. Monteriù, and A. Sabatelli, “A IEEE T RANSACTIONS ON C OMPUTERS, IEEE
stress detection system based on multimedia input peripherals,” in Proc. T RANSACTIONS ON L EARNING T ECHNOLOGIES,
IEEE Int. Conf. Consum. Electron. (ICCE), Las Vegas, NV, USA, 2020, and IEEE Consumer Electronics Magazine.
pp. 1–2.
[37] S. S. Panicker and P. Gayathri, “A survey of machine learning tech- Lia Morra (Senior Member, IEEE) received the
niques in physiology based mental stress detection systems,” Biocybern. M.Sc. and the Ph.D. degrees in computer engineer-
Biomed. Eng., vol. 39, no. 2, pp. 444–469, 2019. ing from the Politecnico di Torino, Italy, in 2002
[38] N. Sharma and T. Gedeon, “Objective measures, sensors and compu- and 2006, respectively, where she is currently a
tational techniques for stress recognition and classification: A survey,” Senior Postdoctoral Fellow with the Dipartimenti
Comput. Methods Programs Biomed., vol. 108, no. 3, pp. 1287–1301, di Automatica e Informatica. Her research interests
2012. include computer vision, pattern recognition, and
[39] B. E. Ashforth, G. E. Kreiner, and M. Fugate, “All in a day’s work: machine learning. She is serving as an Associated
Boundaries and micro role transitions,” Acad. Manag. Rev., vol. 25, Editor of the IEEE Consumer Electronics Magazine.
no. 3, pp. 472–491, 2020

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on January 09,2022 at 10:48:24 UTC from IEEE Xplore. Restrictions apply.

You might also like