A Review On Recognizing Depression in Social Networks

Journal of Ambient Intelligence and Humanized Computing



A review on recognizing depression in social networks: challenges

and opportunities
Felipe T. Giuntini1 · Mirela T. Cazzolato1 · Maria de Jesus Dutra dos Reis2 · Andrew T. Campbell3 ·
Agma J. M. Traina1 · Jó Ueyama1

Received: 18 July 2019 / Accepted: 17 January 2020

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Social networks have become another resource for supporting mental health specialists in making inferences and finding
indications of mental disorders, such as depression. This paper addresses the state-of-the-art regarding studies on recogni-
tion of depressive mood disorders in social networks through approaches and techniques of sentiment and emotion analysis.
The systematic research conducted focused on social networks, social media, and the most employed techniques, feelings,
and emotions were analyzed to find predecessors of a depressive disorder. Discussions on the research gaps identified aimed
at improving the effectiveness of the analysis process, bringing the analysis close to the user’s reality. Twitter, Facebook,
Blogs and Forums, Reddit, Live Journal, and Instagram are the most employed social networks regarding the identification
of depressive mood disorders, and the most used information was text, followed by emoticons, user log information, and
images. The selected studies usually employ classic off-the-shelf classifiers for the analysis of the available information,
combined with lexicons such as NRC Word-Emoticon Association Lexicon, WordNet-Affect, Anew, and LIWC tool. The
challenges include the analysis of temporal information and a combination of different types of information.

Keywords Depressive disorders · Affective computing · Mental health · Sentiment analysis · Emotion recognition · Social
media · Social networks · User behavior

1 Introduction and mental health, data science methodologies have also

enabled the extraction of large sets of data for identification
The area of data science has emerged and expanded towards of patterns and accumulation of significant knowledge (Cor-
meeting the growing volume of data and their required com- tés et al. 2015; Giuntini et al. 2019; Herland et al. 2014).
putational analysis capability. Approaches and algorithms of Although widely discussed, depression is popularly
machine learning and data mining enable the extraction of known as the disease of the century, as it has already been
information from large complex data sets. Such approaches a concern pointed out by researchers. For instance, CDC
have been reoriented to this new environment and used for (2001) and Luoma et al. (2002) highlighted depression
both data interpretation and creation of predictive models affected approximately 27 million Americans and could be
in financial (Idrees et al. 2019), political (Jungherr 2016), related with over 30,000 suicides each year. In a projection
medical (Cazzolato et al. 2019), and criminal (Huang et al. for the next 20 years (Mathers and Loncar 2006), depres-
2018) domains. Regarding their use in the areas of physical sion would be the leading cause of disability in high-income
countries, as the United States, and in 2014 it is one of the
* Felipe T. Giuntini most costly worldwide diseases (Centers for Disease Control
[email protected] and Prevention and others 2014). Despite its overwhelm-
ing impact on human conditions, its high co-morbidity with
Institute of Mathematical and Computer Sciences, University suicidal ideation and behavior, and its high financial cost,
of São Paulo, São Carlos, SP, Brazil
the investment in evaluation and intervention strategies
Department of Psychology, Federal University of São Carlos, have been too little. In a study about the economic return of
São Carlos, SP, Brazil
investment in the treatment of depression, Chisholm et al.
Department of Computer Science, Dartmouth College, (2016) have shown that in countries such as Brazil, it would
Hanover, NH, USA

F. T. Giuntini et al.

Fig. 1  Summary of the current study. The green boxes show the number of studies moving from one step to the next. Red boxes inform the num-
ber of excluded studies by each criterion

be necessary a contribution of 86% or more of the current Subsequent to retrieving the studies from the leading digital
investment to deal with this issue. libraries, we employed selection and exclusion criteria and
According to RC et al. (2003) and González et al. (2010), obtained the studies which were further analyzed. We aim
although several primary care programs have been designed to answer the three following questions:
for detection and treatment, the majority of Americans with
symptoms of depression did not receive treatment or the 1. Which types of social media information were adopted
treatment was insufficient. Additionally, minority ethnic for the identification of depressive mood disorders?
groups, such as Mexican Americans and African Americans, 2. Which social networks were explored in the identifica-
are significantly less likely to receive anti-depression thera- tion of depressive mood disorders?
pies than other ethnic groups. In 2017, the World Health 3. What state-of-the-art techniques have been employed
Organization (WHO) reported the total number of people for the identification and classification of depression for
living with depression worldwide was 322 million, and the each social media?
estimated total number of people living with depression
increased 18.4% between 2005 and 2015 (World Health Paper outline. The remaining parts of the paper are
Organization and others 2017). organized as follows: Section 2 describes the relevant
Data analysis techniques can automatically identify dis- background on depressive disorders. Section 3 presents
turbances, based on indicators and symptoms of depressive the method applied in the systematic review that includes
disorders (American Psychiatric Association and others search strategy, selection criteria, and data extraction. Sec-
2013). Abnormal patterns in behavior can be recognized tion 4 provides the systematic review results and an over-
in online social networks through data mining techniques, view of studies conducted, the social media and social net-
sentiment analysis, and recognition of emotions (Moreno works employed, and the sentiments and emotions analyzed.
et al. 2011). Although the current performance of predictive Section 5 discusses research questions and opportunities in
models is considered suboptimal, reliable models eventually the area of recognition of depressive mood disorders from
could detect depressive mood disorders early, thus, paving a computational and psychological point of view. Finally,
the way for fast interventions and promotion of relevant pub- Sect. 6 gives the conclusions.
lic health solutions.
The work in Guntuku et al. (2017) presents a previous
review of the use of social media for the detection of depres- 2 Background—depressive disorders
sion and mental illness. The authors focused on the Face-
book and Twitter social medias. Although their findings are The World Health Organization and others (2017) character-
promising, they only considered the analysis of depression ize depressive disorders as follows: composed of sadness,
and did not discuss the sentiments, emotions, and other dis- loss of interest or pleasure, feelings of guilt or low self-
orders that usually come along, as discussed in this paper. esteem, sleep or disturbed appetite, feeling tired, and lack
The current paper investigates the state-of-the-art of how of concentration. Besides, depression can be long-lasting or
sentiment and emotion analysis approaches can identify recurrent, substantially impairing an individual’s ability to
depressive disorders in social networks. Figure 1 summa- function at work or school or deal with daily life. In its most
rizes the steps and goals of our literature review process. severe form, depression can lead to suicide.

A review on recognizing depression in social networks: challenges and opportunities

The American Psychiatric Association recommends the school or sexual); excessive involvement in activities of
use of the most recent edition of the Diagnostic and Sta- high potential for harmful consequences, such as exces-
tistical Manual of Mental Disorders (American Psychiatric sive purchases, sexual indiscretion, unplanned invest-
Association and others 2013), widely known as DSM-V, for ment; sleeplessness.
the evaluation and diagnosis of the depressive status by men- – Melancholic characteristics: At least one of the follow-
tal health professionals. Duailibi and da Silva (2014) argue ing symptoms: loss of pleasure in most activities; and
that major depressive disorder is composed of at least five of lack of reaction to usually enjoyable activities. Also,
the following symptoms, present for at least two weeks, and three (or more) of the following symptoms: depressed
which represent changes in the prior functioning of the indi- mood characterized by profound dejection; worsens
vidual. Since at least one of the symptoms are (1) depressed symptoms in the morning; terminal insomnia; restless-
mood or (2) loss of interest or pleasure. ness or psycho-motor retardation; significant loss of
appetite or anorexia; excessive or inappropriate guilt.
• Depressed mood in most days, almost every day (e.g., – Atypical characteristics: Mood reactivity (improvement
feeling sad, empty or hopeless) by subjective observation with stimuli) or two (or more) of the following symp-
or by third parties; toms: increased appetite or weight gain; hypersomnia;
• Accentuated decrease in pleasure or interest in all or feeling of weight in the legs or the arms, besides lack of
almost all activities most of the day, almost every day; energy; a lasting pattern of sensitivity to social rejection,
• Significant loss or gain in weight without being on a resulting in significant social prejudice.
diet (e.g., change in more than 5% of body weight in a – With psychotic delusions or seasonal patterns: delu-
month), or increased or decreased appetite almost every sions related to depression or depressive disorder only in
day. In children, consider the inability to present the specific periods of the year.
expected weight gains;
• Insomnia or hypersomnia almost every day; This complex set of symptoms has been associated with
• Psychomotor restlessness or retardation almost every a substantial loss in quality of life indicators. We usually
day; observe a cause of significant distress or impairment in
• Fatigue and loss of energy almost every day; social, occupational, or even other essential areas of the indi-
• Feeling of worthlessness or excessive or inadequate guilt, vidual’s life (Duailibi and da Silva 2014). Moreover, studies
almost every day; examining the relationship between depression and poverty
• Reduced ability to think or lack of concentration or inde- conditions have indicated that the prevalence of depression
cision, almost every day; may represent economic impacts similar to natural disasters,
• Recurrent thoughts of death, recurrent suicidal ideation civil conflicts, or abrupt changes in society. The study seems
without a specific plan, or attempted suicide or specific to indicate that the higher the scores obtained in the depres-
plan to commit suicide. sion’s indicator were related to a less amount of time devoted
to work, lower total expenditure, and reduced investment in
Additionally, the authors emphasize that it is possible to education. However, tobacco consumption was higher in this
add information in the diagnostic process, called specifiers. population (Barrett et al. 2019).
These specifiers allow a better characterization and progno- The American Psychiatric Association and others (2013)
sis of each case: is offering several “emerging measures” for future research
in clinical evaluations for a standardized and accurate diag-
– Anxiety characteristics: Demands the presence of at nosis. These measures were developed to be administered
least two of the following symptoms on most days: feel- at the initial interview with the patient and monitor the pro-
ing tense; restless; difficulty concentrating due to con- gress of the treatment. They should also be used in research
cerns; fear that something terrible will happen; control and evaluation for potentially improving clinical decision-
loss feelings. In cases where two symptoms are present, making, and not as the sole basis for a clinical diagnosis. The
it is considered mild; three is moderate; four or five is PROMIS Emotional Distress–Depression form,1 for exam-
moderate to severe; and with motor agitation is severe. ple, is a useful tool for the mental health professional to con-
– Mixed characteristics: At least three of the following firm the presence or absence of depressive disorder, as well
symptoms of mania and hypomania should be present as its level. The patient answers a set of eight questions on
almost every day during an episode of major depressive the frequency he/she has been bothered by a list of “negative
disorder: elevated mood, high self-esteem; more speech
than usual or greater speech pressure; escape from ideas
or subjective experience that thoughts are accelerated; 1
Available online in http://www.dsm5.org/Pages​/Feedb​ack-Form.
increased energy for a specific activity (social, work, aspx.

F. T. Giuntini et al.

feelings” during the last week. The choices of answers range

among never, rarely, sometimes, often and always.

3 Method applied in present study

This section presents the method employed for the devel-

opment of a systematic review on the identification of
depressive mood disorders in online social networks. The
Fig. 2  Portion of studies retrieved from each digital library consid-
procedure supported by Parsifal tool,2, which follows the ered in our work
guidelines reported in (Barbara and Charters 2007) and a
protocol was defined for the development of this review.
disorders in online social networks, (4) are duplicate, that
3.1 Search strategy is, they have already been evaluated, were excluded.

The following search string answered the three research 3.3 Data extraction
questions reported in Sect. 1:
After selecting the studies and obtaining the set of stud-
((𝚜𝚎𝚗𝚝𝚒𝚖𝚎𝚗𝚝 ∗ 𝙾𝚁𝚎𝚖𝚘𝚝𝚒𝚘𝚗 ∗ )
ies that had met the inclusion criteria, data were extracted
through a complete reading of the selected studies. An
extraction form with the following questions was defined:
𝙾𝚁𝚕𝚘𝚠 − 𝚜𝚙𝚒𝚛𝚒𝚝𝚎𝚍𝚗𝚎𝚜𝚜)
(1) To which social network was the study applied? (2)
𝙰𝙽𝙳(}}𝚜𝚘𝚌𝚒𝚊𝚕𝚗𝚎𝚝𝚠𝚘𝚛𝚔�� 𝙾𝚁
What social media were considered in the study? (3) Does
}}𝚘𝚗𝚕𝚒𝚗𝚎𝚌𝚘𝚖𝚖𝚞𝚗𝚒𝚝𝚢 ∗ �� 𝙾𝚁}}𝚜𝚘𝚌𝚒𝚊𝚕𝚖𝚎𝚍𝚒𝚊�� ))
the study identify emotions or feelings? If so, which ones?
The primary studies that identified depression in online (4) Apart from depressive aspects, what other disorders are
social networks were retrieved, and the feelings and emo- inferred by the study? (5) What is the main methodology
tions recognized in the process, along with the techniques employed?
employed, were then obtained. The search was conducted
in November 2018. The studies were collected in the fol-
lowing digital databases and with no restriction of publi- 4 Results
cation period: ACM Digital Library, Compendex, IEEE
Digital Library, ISI Web of Science, Science Direct, Scopus, This section addresses the literature review conducted. First,
Springer Link, and Taylor, and Francis Online. Specifically, we give an overview of the way the studies were obtained.
at the Springer database, we chose to filter studies from the We also present the exclusion criteria considered in this
area of computer science only, due to the high number of work. Each selected study is then presented, along with a
studies returned without this filter, which was more than 10 brief description of its main objective. Since we are inter-
thousand ones. ested in studies regarding the recognition of depression in
social networks, we introduce the social network and the
3.2 Selection criteria information type used by the selected studies. Finally, we
address the considered sentiments, emotions, and other
After a systematic reading of the titles and abstracts of the disorders.
studies, the inclusion criterion adopted was the selection of
only primary studies that infer depression in social networks 4.1 Overview of the studies
through the analysis of feelings and recognition of emotions.
This only criterion guided the identification of studies that Figure 2 depicts the proportion of studies returned per digi-
could provide direct evidence on the research questions. tal database, according to the search string. A total of 1647
Studies that (1) do not deal with computational aspects, (2) studies were obtained, most of which from Springer Link
do not characterize a primary study, (3) do not propose a (36.1%), Scopus (24.1%) and ISI Web of Science (16.3%),
computational approach or method that infers depressive followed by Taylor Francis Online (3.7%), El Compendex
(3.5%), IEEE Digital Library (1.5%) and ACM Digital
Library (0.7%).
Figure 3 shows the PRISMA diagram (Liberati et al.
Available at parsif.al. 2009), which covers all stages of the selection process,

A review on recognizing depression in social networks: challenges and opportunities

Fig. 3  PRISMA flow of the

literature review

following the methodology presented in Sect. 3. On the left which is the second most adopted social network. Facebook
side (green boxes) is the number of selected studies, and has a large number of features available, and it also allows
on the right side (red boxes), we show the studies removed the analyst to verify user behavior by verifying users’ time-
in each step. Initially, 325 duplicate studies were removed. line information. Particularly for the behavior analysis area
Duplication occurs because the study can be indexed in more in the Psychology field, the provided information by Face-
than one digital base concomitantly. A total of 1235 stud- book allows a better understanding of depressive behavior,
ies were excluded from the 1332 selected ones, considering since it is possible to combine and merge different types of
the title and abstract reading step, which resulted in only complex data. On the other hand, the experience of relying
97 studies eligible for a full reading. Other 71 studies that on Facebook can be challenging given that the corporation
did not meet the inclusion criteria during the full reading is continuously changing its privacy policies, which often
phase were excluded from the 97 initially eligible ones, and requires periodic changes in the applications, making it
only 26 primary studies were included in the qualitative harder to maintain an online solution for a long time with-
synthesis. out proper updates.
We examined the studies towards answering the inclusion The works (Park et al. 2013; Leiva and Freire 2017; Park
and exclusion criteria (described in Sect. 3.2). They were and Conway 2018) considered Reddit, which is a worldwide
annually distributed as follows: 5 in 2013, 1 in 2014, 2 in social network with communities of depression, anxiety,
2015, 1 in 2016, 15 in 2017, and 2 in 2018. Table 1 shows stress, and happiness. It enables studies focused on com-
the authors and the objective of each study. Table 2 sum- munities with specific themes, and its great advantage is that
marizes the social networks, the type of media explored, communities (called subreddits) use slightly more formal
the methods employed, the sentiments, emotions, and other and structured English sentences as a way of communica-
disorders approached in each study considered in the review. tion. Such sentences ensures the development of scientific
We discuss each of these aspects in the next subsections. studies in the area of text analysis and natural language pro-
cessing, since the pre-processing of the content becomes less
4.2 Social networks used in the studies tedious. Furthermore, the official Reddit API data collection
is made easier compared to other social networks.
Figure 4 shows the most used social networks detected in
the studies considered in our review. Twitter is the most
explored social network. Studies that rely only on Twitter
Blogs / Forum
as a data source are (De Choudhury et al. 2013a, b; Wang Facebook
Social Networks

et al. 2013; Birjali et al. 2017; Guangyao Shen 2017; Vedula Instagram
and Parthasarathy 2017; Lachmar et al. 2017). In their work, Live Journal
Hassan et al. (2017) combined Twitter with another local Reddit
social network, and Jung et al. (2017) built their dataset Twitter
combining data from Twitter, Blogs and Forum, and another Other
local social network. 0 1 2 3 4 5 6 7 8 9 10
Seabrook et al. (2018) explore data from Twitter and Number of Studies
Facebook, and the studies (Park et al. 2013; Wee et al. 2017;
Polignano et al. 2017; Ophir et al. 2017) use only Facebook, Fig. 4  Social networks approached in the studies

F. T. Giuntini et al.

Table 1  Overview of the selected studies

ID Authors Objective

1 De Choudhury et al. (2013a) Use of SVM classifier for checking Twitter posts that indicate depression and proposal a of metrics that
measures the index of depression
2 Xu et al. (2013) Proposal and evaluation of a new theory on the contagion of depression in social networks
3 Wang et al. (2013) Application of data mining methods for the detection of depressed users in social networking services
4 De Choudhury et al. (2013b) Exploration of the potential use of social media for the detection and diagnosis of depressive disorders
5 Park et al. (2013) Development of a Web application that identifies features related to depressive symptoms on Facebook
6 Chomutare (2014) Evaluation of text classification methods to identify patients with risk of depression
7 Semenov et al. (2015) Investigation of methods and metrics published regarding the analysis of a community of depression in
Russian social network VKontakte
8 Karmen et al. (2015) Development of a method that detects symptoms of depression in free text
9 Tung and Lu (2016) Investigation of text mining methods that analyze and predict depression trends in web postings
10 Park and Conway (2017) Investigation of longitudinal changes in psychological states, which manifest themselves through linguis-
tic changes in members of a community of depression
11 Birjali et al. (2017) Search for feelings of depression in Twitter users’ activities for the estimation of their depressive tenden-
12 Wee et al. (2017) Demonstration of the interaction between depression and personality based on behavioral data from
Facebook users
13 Nguyen et al. (2017) Exploration of the textual evidence of online communities interested in depression
14 Reece and Danforth (2017) Application of machine learning tools for a successfully identification of markers of depression in Insta-
gram photos
15 Vedula and Parthasarathy (2017) Observational study for the understanding of interactions between clinically depressed users and their
ego-network, in contrast to a differential control group of normal users and their ego-network
16 Guangyao Shen (2017) Timely depression detection via harvesting of social media data
17 Leiva and Freire (2017) Analyses of messages that a user posts online during a time period and detect the risk of depression
18 Hassan et al. (2017) Detection of a person’s level through the extraction of emotions from the text
19 Fatima et al. (2017) Use of user-generated content for the identification of depression and further characterization of its
degree of severity
20 Polignano et al. (2017) Description of an architecture model that identifies some warning scenarios of blue feelings in Facebook
21 Ophir et al. (2017) Comparison of the traditional offline clinical picture of depression with its online manifestations, and
exploration of unique features of online depression that are less dominant offline
22 Lachmar et al. (2017) Examination of the public discourse of the trending hashtag #MyDepressionLooksLike towards a closely
at how users talk about their depressive symptoms on Twitter
23 Cheng et al. (2017) Exploration computerized language analysis methods can assess one’s suicide risk and emotional distress
in the Chinese social media
24 Jung et al. (2017) Refining of an adolescent depression ontology and terminology as framework for analyses of social media
data and evaluation of description logic between classes and the applicability of this ontology to senti-
ment analysis
25 Seabrook et al. (2018) Report of associations between depression severity and variability and instability in emotion word expres-
sion on Facebook and Twitter across status updates
26 Park and Conway (2018) Investigation written communication challenges manifest in online mental health communities focusing
on depression, bipolar disorder, and schizophrenia

In Reece and Danforth (2017), the authors conducted a platform, to overcome the limitations of access. At MTurk,
study with the photo-sharing social networking Instagram, registered users get paid to answer surveys of their interest.
which poses some bureaucracies for the access to users’ con- The remaining studies were limited to the use online
tent. According to Statista (2018), Instagram gains approxi- virtual communities little-known worldwide, or of a more
mately 1 billion active users worldwide every month. The public restricted social network, such as the Live Journal
work conducted by Reece and Danforth (2017) needed the Platform,3 which was used in (Nguyen et al. 2017; Fatima
help of volunteers to answer a survey and share their data
using Amazon’s Mechanical Turk (MTurk), word cloud
The Live Journal Platform: livejournal.com.

A review on recognizing depression in social networks: challenges and opportunities

Fig. 5  Type of information

available in the social media
and used in the studies

et al. 2017); the Psycho-Babble Grief,4 used in Karmen users on Twitter, and compared their solution with Naive
et al. (2015); and VKontakte. We,5 a Russian network used Bayes classifier and Multiple Social Networking Learning
in (Semenov et al. 2015). Live Journal Platform is similar (MSNL), which was designed by Song et al. (2015).
to Reddit, since it has non-specific content. Psycho-Bable is User log information, i.e., the record of users’ activities
focused on the discussion of psychological problems, pro- in the network was used by studies 5 (Park et al. 2013),
viding mutual support, and VKontakte.We is similar to Face- 7 (Semenov et al. 2015), and 12 (Wee et al. 2017). In the
book regarding its supported resources and layout structure. work of Park et al. (2013), the authors used a Facebook
More generally, Tung and Lu (2016) explored web post- Web Application to collect data from a survey with Face-
ings as a whole, such as blogs and sites, and Chomutare book users, based on the self-report questionnaires CES-D
(2014) compared postings from a local specific diabetes (Steinfield et al. 2007) and BDI (Beck et al. 1961). They also
community. Xu et al. (2013); Cheng et al. (2017) did not performed a Face-to-Face interview and applied correlation
specify the social network used in their study. metrics to verify differences between the manifestation of
depression reported by users online and personally. Semenov
4.3 Social media and techniques employed et al. (2015) considered user attributes as age, gender, num-
in the studies ber of friends, and structural properties of their egocentric
networks and conducted a descriptive statistical analysis.
The recognition of mood depressive disorders taking advan- Wee et al. (2017) applied quantitative analysis to understand
tage of the analysis of social media content has grown in the behavior of users and obtain evidence or indications of
recent years. Figure 5 shows a summary of the use of social depressive behavior. The authors relayed on logging user
media in the selected primary studies. “Study ID” (repre- activity and texts, images, and emoticons. Although all
sented by the horizontal axis) refers to the number of each media types were used, no specific algorithm or technique
study, which corresponds to the ones reported in Table 1. was proposed by them, only quantitative statistical analysis.
The information types used in the studies and extracted from As mentioned before, Wee et al. (2017) used images in
social media are represented by different symbols, which their work, but only relative to the frequency of the use of
denote text, log files, images, and emoticons in the analysis. images in users’ posts, not as an analysis feature. On the
Text was the only media considered by all studies, except other hand, study number 14 (Reece and Danforth 2017)
for studies number 5 (Park et al. 2013) and 14 (Reece and explored the use of only images in the process of detect-
Danforth 2017). One of the reasons for this fact is that Twit- ing depression in Instagram posts. The authors obtained
ter is the main social network used, appearing in ten studies, 43,950 photos from Instagram users and used color analy-
which is mainly based on text information. sis, metadata components, and algorithmic face detection.
In studies 16 (Guangyao Shen 2017) and 20 (Polignano The authors used Face detection to recognize a human face
et al. 2017), the authors used emoticons to improve and in photographs and extracted Hue, Saturation, and Value
provide more meaning to the analysis of feelings regard- (HSV) features to find pixel-level averages. Their results
ing the text. The work of (Polignano et al. 2017) applied show that depressive users’ posts had HSV values shifted
their studies on the Facebook network using NRC Word- compared with photos posted by healthy individuals.
Emotion Association Lexicon, WordNet-Affect, as well as Concomitantly to the text, in study 1 (De Choudhury et al.
Naive Bayes and Multilayer Perceptron classifiers. More 2013a) analyzed users’ log information, while studies num-
recently, Guangyao Shen (2017) proposed a new multimodal ber 5 and 7 (Park et al. 2013; Semenov et al. 2015) analyzed
depressive dictionary learning model to detect depressed users’ profiles. De Choudhury et al. (2013a) analyzed users’
frequency records in specific activities, including posting
frequency, message sending, among others. In (Park et al.
The Psycho-Babble Grief: dr-bob.org/babble/grief/. 2013; Semenov et al. 2015), the authors analyzed the users’
The Russian network VKontakte.We: vk.com. profiles in order to verify demographic features, such as age,

F. T. Giuntini et al.

Fig. 6  Word Cloud representation of a the techniques employed in the studies and b additional sentiments and mental illnesses recognized in the
studies, besides depression

number of joined communities, gender, preferences and user regression model called Lasso, and Birjali et al. (2017)
behavior in the network. applied several traditional machine learning models such as
The considered studies also employed specific metrics or Instance-Based Learning (IBL), Sequential Minimal Opti-
techniques for each media type. Figure 6a presents a word mization (SMO), J48 and CART for classification. Karmen
cloud representation, showing the most cited words in the et al. (2015) used the Stanford CoreNLP (Manning et al.
studies. The most recurrent word is “LIWC”, which stands 2014) for the lexical sentiment analysis and vocabulary
for Linguistic Inquiry and Word Count, a text analysis soft- frequency control to implement a grammar-oriented pro-
ware that computes the degree of use for different words’ cess. Semenov et al. (2015) used a binary logistic regres-
categories in a wide variety of texts (Pennebaker et al. 2001). sion and a clustering coefficient and Tung and Lu (2016)
The core of LIWC software is a lexical resource, available explored ontology, sentiment analysis, terminology, FAQs,
in multiple languages such as English and Portuguese (Bal- Multiple nominal logistic regression, decision trees, and
age Filho et al. 2013). The high frequency of LIWC mainly association rules to determine negative terms. Similarly,
occurred because the text was the most used media in the Chomutare (2014) used bag-of-words and multiple tra-
considered studies, and LIWC was used in several studies ditional classifiers, among Naïve Bayes, Support Vector
(De Choudhury et al. 2013a, b; Park and Conway 2017; Machine (SVM) and Decision Trees (DT), to find nega-
Nguyen et al. 2017; Fatima et al. 2017). For instance, tive terms. Other examples of studies that worked with text
Guangyao Shen (2017) use LIWC, EMOJI, and the Naïve are: (Vedula and Parthasarathy 2017), where the authors
Bayes classifier to determine the feelings of postings on mul- employed ego-networks metrics; (Leiva and Freire 2017)
tiple social networks. where the authors used VADER Sentiment Analysis (Hutto
Methodologies for text analysis and processing have also and Gilbert 2014; Hassan et al. 2017), which employed Sup-
been employed. One example is “SentiStrenth”, a polarity port Vector Machine (SVM), Naïve Bayes (NB) e Maximum
verification system used in (Xu et al. 2013), alongside an Entropy (ME) algorithms.
econometric model. Wang et al. (2013) used several algo- Park et al. (2013) considered users’ profiles and text
rithms such as feature extraction methods, Bayesian net- information, and apply the Spearman correlation and the
works, decision trees and tables of rules. Additionally, in Mann–Whitney U test by means of IBM SPSS6 software.
De Choudhury et al. (2013b) they combined the use of LIWC In (Wee et al. 2017) the authors used statistic analysis and a
with the lexical analysis package ANEW lexicon (Bradley Poisson regression method to analyze users’ log information.
and Lang 1999) and the Egocentric Social Graph, aimed
at verifying activation and dominance. According to the
authors, “activation” refers to the physical intensity degree
of emotion (for instance, “terrified” is higher in the activa-
tion than “scared”). Plus, “dominance” refers to the degree
of control in an emotion (for instance, “anger” is dominant,
while “fear” is submissive).
Besides the studies focused on the sentiment analysis 6
IBM SPSS: ibm.com/br-pt/marketplace/spss-sta-
with LIWC, Nguyen et al. (2017) proposed a regularized tistics/.

A review on recognizing depression in social networks: challenges and opportunities

4.4 Sentiments, emotions and other explored fear, disgust, and surprise. Ophir et al. (2017) analyzed the
disorders valence of feelings, but also found anxiety and substance
abuse. Lachmar et al. (2017) found dysfunctional thoughts,
The selected studies rely on sentiment analysis and emotion lifestyle challenges, social struggles, hiding behind a mask,
recognition to identify depressive disorders. Thus, each work apathy, and sadness, suicidal thoughts, behaviors, and seek-
starts from a specific set of these feelings and emotions, ing relief. Cheng et al. (2017) recognized suicide risk,
aimed at indicating a depressive disorder. The studies also anxiety, and stress. Jung et al. (2017) assessed self-harm,
explore other mental disorders and recognize positive senti- anxiety, change of appetite, sleep, hypersomnia, irritability,
ments, mainly to allow the comparison to other techniques. self-esteem, lowered libido, pain, bullying, stress, person-
Figure 6b shows a word cloud of the sentiments and mental ality, academic stresses, and loneliness. Finally, Park and
illnesses recognized in the primary studies, besides depres- Conway (2018) recognized bipolarity, schizophrenia, loseit,
sion. It is possible to realize that the feelings and mental and bodybuilding.
illnesses found through the analysis of social media are The feelings, emotions, or disorders encountered by the
mostly described as symptoms, specifiers, or predecessors presented studies and shown in Fig. 6b are very similar to
of depression, as described in Sect. 2. those indicated as part of a depressive disorder discussed
De Choudhury et al. (2013a) found happiness, sadness, in Sect. 2.
and hate as the main emotions, and they were capable of
recognizing discomfort, pain, hope, and concern. De Choud- 4.5 Identified patterns
hury et al. (2013b) found positive affect (PA), negative affect
(NA), activation, and dominance emotions. Table 2 presents a summary of the different aspects consid-
Tung and Lu (2016) highlighted anorexia, concentration ered in this paper, namely the social networks, media, meth-
problems, difficulties of memorization, sleep disorders, anxi- ods, sentiments, emotions, and other disorders identified in
ety, and suicide-related terms in Web postings. Birjali et al. the primary studies. By observing the distribution of items
(2017) found symptoms and traces of anorexia, cyberbul- in the aspects being analyzed in this work, we can identify
lying, fear, heartache, insults, loneliness, and punishment. the recurrence of patterns among the studies. Table 3 shows
Wee et al. (2017) found Neuroticism, which consists of a the most frequent patterns.
characteristic of emotional instability when a person tends to We identified Twitter as the most employed social net-
experience negative emotions such as anxiety, anger, or even work, appearing in ten studies. In all cases, the type of media
depression. Nguyen et al. (2017) found terms that are related used was text, and in one of the studies the authors com-
to bipolarity disorders, self-mutilation, distress, and suicide. bined text with emoticons. Most efforts employed lexical
In works (Xu et al. 2013; Wang et al. 2013; Park et al. approaches to explore the information gathered from Twitter.
2013; Park and Conway 2017; Reece and Danforth 2017; This fact is expected since the information consists of textual
Guangyao Shen 2017; Leiva and Freire 2017; Hassan et al. data. Also, three of the studies explored Machine Learning
2017; Vedula and Parthasarathy 2017; Seabrook et al. 2018), (ML) algorithms to analyze the data.
the authors only considered the polarity of the sentiment or When considering Facebook, the studies used text (four
emotion. In (Park et al. 2013; Chomutare 2014; Karmen times), image, log information, and emoticon (two times
et al. 2015; Semenov et al. 2015) the authors do not specify each) as the available media. They explored statistical tech-
sentiments, emotions, and other disorders. Park et al. (2013) niques in two works, mainly for dealing with log informa-
had the support of specialists to manually relate the answers tion; lexical approaches in other two works for the textual
of a questionnaire to the questions of the Diagnostic and information; and finally, ML approaches in the other three
Statistical Manual of Mental Disorders (DSM-IV) (Ameri- works. The unique effort that employed Instagram, authors
can Psychiatric Association and others 2013). Chomutare considered the images as the media. The three works with
(2014) analyzed only positive and negative affects, activa- Reddit, authors focused only on the textual media, employ-
tion and dominance, and Karmen et al. (2015) started from ing lexical and ML approaches to explore the data. Finally,
the principle of finding the recurrence of synonyms of the eleven works relied on Live Journal, Blogs/Forum, and other
word depression in Psycho-Babble postings. social media in their work. All of them used text, in which
Vedula and Parthasarathy (2017) found a correlation one of them also explored Log information combined with
between insomnia and depression. Fatima et al. (2017) text to detect valence. There is no pattern on the methods
found aspects of rejection, frustration, aggravation, as well employed to explore the information of these eleven studies.
as being thirsty, scared, listless, lazy, indifferent, sympa- They counted on a mixture of ML and lexical approaches,
thetic, touched, surprised, thoughtful, optimistic, and loved. ontologies, among others.
Polignano et al. (2017) recognized the Ekman’s universal The “valence” of sentiments and emotions was the focus
basic emotions (Ekman 1993) as happiness, anger, sadness, of 20 studies. Most of them were based on the analysis of the

Table 2  Summary of the studies in relation to each aspect analyzed in this work
ID Social networks Media Methods Sentiments and emotions Other disorders

1 Twitter Text SVM, LIWC Valence –
2 Other Text SentiStrength, Econometric models Valence –
3 Twitter Text Man-made rules, feature extraction, Pretty, love, like, happy, good, ugly, sad,
Bayesian Networks, J48; Rules Deci- depressed, unhappy, bad
sion Table
4 Twitter Text Egocentric Social Graph, LIWC, ANEW Positive and negative affect, activation, –
5 Facebook Image, log Spearman and Mann–Whitney U test, Valence –
CES-D and BDI survey
6 Other Text Bag-of-words; Bigrams, NB, SVM, DT Valence Distress, disturbed sleep, low self-confi-
dence, poor concentration or indecisive-
ness, poor or increased appetite, suicidal,
agitation, guilt
7 Other Log, text Egocentric networks and binary logistic Valence –
8 Blogs / Forum Text Stanford CoreNLP Valence –
9 Blogs / Forum Text NLP processing, extraction of negative Boredom disgust loathing Memory difficulties, loss of appetite and
features energy, sluggishness, mental fatigue,
overeating, agitation fainting, forgetful-
ness, anorexia, sleep disorder, irritability
10 Reddit Text LIWC Valence Anxiety
11 Twitter Text IBL, SMO, J48 and CART​ Valence
12 Facebook Emoticon, image, log, text User frequency analysis, user activity Valence Anguish, emotional instability
13 LiveJournal Text LIWC Valence Bipolar disorder, self-harm, grief/bereave-
ment, suicide
14 Instagram Image Image analysis, Bayesian estimation, RF, Valence –
valencia filter
15 Other, Twitter Text Linguistic Content Analysis, Gradient Valence Insomnia
Boosted Decision Trees
16 Twitter Emoticon, text LIWC, EMOJI, NB, MSNL Valence –
17 Reddit Text Logistic Regression, SVM, KNN, RF Valence –
18 Twitter, Other Text SVM, NB, Maximum Entropy Valence –
19 LiveJournal Text LIWC, Decision Forests, ANEW Rejected, frustrated, aggravation, thirsty, Anxiety
scared listless, lazy, indifferent, sym-
pathetic, touched, surprise, thoughtful,
optimistic, loved
20 Facebook Emoticon, text NRC Word-Emotion Association Lexi- Happiness, anger, sadness, fear, disgust, –
con, WordNet-Affect, NB, MLP surprise
21 Facebook Text Multiple Regression Valence Anxiety, substance abuse
F. T. Giuntini et al.
A review on recognizing depression in social networks: challenges and opportunities

Table 3  Patterns identified in the different aspects considered regard-

lenges, social struggles, apathy and sad-

Bipolar, schizophrenia, loseit, bodybuild-

ing the primary studies

hypersomnia, irritability, self-esteem,

ness, suicidal thoughts, seeking relief

lowered libido, pain, bullying, stress,

Self-harm, anxiety, change of appetite,
Dysfunctional thoughts, lifestyle chal- Antecedent Consequent

academic stresses, loneliness

Twitter [10] ⇒ Text [10]
Suicide risk, anxiety, stress Text + Emoticon [1]
Lexical approaches [7]
ML approaches [3]
Facebook [7] ⇒ Text [4]
Other disorders

Image [2]
Log [2]

Emoticon [2]

Statistical Approaches [2]

Lexical approaches [2]
ML approaches [3]
Instagram [1] ⇒ Image [1]
Reddit [3] ⇒ Text [3]
Lexical approaches [2]
Sentiments and emotions

ML approaches [2]
Anger, sadness, fear

𝙻𝚒𝚟𝚎𝙹𝚘𝚞𝚛𝚗𝚊𝚕+ Text [11]

𝙾𝚝𝚑𝚎𝚛𝚜 ⇒
Log + Text [1]




Diverse methods (ML, lexical

approaches, ontology, etc.)
Multiple Nominal Logistic Regression,

Valence [20] ⇒ Text [17]

LIWC, MoodPrism, Spearman rho, post

Linear Least Squares Regression, Lexi-

Qualitative content analysis, NCapture

hoc comparison, Mann–Whitney U

Image [3]
Ontology, sentiment analysis, FAQs,
Survey, LIWC, Logistic Regression,

Log [3]
Emoticon [2]
Image + Log [2]
DT, Association Rules

Text + Log [2]

Text + Emoticon [2]
Text + Image + Log + Emoticon
con diversity
tests, SPSS


LIWC [8] ⇒ Text [8]


Emoticon + Text [1]

Valence [6]
Twitter [4]
LiveJournal [2]
Facebook + Twitter [1]
Reddit [1]
Other (social media) [1]




24 Blogs/ Forum, Other, Twitter Text

text (17), with image, log information, and emoticon appear-

ing in three other studies. The studies combined image with
log, text with log, and text with emotion, two times each.
Also, one study combined all available information, text,
25 Facebook, Twitter

image, log, and emoticon.

ID Social networks
Table 2  (continued)

LIWC was the most employed method, appearing in eight

studies, in which all of them used text, and one of them
22 Twitter

26 Reddit

combined emoticon with textual information. In six of the

23 Other

studies, the authors focused on detecting valence. LIWC was

mainly explored to analyze data from Twitter (four works)

F. T. Giuntini et al.

Data is organized in a
User or social network: nonrelational database
provides access credentials

Temporal Extraction of
Behavior Patterns

Access API: Collects and

Provides Data A B
Emotional Feature Extraction

Fig. 7  High-level pipeline for data acquisition from social networks, data storing and organization, anonymization, feature extraction, and finally,
the recognition of depressive behavior patterns considering multi-modality and temporal approaches

and Live Journal (2 works). Facebook and Twitter together, Disorders (DSM)(American Psychiatric Association
Reddit and other social media were the focus of one study and others 2013).
each. 6 The studies did not provide the computational solution
online and did not make it available for testing with
other users in real-time.
5 Challenges and opportunities
The aforementioned limitations leave open opportuni-
The main objective of this work is to present state-of-the-art ties for multimodal approaches to identify depressive disor-
studies regarding the identification of depressive disorders ders in social network users automatically. The information
in social networks. We considered both the analysis of senti- regarding long-term user behavior is also relevant, and it
ments and emotions. Despite the advances in this field, the was not explored in the studies we selected in this review.
selected studies allowed us to identify existing problems and Figure 7 shows a high-level pipeline we envision for
opportunities for research. In particular, we observed the future approaches focused on the recognition of depression
following limitations: in social networks. Accordingly, the analysis of the data
obtained by particular users along time could lead to accu-
1. Most studies focus only on the analysis of textual infor- rate results, regarding the users’ behavior and the associated
mation from postings, log activities, or demographic temporal information. For instance, it could allow us to take
characteristics of the users’ profile. advantage of using frequent patterns mining algorithms to
2. Most studies did not properly explore the associated extract relevant information, also relating them to the Diag-
media, such as images, videos, and emojis. nostic and Statistical Manual of Mental Disorders (DSM-V)
3. Most studies did not consider the users’ context, i.e. (American Psychiatric Association and others 2013), and
users’ history, or look for changes in the users’ behavior allowing real-time analysis.
pattern along time. In fact, the temporal information is Given that the gaps pointed out make it challenging to
often underused. analyze human depressive behavior, the proposed pipe-
4. Regarding postings, the studies did not consider the line (Fig. 7) goes from collecting and storing data, through
interaction and reactions from user friends in the social multi-modal extraction of emotional characteristics on
networks, like comments and other demonstrations of social media, to temporal recognition of frequent patterns
positivity/negativity. of depressive behavior.
5 The studies manually relate the available information Another gap that we were able to identify in the state-of-
to the Diagnostic and Statistical Manual of Mental the-art studies is the lack of analysis of the interaction of

A review on recognizing depression in social networks: challenges and opportunities

I suffer with a major physical health issue and that
constantly affects my mental health. Similarly, my
emotional health is just as bad and it deteriorates my
motivation to do anything.
Text Analysis


Post predominant emotions


Fig. 8  Multimodal extraction of predominant emotional features of the post

friends in the social network. The described studies focus can access any content but with limited data requests and
on the user data only, without considering the information downloads.
provided by comments, likes, and other interactions of its APIs commonly return a single JSON format file. It
connections in the network. Also, we hypothesize that the is interesting to anonymize and store this data in a well-
combination of different types of information, such as text, structured way, although this file can be manipulated as the
images, videos, and emoticons, has the potential of improv- social network returns it. Furthermore, although Generation
ing the overall analysis of the users’ behavior. Accordingly, and Noise Addition techniques are the most widely used to
such information could lead to more accurate results regard- ensure privacy, in our context, we could lose information
ing the detection of depressive disorders in social networks. that is relevant to pattern recognition, given that user context
The most popular social networks commonly have an is an important parameter. A straightforward solution would
Application Programming Interface (API) for sharing data be to replace the unique user identifier (ID) with a unique
with applications in various contexts. Examples are Face- hash and randomize them. A more sophisticated solution to
book Graph API,7 Instagram Basic Display,8 Twitter API9 ensure data privacy would be the use of Blockchain tech-
and PRAW—Python Reddit API Wrapper.10 Each network nologies (Zhang et al. 2019). With Blockchain, it is possible
has a different privacy and data sharing policy, but it is usu- to keep track of the researcher who manipulated the data.
ally required to register an application on the platform. Face- In the context of sentiment analysis and multimedia infor-
book and other branded products (as Instagram) can only mation, the use of a non-relational database such as Mon-
collect data with users’ permission. The user accesses the goDB11 may bring better computational performance, reduce
external platform, logs in, and decides which data they agree costs due to the use of efficient and staggered architectures
to share. However, in the latest privacy policy, Facebook rather than a monolithic one. Also, non relacional databases
does not approve applications that are intended for sentiment allow developers to execute queries without the need of navi-
analysis, making it disadvantageous to the use of Facebook gating through the SQL data architecture. Besides that, mul-
in scientific research. In Twitter and PRAW, the application timedia recovery can be quick and easy.
After storing and anonymizing the data, emotional fea-
tures can be extracted. As seen in Fig. 8, a social network
post may provide different types of media. A post contains
https​://devel​opers​.faceb​ook.com/docs/graph​-api. the user identifier (Username) and may contain text, an
10 11
https​://praw.readt​hedoc​s.io/en/lates​t/. https​://www.mongo​db.com/.

F. T. Giuntini et al.

image as well as inside information. In this case, insider will no longer be linked to the analysis of feelings and
information refers to the content that we may or may not emotions in social networking posts, which is currently not
have in a post, such as comments or reactions (emoticons). used in the medical practice, only in virtual environments.
Such insider information can significantly contribute to The main goal of future works will be the employment of
the process of extracting emotional features, as the user’s actual recognition of depressive behavior patterns, as well
network of friends can react to the post indicating the as the identification of social and cultural aspects related
polarity, emotion, or feeling involved. An example is the to risk factors in a non-virtual environment. In this way,
study (Giuntini et al. 2019), which evaluated the expression social networks and their media can give special meaning
of basic emotions Ekman (1993) through Facebook reac- and be used as objects in the area of Psychology. Finally,
tions. Thus, with the multimedia extraction of the different future studies may further contribute to the construction
media involved in each post, and if privileged information and evaluation of new public health policies.
is provided, this can be used to ensure the accuracy of the
emotional features.
Finally, emotional features can be analyzed over time
to extract and evaluate the evolution of frequent behavior 6 Conclusion
patterns. Therefore, approaches such as Association Rules
(e.g. Apriori and Fp-Growth algorithms), Classification This paper addressed a review of studies on the recogni-
(Instance-Based Learners (kNN), SVM, Decision Trees tion of depression in social networks. There are techniques
and Naïve Bayes), Clustering (e.g., k-Means and Expecta- that can automatically identify mental health disturbances,
tion-Maximization (EM) algorithms), Temporal Patterns considering indicators and symptoms of depressive dis-
of Motifs extracted from time series, and temporal patterns orders (American Psychiatric Association and others
(e.g. Allen’s Interval Algebra and LSTM) can be explored 2013). We focused on studies that automatically identify
and combined. However, it is worth noting that the most abnormal behavioral patterns in social networks. However,
significant difficulties of exploring temporality in social net- their performance is considered sub-optimal. Such stud-
works is the fact that there is no regularity in the frequency ies employ data mining techniques, sentiment analysis,
of users’ posts. and recognition of emotions approaches. Reliable models
Traditionally, mental health professionals have used clini- are very important since they can eventually provide early
cal examination techniques based on information provided detection of depressive mood disorders. This can be the
by self-report of emotional, behavioral, and cognitive dimen- basis to lead to fast interventions by physicians, thus pro-
sions (e.g., questionnaire, interviews, standard inventories, moting relevant public health solutions.
among others) for the diagnosis and follow-up of their inter- We investigated the state-of-the-art of studies on the
ventions. The thousands of data regularly posted in the social identification of depressive disorders in social networks,
media could be a valuable new source of personal informa- considering sentiment and emotion analysis, and observed
tion, that should bring new light to the analysis process. the following points in the selected studies:
The evaluation of the available information in big data
has shown to be an efficient and effective tool in the study 1. The most employed social media resources for the
of dimensions of human behavior, particularly for motiva- identification of depressive mood disorders are text,
tional-emotional ones. These data point to the promising followed by emoticons, log information, and images.
direction of the use of social networks as another instrument 2. The most employed social networks for the identifica-
in the identification process of depressive moods; it could tion of depressive mood disorders are, respectively
allow quicker and more accurate diagnosis and follow-up. Twitter, Facebook, Blogs and Forums, Reddit, Live
By working with user-related data, we deal with sensitive Journal and Instagram.
information. Provided the proper ethical approval, these new 3. The most employed techniques for the identification
sources of assessment could lead to effective diagnosis and and classification of tasks are classic off-the-shelf
the rapid identification of mood changes. Considering the classifiers, as Naive Bayes (NB), Decision Trees (DT),
implications of depression, the efficiency of timely infor- Instance-Based Learning (IBL), Multilayer Perceptron
mation and consequent decision support for mental health (MLP), and Support-Vector Machines (SVM). In many
professionals can be a crucial element in the prevention of studies, such approaches were combined with lexicons
suicide. as NRC Word-Emoticon Association Lexicon, Word-
The research roadmap indicates the strengthening of Net-Affect, Anew, and Linguistic Inquiry and Word
multidisciplinary research, which will hopefully allow, Count (LIWC) software was widely employed in the
shortly, substantial contributions to health and the com- selected studies for text analysis.
putational development of better algorithms. Such efforts

A review on recognizing depression in social networks: challenges and opportunities

