Mdscan - An Explainable Artificial Intelligence Artifact For Menta

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Association for Information Systems

AIS Electronic Library (AISeL)

ECIS 2023 Research-in-Progress Papers ECIS 2023 Proceedings

5-2-2023

MDSCAN: AN EXPLAINABLE ARTIFICIAL INTELLIGENCE


ARTIFACT FOR MENTAL HEALTH SCREENING
Salih Tutun
Washington University in St. Louis, [email protected]

Anol Bhattacherjee
University of South Florida, [email protected]

Kazim Topuz
The University of Tulsa, [email protected]

Ali Tosyali
Rochester Institute of Technology, [email protected]

Gorden Li
Bosch Center for Artificial Intelligence, [email protected]

Follow this and additional works at: https://aisel.aisnet.org/ecis2023_rip

Recommended Citation
Tutun, Salih; Bhattacherjee, Anol; Topuz, Kazim; Tosyali, Ali; and Li, Gorden, "MDSCAN: AN EXPLAINABLE
ARTIFICIAL INTELLIGENCE ARTIFACT FOR MENTAL HEALTH SCREENING" (2023). ECIS 2023 Research-in-
Progress Papers. 56.
https://aisel.aisnet.org/ecis2023_rip/56

This material is brought to you by the ECIS 2023 Proceedings at AIS Electronic Library (AISeL). It has been
accepted for inclusion in ECIS 2023 Research-in-Progress Papers by an authorized administrator of AIS Electronic
Library (AISeL). For more information, please contact [email protected].
MDSCAN: AN EXPLAINABLE ARTIFICIAL INTELLIGENCE
ARTIFACT FOR MENTAL HEALTH SCREENING

Research Paper

Salih Tutun, Washington University in St. Louis, St. Louis, Missouri 63130, USA,
[email protected]
Anol Bhattacherjee, University of South Florida, Tampa, Florida 33620, USA,
[email protected]
Kazim Topuz, The University of Tulsa, Tulsa, Oklahoma 74104, USA, kazim-
[email protected]
Ali Tosyali, Rochester Institute of Technology, Rochester, New York 14623, USA,
[email protected]
Gorden Li, Bosch Center for Artificial Intelligence, Sunnyvale, California 94085, USA,
[email protected]

Abstract
This paper presents a novel artifact called MDscan that can help mental health professionals quickly
screen a large number of patients for ten mental disorders. MDscan uses patient responses to the SCL-
90-R clinical questionnaire to create a full-color image, similar to radiological images, which identifies
which disorder or combination of disorders may afflict a patient, the severity of the disorder, and the
underlying logic of this prediction, using an explainable artificial intelligence (XAI) approach. While
prior artificial intelligence (AI) tools have seen limited acceptance in clinical practice because of the
lack of transparency and interpretability in their "black box" models, the XAI approach used in MDscan
is a "white box" model that elaborates which patient feature contributes to the predicted outcome and
to what extent. Using patient data from a mental health clinic, we demonstrate that MDscan outperforms
current (expert-based) clinical practice by an average of 20%.

Keywords: Mental health; Artificial intelligence; Deep learning; Explainable artificial intelligence.

1 Introduction
The world is facing a mental health crisis of unprecedented scale, with an estimated 970 million people,
or nearly one in eight people globally, having lived with mental disorders in the year 2019, a 28%
increase over the previous year (World Health Organization, 2022). This global crisis has worsened
significantly with recent pandemics (e.g., COVID-19) and epidemics (e.g., opioid overdose) (Johnson
et al., 2021). In the United States, one in five adults experience mental disorders today (National Alliance
on Mental Illness, 2022). A mental disorder is defined as a clinically significant disturbance in a person's
cognition, emotional regulation, or behavior that causes significant distress, impairment in functioning,
and self-harm tendencies (World Health Organization, 2022). This includes a wide range of disorders
such as anxiety, depression, obsessive-compulsive disorder, bipolar disorder, schizophrenia,
psychosomatic disorder, autism, paranoia, and post-traumatic stress disorder, each with its unique

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 1


MDscan for Mental Health Screening

configuration of symptoms and requiring a different treatment protocol. According to a study by the
Lancet Commission, the total cost of mental health disorders, in terms of lost productivity, disability,
social welfare, and law and order spending, will exceed $16 trillion worldwide between 2016 and 2030
(The Lancet, 2018).
Despite the growing demand for mental health services, there is an acute shortage of mental health
experts, with some countries reporting as little as one psychiatrist for every 100,000 people (World
Health Organization, 2018). This supply shortage results in long wait times for mental health
appointments, delayed diagnosis and treatment, and continued suffering, even in advanced nations
(Wainberg et al., 2017). Many mental health clinics also lack the tools and resources to screen large
volumes of patients (Kilbourne et al., 2018). Psychiatrists and counselors have therefore called for
innovative technological tools to help augment their capacity to screen and treat patients more efficiently
and effectively (Thieme et al., 2020). In 2019, Health Education England issued a report suggesting
using artificial intelligence (AI) to help augment mental health experts' capacity to screen mental health
patients (Foley and Woollard, 2019).
Modern web technologies make it feasible to administer clinical mental health screening instruments
privately and securely on a large scale and machine learning (ML) techniques can help screen patients
for common mental disorders. However, most ML models are "black box" models that provide little to
no explanation of the rationale behind their prediction. Without explanations, experts cannot trust these
predictions for clinical use, where patients' lives and well-being are often at stake. Explainable artificial
intelligence (XAI), referring to a set of AI/ML tools and techniques that can help human users
understand the reasons for predictions made by AI models (Murdoch et al., 2019), can bring
transparency, explainability, and trust to these black-box models, and make them acceptable for use in
professional settings.
In this paper, we propose a novel XAI artifact called MDscan to help mental health experts screen
patients efficiently and accurately using a "white box" ML approach that explains the reasons for a
diagnosis. MDscan converts tabular patient data from the SCL-90-R mental health questionnaire into
interpretable images that experts can use to screen, classify, and prioritize patients at scale for ten
different mental disorders. Using a computational XAI approach, MDscan generates a full-color visual
representation of ten mental disorders, similar to that of radiological or neurological scans, which were
previously unavailable in mental health practice. We validate our proposed artifact using a field
experiment of 500 patients from a mental health clinic.

2 Related Literature

2.1 The Practice of Mental Health Screening


The two clinical guidelines used by mental health experts to classify and diagnose mental disorders are
the Diagnostic and Statistical Manual of Mental Disorders (DSM) (American Psychiatric Association,
2022) and the International Classification of Diseases (ICD) (World Health Organization, 2018). First
published in 1952, DSM is currently in its fifth revision (DSM-5-TR) and is specific to mental disorders.
In contrast, ICD was first proposed in 1860, is currently in its eleventh version (ICD-11), and has a
broader scope, covering overall health, with Chapter 5 of the ICD relating specifically to mental
disorders. Moreover, DSM focuses more on quantitative diagnostic criteria, while ICD descriptions of
psychiatric disorders tend to be more qualitative and, therefore, more reliant on clinician judgment
(Tyrer, 2014).
Mental health experts have used DSM to develop a comprehensive psychological assessment instrument
called Symptom Checklist 90 (SCL-90-R) to screen for ten common mental disorders (Schmitz et al.
2000). This instrument consists of 90 questions, which asks patients to self-rate their mental state over
the past week, on five-point (0-4) scales, on issues such as "feeling lonely," "feeling nervousness or
shakiness," and "thoughts of ending your life." Patients’ self-ratings on these items are combined to
compute aggregate scores for ten mental disorders and their severity (functional or non-symptomatic,

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 2


MDscan for Mental Health Screening

moderately symptomatic, and severely symptomatic) based on the threshold values of these scores
(Schmitz et al., 2000). However, interpreting SCL-90-R questionnaire data is complex, given the
overlapping nature of symptoms across many mental disorders and the possibility of multiple concurrent
disorders, and is a job usually relegated to trained mental health experts.

2.2 Research on Mental Health Screening


In recent years, there have been several attempts to use artificial intelligence (AI) and machine learning
(ML) methods to automate the screening and diagnosis of specific mental disorders. This research can
be grouped into two broad categories. The first group employed non-clinical data (e.g., Twitter) to
analyze a specific mental disorder like depression. For example, Syarif et al. (2019) used computational
linguistics and deep learning algorithms on Twitter data to model three levels of depression (high,
moderate, low), using self-declared depression diagnosis as the “ground truth” to evaluate their
prediction accuracy. Similarly, Ware et al. (2020) used smartphone usage data to implement a family of
ML models to predict depressive symptoms among college students.
The second group employed clinical data for diagnosing specific mental disorders. Using resting-state
fMRI data from 163 Japanese patients suffering from major depressive disorders and 195 healthy
subjects (control group), Nakano et al. (2020) used ensemble learning (random forest and AdaBoost),
support vector machine, and sparse logistic regression models to classify patients as depressed versus
healthy. Schulte-Rüther et al. (2021) used a random forest model on a subsample of previously labelled
Autism Diagnostic Observation Schedule data (n=1,262) from a German data repository ASD-Net to
classify patients as attention deficit hyperactivity disorder, conduct disorder, anxiety disorders, and a
“other” category. Some of these diagnoses were partially overlapping. Mellem et al. (2021) applied two
analytical approaches (Personalized Advantage Index and Bayesian Rule Lists) to randomized, placebo-
controlled clinical trial data to identify a paliperidone-indicated subgroup of schizophrenia patients who
demonstrate a larger treatment effect compared to the placebo group.
Despite high prediction accuracy, these approaches have not been accepted by mental health experts for
clinical practice because of three reasons. First, the majority of the studies examined a single disorder,
such as depression, limiting their utility in clinical settings where a patient may suffer from multiple
concurrent disorders, with potentially overlapping symptoms. Second, many studies used Twitter or
smartphone data for predicting mental disorders, which is not an accepted clinical protocol for mental
disorder screening. Third, these approaches are typically “black box” technologies that provide little
visibility into the underlying process or logic of the prediction process and are, therefore, not trusted by
mental health experts. Lacking transparency, it is unlikely that mental health experts will ever adopt
black-box solutions in a professional setting where their diagnoses have severe implications for their
patients’ lives and mental health. It is, therefore, imperative to employ (1) established clinical protocols
(e.g., SCL-90-R) and (2) explainable AI (XAI) approaches to improve experts’ trust and acceptance of
AI-based solutions.

2.3 Explainable AI
XAI refers to a set of ML tools and techniques that can help human users understand the reasons behind
the predictions made by AI models (Murdoch et al., 2019). While traditional AI models are typically
“black box” models where even the designers of these models cannot explain why or how their models
arrive at a specific decision, XAI models are considered to be “white box” models that can explain the
key features that contributed to a prediction and the respective weights of those features toward the
prediction (Loyola-Gonzalez, 2019). Such “white box” models give decision-makers the underlying
basis for making decisions and help build transparency and trust in AI models.
The last few years have seen significant progress in XAI models, with the development of many new
packages and methods, such as Local Interpretable Model-agnostic Explanations (LIME), SHapley
Additive exPlanations (SHAP), Shapash, ExplainerDashboard, Dalex, Explainable Boosting Machines
(EBM), ELI5, and others. LIME, proposed by Ribeiro et al. (2016), is one of the earliest XAI techniques
that explains the predictions of a machine learning classifier by learning an interpretable model based

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 3


MDscan for Mental Health Screening

on local observations around the predicted observation. Lundberg and Lee (2017) introduced SHAP to
explain individual predictions globally by assigning each feature a theoretically optimal value for a
particular prediction. We employ these techniques, along with our own ShapRadiation technique and
clinical practice (SCL-90-R), to design an XAI framework for mental health screening.

3 Artifact Design
Our proposed XAI framework uses data from the SCL-90-R questionnaire as inputs and converts these
inputs into explainable two-dimensional images that can be used for mental disorder screening. This
framework consists of three phases: (1) creating a feature map, (2) predictive modeling for image
recognition, and (3) generating an explanatory canvas. Figure 1 depicts an overview of these three
phases.

Figure 1. An XAI framework for mental health screening.

3.1 Generating a Feature Map


Numeric data from the SCL-90-R questionnaire is entered into our design artifact in a tabular format.
Each row in this table represents an observation from a specific participant and each column represents
a participant’s response to a specific question on the questionnaire (i.e., a feature). The MDscan
algorithm first transforms each observation into an image by locating the 90 features of each observation
on a two-dimensional (2D) map using the t-distributed Stochastic Neighbor Embedding (t-SNE)
technique. t-SNE is an algorithm for visualizing high-dimensional data in a low-dimensional space,
allowing latent patterns to emerge visually on the image map (Maaten and Hinton 2008). While other
dimension reduction techniques, such as principal component analysis (PCA), are linear transformations
that work with a global data structure, t-SNE is exceptionally superior in cases of non-linear manifold
patterns by virtue of its ability to preserve both local and global data structures (Maaten and Hinton,
2008). However, while traditional t-SNE examines training data sample-by-sample, we deviate from
this approach by examining the data feature-by-feature, which is called DeepInsight in the ML literature
(Sharma et al., 2019). This feature-by-feature approach retains the original global and local structure
during the dimensionality reduction process by computing pairwise similarities (cosine similarity)
among features as an expression of relations between features, which we use to position similar features
close to each other and dissimilar features further apart on a 2D plane.
After locating each feature on a 2D plane, we minimize the feature space by eliminating empty pixels
or white spaces containing no information on the plane to minimize the image size for each observation.
This helps us reduce the memory and computational needs for processing each image. We accomplish
this objective using the Graham Scan algorithm (Graham, 1972) to identify a convex hull polygon that
extracts the minimum area containing all 90 features on the 2D plane. This polygon starts with one
feature point as the origin of polar coordinates and uses a counterclockwise rotating ray to meet the first

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 4


MDscan for Mental Health Screening

feature point, which is then set as the following origin of polar coordinates. The ray is rotated
counterclockwise from its new origin until it meets the next feature point, and this process is repeated
iteratively until it reaches back to the starting feature point. The resulting polygon (the inner shape with
red lines in Figure 2) contains all feature points in the least possible canvas space without altering the
relative location of features. Next, we determine the minimum bounding rectangle containing the convex
hull polygon and rotate the minimum bounding rectangle to obtain an orthogonal representation to
minimize the canvas size containing all feature information.
Lastly, we visualize each participant’s SCL-90-R responses using grey-scaled pixels on our minimized
image map by rescaling each response (a 0-4 score) to a pixel value between 0 and 255, using the min-
max normalization function. The higher value is this value, the darker the color of the associated pixel.
Darker pixels are more indicative of a positive prediction of a mental disorder, while lighter pixels reflect
a negative prediction.

Figure 2. Feature map generation.

3.2 Prediction Model


Feature maps (images) from Phase 1 are now used as inputs for predictive modeling, in our case, for
multiclass classification of ten mental health disorders. Convolutional neural networks (CNN), used
widely for computer vision or image recognition tasks, are best suited for detecting patterns in images
(Zhu et al., 2021). In the CNN approach, a mathematical operation called convolution is used to filter
an input image into a feature map of lower dimensionality. Multiple filters, also called kernels or feature
detectors, may be used to recognize different latent features of an image. The filtered features are
represented as numeric matrices, which are pooled or compressed by computing the average or
maximum values to create a summarized representation of features. Pooling improves the stability of
CNN, and without it, even the slightest fluctuations in pixels may cause the model to misclassify. The
output of the pooling process is flattened into a column vector and fed into a fully-connected, feed-
forward layer for multiclass classification. The best model architecture for CNN may be defined by a
human modeler or derived from Bayesian optimization. Since the CNN architecture is well known to
ML experts and is not unique to our artifact, we do not provide further details here to conserve space.

3.3 Creating Explanatory Canvas


The last and most crucial phase of our artifact design is generating an explanatory canvas from the
prediction model in Phase 2 to understand which features are related to the predicted class for a specific
observation. This explanatory canvas overlays the feature map for each observation with distinct
background colors for each of the ten classes (mental disorders), with different intensity levels to depict
the severity of the disorder. We generate the canvas using the state-of-the-art SHAP and LIME
algorithms and our own custom-built ShapRadiation approach.
The SHapley Additive exPlanation (SHAP) algorithm, developed by Lundberg and Lee (2017), is an
XAI algorithm used to explain individual predictions in a prediction model by computing the relative
contribution of each feature toward the overall prediction as Shapley values. This algorithm is named

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 5


MDscan for Mental Health Screening

after and based on the work of Lloyd Shapley, the 2012 Nobel Prize winner in Economics, who
developed this approach to compute the relative contributions of multiple players in cooperative game
theory. In our problem, each feature represents a player, and the collective contribution of these features
determines the predicted class of a given observation.
MDscan employs the SHAP algorithm to compute Shapley values for each pixel on our feature map
(from Phase 1). The Shapley value of a pixel toward a given class is the difference in probability of
predicting this class using the image containing that pixel versus the image without that pixel (i.e., by
substituting the focal pixel with a random pixel). We estimate ten Shapley values for each pixel to
correspond to our ten classes. Shapley values may be positive or negative, shown in red or blue,
respectively, in Figure 3, with a darker color denoting a value far from zero. A negative Shapley value
indicates a decreased likelihood of categorizing a feature in a specific class, while a positive value
increases the chances of a positive class prediction.

Figure 3. Computing Shapley values for ten mental disorders.


The matrix of Shapley values is a sparse matrix, with only 90 pixels containing positive or negative
prediction information (indicated in red or blue in Figure 3) and all remaining pixels on the feature map
(white pixels in Figure 3) containing no information (missing values). The preponderance of missing
values makes sparse matrices quite deficient in explanatory ability. To paint the entire canvas, including
pixels with and without Shapley values, and define visual boundaries between the different classes, we
developed our own ShapRadiation approach to determine how pixels with Shapley values affect other
pixels with missing Shapley values. In this approach, we propose that pixels with Shapley values will
have a stronger effect on pixels in their near vicinity than on pixels far away, and define the influence
of a pixel’s explanatory power on another pixel by simulating a Gaussian distribution with the term
𝑑𝑖𝑠𝑡𝑖𝑗−𝑝𝑞
𝑒− 2 where distij-pq is the Euclidean distance between two pixels in row and column locations ij
and pq. The influence of Shapley values for each pixel decreases with increasing distance from the
center of the Gaussian distribution. This ShapRadiation approach transforms the sparse feature matrix
into a dense matrix. Each pixel on this canvas has a ShapRadiation value for each of the ten classes, and
the highest ShapRadiation value is used to determine the predicted class for that pixel, as shown in the
second panel of Figure 4. Each class, within its ShapRadiation boundary, can be color-coded (for our
ten classes), as shown in the third panel of Figure 4.
We examine the robustness of MDscan’s explanatory canvas using Local Interpretable Model-Agnostic
Explanation (LIME), a powerful local explainer for ML models (Ribeiro et al. 2016). The LIME
algorithm explains a classifier's predictions by approximating it with a linear model that explains
individual predictions based on how the model behaves in the vicinity of the predicted observation.
Because LIME approaches the explanation process very differently from SHAP, it provides a good test
of the robustness of the SHAP-generated explanatory canvas.

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 6


MDscan for Mental Health Screening

Figure 4. Defining class boundaries using ShapRadiation.


LIME uses a permutation technique to generate several new images from an existing feature map with
randomly masked pixels (see the second panel in Figure 5). Using a trained CNN model, it then predicts
labels for the generated images (third panel in Figure 5). Lastly, it uses simple linear regression to fit
the original label of the masked pixels against their predicted labels. This process generates coefficients
or weights for each pixel, representing their explainability toward the predicted label. This linear model
uses Least Absolute Shrinkage and Selection Operator (LASSO), a regularization technique commonly
used in ML models to reduce overfitting and noise in the model. The average of absolute coefficients of
pixels generated from LIME using training images defines each pixel value on the explanatory area,
where each pixel is assigned the class for which it has the largest value among all classes (fourth panel
in Figure 5). Average coefficients or weights greater than zero means that this pixel has a positive
prediction on the probability of a given class and is recorded as one on the explanatory canvas, while
weights equaling or less than zero means that the pixel has no effect or negative effect respectively on
the current class and is recorded as 0 (i.e., dropped) on the explanatory canvas.
Our final explanatory canvas, shown in panel 4 in Figure 4, is the intersection of the explanatory areas
from ShapRadiation and LIME explanations. The matching sections of the two explanatory areas are
color-coded for the different classes. The 90 input features are overlaid on the canvas using the
grayscaled pixels from Phase 1, with the relative intensity of each pixel representing the severity of the
mental disorder for each patient, while the color around the darker pixels and indicative of the patient’s
specific mental disorder.

Figure 5. Explanatory canvas with (a) questions and (b) mental disorders.

4 Artifact Evaluation
We tested MDscan using clinical trial data from patients at a mental health clinic in Turkey. Since
September 2018, this clinic has administered the SCL-90-R questionnaire to its clients on a secure web
portal called Psikometrist.com. Counselors at the clinic send clients an encrypted link to the online
questionnaire by e-mail or text for them to complete the questionnaire from the privacy of their homes.
Since its inception, the portal has recorded responses from 15,760 participants. The portal also tracks
how long each participant took to complete each question. To ensure data quality, we removed responses

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 7


MDscan for Mental Health Screening

that were completed in less than two seconds per question or those determined by the clinic staff to
contain inconsistent responses to similar questions. This screening process led to 6,139 complete
observations, which was the initial data source for our artifact evaluation.
The clinic provided us with the actual diagnosis (class labels) for each of the 6,139 patients, which was
used as the “ground truth” for our artifact evaluation. This evaluation was made by clinical psychologists
and psychiatrists based on participants’ SCL-90-R responses, personal interviews, direct observations,
physiological evidence (blood tests, blood pressure, etc.), and medical history. Also available to us was
the General Severity Index (GSI), a measure of patients’ overall mental status, which was essentially
the mean of all 90 patient responses to the SCL-90-R instrument. GSI was used to determine whether a
patient was mentally ill in general and needed treatment.

Labels Number of Clinical Practice MDscan


Positive
Samples Recall Precision F1 Score Recall Precision F1 Score
Somatization 248/500 1 0.50 0.66 0.88 0.87 0.88
Obsessive-
383/500 1 0.77 0.86 0.87 0.98 0.92
Compulsive
Interpersonal
331/500 1 0.66 0.80 0.86 0.92 0.89
sensitivity
Depression 386/500 1 0.46 0.87 0.88 0.98 0.93
Anxiety 267/500 1 0.28 0.70 0.91 0.95 0.93
Hostility 231/500 1 0.70 0.63 0.85 0.96 0.91
Phobic anxiety 139/500 1 0.40 0.44 0.90 0.97 0.94
Paranoid ideation 350/500 1 0.7 0.82 0.70 0.85 0.77
Psychoticism 200/500 1 0.4 0.57 0.87 0.83 0.85
Additional items 305/500 1 0.61 0.76 0.89 0.93 0.91

Table 1. Comparison of MDscan versus clinical practice for functional to moderately


symptomatic patients.

Labels Number of Clinical Practice MDscan


Positive
Samples Recall Precision F1 Score Recall Precision F1 Score
Somatization 248/500 1 0.59 0.75 0.88 0.87 0.88
Obsessive-
383/500 1 0.82 0.90 0.87 0.98 0.92
Compulsive
Interpersonal
331/500 1 0.71 0.83 0.86 0.92 0.89
sensitivity
Depression 386/500 1 0.84 0.92 0.88 0.98 0.93
Anxiety 267/500 1 0.64 0.78 0.91 0.95 0.93
Hostility 231/500 1 0.50 0.66 0.85 0.96 0.91
Phobic anxiety 139/500 1 0.28 0.43 0.90 0.97 0.94
Paranoid ideation 350/500 0.99 0.72 0.83 0.70 0.85 0.77
Psychoticism 200/500 1 0.40 0.57 0.87 0.83 0.85
Additional items 305/500 1 0.61 0.76 0.89 0.93 0.91

Table 2. Comparison of MDscan versus clinical practice for moderately to severely


symptomatic patients.

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 8


MDscan for Mental Health Screening

We selected a random anonymized sample of 500 observations from the 6,139 labeled observations to
evaluate our MDscan algorithm relative to current clinical practice (manual evaluation by mental health
experts based on SCL-90-R responses). We trained three mental health experts (two psychologists and
one psychiatrist) on reading and interpreting the MDscan images. We then asked these experts to
diagnose the 500 patients in our sample using our explainable MDscan images. We compared this
diagnosis against the patients’ original diagnosis reported to us by the clinic. Using two threshold values
for this evaluation (functional versus moderately symptomatic patients, and moderately symptomatic
versus severely symptomatic patients) we computed recall, precision, and F1 scores for our MDscan
approach against the original diagnosis (ground truth). The classification metrics are reported for
functional to moderately symptomatic patients and for moderately to severely symptomatic patients in
Tables 1 and 2, respectively. Clinical practice outperformed MDscan in recall (how well the model
correctly detected true cases) in both samples, but MDscan was superior in precision (how accurate were
the model’s positive predictions). Overall, MDScan consistently outperformed clinical practice on F1
score (the harmonic mean of recall and precision) for all mental disorders by an average of about 20%.
However, the real value of our algorithm was not in outperforming clinical practice but in its ability to
explain the rationale for each diagnosis. Figure 6 illustrates an example of how an MDscan image
displayed features contributing to the prediction of mental disorders. In this figure, the responses to
questions C42, C27, C58, C49, and C12 (in red squares) are darker in color, indicating a high value,
which implies a “positive” prediction of somatization. In contrast, responses to questions C23, C72, and
C86 (in green squares) are lighter in color, indicating a low value and or a “negative” prediction of
anxiety. It is worth noting that using MDscan, mental health experts can diagnose patients with multiple
disorders, which are sometimes missed, given information overload from 90 features and the complex
and overlapping symptomology of many mental disorders.

Figure 6. Example of a MDscan-based explanation.

5 Discussion
This study used an XAI approach to design a novel artifact for screening ten mental disorders, along
with their severity and explanation of the prediction in form of a full-color explanatory image. The
artifact presented here was informed and guided by clinical psychiatric practice (DSM-based SCL-90-
R questionnaire) and latest developments in XAI techniques to help secure user confidence, trust, and
acceptance of this artifact by mental health practitioners for clinical practice. A field test of this artifact
showed that it outperformed standard (manual) clinical practice for each of the ten mental disorders by
an average of 20%. The initial screening provided by MDscan may be complemented with patient
interviews, observations, physiological evidence, and prior medical history to generate a more
comprehensive diagnosis of mental disorders.

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway 9


MDscan for Mental Health Screening

5.1 Contributions to Research


This paper contributed to the XAI literature in two ways. First, it provides an illustrative example of
using a “white box” XAI approach to design computational algorithms. We build a unique XAI pipeline
that combines and configures various component ML algorithms such as t-SNE, convex hull, minimum
bounding rectangle, CNN, SHAP, LIME, and ShapRadiation to address a target problem: screening of
mental disorders. White box approaches such as the MDscan algorithm represent a significant
advancement over black box approaches typically used in most AI/ML applications, and are essential
for transparency, interpretability, and acceptance of AI in mission-critical applications in businesses and
society. Although our XAI approach examined the specific case of mental health screening, a similar
approach may be used in other domains such as finance, cybersecurity, and public health.
Second, this paper proposed a new technique called ShapRadiation to determine how to create a
multiclass explanatory visual canvas from a sparse matrix with thousands or millions of missing values
by employing pixels with known Shapley values to impute corresponding values for pixels with missing
Shapley values. This ShapRadiation technique is a unique contribution to the AI/ML literature and may
help create informative visual representations from sparse data across a wide range of application
domains.

5.2 Contributions to Practice


The MDscan algorithm proposed in this paper serves the critical and underserved context of mental
health screening – a growing public health crisis that is often stigmatized by society and rarely addressed
within our public health care systems. Given the substantial global imbalance between supply and
demand of mental health services, our artifact can serve as a valuable and reliable screen for mental
health providers in managing their large and growing patient caseload by providing a preliminary
diagnosis and by categorizing patients based on their specific mental disorder and severity. Furthermore,
our artifact may help mental health clinics improve their patients’ clinical experiences by optimizing
patient diagnosis, monitoring, and treatment more efficiently. The explainability offered by our artifact
may help enhance the trustworthiness and acceptance of its predictions among patients and providers.
In addition to screening for specific health disorders, mental health professionals can also use MDscan
to diagnose subcategories within each of the ten mental disorders, such as distinguish between different
types of psychoticism (e.g., paranoid psychosis, schizoaffective disorder, and paranoid psychosis), using
different combinations of responses in the SCL-90-R questionnaire. Even trained professionals find
complex mental disorders challenging to diagnose because of overlapping symptoms between mental
disorders and information overload between 90 responses for each patient. Because a picture is worth a
thousand words, MDscan’s visual images can help uncover rare mental disorders from hidden patterns
in patients’ responses while simultaneously watching out for specific symptoms such as tendencies for
self-harm or suicide.
Lastly, experts can compare MDscan images for a given patient at different points in time to track the
progress of a mental disorder over time or to evaluate the efficacy of their treatment protocol. Figure 7
depicts three MDscan images of an actual patient taken on her first day in the clinic, after 21 days of
treatment, and after 40 days of treatment. The change in the darkness of pixels from dark to light over
time indicates the patient’s recovery from her mental disorder. The lower panel in this figure shows the
change in pixel intensity across consecutive points in time, color-coded to demonstrate an increase
(blue), no change (white), and decrease (pink) in disease severity.

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway


10
MDscan for Mental Health Screening

Figure 7. Mental health trend analysis.

6 Conclusion
As the demand for mental health services continues to explode around the globe, there is a growing need
for automated tools to support mental health screening and diagnosis. The MDscan algorithm presented
in this paper attempts to address this critical problem while also demonstrating how to improve user
trust, confidence, and acceptance of AI predictions by explaining their AI predictions. We hope that our
research will inspire other researchers to develop their own XAI approaches for addressing critical
business and social problems in other domains, such as finance and cybersecurity, where visual and
explainable representations of large complex data sets are needed to detect essential and useful patterns.

7 Acknowledgments
The authors thank the Guven Private Health Laboratory, Ankara, Turkey, for providing the labeled
mental disorder data used in this study. This data was sourced with the help of DNB Analytics.

References
American Psychiatric Association (2022). Diagnostic and Statistical Manual of Mental Disorders, Fifth
Edition, Text Revision (DSM-5-TR™). Washington DC.
Foley, T. and Woollard, J. (2019). The digital future of mental healthcare and its workforce. London:
Health Education England.
Graham, R.L. (1972). “An efficient algorithm for determining the convex hull of a finite planar set,”
Information Processing Letters 1, 132-133.
Johnson, M., Albizri, A., and Harfouche, A. (2021). “Responsible artificial intelligence in healthcare:
Predicting and preventing insurance claim denials for economic and social well-being, Information
Systems Frontiers, https://doi.org/10.1007/s10796-021-10137-5.
Kilbourne, A. M., Beck, K., Spaeth-Rublee, B., Ramanuj, P., O’Brien, R. W., Tomoyasu, N., and Pincus,
H. A. (2018). “Measuring and improving the quality of mental health care: A global perspective,”
World Psychiatry 17 (1), 30–38.
Lundberg, S.M. and Lee, S.I. (2017). “A unified approach to interpreting model
predictions,” Proceedings of the 31st Conference on Neural Information Processing Systems, Long
Beach, USA, 4768–4777.

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway


11
MDscan for Mental Health Screening

Loyola-González, O. (2019). “Black-Box vs. White-Box: Understanding Their Advantages and


Weaknesses from a Practical Point of View,” IEEE Access 7, 154096–154113.
Maaten, L.V., and Hinton, G.E. (2008). “Visualizing Data Using t-SNE,” Journal of Machine Learning
Research 9, 2579-2605.
Mellem, M. S., Kollada, M., Tiller, J., and Lauritzen, T. (2021). “Explainable AI enables clinical trial
patient selection to retrospectively improve treatment effects in schizophrenia,” BMC Medical
Informatics and Decision Making 21(1), 1-10.
Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B. (2019). “Interpretable machine
learning: definitions, methods, and applications,” Proceedings of the National Academy of Sciences
116 (44): 22071–22080.
Nakano, T., Takamura, M., Ichikawa, N., Okada, G., Okamoto, Y., Yamada, M., ... & Yoshimoto, J.
(2020). “Enhancing multi-center generalization of machine learning-based depression diagnosis
from resting-state fMRI,” Frontiers in Psychiatry, 11, 400.
National Alliance on Mental Illness (2020). Mental Health by the Numbers. URL:
https://www.nami.org/mhstats (visited on March 25, 2023).
Ribeiro, M., Singh, S., and Guestrin, C. (2016). “’Why should I trust you?’: Explaining the predictions
of any classifier,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, arXiv:1602.04938.
Schulte-Rüther, M., Kulvicius, T., Stroth, S., Roessner, V., Marschik, P., Kamp-Becker, I., and Poustka,
L. (2021). “Using machine learning to improve diagnostic assessment of ASD in the light of specific
differential diagnosis,” medRxiv, 2021-10.
Schmitz, N., Hartkamp, N., & Franke, G. H. (2000). “Assessing clinically significant change:
Application to the SCL-90–R.” Psychological Reports, 86 (1), 263-274.
Sharma, A., Vans, E., Shigemizu, D., Boroevich, K.A., and Tsunoda, T. (2019). “DeepInsight: A
methodology to transform a non-image data to an image for convolution neural network
architecture,” Scientific Reports, 9, 11399.
Syarif, I., Ningtias, N., and Badriyah, T. (2019). “Study on mental disorder detection via social media
mining. 4th International Conference on Computing, Communications and Security, 1-6.
The Lancet (2018). The Lancet Commission on Global Mental Health and Sustainable Development
The Lancet, 392, 10157, 1553-1598.
Thieme A., Belgrave D., and Doherty G. (2020). “Machine learning in mental health,” ACM
Transactions on Computer-Human Interaction 27 (5), 34.
Tyrer, P. (2014). “A comparison of DSM and ICD classifications of mental disorder,” Advances in
Psychiatric Treatment 20 (4), 280–285.
Wainberg, M. L., Scorza, P., Shultz, J.M., Helpman, L., Mootz, J., Johnson, K. A., Neria, Y., Bradford,
J. M. E., Oquendo, M. A., and Arbuckle, M. R. (2017). “Challenges and opportunities in global
mental health: A research-to-practice perspective,” Current Psychiatry Reports 19 (5), 28.
Ware, S., Yue, C., Morillo, R., Lu, J., Shang, C., Bi, J., Kamath, J., Russell, A., Bamis, A., and Wang,
B. (2020). “Predicting depressive symptoms using smartphone data.” Smart Health 15, 100093.
World Health Organization (2018). Mental Health: Strengthening Our Response. URL:
https://www.who.int/news-room/fact-sheets/detail/mental-health-strengthening-our-response
(visited on March 20, 2023)
World Health Organization (2022). Mental Disorders. URL: https://www.who.int/news-room/fact-
sheets/detail/mental-disorders (visited on March 22, 2023)
Zhu, Y., Brettin, T. S., Xia, F., Partin, A., Shukla, M., Yoo, H. S., Evrard, Y. A., Doroshow, J. H., and
Stevens, R. L. (2021). “Converting tabular data into images for deep learning with convolutional
neural networks,” Scientific Reports 11, 11325.

Thirty-first European Conference on Information Systems (ECIS 2023), Kristiansand, Norway


12

You might also like