1. Introduction and Rationale
Health literacy can be defined as an individual’s ability to make health decisions based on a sound analysis of relevant data. Over the last few decades, health literacy has garnered attention across the world. This in part is due to research that suggests that health literacy is a key determinant of health. For instance, according to the American Medical Association, health literacy is a stronger predictor of a person’s health than age, income, employment status, education level, or race [
1]. A survey conducted across eight European countries notes that individuals with lower levels of health literacy tend to have worse health [
2]. In addition to the health implications, low health literacy has financial implications for individuals as well as governments [
3,
4,
5].
Health literacy is multifaceted and encompasses a person’s ability to access, understand, process, and apply health information relevant to disease prevention, healthcare, and health promotion [
6]. Disease prevention is an important aspect of public health [
7]. In 2000, 35% of deaths in the United States were linked to tobacco and alcohol use, poor diet, and physical inactivity [
8]. On a global scale, 10% of mortality is attributed to physical inactivity and dietary risk factors [
9]. From a disease prevention standpoint, individuals with low health literacy have been shown to make poor health choices, engage in risky behavior, and have low self-management [
5]. Though professionals are charged with educating the public about health risks, hazards, and issues, there is a need for personal empowerment as well [
10,
11,
12]. Improving health literacy is a nontrivial endeavor. Currently, individuals seeking to access and understand health data are confronted with a myriad of data-related challenges. For instance, health data is often voluminous and originates from heterogeneous sources [
13,
14,
15,
16]. As a result, people find themselves having to engage in a time-consuming traversal of multiple websites to access relevant data. In addition to access, presenting data to individuals in a dense and understandable fashion is crucial to improving health literacy for the public [
13]. Given the scale and complexity of the data related to disease prevention, visualizations have the potential to play a crucial role.
Interactive visualizations predominately represent data in a visual format and allow users to manipulate how the data is shown. Simple visualizations such as bar charts, scatter plots, and pie charts have been used extensively over the last two centuries in the health domain. However, as the size of data increases, there is a need for visualizations that can mirror the complexity of the data and facilitate its understanding without straining the cognitive resources of users [
17]. While the development of elaborate non-trivial visualizations has increased in recent years, research on instructional materials for visualizations is sparse [
18,
19,
20]. As users’ understanding of the tool influences their ability to use the tool to complete tasks effectively, more research on visualization literacy—which is the ability of users to interpret and extract information from visualizations—is necessary. Borner et al. highlight the need for instruction so that individuals are better equipped to understand novel visualizations [
21]. While some may avoid using non-typical visualizations because of their complexity, it is important to investigate, if, with training, individuals can learn to use such visualizations. Therefore, before we can explore the use of non-typical visualizations for health literacy, it is important to first examine visualization literacy.
The purpose of this paper is twofold. First, to present research that investigates the ability of individuals to learn to use elaborate interactive visualizations. Second, to examine the ability of non-trivial visualizations to improve health literacy. To this end, we have created a visualization tool, HealthConfection, that allows individuals to make sense of the causes and risk factors that contribute to mortality across the world. Using this tool, we have conducted two user studies. The results from the first study, which is for visualization literacy, informs the second study that investigates health literacy. In this paper, we report our findings and discuss the implications for the visualization and health communities. The rest of the paper is organized as follows.
Section 2 provides some conceptual and terminological background.
Section 3 describes the visualization tool that we have created.
Section 4 presents the research methodology and results from the visualization literacy study.
Section 5 presents the health literacy study that we conducted. The final section,
Section 6, presents the general conclusions.
3. HealthConfection
HealthConfection is a visualization tool that allows users to explore and make sense of the risk factors and the causes of mortality. The tool incorporates selected datasets aggregated by IHME [
33]. The datasets include over 12 million records that estimate the 57 risk factors and over 235 causes that lead to death. Part of the challenge when working with large datasets is determining how users will explore the data. In visualizations, providing an overview is beneficial. When properly designed, overviews can provide users with an immediate appreciation for the size and extent of the data space, and support the navigation and exploration of the data space [
38]. Previous visualization tools have shown the importance of providing users with a high-level overview of the data [
30,
32]. In addition to creating an overview visualization, we have also developed visualizations that emphasize four different perspectives through which users improve their health literacy: demography, geography, chronology, and sentiment.
When working with multiple visualizations, it is important to provide users with consistent structures and navigational cues and anchors [
38,
39]. As users navigate a data-centered tool, they find themselves confronted with familiar questions, including
where am I? where can I go? and how do I get there? Visual metaphors can help to provide consistent structures. When users internalize visual metaphors, they can navigate visualizations effectively [
40]. One technique to organize several representations is to use the visual confection metaphor. A visual confection is an assembly of visual representations, juxtaposed to tell a story, present visual comparisons, and show relationships and transitions [
41]. Confections focus on the organization of representations through compartments, which can then be used to zoom in on visual elements. The consistent structure and navigation allow users always to be aware of their current location. Based on the Gestalt principle of symmetry, one viable technique for juxtaposing visual confections is to have a central representation around which other representations are arranged [
42]. Placing a representation at the center implies that the representations surrounding it are conceptually related to it [
42]. The central representation, then, is where users begin their exploration of
the story of the data.
Figure 1 shows the visual organization of our tool.
HealthConfection provides cues that allow users to explore health data from different perspectives while at the same time minimizing visual discontinuity. By interacting with the ‘+’ anchor to the right of each compartment, users can explore a perspective, control which visualization is in the center, watch the tutorial, and hide other visualizations. The
Overview visualization in
Figure 1, shows the relationships between the causes of death and risk factors at a global level and allows users to select specific age groups, geographic locations, or points in time for investigation. The surrounding compartments allow users to explore
the story of the data from the four perspectives. In the IHME datasets, causes and risk factors are grouped at the level of clusters and groups. For causes, there are 21 clusters and three groups: communicable, non-communicable, and injury. For risk factors, there are ten clusters and three groups: metabolic, behavioral, and environmental and occupational risks. In our visualizations, we use a consistent color coding to emphasize the hierarchical structure of causes and risks.
Non-communicable,
communicable, and
injury causes are encoded with
blue,
red, and
black, respectively. For risks, we use light shades of
orange,
green, and
pink for
metabolic,
behavioral, and
environmental and occupational risk groups, respectively.
The
Demography visualization allows users to explore which risks and causes affect different age groups. It also ranks the regions of the world based on their mortality rate for each age group. The visualization, enlarged in
Figure 2a, has five main components, four of which are arranged as tracks. The innermost track represents the age groups at which the data is aggregated (e.g., 1–4, 50–54). The second track depicts the ranking of cause-clusters for each age group. Clusters are arranged in descending order, with the cause-cluster with the highest rank on the outside. The third track depicts the ranking of risk-clusters. The gray circles in the cause and risk tracks depict clusters that do not contribute to mortality for the age group. The last track shows the ranking of location clusters. Risk, cause, and location clusters are ranked and arranged according to their mortality rate per 100,000 people. The sub-visualization placed in the center of the tracks depicts the relationship between causes and risks for specific locations for a specific age group. The Demography visualization is a dense visualization that encodes over 800 data items in its initial configuration. Through interaction, users can control the amount of data shown and perform a variety of tasks. For instance, users can filter to understand how a risk-cluster affects different age groups. Users can also search for a specific cluster and then drill to get more information on the causes or risk factors that make up that cluster.
The
Geography visualization (
Figure 2b) allows users to explore the relationships between causes and risk factors at three levels of granularity: global, regional, country. The top half of the visualization encodes the relationship between risk factors and causes at a global level and the regional distribution of mortality for a selected cause or risk factor. The circular sub-visualizations on either side of the map show the same relationships but from different perspectives. The left one shows risk factors as circles and the causes related to them as arcs, while the sub-visualization on the right shows causes as circles and risk factors as arcs. The map shows how a selected risk or cause affects different regions of the world. The bottom half of the visualization allows users to explore the cause-risk relationship for a specific region of the world. The oval track is comprised of 21 visual elements, each representing a region. By selecting a region, cause- and risk-related mortality rates are shown as heatmaps, for the countries in the region. Connecting the risk and cause heatmap portions of the visualization are links that emphasize the relationship between cause-clusters and risk-clusters for that specific region. By interacting with the Geography visualization, users can determine the regions of the world that are most affected by a cause, cause-cluster, risk, or risk-cluster. They can also compare the impact that certain diseases have on countries and make sense of the relationship between causes and risk factors at multiple levels of granularity.
The
Chronology visualization (
Figure 2c) allows users to explore how mortality has changed over time. This visualization has two main controls and three panels. The first control allows users to filter data by selecting a specific time period. The second control is part of the first panel and allows users to select a cause-cluster for further examination. The first panel depicts the ranking of cause-clusters at a global level over the specified time frame. Each cause-cluster is arranged based on its rank for a specific year and links are drawn between each year’s placement to help users understand the temporal trend. The second panel depicts the proportion of mortality for causes in a selected cluster. The third panel portrays the temporal distribution of cause-cluster specific mortality for each region of the world. With interaction, users can determine which cause-cluster results in the highest mortality at a global level and explore how mortality has changed over time. The
Sentiment visualization (
Figure 2d) allows users to explore the public’s perception of different health hazards. This visualization uses Twitter data (data not from IHME) that includes over four hundred thousand health-related tweets. Using machine learning models, we classified each tweet by its user category and subject theme. The circular arcs at the top of the visualization represent the top 50 words for the dataset. The middle portion depicts the categorization of tweets by user groups and tweet themes. In its initial configuration (
Figure 1), the bottom of the sentiment visualization depicts the sentiment rate for cause-clusters. Users can drill to retrieve additional information for a selected cause-cluster. For instance, in
Figure 2d, when cancer is selected, the curved heatmaps depict the sentiment for each cause in the cluster for each user group and tweet theme.
Interaction plays a crucial role in the exploration of data. To facilitate the understanding of health patterns and trends, each visualization has different interactions such as filtering, drilling, selecting, searching, and comparing, that are operationalized in a consistent manner. For an in-depth discussion of how the visualizations were designed, the interested reader is directed to [
17].