1 Introduction

While visualisations bolster the brain’s capacity to digest vast amounts of information, they must be designed to leverage human perception and support our cognitive limitations (Few 2009, 2014). Data visualisation aims to tip the balance from relying on cognitive processing to leveraging our fast-paced visual perception, to bring patterns within data into clear view while supporting natural interactions with the visualisation (Qu et al. 2022). To take advantage of human perception processes, successful data visualisations must consider the way the visual system operates, such as proximity, similarity, enclosure, and connection (Few 2014). With an important focus on supporting the ease of visual processing in data visualisations, it logically follows that visual distractions may pose a risk to the effective sense-making of large data sets.

Advancements in head-mounted extended reality displays have nurtured the rapid research and development of visualization methods and systems in the platforms. Two immersive environments commonly used to view data three-dimensionally are virtual reality (VR) and augmented reality (AR). Although a VR headset can isolate the users from real-world visual stimuli, visual distractions can still occur within the virtual environment. However, this approach is less practical for notetaking, awareness of the surrounding environment or avoiding objects in the room (McGill et al. 2015). There are also reports of virtual reality sicknesses such as nausea and disorientation (Guna et al. 2019). Whereas AR overlays the display onto the user’s natural environment (Azofeifa et al. 2022), which reduces spatial vulnerability and enables notetaking or keyboard use. However, with a real-world backdrop, there is more scope for visual distractions. Thus, both methods come with theoretical and practical advantages and disadvantages for usability. As such, which headset provides the most optimal user experience in immersive data visualisation is yet undetermined.

Immersion and presence have been separated theoretically throughout the extended reality literature, although there is a subtle difference between the two (Bouchard et al. 2012; Silva et al. 2016). Immersion is an objective description of the sensory mediating technology, for instance, a VR headset. These features include a wide field of view, high-resolution visual graphics, spatially appropriate sound effects, and spatial tracking of the head and limbs (Silva et al. 2016; Slater and Wilbur 1997; Takatalo et al. 2008). On the other hand, sense of presence refers to the user’s subjective experience while immersed in technologically simulated environments (Witmer and Singer 1998). That is, consciously existing in two places at once, one inside the virtual environment and the other within the physical, non-mediated natural world (Slater 2018). Through this blurring of consciousness, between the natural and virtual, human and computer interactions (HCI) are enhanced (Clemente et al. 2014). The presence provides the user with control over their actions by comparison between intentions and perceptions (Riva et al. 2014), thus effectual HCI and interactive visualisation.

What has not been evident in the literature is the impact of presence on task performance explicitly in interactive visual data analysis and the resulting user experience, especially in oncology data analytics. The VR oncology data visualisation program allows the user to observe patient information, within a cohort, by comparing genomic similarities and treatment histories (Lau et al. 2022). When utilised by medical professionals, patterns within the data assist in deeper understanding and planning for an individual patient’s specific condition and support hypothesis testing of gene similarity and survival rates (Lau et al. 2019, 2022). Our study aims to offer insights into optimising immersive extended reality experiences for performance and user experience in both medical and data visualisation tasks more broadly. The medical tasks illustrate a real scenario of how a clinician does the daily analytics in the VR/AR environment, such as identifying patients with specific biological and clinical properties, interpreting visual charts showing genomic information, and comparing one or more selected patients in the cohort.

We contribute a usability study to evaluate the impact of sense of present on user experience and performance in virtual and augmented reality in line with positive technology as suggested by (Peters et al. 2018). Our work compares the efficiency, effectiveness, as well as autonomy and competence for users across AR and VR headsets using the VROOM system (Lau et al. 2022). VROOM provides performance explicitly in interactive visual data analysis on oncology data and the resulting user experience. The interactive visualisation includes simulated 3D environments (spherical objects and realistic models to represent patients) and popular 2D charts (such as heatmaps, box plots, scatterplots, etc.) to represent the traditional or physical analytical views. Autonomy and competence refer to subjective feeling and user satisfaction ratings on the visualisation and interaction experience.

Particularly, we study three hypotheses:

  • H1. If visual immersion supports feelings of presence, then virtual reality use will predict higher scores of presence than augmented reality use when performing data visualisation tasks.

  • H2. If a higher level of presence supports data visualisation task performance, then VR will predict higher accuracy and faster task completion in data visualisation tasks.

  • H3. If higher feelings of presence increase human-computer interaction need satisfaction in data visualisation tasks, then presence scores will hold a positive relationship with scores of autonomy and competence.

2 Rationale

Although extended reality has been widely adopted in healthcare and biomedical fields, such as CellexalVR (Legetth et al. 2018), StarMap (Yang et al. 2018), MinOmics (Maes et al. 2018), BioVR (Zhang et al. 2019), VROOM (Lau et al. 2022) and in recent reviews (Qu et al. 2019, 2022). Unfortunately, there are limited usability studies to understand the impact of presence on user experience and performance in the VR and AR environment, per the focus of this paper.

Sense of presence is central to psychological research on human experience in virtual environments (Schubert et al. 2001). Presence has been measured between virtual scenes deemed to be differing levels of realism whose 77 adult and child participants rode two virtual rollercoasters of different degrees of realism (Baumgartner et al. 2008). The study showed that higher ratings of presence were found in the realistic rollercoaster condition. Bouchard et al. 2012 used a between-subjects design of 31 participants to display two groups with the same scene of a room with a mouse cage. The researchers only manipulated the experimental group’s belief of the room’s realism with instructions that they were viewing a live feed and the control group that the room was a virtual replica of a nearby room. Despite both groups in fact viewing the same animated mouse cage, the experimental group reported significantly higher scores of presence than the control. Further highlighting the importance of perceived immersion, (Weber et al. 2021) reported that although presence could be experienced despite simultaneous awareness of two realities, presence scores decreased when attention was diverted away from the virtual environment by real-world distractions. Due to constant reminders of the real world through the blending of natural and extended reality visuals in augmented reality, presence is logically more at risk in AR (see H1). Whilst the sense of presence should be as high as possible, the reality could be useful for data visualisation and interaction in the extended overlays of extended reality environments.

Presence and virtual task performance held a consistently positive relationship, however, participants experiencing virtual reality sickness report lower levels of presence (Weber et al. 2021). Thus, when directed attention is broken, presence is compromised. Presence also provide the user with control over their actions by comparison between intentions and perceptions (Riva et al. 2014). While holding differing viewpoints on presence, both groups of researchers assert that presence supports control and performance (see H2).

Aligned with Self-Determination Theory (SDT) (Ryan and Deci 2017), sustained motivation and well-being for technology users are most broadly measured by fulfilling the inherent needs of feelings of autonomy, competence, and if applicable, relatedness. Poor user experience can lead to disengagement, time-wasting, stress, or in the case of a clinician, poor judgement, and patient treatment planning (Peters et al. 2018). This study also employs measures of user experience in line with positive technology (Peters et al. 2018) (see H3).

3 Method

3.1 Participants

A total of 38 adults (21 females; 15 males; 2 non-binary), aged between 18 and 71 years old (M = 30.29; SD = 16.16), participated in the study. All participants completed a high school or university degree with normal or corrected vision. There is requirement of previous knowledge in the field in this study. As the visualisation shows the gene expression values via charts which can be interpreted with general knowledge, no prior experience or knowledge regarding extended reality or data analysis was required from the participants. Participants were recruited via the university’s SONA student portal (mostly first-year psychology students) in exchange for course credit or external participants for $30 online shopping gift vouchers.

3.2 Design

A within-subjects experimental design was employed for this study, with two within-group conditions (augmented reality condition and virtual reality condition). Using priori power calculation for differences between two dependent means (Faul et al. 2007), it showed that a total of 36 participants would be required for a large effect size, with power 0.9 at a = 0.05. Measurements were taken for task completion time in seconds (efficiency) over three task categories and accuracy (effectiveness). Preference for headset type, subjective feelings of presence and need satisfaction ratings (autonomy and competence) were also measured. Open-ended responses were taken regarding reasons of preference, experiences of viewing two realities and headset discomfort.

3.3 Stimuli and measures

Tasks were designed to step through the typical use of the program’s breadth of features for an assessment of a patient-of-interest within a sample of patients. AR and VR tasks were kept as comparable as possible with slight adjustments to controller instructions as required. Three categories were designed for deeper insight into different aspects of usability, namely, navigation, reading and interpretation (see supplementary). Tasks under the navigation category required the participant to move through the program’s features or to interact with controls. For example, “Select a high-risk patient and drag it into the group panel”. Reading tasks simply required the participant to read or count data outcomes from the program (e.g., “What is the sex and risk category of the patient?”). Interpretation tasks went beyond reading, requiring some evaluation of the data (e.g., “Looking at the table, overall was the most common treatment type successful?”).

As recommended by Schubert et al. (Schubert et al. 2001), a self-report questionnaire was selected to measure the subjective experience of presence over a biological measure. Schubert et al.’s Igroup Presence Questionnaire (IPQ) was deemed most suitable as it focused specifically on measurements of presence. Fourteen items comprise four subscales with Cronbach’s alphas were used in our study including spatial awareness, involvement, realness, and presence (see details in supplementary).

To measure the participant’s psychological needs satisfaction when directly interacting with the interface, the TENS-Interface (Peters et al. 2018) was administered following each head-mounted display (HMD). The questionnaire was designed to specifically measure autonomy and competence. Ten items were delivered as a 5-point Likert scale from ‘do not agree’ to ‘strongly agree’.

3.4 Apparatus and materials

All participants use VROOM program (Lau et al. 2022) for both the Oculus Quest 2 (VR) and Microsoft HoloLens (AR) HMD. Times were taken in seconds using a stopwatch. Our experiments used a data set from https://www.cancer.gov/ccg/research/genome-sequencing. The raw data includes patient biological attributes, patient history, and Ribonucleic acid sequencing data. To minimise the distraction of the environment, all experiments in both VR and AR were carried out in a quiet and large laboratory space with blinds to block excessive light from outside. The large room had at least 3 × 3 square meters of clear space with no furniture or any surrounding objects for the experiments. The experiments were carried out one-on-one between the researcher(s) and the participants.

3.5 Procedure

Following the introduction and consent, demographic and extended reality experience information was collected. Next, the sole participant was positioned, standing, in a quite laboratory space with a 3 × 3 m clearance of any hazards. Before commencing, the comfort, fit and sound volume of the headset, along with the experimenter’s voice clarity, was confirmed with the participant.

Task directions were then given verbally by the researcher as read from the task list. The timing of each task was scored individually, in seconds, with a stopwatch and recorded. Accuracy was recorded dichotomously (i.e., correct/ incorrect) and converted into an overall percentage for each participant. The distance between the researcher and participant was kept consistent throughout all experiments (i.e., approximately 3 m). The experimenter’s back was turned to observe the live onscreen headset casting and to allow the participant to complete tasks without feeling watched.

Figure 1 illustrates an example of a user’s view showing the entire patient population in AR and VR respectively. Each patient is represented as a sphere whose is mapped by a colour indicating high, medium, or low risk respectively. In the 3D immersive space, patients with similar genomic properties are located closely together (Lau et al. 2022). Figure 2 shows visual analytic views of selected patients in AR and VR environments with multiple analytics panels. The user hypothetically selects one or more patients to review the details of biomedical and compare their genomics information with classical 2D charts, such as boxplots, bar chart, heatmap, and scatterplot in AR and VR environments.

Following each HMD experience, the corresponding survey was administered immediately, in line with the recommendations of (Doherty and Doherty 2018) on self-reporting approaches. The experiments on AR and VR devices were counterbalanced to limit order effects in the within-subjects design (i.e., learning effects or fatigue). Task times were summed for each participant into the three task categories, along with accuracy scores.

Fig. 1
figure 1

Example of the VROOM visualisation viewed of the entire patient population through AR (top) and VR (bottom) respectively. Each patient is represented as a spherical object

Fig. 2
figure 2

An example of visual analytic views showing multiple analytics panels of selected patients in AR (top) and VR (bottom) environments on two different data samples

4 Results

Descriptive statistics, as seen in Table 1, revealed that participants reported a higher presence in VR than in AR (H1). VR also produced faster task completion across all time measures along with higher accuracy rates than in AR (H2). Feelings of autonomy and competence were also scored higher by participants in VR than in AR (H3). In cleaning the data, no out-of-range or missing data were identified. Five univariate outliers were found beyond the criterion of Z score > 3.29. These comprised two for total completion time and one each for reading time, navigation time and accuracy. Consequently, the influence of those outliers was reduced by adjusting the raw scores to one higher than the next most extreme score as recommended by (Tabachnick et al. 2013).

To test the first hypothesis, a dependent t-test was conducted on the mean feeling of presence score under the two conditions, AR and VR. Assumptions of normality were met and the result indicated a statistically significant difference between the AR and VR conditions, t(37) = 5.86, p < .001. The mean presence score of 38.00 (SD = 10.48) within the AR condition was lower than the mean of 48.16 (SD = 6.93) for the VR condition (see Fig. 3a). A large effect size (r2 = 0.48) was indicated by the mean difference of 10.16 between the two conditions, 95% CI [6.64, 13.67].

Table 1 Descriptive statistics for all measures across AR and VR

To test hypothesis two, three dependent t-tests were conducted on the mean accuracy as a percentage, total completion time and interpreting time under AR and VR. Assumptions of normality were violated; visual inspection did not show extreme deviation and with a robust sample N = > 30 results should be interpreted with caution (Hills 2011). A statistically significant difference was found for total completion time between the AR and VR conditions, t(37) = 7.37, p < .001. The mean VR total completion time of 377.94 (SD = 140.83) was faster than the mean time of 588.39 (SD = 228.52) for the AR condition (see Fig. 3b). A large effect size (r2 = 0.56) was indicated by the mean difference of 210.44 between the two conditions, 95% CI [152.61, 268.28]. The results also indicated a statistically significant difference in interpreting time between the AR and VR conditions, t(37) = 2.68, p < .01 (one-tailed test). The mean interpreting time of 103.12 (SD = 37.58) within the VR condition was faster than the mean of 126.89 (SD = 55.47) for the AR condition (see Fig. 3b). A large effect size (r2 = 0.16) was indicated by the mean difference of 23.77 between the two conditions, 95% CI [5.77, 41.77]. Lastly, the results revealed a statistically significant difference in accuracy between the AR and VR conditions, t(37) = 4.10, p < .05. The mean VR accuracy score of 96.40 (SD = 4.13) was higher than the mean of 93.69 (SD = 4.94) for accuracy in the AR condition (see Fig. 3c). A large effect size (r2 = 0.83) was indicated by the mean difference of 2.72 between the two conditions, 95% CI [1.37, 4.06].

Due to violations of normality, rankings of total time and Interpreting time and accuracy across HMD were compared using the Wilcoxon Signed-Rank test. As expected, the median rank for Total time in VR (Median (M) = 341.47, Range (R) = 505.15) was significantly faster than in AR (M = 529.94, R = 811.92), with a large effect size, z (N = 38) = 4.97, p < .001, r2 = 0.33. For Interpreting time VR (M = 97.88, R = 165.63) was significantly faster than in AR (M = 106.13, R = 234.62), with a small to medium effect size, z (N = 38) = 2.42, p = .016, r2 = 0.08. Lastly, the median rank for Accuracy in VR (M = 96.67, R = 16.67), was significantly higher than in AR (M = 93.33, R = 16.67), with a large effect size, z (N = 38) = 3.31, p < .001, r2 = 0.14.

To test hypothesis three, firstly two dependent t-tests were conducted to determine whether a difference in mean autonomy and competence occurred between AR and VR. Assumptions of normality were not met, as such, results should be interpreted with caution. The results indicated a statistically significant difference in autonomy between the two conditions, t(37) = 1.13, p < .05 (one-tailed test). The mean autonomy score of 20.92 (SD = 3.81) within the VR condition was higher than the mean of 19.79 (SD = 3.56) for the AR condition (see Fig. 3d). A medium effect size (r2 = 0.09) was indicated by the mean difference of 1.13 between the two conditions, 95% CI [0.20, 2.46]. The results also indicated a statistically significant difference in competence between the AR and VR conditions, t(37) = 3.32, p = < 0.001 (one-tailed test). The mean competence score of 20.66 (SD = 5.07) within the VR condition was higher than the mean of 17.34 (SD = 5.07) for the AR condition (see Fig. 3d). A large effect size (r2 = 0.48) was indicated by the mean difference of 3.32 between the two conditions, 95% CI [1.37, 5.26].

Fig. 3
figure 3

Comparison of between AR and VR, a) presence scores, b) task completion time and interpreting completion, c) accuracy, and d) satisfaction with autonomy and competence

Due to violations of normality, rankings of Autonomy and Competence across HMD were compared using the Wilcoxon Signed-Rank test. Contrary to expectations, the median rank for Autonomy in VR (M = 22.00, R = 12) was not significantly faster than in AR (M = 21, R = 12), z (N = 38) = 1.76, p = .076, r2 = 0.04. For Competence, VR (M = 21.00, R = 15) was significantly faster than in AR (M = 18.00, Range = 16), with a large effect size, z (N = 38) = 3.42, p = < 0.001, r2 = 0.15.

To examine whether presence supports need satisfaction, presence scores were correlated with autonomy and competence, across both VR and AR conditions. Pearson product-moment correlations were performed between AR presence and AR autonomy, AR presence and AR competence, VR presence and VR autonomy, and between VR presence and VR competence, using an alpha level of 0.05 (see Table 2).

Table 2 Correlations for presence and needs satisfaction variables disaggregated by HMD

A moderate, positive relationship between AR presence and AR autonomy was significant, r(36) = 0.40, p < .05 (higher presence predicted higher autonomy). There was also a moderate positive relationship between AR presence and AR competence (higher presence also predicted higher competence), the result was also significant, r(36) = 0.47, p = < 0.05. A post hoc power analysis was conducted using the GPower 3.1.9.7 software (Faul et al. 2007) with sample size of 38, with alpha set to a = 0.05 (two-tailed) and correlation coefficients for AR presence with AR autonomy r = .40 and AR competence r = .47. Both correlations had adequate power > 0.90 for medium effect size. However, when inspecting the confidence intervals for both correlations the lower ends indicated the possibility of only weak correlations, thus suggesting caution should be taken in interpreting the results. A weak, positive relationship between VR presence and VR autonomy was not significant, r(36) = 0.03, p > .05 (higher presence did not reliably predict higher autonomy). There was also no relationship between VR presence and VR competence (presence did not predict competence), and the result was not significant, r(36) = 0, p = > 0.05. Using VR autonomy r = .03 and VR competence r = 0 (Faul et al. 2007) show both correlations held inadequate power of 0.18 and 0.09 respectively, indicating the null hypothesis cannot be accepted unconditionally.

5 Open-ended responses

71% of participants favoured VR over AR and additional open-ended feedback on HMD preference was provided by thirty-five participants. VR’s ease of use while completing tasks was the most common response. Specifically, a clearer view was mentioned by ten participants, with a wider field of view being raised by one participant. Additionally, user-friendly controls were mentioned by ten participants as important with learning to grab and click feeling easier to learn and being more precise in VR. Eleven participants reported that VR allowed for a more immersive experience, they were less distracted or more focused than in AR. However, there were positives raised for the AR device with five stating that the AR was easier to use, two that the hand gestures felt more natural and two felt less spatially vulnerable in AR. One participant offered a negative comment about the VR as feeling clunky and less accurate than the AR. Conversely, one participant raised that the view was cut off at the edges in AR in comparison with a wider view in VR.

Feedback regarding splitting attention between the real and virtual world in AR saw thirteen participants reporting that they felt distracted by the real world. The main reasons were a constant switching of focus between realities and that the real world distracted or demanded their focus. Five expressed that their distraction, engagement, or disorientation improved with time. Five participants reported feeling uncomfortable, overwhelmed, or unnatural while viewing two realities. Four stated that they struggled to complete tasks, determine features such as colour and text, or found the virtual world to be oversized. On the other hand, eleven participants enjoyed viewing both realities at once by reporting an easy transition between realities and that reality was enhanced. Three participants felt more comfortable using AR due to their spatial awareness.

Feedback on headset discomfort also saw thirteen participants raise concerns for the AR and ten for VR. Discomfort utilising the AR headset was most commonly regarding the eyes for eleven participants. Strained, overstimulated or sore eyes were reported along with difficulty using the headset with glasses. Three reports of headaches were made, two felt mental fatigue and one felt nausea. For the VR session, three participants described the headset to be heavy or applying pressure on the face, three reported blurry vision or lack of visual focus, two felt a headache and one experienced mental fatigue. Two participants experience dizziness or disorientation within the virtual world.

6 Conclusion and discussion

This study aimed to determine the role of presence in performance and user experience in virtual and augmented reality data visualisation tasks. H1 and H2 anticipated that VR would predict higher scores of presence, efficiency, and effectiveness than AR. H1 was supported as VR was found to benefit presence more than AR. This outcome aligns with studies (Baumgartner et al. 2008; Bouchard et al. 2012; Witmer and Singer 1998) and participant feedback which recognised that upholding a belief in realism within the virtual world strengthens levels of presence. AR allowed for breaks in subjective realism through distraction while VR safeguarded the illusion of non-mediation. Efficiency and effectiveness were also supported in VR over AR through significantly faster task completion and accuracy measures (H2). Contrary to previous data visualisation studies (Nguyen et al. 2020; Schubert et al. 2001), a trade-off between time and accuracy was not apparent as both measures were improved in the VR condition.

Interestingly, it was found that the largest difference in task category timing between AR and VR was in data navigation and interpretation. Slower AR navigation is sufficiently explained by the observed difficulties participants had with controlling gestures and retrieving lost data windows in the AR space. However, interpreting data output, which was conditionally identical across AR and VR, highlights an intriguing difference. Evidently, VR supported the participants to process the data and articulate an answer more quickly. As VR also resulted in higher presence scores, a potential link between the focus afforded by the sense of presence and the ability to process data at a faster rate is plausible. These results align with findings of a positive relationship between presence and virtual task performance as in (Witmer and Singer 1998) while offering support to Riva et al.’s claim that presence supports a user’s intended actions (Riva et al. 2014).

Additionally, it was expected that VR would support higher need satisfaction outcomes due to a relationship with presence scores. Competence was significantly higher in VR than in AR, however, significance was not found for autonomy. These results indicate that conducting the data visualisation tasks within VR better satisfied the need for competence for participants within visualization and interaction tasks. However, contrary to expectations no reliable relationships were found between presence scores and need satisfaction scores in the current sample. Thus, the results did not adequately support a link between presence and well-being. The tasks may have been too structured to allow participants a true sense of autonomy within the data visualisation program or the exposure to the technology was too short for competence to emerge. A study of medical professionals in the field monitored over a long period of time for presence and needs satisfaction may provide more realistic measures. Although our result showed a positive correlation between autonomy and competence with presence, we do not have enough evidence to identify whether autonomy and competence caused higher presence or vice versa. There is also a chance that another unknown variable could have originated the increase in both presence and autonomy or competence.

Taken together with open-ended responses, the results of the present study suggest that virtual reality may offer a better solution based on evoking higher presence, and increased efficiency and effectiveness of immersive visualisation tasks. VR attracted higher preference scores and lower instance and severity of discomfort reports. However, AR was noted as a more practical solution for work settings. Thus, a blend of the two conditions may provide an alternative solution to either HMD alone which is supported by McGill et al.’s study (McGill et al. 2015).

While the correlation analysis can provide good insight into the impact of presence on user experience and performance of data visualisation in VR and AR, presence alone may not be the sole contributor to the outcomes. Other factors, such as immersion of VR compared to AR could also influence the presence and user performance which should be investigated in future work. Finally, cybersickness was also reported in our study due to the strain, overstimulation and discomfort while wearing the devices. This limitation will be addressed and studied in our future design with the guidance of the well-known questionnaire for quantifying simulator sickness (Kennedy et al. 1993).