Multimodal Drumming Education Tool in Mixed Reality

Pinkl, James; Villegas, Julián; Cohen, Michael

doi:10.3390/mti8080070

Open AccessCommunication

Multimodal Drumming Education Tool in Mixed Reality

by

James Pinkl

^*

,

Julián Villegas

and

Michael Cohen

Department of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu 965-8580, Japan

^*

Author to whom correspondence should be addressed.

Multimodal Technol. Interact. 2024, 8(8), 70; https://doi.org/10.3390/mti8080070

Submission received: 27 May 2024 / Revised: 20 June 2024 / Accepted: 2 July 2024 / Published: 5 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

First-person VR- and MR-based Action Observation research has thus far yielded both positive and negative findings in studies observing such tools’ potential to teach motor skills. Teaching drumming, particularly polyrhythms, is a challenging motor skill to learn and has remained largely unexplored in the field of Action Observation. In this contribution, a multimodal tool designed to teach rudimental and polyrhythmic drumming was developed and tested in a 20-subject study. The tool presented subjects with a first-person MR perspective via a head-mounted display to provide users with visual exposure to both virtual content and their physical surroundings simultaneously. When compared against a control group practicing via video demonstrations, results showed increased rhythmic accuracy across four exercises. Specifically, a difference of 239 ms (z-ratio = 3.520, p < 0.001) was found between the timing errors of subjects who practiced with our multimodal mixed reality development compared to subjects who practiced with video, demonstrating the potential of such affordances. This research contributes to ongoing work in the fields of Action Observation and Mixed Reality, providing evidence that Action Observation techniques can be an effective practice method for drumming.

Keywords:

mixed reality (mr); multimodal interfaces; motor skill learning; Action Observation (ao); Virtual Co-embodiment (vc); merged reality; virtual reality; music pedagogy; rhythm training; sound and music computing; information interfaces and presentation; extended reality (xr)

1. Introduction

Extended Reality (xr), sometimes also referred to as media-generated reality [1], is a term that generalizes concepts such as Virtual Reality (vr), Augmented Reality (ar), and Mixed Reality (mr) [2]. Vr is the simulated experience through 3d virtual environments. A vr user is “isolated” to some extent from the surrounding ambient environment, and controls and interacts with computer-generated visual, auditory, and haptic components of a virtual environment, often in a first-person (endocentric) experience. Such experiences may entail the use of a head-mounted display (or hmd) [3]. In ar, a user is presented with simultaneous exposure to virtual elements and real-time perception of the physical environment, resulting in a “hybrid”, mediated experience [4]. Mr-based applications may combine vr and ar techniques while including the anchoring of virtual elements within the real world [5].

Current applications of xr include gaming, audio-visual media, design, social media, training, and education. In the case of education, xr-based learning environments can surpass traditional instructional learning by enhancing, stimulating, or motivating student understanding [6]. A study involving a mr learning environment tested in a high school chemistry class showed that students can achieve significant learning gains when such educational tools are co-designed with educators [7]. Another study, based on Bloom’s cognitive complexity level [8], evaluated the effectiveness of university students learning English as a second language via a 3d vr learning platform, and the results showed that the use of the tool assisted in the development of a higher level of thinking [9]. Furthermore, Villena-Taranilla et al. observed that vr, especially immersive vr, promotes greater learning in comparison to control conditions in studies involving students in the Kindergarten to sixth-grade range [10].

One method that can potentially yield larger learning gains in xr-based learning is Action Observation (ao). Ao refers to the process of learning or practicing through observation and imitation of another person or avatar’s actions as practice. This act of observing movement can induce the same neural activity within the premotor, motor, and parietal cortices as when such movement is performed by oneself [11]. Ao activates mirror neurons within the brain, which are inherent to neurocognitive functions relating to social functions [12]. A complete neurological discussion on ao is outside the scope of the article but interested readers are encouraged to look through the cited sources. In a study involving upper limb motor function measurements of children with cerebral palsy, a significant improvement was found between the functional score of a group before and after ao treatment relative to a control group [13]. Similarly, ao therapy with intensive repetitive practice of observed actions was found to provide a significant improvement of motor functions in chronic stroke patients [14].

In vr-based applications of ao, the actions of an exemplar, real or virtual, real-time or recorded, are often simultaneously displayed atop a learner’s avatar. Onebody is a system that allows students to receive remote real-time posture guidance, rendered in the first-person in place of their body [15]. A study indicated that Onebody, relative to other modalities of movement instruction (including third-person vr) resulted in significantly higher matching accuracy for upper limb posture. In another study, tai chi moves were taught to subjects via either 2d video or 3d immersive vr system and subjects of the latter were demonstrated to have comparatively learned more [16]. In another study involving subjects learning to use a prosthetic arm for the first time, it was found that subjects who were exposed to first-person vr-based ao were able to complete a bilateral manual task significantly faster than subjects who observed third-person vr-based ao or standard 2d video [17].

Conversely, the limitations of ao, including vr-based ao, have been noted. In the aforementioned prosthetic-limb training experiment, for example, results were less promising for the comparatively easier unimanual task of picking up and moving blocks using the prosthetic limb. Consequently, it was suggested that vr-based ao is more likely to exhibit efficacy in tasks involving relatively simpler coordination challenges. In a separate experiment involving a vr-based full body-tracking tai chi training system [18], two immersion techniques that could be defined as examples of ao were significantly more difficult for subjects to follow. The average positional error of one of those ao-based modalities was statistically significantly higher relative to a condition based on a traditional teaching environment.

Virtual Co-embodiment (vc) is a relatively new concept and research field that describes applications of multiple users sharing control of a single avatar. In a recent study [19], a vr-based dual task hand movement coaching application with first-person perspective employed a vc mode that tracked and averaged two users’ hand positions to control a single avatar’s arms. The performance of subjects who used this mode to practice had improved motor skill learning efficiency relative to both a control group that practiced alone and a group that practiced via an ao mode. The ao mode, despite its subjects exhibiting higher learning than the control group, was thought to have a relatively low sense of agency or the subjective feeling of initiating and controlling an action. Because of this, it is believed that ao-based practice, including even first-person vr-based ao practice, does not excel at helping users retain motor skills learned. Results also alluded to the ao mode being less effective than vc in the short term. Similar conclusions were drawn when subjects learned movements via a first-person vr-based ao tool called Just Follow Me—subjects could accurately mimic immediately after learning but the learning was not substantially retained.

As authors interested in exploring xr-based applications to teach drumming, both ao and vc were considered. While weighted average-based vc had observable success in improving subjects’ ability to perform a dual task [19], this was for a relatively slow action with primarily smooth patterns of motion. Conversely, the trajectory of a drumstick in use is inherently full of sharp turns and direction changes that can occur almost instantaneously. If a similar vc approach was employed to a drumming tool, double hits in the performance audio or glitchy motions in the virtual scene seem unavoidable. Although some research has been conducted in musical applications of ao [20], studies entailing ao and drumming, in particular, are scarce. In a study involving subjects listening to music with groove, it was found that such music engages listeners’ motor system, an effect also induced by ao.

Conversely, vr-based musical tools are not uncommon—and past studies have suggested such tools have the potential to both support musical rehearsal [21] and improve motor function and reported reduced feelings of anxiety in chronic stroke patients [22]. Despite the existence of vr-based applications of drumming [23], research on its effectiveness is underrepresented. In authors’ previous work, a vr-based system was designed to teach drumming exercises through first-person interaction with an exemplar’s demonstrations, to inconclusive results [24].

1.1. Research Question

As vr- and, more generally, xr-based ao research has shown both positive and negative effects on subject learning, our research question is as follows: Do the affordances of a first-person mr-based ao tool designed to help non-musicians practice drumming result in different levels of improvement relative to simply practicing with first-person video demonstrations? Because such ao tools improve the learning of novices relative to a control group in some cases [15,17] and impairs such learning in others [18], the goal of this work is to compare the improvement of novice drummers learning rhythms via xr-based ao against the improvement of drummers’ learning via video. Conclusions drawn from this comparison may strengthen understanding of ao-related concepts and influence the methods and tools for teaching music as a motor skill.

1.2. Background

In order to better understand our research, it is appropriate to review related terms.

1.2.1. Video See-Through

Just as hmds can be used for vr applications, they can also be used to enhance real environments with virtual elements in mr experiences. Video See-Through (or Passthrough, as the feature is referred to by Meta [25]) visually displays one’s ambient real environment via video feed. Although both VR and mr technology can be used effectively in fields such as education [6], in certain applications, users may be tasked with interacting with virtual components and real objects simultaneously. Mr, and specifically video see-through capabilities, may help provide such experiences seamlessly. Due in part potentially to such capabilities and the ubiquity of mr technology, the global mr market size was valued at about 811 billion USD in 2021 and is projected to grow to 19,489 billion USD by 2030 [26].

1.2.2. Haptic Displays

The main functions of a haptic device include actuation, the display of forces from the virtual environment via actuators contacting a user’s body, and sometimes sensing, the tracking of movement or force of a user to control a virtual avatar [27]. Current handheld haptic devices in vr applications support large-scale body movement, are easy to use and put down, and provide vibrotactile feedback [27]. Such a display can be controlled through the playback of audio signals or direct programming of signal frequencies, amplitudes, and durations. Common use cases of haptics include using vibration to reinforce a player’s actions and signify danger or urgency within the game.

1.3. Relevant Rhythmic Terms and Musical Practices and Technologies

1.3.1. Rudiments

Rudiments, two examples of which are shown in Figure 1a, are widely agreed-upon exercises deemed essential for drummers to practice [28]. These exercises may help musicians practice rhythm, dynamics, or sticking, which is the concept of “assigning” certain notes of an exercise to particular hands to increase fluidity. Rudiments help drummers hone important aspects of playing, such as control, coordination, and endurance. Because achieving complete mastery of a rudiment is an ongoing pursuit, drummers regularly practice rudiments even after achieving high proficiency.

1.3.2. Polyrhythms

Polyrhythms, in contrast, are not usually considered essential for beginner drummers’ practice routines, and are often regarded as a concept that requires substantial practice [29]. They are made up of simultaneously expressed musical lines based on different but mathematically related tempos [30]. A polyrhythm can be broken down into elements known as a basic pulse and counterrhythm(s) [31]. Although multiple inherent tempos can be realized through deep listening, western notation is able to notate most polyrhythms in a single stave based on a single tempo, as shown in Figure 1b. Despite this, players often have difficulty learning to play, or even properly “feel”, polyrhythms. Apart from studying with a teacher, two common techniques to learn polyrhythms include breaking down the divisions of the primary pulse and counterrhythm to the lowest common factor and uttering mnemonic phrases that can reflect the cadence of a polyrhythm [32]. Certain cultures, such as some in West Africa, pass down musical tradition and polyrhythms entirely through oral transmission [33].

2. Materials and Methods

2.1. Apparatus

Our system uses a Meta Quest 2 (Meta Platforms, Menlo Park, CA, USA) hmd connected via Quest Link to a workstation running the Unity 2023.1.0b6 game engine. The frontal cameras of the hmd are used for Meta Passthrough, displaying a real-time, gray-scale monochrome, stereoscopic view to the user. Within this view, users can see and strike Roland PD-7 electronic drum pads with a pair of standard-sized drumsticks. The Unity project augments a user’s view with virtual content, including 3d models of drums [34] and drumsticks, for observation and interaction. These accompany their physical counterparts placed within the user’s reach. Atop the Meta Quest 2 hmd, the user also wears HD380 Pro headphones (Sennheiser, Wedemark, Germany) to aurally monitor both the exemplar performance of the virtual drumming and the user’s own performance via a TD-25 Drum (Roland, Hamamatsu, Japan) Sound Module.

The audio of the virtual exemplar demonstration had a “floor tom” sound corresponding to the right virtual drum and a “snares-off” snare sound corresponding to the left virtual drum. This was chosen for both auditory and practical reasons. The floor tom and snare occupy significantly different frequency ranges, and the set-up used in the experiment was similar to common placements of a snare and floor tom pair. Subjects monitored their performance on drum pads as a wood block sound, chosen for its transient-like sonic qualities. For the two rudiment exercises, the two pads shared the same sound; for the polyrhythmic exercises, the sound of the left pad was relatively higher in pitch than the right because when practicing polyrhythms, it is often recommended to use two different sounding instruments or timbres to differentiate between hands [31].

The architecture of the full system, as seen in Figure 2, shows the user’s means of multimodal interaction with the system. The drum pads, played by the user’s hands receiving vibrations from the controllers, send MIDI signals interpreted by Unity in real-time, influencing the mr scene.

The virtual drums and sticks are instantiated as prefabs. They can be moved and re-instantiated about the scene to best fit the layout of the physical pads, improving the integration of the mr experience. The virtual drumstick pair uses keyframe-based programmed animations to modulate the location and strike the virtual drum to play a variety of rhythms that the user is expected to observe and practice along with.

Real-time MIDI capability in Unity is handled by the MIDI Player Tool Kit Pro asset [35]. Such data are received from the TD-25 and used for concurrent feedback purposes. As a way of encouraging rhythmic adjustment while a user practices, his/her timing is compared against the timing of an exemplar. This timing difference is used to give positive or negative feedback depending on whether or not it falls within a predetermined window of 50 ms. This timing threshold was chosen as the precedence effect occurs for delay times between 2 and 50 ms [36].

In addition to a pair of standard drumsticks, users also held a Meta Quest 2 controller in their corresponding hand while practicing, as shown in Figure 3. This grasp is based on matched grip and allows the user to relatively easily hold the controller between the fingers that are not part of the drumstick fulcrum. Haptic feedback, expressed via the controllers, is synced with the virtual drumsticks’ exemplar animations and reinforces the timing and sticking of the rhythmic exercises. For every right or left stroke, the corresponding controller vibrated for 200 ms at half of their maximum amplitude, starting at the exact time of the ideal stroke.

For portions of the experiment involving video demonstrations, a Dell E2010H

20^{″}

monitor was used. Stereo loudspeakers were used for initial demonstrations, whereas the headphones were used elsewhere.

2.2. Participants

There were 20 subjects (15 male, 5 female), students recruited from The University of Aizu who participated in this study (age:

M = 23, S D = 4

years). This experiment was conducted following the ethic guidelines of the University of Aizu. Before the start of the experiment, subjects self-reported their age, hearing issues, dominant hand, whether or not they were able to read Western musical notation, and any prior musical experience. As there was an aim to recruit primarily nonmusicians and novices, 55% of subjects had no prior experience with musical instrument practice. Examples of reports from subjects with musical experience included having played bass guitar for a lifetime total of ten hours, to practicing trumpet for two years at the age of 10. No subjects reported hearing issues and 95% were right-handed. Subjects were given information and instructions via a script read aloud before being given an informed consent form to sign. Ten subjects were randomly placed in a control group and 10 subjects were placed in the experimental group.

2.3. Procedure

Each experiment was tasked to one of the authors who met with each of the subjects individually and facilitated the process. The procedure consisted of five sections in total, including a tutorial section, two sections to teach two drumming rudiments, and two sections to teach two polyrhythms. The goal of each section was to teach and potentially improve a subject’s skill at the execution of a rhythmic exercise. The tutorial section was meant only to familiarize subjects with the flow of the experiment and their assigned modality for practice, and taught a simple exercise consisting of eighth notes played with alternating hands. After the tutorial, the subjects were asked if they wanted to make any adjustments to their set-up, including headphone volume and seat or drum pad height. After this, no further adjustments were made to the apparatus during the experiment.

The following four sections sequentially consisted of doubles and paradiddles, two rudiments shown in Figure 1a, and the 3:2 and 3:4 polyrhythms shown in Figure 1b. This succession was decided based on ascending rhythmic complexity. All five sections’ exercises were of tempo eighth note = 120 beats per minute (bpm). Subject performances during these sections, or experimental blocks, were recorded as quantitative data. The experiment took about 35–45 min for subjects in the control group and 45–55 min for subjects in the experimental group. The difference in experiment length was due to preparations of the mr apparatus that pertained only to subjects of the experimental group.

Each of the five aforementioned sections consisted of the same four-phase process to help each subject learn the rhythmic exercise corresponding to that section, as shown in Figure 4. In the first phase, subjects were shown a video demonstration of the rhythm via a computer monitor and stereo loudspeakers. The video was shot in third-person perspective via a camera placed behind and above the drummer’s performance, as shown in Figure 5. The video started with the in-tempo clicking of a metronome for one measure before the demonstration of the exercise began. The exercise was played for four measures, via a 22-second-long video. The audio of each of the videos in this first phase was quantized and panned, to rectify timing imprecision and to achieve stereo separation, respectively.

After observing the video demonstration of the first phase, subjects were then asked to complete a baseline recording of the observed rhythm for phase 2. These recordings for each exercise are also referred to as trial 1 recordings. Subjects were given headphones and were played back a click track to drum along with. They were asked to recall the demonstration to the best of their capacities and try to mimic the rhythm just observed.

After the baseline recording, subjects were asked to complete a short practice session for phase 3. This portion was dependent on the subjects’ assigned mode of practice; Video or mr. Subjects practicing via standard video in the control group were asked to play along while watching a video shot in first-person, shown in Figure 6a. Subjects in the mr group were helped with the set-up and fitting of the hmd and headphones. Subjects then placed their hands on the drum pads as the virtual drum and drumstick objects were instantiated into the scene based on hand-tracking. After this set-up, subjects grabbed Meta Quest 2 controllers with some combination of their ring, index, and pinky fingers, and the mr scene (shown in Figure 6b) was launched. In addition to a monochrome representation of their surroundings, a sticking diagram, the same of which is used for the video-based practice content, was displayed on a virtual screen in front of the subject. Within the scene, subjects were asked to focus on and try to mimic the movements of the virtual drumsticks playing virtual drums atop the physical pads. Vibrational cues expressed via the controllers also reinforced the pattern of each exercise’s rhythm and sticking. Subjects of both groups used headphones in this phase. Each practice session, regardless of rhythmic exercise or subject group, consisted of four repetitions of the exercise, a duration of about two and a half minutes.

The fourth and final phase of each section was a final recording of the the rhythmic exercise. Once again, subjects wore headphones and listened to a metronome while recording a four-measure performance of the previously practiced exercise. The data of these recordings are referred to as trial 2.

There was no verbal instruction to subjects within the practice media prepared for the Video or mr groups. While control group subjects were getting situated with headphones and mr group subjects were getting situated with the mr apparatus; however, there was some communication between the subject and the author facilitating the experiment. After finishing the four phases, participants volunteered their thoughts about the whole experience. These data were not analyzed.

3. Results

Two metrics were used to analyze the data: the difference in maximum of consecutive correct strokes between the post-practice recording (trial 2) and the pre-practice recording (trial 1), and the absolute timing error of each recordings’ strikes. For the second metric, we compared the timing and sticking of each stroke against an ideal performance. These data were analyzed through a series of mixed linear models in R [37], eased by the library lme4 [38]. The goodness of fitness of the final models was confirmed with diagnostics available in the DHARMa library [39]. Starting with a simple model that included the random effect of each participant, we added potentially relevant factors (group and block) and compared the nested models via ANOVA. When necessary, not nested models were compared using the Bayesian Information Criterion (BIC).

A correct hit was defined as one that used the intended hand and occurred within

\pm 50

ms of an ideal stroke. We subtract the number of correct hits of the trial 1 recording from that of trial 2 recording for both groups to gauge the performance improvement.

We found no significant effect of group [

χ^{2} (1) = 2.224, p = 0.136

] or block [

χ^{2} (3) = 2.241,

p = 0.524

] on the number of correct consecutive strokes, as shown in Figure 7.

For the absolute timing error, we only analyzed the data when the hand used by the participant corresponded to the intended hand. This was the case

67.2

% of the time. We found significant effects on the absolute timing error of the interactions between Block and Trial [

χ^{2} (3) = 63.705, p < 0.001

], Block and Group [

χ^{2} (3) = 14.665, p = 0.002

], and Trial and Group [

χ^{2} (1) = 112.391, p < 0.001

]. Posthoc analyses based on Tukey’s honest significant difference between estimated least-square means were computed for the significant interactions with the library means.

As illustrated in Figure 8, all the blocks yielded significant differences within the same trial, except in the case Doubles–Paradiddles with an estimated difference of

- 38.5

ms [z-ratio

= - 0.604, p = 0.931

]. Differences across trials for a given block are summarized in Table 1. According to this table, both treatments video and mr improved the timing with which participants could perform the exercises for all the blocks except “Doubles”. In this case, we observed a flooring effect suggesting that participants found it too easy to perform it with correct timing.

The effect of the interaction between block and group on the absolute timing error is illustrated in Figure 9. Within the same group, the absolute timing errors between blocks were significantly different, except between Doubles and Paradiddles in the MR group (difference =

51.1

ms, z-ratio

= - 0.775, p = 0.866

). No significant differences between the two groups in any of the blocks were found, as summarized in Table 2.

Perhaps the interaction between trial and group is the most interesting for the purposes of our study. These results are shown in Figure 10 and summarized in Table 3. According to these, there are no significant differences between groups for a given trial; however, the absolute timing errors were significantly lower in the second trial relative to the first one. Crucially, the absolute timing error difference between the first and the second trial for the MR group was larger than that of the Video group 239 ms (z-ratio

= 3.520, p < 0.001

). The latter finding indicates that participants benefited more from the mr treatment than the Video treatment.

4. Discussion

The displayed results, particularly the interaction between trial and group shown in Figure 10, suggest there is a timing error-related benefit to novices employing at least one of the affordances of the multimodal mr tool for practice involving rudiments and polyrhythms. This difference of 239 ms between improvement in absolute timing error between groups is statistically significant. Moreover, it signifies the subjects practicing via mr had twice as great a reduction in timing error observed relative to subjects practicing via video. In activities that are time-sensitive such as rhythm, this reflects a significant difference. Due to the experimental findings, we suspect the overlaying of the exemplar avatar within the mr scene helped users perform with more accuracy. We also believe the multimodal expression via vibration in the controllers helped subjects internalize the rhythm during practice.

When considering why mr-based ao had no observable effect on the improvement of subject performance with respect to the maximum number of consecutive correct strikes, it seemed contradictory at first. Our current interpretation relates to the scale of novice drummers’ timing inaccuracies. While mr-based practice may help subjects improve their timing error in ways video practice did not, it was not enough to have a noticeable effect on the maximum number of consecutive correct strikes. It seems possible such subjects’ timing improved, but not enough to where a correct hit was detected using a window of 50 ms. Using a wider window (for example, 75 ms) for the correct strike detection in a reanalysis of the data could yield observable differences between the two methods; however, 50 ms is considered the point at which two auditory events start to be perceived independently and not fused as per the precedence effect.

The aim to recruit novice musicians and to avoid the ceiling effects of experienced drummers playing exercises they might already be familiar with was mostly fulfilled. This also helped equalize the initial capabilities of subjects. As shown in Figure 10, the absolute timing errors of both groups before practice sessions were at comparable levels. However, it thus potentially results in this study not being representative of more experienced drummers and musicians.

As expected, subjects were not able to perform the two polyrhythmic exercises as proficiently as rudiments, in general. Two subjects (both of the control group) were surprisingly able to perform all rudiments and polyrhythms correctly both in phase 2 and phase 4’s recording. Another relatively rare case was a subject performing all exercises besides 3:4 polyrhythm correctly for both the pre- and post-recordings, which was observed twice in the mr group and once in the control group.

4.1. Limitations of the Study

The experiment was limited in that only 20 subjects were able to be recruited. Whether our findings also apply to a general population needs to be validated with further subjective studies.

Another limitation was the diversity of language ability of subjects. Native Japanese, English, and Chinese speakers were recruited for the experiment but instructions were facilitated in English and Japanese only. Despite the care taken on the authors’ side to prepare scripts in the two languages, there was potential for some shortcomings in understanding due to the subject background.

The experiment was also limited to four rhythmic exercises. While the two rudiments, doubles and paradiddles, are almost universally used across many genres and percussion instruments, polyrhythms are less common and often considered more advanced.

In the same vein, whether the benefits observed in our experiment persist over time and not just in a short period needs to be investigated in longitudinal studies.

4.2. Future Lines of Research

4.2.1. MR Virtual Co-embodiment Extension

It is of interest to extend the current mr experience to incorporate vc techniques. This may include a weighted average-based vc application for drumming with brushes, a device used instead of drumsticks that allows drummers to express rhythm via smooth lateral movement as opposed to striking.

4.2.2. Exploration of Multimodal Interfaces

The haptic interface used in this study could be considered a limitation as it may have weighed down the mr-group subjects’ hands during practice sessions. A lighter, more unobtrusive, solution would be easier to implement and it is of interest to conduct a study comparing subjects using different interfaces for haptic feedback. In addition, experimenting with the qualities of the programmed vibration, including frequency, duration, and intensity, may yield interesting findings.

4.2.3. Extensions for More Drums and Other Instruments

We are also interested in extending development to explore training for a full drum set. This would include triggers on foot-operated pedals in addition to the hands. Beyond that, piano and keyboard instruments (which involve coordination of fingers as opposed to wrists and feet) as well as brush technique on snare drum may also be effective extensions of this application. Significant pedagogical benefits may be found with training for brushes in particular, as such technique is focused on continuous positioning of the hands, as opposed to just the moments a drum is to be struck.

4.2.4. Integration of Hmd with Higher Specifications

Extending the experience with a Meta Quest 3 or Apple Vision Pro hmd is an obvious next step. Due to their higher display resolution, wider field of view, and full-color video see-through capabilities, utilizing such devices would increase immersion for users of this system [40,41].

5. Conclusions

This work aimed to determine if multimodal and mr-based affordances can help beginner drummers and result in measurable differences in rhythmic improvement relative to the standard practice method of video. Our experimental findings showed a significant difference of 239 ms (z-ratio

= 3.520, p < 0.001

) between the improvement of timing errors of subjects who practiced with mr relative to subjects who practiced with video. This potentially demonstrates worthwhile advantages of mr- and/or haptic-based affordances in the case of novice drummers wanting to improve their timing accuracy. Moreover, these results reflect a study that included polyrhythms, a historically difficult rhythmic concept to teach. Finally, this work also brings new evidence in favor of ao as a beneficial treatment for learning motor skills, contributing positive findings to a field with mixed results when put into practice.

Author Contributions

Conceptualization, J.P., J.V., M.C.; methodology, J.P., J.V., M.C.; software, J.P.; validation, J.P., J.V.; formal analysis, J.P., J.V.; data curation, J.P.; writing—original draft preparation, J.P., J.V.; writing—review and editing, J.P., J.V., M.C.; supervision, J.V., M.C.; funding acquisition, J.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not require research ethics examinations under the University of Aizu research ethics policy. It is confirmed by the Public University Corporation, the Secretariat Office relating to Research Ethics of the University of Aizu.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The Unity project and data acquired and used in this manuscript can be found online: https://github.com/jPinkl/Multimodal-MR-Drumming-Education-Tool, accessed on 20 June 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Laato, S.; Xi, N.; Spors, V.; Thibault, M.; Hamari, J. Making Sense of Reality: A Mapping of Terminology Related to Virtual Reality, Augmented Reality, Mixed Reality, XR and the Metaverse. In Proceedings of the Annual Hawaii International Conference on System Sciences, Waikiki, HI, USA, 3–6 January 2024; pp. 6625–6634. [Google Scholar]
Vasarainen, M.; Paacola, S.; Vetoshkina, L. A systematic literature review on extended reality: Virtual, augmented and mixed reality in working life. Int. J. Virtual Real. 2021, 21, 1–28. [Google Scholar] [CrossRef]
Rand, D.; Kizony, R.; Feintuch, U.; Katz, N.; Josman, N.; Rizzo, A.S.; Weiss, P.L. Comparison of two VR platforms for rehabilitation: Video capture versus HMD. Presence Teleoperators Virtual Environ. 2005, 14, 147–160. [Google Scholar] [CrossRef]
Rauschnabel, P.A.; Felix, R.; Hinsch, C.; Shahab, H.; Alt, F. What is XR? Towards a framework for augmented and virtual reality. Comput. Hum. Behav. 2022, 133, 107289. [Google Scholar] [CrossRef]
Speicher, M.; Hall, B.; Nebeling, M. What is mixed reality? In Proceedings of the CHI Conf. on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019. [Google Scholar] [CrossRef]
Pan, Z.; Cheok, A.D.; Yang, H.; Zhu, J.; Shi, J. Virtual reality and mixed reality for virtual learning environments. Comput. Graph. 2006, 30, 20–28. [Google Scholar] [CrossRef]
Tolentino, L.; Birchfield, D.; Megowan-Romanowicz, C.; Johnson-Glenberg, M.C.; Kelliher, A.; Martinez, C. Teaching and learning in the mixed-reality science classroom. J. Sci. Educ. Technol. 2009, 18, 501–517. [Google Scholar] [CrossRef]
Bloom, B.S. Taxonomy of Educational Objectives. In The Classification of Educational Goals, Handbook I Cognitive Domain; Longman Group: London, UK, 1956. [Google Scholar]
Chen, Y.L. The effects of virtual reality learning environment on student cognitive and linguistic development. Asia-Pac. Educ. Res. 2016, 25, 637–646. [Google Scholar] [CrossRef]
Villena-Taranilla, R.; Tirado-Olivares, S.; Cózar-Gutiérrez, R.; González-Calero, J.A. Effects of virtual reality on learning outcomes in K-6 education: A meta-analysis. Educ. Res. Rev. 2022, 35, 100434. [Google Scholar] [CrossRef]
Rizzolatti, G.; Craighero, L. The mirror-neuron system. Annu. Rev. Neurosci. 2004, 27, 169–192. [Google Scholar] [CrossRef] [PubMed]
Oberman, L.M.; Pineda, J.A.; Ramachandran, V.S. The human mirror neuron system: A link between action observation and social skills. Soc. Cogn. Affect. Neurosci. 2007, 2, 62–66. [Google Scholar] [CrossRef] [PubMed]
Buccino, G.; Arisi, D.; Gough, P.; Aprile, D.; Ferri, C.; Serotti, L.; Tiberti, A.; Fazzi, E. Improving upper limb motor functions through action observation treatment: A pilot study in children with cerebral palsy. Dev. Med. Child Neurol. 2012, 54, 822–828. [Google Scholar] [CrossRef] [PubMed]
Ertelt, D.; Small, S.; Solodkin, A.; Dettmers, C.; McNamara, A.; Binkofski, F.; Buccino, G. Action observation has a positive impact on rehabilitation of motor deficits after stroke. Neuroimage 2007, 36, T164–T173. [Google Scholar] [CrossRef]
Hoang, T.N.; Reinoso, M.; Vetere, F.; Tanin, E. Onebody: Remote posture guidance system using first person view in virtual environment. In Proceedings of the Nordic Conf. on Human-Computer Interaction, Gothenburg, Sweden, 23–27 October 2016; pp. 1–10. [Google Scholar]
Patel, K.; Bailenson, J.N.; Hack-Jung, S.; Diankov, R.; Bajcsy, R. The effects of fully immersive virtual reality on the learning of physical tasks. In Proceedings of the 9th Annual International Workshop on Presence, Cleveland, OH, USA, 24–26 August 2006; pp. 87–94. [Google Scholar]
Yoshimura, M.; Kurumadani, H.; Hirata, J.; Osaka, H.; Senoo, K.; Date, S.; Ueda, A.; Ishii, Y.; Kinoshita, S.; Hanayama, K.; et al. Virtual reality-based action observation facilitates the acquisition of body-powered prosthetic control skills. J. Neuroeng. Rehabil. 2020, 17, 113. [Google Scholar] [CrossRef] [PubMed]
Chua, P.T.; Crivella, R.; Daly, B.; Hu, N.; Schaaf, R.; Ventura, D.; Camill, T.; Hodgins, J.; Pausch, R. Training for physical tasks in virtual environments: Tai Chi. In Proceedings of the IEEE Virtual Reality, Los Angeles, CA, USA, 22–26 March 2003; pp. 87–94. [Google Scholar]
Kodama, D.; Mizuho, T.; Hatada, Y.; Narumi, T.; Hirose, M. Effects of Collaborative Training Using Virtual Co-embodiment on Motor Skill Learning. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2304–2314. [Google Scholar] [CrossRef]
Haslinger, B.; Erhard, P.; Altenmüller, E.; Schroeder, U.; Boecker, H.; Ceballos-Baumann, A.O. Transmodal sensorimotor networks during action observation in professional pianists. J. Cogn. Neurosci. 2005, 17, 282–293. [Google Scholar] [CrossRef] [PubMed]
Ppali, S.; Lalioti, V.; Branch, B.; Ang, C.S.; Thomas, A.J.; Wohl, B.S.; Covaci, A. Keep the VRhythm going: A musician-centred study investigating how Virtual Reality can support creative musical practice. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022; pp. 1–19. [Google Scholar]
Trobia, J.; Gaggioli, A.; Antonietti, A. Combined use of music and virtual reality to support mental practice in stroke rehabilitation. J. Cyberther. Rehabil. 2011, 4, 57–61. [Google Scholar]
Emre Tanirgan—Paradiddle. Available online: www.meta.com/experiences/5719805344724551 (accessed on 27 May 2024).
Pinkl, J.; Cohen, M. VR Drumming Pedagogy: Action Observation, Virtual Co-Embodiment, and Development of Drumming “Halvatar”. Eletronics 2023, 12, 3708. [Google Scholar] [CrossRef]
Meta. Use Passthrough on Meta Quest. Available online: www.meta.com/help/quest/articles/in-vr-experiences/oculus-features/passthrough (accessed on 26 May 2024).
Mixed Reality Market Insights. Available online: https://www.skyquestt.com/report/mixed-reality-market (accessed on 19 June 2024).
Dangxiao, W.; Yuan, G.; Shiyi, L.; Zhang, Y.; Weiliang, X.; Jing, X. Haptic display for virtual reality: Progress and challenges. Virtual Real. Intell. Hardw. 2019, 1, 136–162. [Google Scholar] [CrossRef]
Carson, R.; Wanamaker, J.A. International Drum Rudiments; Alfred Music Publishing: Los Angeles, CA, USA, 1984. [Google Scholar]
Huang, A. Polyrhythms vs Polymeters. Available online: https://youtube.com/watch?v=htbRx2jgF-E (accessed on 26 May 2024).
Arom, S. African Polyphony and Polyrhythm: Musical Structure and Methodology; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Magadini, P. Polyrhythms: The Musician’s Guide; Hal Leonard Corporation: Milwaukee, WI, USA, 2001. [Google Scholar]
Rissman, N. Cycling Through Polyrhythms. J. Music. Theory Pedagog. 2005, 19, 3. [Google Scholar]
Frishkopf, M. West African Polyrhythm: Culture, theory, and graphical representation. In Proceedings of the ETLTC: 3rd ACM Chapter Conference on Educational Technology, Language and Technical Communication, Aizuwakamatsu, Japan, 21–30 January 2021. [Google Scholar] [CrossRef]
Bass Drum V1 3D Model. Available online: https://free3d.com/3d-model/bass-drum-v1--469482.html (accessed on 26 May 2024).
Maestro—Midi Player Tool Kit (MPTK). Available online: https://paxstellar.fr (accessed on 26 May 2024).
Haas, H. The Influence of a Single Echo on the Audibility of Speech. J. Audio Eng. Soc. 1972, 20, 146–159. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing, Version 4.4.0; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: http://www.R-project.org (accessed on 28 May 2024).
Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
Hartig, F. DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models, R Package Version 0.4.5; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
Meta Quest 3. Available online: https://www.meta.com/quest-3 (accessed on 25 May 2024).
Apple Vision Pro. Available online: https://www.apple.com/apple-vision-pro (accessed on 25 May 2024).

Figure 1. Four rhythmic exercises used in this experiment. (a) Western notation representation for eighth note doubles and paradiddles (left and right panel, respectively), two examples of rudiments. (b) Western notation representation for 3:2 and 3:4 polyrhythms, left and right panel, respectively.

Figure 2. System schematic.

Figure 3. Users in the experimental group practiced while simultaneously grasping both the drumstick and Meta Quest controller.

Figure 4. Procedure for subjective testing.

Figure 5. The first phase of all sections of the pilot study was a third-person (overhead view) video demonstration of the corresponding exercise.

Figure 6. The two practice modalities: Video and MR. (a) For subjects in the Video group, first-person video demonstration of the corresponding exercise are used for each rhythm’s practice session. (b) For subjects of the MR group, a demonstration using virtual objects and programmed animations is used for each rhythm’s practice session.

Figure 7. The difference in the maximum consecutive correct hits between trial 2 and trial 1 of each group.

Figure 8. Effect on the absolute timing error of Block and Trial.

Figure 9. Effect on the absolute timing error of Block and Group.

Figure 10. Effect on the absolute timing error of Trial and Group.

Table 1. Absolute timing error differences per block between the pre- and post-recordings.

	Estimate	SE	z-Ratio	p
Block = 3:2
Trial1–Trial2	0.627	0.074	8.427	<0.0001
Block = 3:4
Trial1–Trial2	0.586	0.067	8.767	<0.0001
Block = Doubles
Trial1–Trial2	−0.002	0.063	−0.034	0.9731
Block = Paradiddles
Trial1–Trial2	0.172	0.068	2.518	0.0118

Degrees-of-freedom method: asymptotic.

Table 2. Timing error differences between subject groups’ performances of each exercise.

	Estimate	SE	z-Ratio	p-Value
Block = 3:2
MR–Video	−0.350	0.271	−1.291	0.1967
Block = 3:4
MR–Video	−0.126	0.269	−0.466	0.6410
Block = Doubles
MR–Video	0.024	0.268	0.090	0.9282
Block = Paradiddles
MR–Video	−0.124	0.269	−0.462	0.6443

Degrees-of-freedom method: asymptotic.

Table 3. Timing error differences between the trials of each subject group.

	Estimate	SE	z-Ratio	p-Value
Group = MR
Trial1–Trial2	0.465	0.049	9.483	<0.0001
Group = Video
Trial1–Trial2	0.226	0.047	4.781	<0.0001

Degrees-of-freedom method: asymptotic.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pinkl, J.; Villegas, J.; Cohen, M. Multimodal Drumming Education Tool in Mixed Reality. Multimodal Technol. Interact. 2024, 8, 70. https://doi.org/10.3390/mti8080070

AMA Style

Pinkl J, Villegas J, Cohen M. Multimodal Drumming Education Tool in Mixed Reality. Multimodal Technologies and Interaction. 2024; 8(8):70. https://doi.org/10.3390/mti8080070

Chicago/Turabian Style

Pinkl, James, Julián Villegas, and Michael Cohen. 2024. "Multimodal Drumming Education Tool in Mixed Reality" Multimodal Technologies and Interaction 8, no. 8: 70. https://doi.org/10.3390/mti8080070

APA Style

Pinkl, J., Villegas, J., & Cohen, M. (2024). Multimodal Drumming Education Tool in Mixed Reality. Multimodal Technologies and Interaction, 8(8), 70. https://doi.org/10.3390/mti8080070

Article Menu

Multimodal Drumming Education Tool in Mixed Reality

Abstract

1. Introduction

1.1. Research Question

1.2. Background

1.2.1. Video See-Through

1.2.2. Haptic Displays

1.3. Relevant Rhythmic Terms and Musical Practices and Technologies

1.3.1. Rudiments

1.3.2. Polyrhythms

2. Materials and Methods

2.1. Apparatus

2.2. Participants

2.3. Procedure

3. Results

4. Discussion

4.1. Limitations of the Study

4.2. Future Lines of Research

4.2.1. MR Virtual Co-embodiment Extension

4.2.2. Exploration of Multimodal Interfaces

4.2.3. Extensions for More Drums and Other Instruments

4.2.4. Integration of Hmd with Higher Specifications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI