Keywords

1 Introduction

Early work on Virtual Reality (VR) speculated that the high-fidelity immersive environments generated by VR systems can foster a sense of “presence,” such that users feel as though they are physically present in the virtual environment (VE) [1, 2]. At the time, it was generally assumed that a sense of presence would be an unconditional boon to learning [2,3,4,5], though later studies found evidence that presence is not particularly predictive of conceptual knowledge acquisition [6], procedural knowledge acquisition [7], or spatial knowledge acquisition [8]. Others have indicated that presence contributed to improvements in learning, cognitive task performance, or therapeutic outcomes [9]. Importantly, previous work has highlighted that presence and usability factors may be interrelated, such that one’s sense of control over a VE may foster a sense of presence [2, 10, 11]. It is also possible that one’s sense of control (i.e., usability or level of interactivity) over a VE may first require presence to be fostered [12, 13]. Before the relationship between presence and performance can be accurately understood, it is critical to disentangle usability factors from presence.

Thus, the goal of this experiment was to determine the extent to which presence influences learning in VR environments. To address this, we compared the effectiveness of training a maintenance procedure in both 2D desktop and 3D VR environments to observe whether greater presence was perceived in more immersive systems and to observe the effect of presence on recall and training. We also examined the impact of different interaction methods in VR (gesture- vs. voice-based interaction), as well as perceptions of usability to investigate what factors may influence perceptions of presence. Based on the literature that predicts more immersive environments (e.g., 3D vs. 2D environment) will lead to increased perceived presence and positive effects on learning, we predicted that people experiencing the VR conditions would recall more on a procedural task than participants trained in the desktop environment. Additionally, we predicted that more natural system interaction, such as gesture-based versus voice-based interaction, would also lead to increased presence and subsequent learning.

2 Methodology

2.1 Equipment

Testbed.

The Unity 3D game engine was used to develop both desktop and VR trainers for this study, which trained participants how to complete a procedural maintenance task on a virtual shore-based E-28 arresting gear. An arresting gear is a machine that stops an aircraft at the end of a short runway or aircraft carrier. In the present study, participants performed a procedure to remove and replace a virtual alternator on the arresting gear, which also served as the training task for this experiment (Fig. 1). Participants learned the maintenance task using either a desktop-based trainer with a mouse, keyboard, and monitor, or a VR-based trainer using a Microsoft Kinect V2 motion tracker and Oculus Rift DK2 Head-Mounted Display (HMD). Conditions were identical except for the method of user interaction.

Fig. 1.
figure 1

E-28 arresting gear alternator from the maintenance training task

Interaction.

There were three conditions that differed based on method of computer interaction: a desktop-based training group (desktop), a VR-based training with gesture group (gesture), or a VR-based training without gesture group (voice). In the desktop group, participants used a mouse and keyboard to interact with the environment, which was displayed on a computer monitor. In both VR groups, participants viewed the VE through the Oculus HMD. Participants were able to move around in the VE by walking and they could interact with virtual objects using their right hand. We utilized Kinect’s body, head, and gesture motion-tracking capabilities to enable participants to interact with the VE. In the gesture group, participants were trained during a tutorial and practice session to use five gesture-based actions to interact with the system (Table 1).

Table 1. List of gestures, actions, and descriptions for interacting with the system

In the voice group, participants were trained by completing a tutorial and practice session to use the same five actions as voice commands to interact with the system. Although the testbed could recognize gestures, it was not technologically feasible to develop an accurate voice recognition system. Thus, the researcher manually triggered participants’ voiced actions on a computer system linked to the HMD. In every condition, the researcher monitored the experiment on a linked computer to ensure smooth system operation. For the VR conditions, we calibrated the VE for every person’s height, such that the same visual scene was displayed for all participants.

2.2 Participants

Seventy-five students from a large southeastern United States university participated in this study (55% female, M age = 21.4 years, SD age = 3.6 years). Students received $15 an hour for up to three hours of participation. Participants were assigned randomly to one of three conditions: VR training with gesture (n = 25), VR training without gesture (n = 25), or desktop training (n = 25).

2.3 Materials

A subset of the materials administered during the experiment is included in this section and the following analyses. Additional measures were collected that are outside the scope of the current research question, such as demographic questions.

Mental Effort Rating Scale.

After each training scenario, participants were asked to indicate their level of mental effort on the task they just performed on a scale of 1 (“Low”) to 9 (“High”) [14].

Presence Questionnaire (PQ).

The PQ (α = .76) [15] is a 19-item measure that assesses participants’ experience within the VE on four subscales: involvement (α = .85), sensory fidelity (α = .73), adaptation/immersion (α = .60), and interface quality (α = .50). Participants responded on a scale of 1 (“Not at All”) to 7 (“Completely”) to each item.

System Usability Scale (SUS).

The SUS (α = . 69) [16] has 10 items with statements relating to usability rated on a scale of 1 (“Strongly disagree”) to 5 (“Strongly agree”).

Recall Measure.

Participants were asked to list the steps for the maintenance procedure, including tools and parts where appropriate, within five minutes.

2.4 Procedure

Participants were recruited through an online research participation system. After participants consented, they completed a demographic questionnaire. Participants then read a PowerPoint tutorial that familiarized them with the VE and the interaction method for their condition. In the tutorial, all participants received general information about the task environment, interface, and the E-28 arresting gear. The tutorial also provided different instructions for the interaction method depending on which condition was randomly assigned (gesture, voice, or desktop). Once participants completed the tutorial, their understanding of the tutorial content was assessed and any questions about the task and interaction instructions were clarified.

Next, participants in the desktop condition were seated at the computer and instructed to complete the practice phase in the experimental testbed. Participants in the gesture and voice conditions practiced the five gesture or voice commands, respectively. Participants in the VR conditions were then directed to a mark on the floor and were told how to adjust the HMD, and the researcher calibrated the VE. In the practice phase (i.e., replacing an engine cage), participants were told they would receive narrated instructions with relevant arresting gear parts highlighted in green to guide them. For example, to remove the exhaust pipe, participants would first hear a verbal narration to equip the pipe wrench tool, and then the exhaust pipe was highlighted in green to direct the participant to the part requiring interaction. Depending on the assigned group, participants interacted by clicking a mouse, speaking voice commands, or enacting gestures to select the appropriate tool and then perform the “remove” action. Throughout the practice phase, participants were permitted to ask questions about interacting with the task, but questions were not permitted during subsequent scenarios.

Participants completed three training phases involving the task of replacing the alternator. The training scenarios provided scaffolding (e.g., narration and highlighting part location) that was reduced in each subsequent scenario. The first training scenario provided participants with narrated instructions and green highlights to guide them. In the second training scenario, only narrated instructions were provided. In the final recall scenario, neither narrated instructions nor green highlights were provided, such that participants were required to perform the maintenance task without guidance. At the end of each scenario, participants were asked to rate their level of mental effort on the scenario. Following the training scenarios, participants were asked to complete several measures, a subset of which included the PQ, SUS, and a five minute free recall measure of the procedural steps in replacing the alternator. Participants were debriefed upon conclusion of the study.

3 Results

Prior to examining the relationship among presence, usability, mental effort, and procedural recall performance, we examined preliminary group differences and correlations among our variables. We performed several one-way ANOVAs to examine the differences among groups (gesture, voice, desktop) for presence, usability, mental effort, and recall performance, but there were no significant group differences for any of these variables (all ps > .50). However, bivariate correlation analyses revealed that usability was positively related to performance (r = .29, p = .01), and mental effort was negatively related to performance (r = −.40, p < .001). Presence was not related to performance (r = −.01, p = .96), but it was correlated with usability (r = .60, p < .001).

Because several of our variables of interest were significantly related to performance, and due to the potential confounding nature of presence and usability (as indicated by their high correlation with one another), we conducted a more comprehensive post-hoc analysis to examine the simultaneous effects of these variables on procedural recall performance for each condition. We tested a moderated mediation model where presence score was the predictor variable, usability was the mediating variable, condition (gesture, voice, or desktop) was the moderating variable, and recall performance was the outcome variable. Usability mediated the relationship between presence and performance, and condition moderated the relationship between usability and performance. Additionally, participants’ subjective ratings of mental effort were included as a control variable (see Fig. 2 for a conceptual diagram of this model). It should be noted that we tested this same model with usability as the predictor and presence as the mediator, but it was not significant.

Fig. 2.
figure 2

Conceptual diagram for moderated mediation model predicting procedural recall performance. Standardized regression coefficient (β) values are provided on each line; estimated indirect effect is provided in parentheses. aβ value for condition by usability interaction. *p < .05

The overall model was significant, (F[5, 69] = 6.89, p < .001, R 2 = .33; see Table 2), accounting for 33% of the variance in procedural recall performance, with usability and the interaction between usability and condition as significant positive predictors, and mental effort as a significant negative predictor (such that lower subjective mental effort corresponded to greater recall performance). The direct effect of presence on performance was not statistically significant (p = .09), but the indirect effect of presence on performance through usability was statistically significant for only the desktop condition (β = 0.41, 95% CI = 0.21, 0.68; index of moderated mediation = 0.16; 95% CI = 0.06, 0.27). Specifically, any benefit to recall that could have been conferred by presence was better explained through usability, but only for the desktop condition. Although presence and usability were highly correlated, participants in the desktop condition who reported poor usability recalled the fewest procedural steps, regardless of their feelings of presence.

Table 2. Standardized regression coefficients predicting recall performance.

4 Discussion

Examining the group-level statistics, it is of note that there were no differences among any of our variables – particularly presence. Theory would suggest that our most immersive condition (i.e., gesture-based VR) should have fostered a significantly greater sense of presence than our desktop condition, but our analyses indicated that presence was equivalent across conditions. Usability was highly related to presence in our analyses, but was stronger than presence in predicting recall performance. The moderated mediation model indicated that the presence and usability questionnaires may be assessing the same cognitive mechanism that relates to performance; however, usability appears to measure it more accurately. Although the overall reliability of the PQ was acceptable, the reliability scores of two subscales was unsatisfactory. It is possible that some aspects of the PQ may not accurately assess sub-constructs of presence. In short, we contend that any claims of learning or training benefits due to presence may actually be better explained by usability.

5 Conclusion

Contrary to theory, our results suggest that presence is not predictive of learning outcomes in more immersive simulations. We recommend that developers and researchers consider prioritizing usability before fostering immersion in simulation-based training, as presence may not be the underlying cognitive mechanism by which simulation-based training is effective. Although the current study used gestures that were representative of real-world interactions, system limitations required the gestures to be gross and discrete, instead of fluid, natural gestures. Future researchers may want to consider investigating the role of other design features which may influence recall, such as interactivity (e.g., natural gestures) or sensory feedback (e.g., haptic feedback).