Keywords

1 Introduction

Voice recognition systems provide a powerful potential method of control for robotic systems. In law enforcement, communication between team members is verbal and gestural. By providing a verbal interface for a small unmanned ground vehicle (sUGV) for special weapons and tactics (SWAT) operations, team members can operate the sUGV hands-free and maintain situation awareness [1]. However, the constraints of the operational environment limit the network connectivity and on-board computational power available to the voice recognition system and thereby limit its capabilities to a keyword-based command system. In the keyword-based command system, the officers must learn the available commands and how to pronounce them to ensure proper recognition. Then, the officers must accurately recall the commands and say them correctly in highly stressful and dynamic situations. In an early test of our voice recognition system, officers failed to recall the commands [1]. To address recall failures and to assist with recognizable pronunciation of commands, we developed training tools that allowed officers to practice issuing verbal commands to the voice recognition system [2, 3]. Our most recent virtual environment training tool includes an operating environment, a simulated sUGV, and supports both virtual reality and desktop computer-based training [3].

Virtual reality provides a more immersive training environment that increases engagement and retention [4, 5]. However, VR requires users to wear a head-mounted display, isolate themselves from their surroundings, and set aside time dedicated to VR training. The desktop training system has lower computational requirements, requires less start-up time, and better supports drop-in/drop-out training. It is not known whether the benefits of VR training outweigh the limitations.

2 Related Work

Virtual reality using head-mounted displays is an emerging technology with many potential applications [5,6,7,8,9,10]. The most recent generation of VR HMDs has made significant progress in addressing technological issues that have previously limited adoption of the technology. Improved tracking, reduced latency, high resolution displays, and advanced graphics capability have converged to provide powerful, immersive simulations. The technology is not only effective for gaming or data visualization. The technology has been rapidly adopted for education and training by the military, industry, and sports [4, 8]. VR training can increase retention of knowledge and improve task performance.

Despite its advantages, there are drawbacks to VR HMDs that may limit use of the technology. HMDs can cause user discomfort in many forms: eyestrain, heat, neck and head pain, fatigue, and simulator sickness [11,12,13,14,15]. Many of the advances in HMD technology have helped to address factors known to contribute to simulator sickness: low frame rates, low quality displays, high latency, poor tracking. However, movement through virtual spaces can lead to simulator sickness and there is no solution available that fully addresses the issue. Often, the user has limited space to move either because of the limitations of the physical space (room size, obstacles) or because of the limitations of the VR system. Many of the methods that allow the user to move through a virtual space larger than the physical space will contribute to simulator sickness [15, 16]. Steering movement, with a joystick in VR and/or with a keyboard and mouse in games, often leads to simulator sickness in VR [14]. Some methods, like teleportation or portaling [17,18,19], modify how the user moves through space. Some methods apply visual effects to reduce simulator sickness [20, 21]. Others use physical motions to drive virtual motions and provide physical cues to the user’s sensory system [22, 23]. Another popular technique, redirected walking, takes advantage of control of all visual inputs to the user and manipulates the user into walking in circles while believing they are walking straight [24]. The different methods available each have strengths and weaknesses that may depend on the context and the tasks to be performed in the virtual environment.

VR HMDs are also not always convenient. The HMD is not an integral part of the computer. It is an optional add-on purchased for special applications. Typically, a user will use a keyboard and monitor for most tasks and then start their specific VR application, wear the HMD, and then interact with the virtual environment. In the VR HMD, the user is often blind to the outside environment and may have difficulty communicating with those in their physical space [25]. For any task outside of the specific application, the user may have to remove the HMD, perform the task with the keyboard and mouse, put the HMD back on, and return to the VR application. This switching cost may also reduce the perceived usability of a VR training tool or application.

The differences in VR and desktop modes for training and learning have been previously explored by many researchers, but the results have been inconsistent. In some cases, there is no difference found in quantitative assessments, but participants self-report special benefits (improved spatial insights, more realistic) and increased difficulties in the VR mode [26]. Others show only slight improvements for VR in quantitative metrics [27]. In a navigation task, users reportedly prefer the VR mode but measures of performance are better in a desktop mode [28]. These results suggest that the strengths of VR may be offset by the weaknesses associated with VR. The advantages of VR may be context dependent and limited to benefits in specific aspects of the training task.

The current study compares VR and desktop modes for a training tool to evaluate potential differences in simulator sickness, sense of presence, usability, and user preferences for the two modes.

3 Apparatus

We developed the desktop and VR training tool using Unity 2017. The tool was designed to provide more realistic and immersive training with the voice recognition system. In the training tool, participants were directed to search virtual environments for boxes containing contraband (e.g., drugs) and find and disarm a small bomb. Participants interacted with the simulation in VR using an HMD and on a desktop system using a standard display.

3.1 Robot and Environment

We imported a virtual sUGV model based on Dr. Robot’s Jaguar V4 Mobile Robotic Platform [29]. A physical robot of the same design is used in our laboratory and in training activities with local law enforcement officers. Four virtual environments were used in the study. We acquired two complete virtual environments from the Unity Asset Store: a desert city environment [30] and a shooting range [31]. We developed two additional environments for the project: a school environment consisting of a single hallway lined with lockers and two classrooms and an office space with three rows of cubicle desks. See Fig. 1 for top-down renderings of the four virtual environments.

Fig. 1.
figure 1

Renderings of the four virtual environments used in the study: desert city, shooting range, school, and offices.

For this study, participants were told to search for boxes of contraband and a bomb (see Fig. 2). We placed two boxes of contraband and a single bomb in each of the environments. In each environment, the items were placed in two configurations: one for the VR mode and one for the desktop mode.

Fig. 2.
figure 2

Renderings of the contraband box (left) and bomb (right).

3.2 Command and Control

The basic functions of the robot (move forward, backward, turn left, turn right, activate lights, activate sirens, etc.) were implemented using both physical controls and voice commands. Participants used voice commands to activate systems on the robot. Table 1 lists the voice commands available to participants during the study. In the study, participants used a ‘push-to-confirm’ model. In this model, the recognizer was always running and attempting to interpret utterances made by the participant. Participants used a keyword, ‘Apple’, to indicate to the recognizer that a command was being issued to the robot. The word(s) following the keyword were interpreted as a command to the robot. If no command was recognized in an utterance, the utterance was ignored. If a command was recognized, the command was displayed to the participant via the voice command user interface. The participant then must confirm the voice command and only then will the action be performed. The ‘push-to-confirm’ model reduced the chances of an accidental activation of one of the robot systems.

Table 1. Voice commands available to participants.

Participants used physical controls to select menu items, to drive the robot, and to activate special commands. To accommodate differences in the VR and desktop systems, controls were varied slightly between the two modes.

In VR, participants wore the HMD (Oculus Rift virtual reality headset) and held two controllers (Oculus Touch controllers). Participants used the built-in microphone on the HMD to issue voice commands. Three Oculus cameras were used to provide full 360-degree tracking of the participants. Participants selected menu commands by pointing at the menu items and pressing the left controller joystick. Once in the environment, participants directed movement of the robot using a joystick on the controller held in their left hand. Locomotion in VR could lead to simulator sickness. However, the study environments are large and include multiple rooms. For this study, we chose to use a common method of movement in VR: teleportation. Participants press down on the right controller’s joystick, point to where they want to move to, and release the joystick. Upon release, the participant’s camera is instantly re-positioned above the target position. This method allows participants to control their view and minimizes simulator sickness. At times, the robot could become stuck in the environment. Participants could reset the robot position by pressing and holding the right controller’s grip button.

In the desktop mode, participants used a keyboard and touchpad for the physical controls. Participants wore a headset microphone (Logitech Wireless Gaming Headset G930) to issue voice commands and used the touchpad only at the start of a scenario to make their selections from menus. They directed the movements of the robot using the ‘W-A-S-D’ keys, a common configuration for gaming. A significant difference from the VR environments was that the participant’s point of view was always locked to the robot’s position. Participants could select between two views: a first-person view as if they were viewing the scene through the robot’s camera and a third-person view as if they were viewing from a chase camera just behind and above the robot. Participant’s used the ‘Z’ key to switch between the views. On the desktop system, participants reset the robot using the ‘R’ key (Table 2).

Table 2. Physical controls.

4 Method

4.1 Participants

Participants were recruited from the general population in and around Starkville, MS. Five participants completed the preliminary study (3 men, 2 women). The average age of participants was 27.4 (SD: 7.16). All of the participants reported familiarity with virtual reality and reported at least some experience playing video games (80% sometimes play and 20% often play). Two participants wore corrective lenses. With regard to frequency of simulation or motion sickness, 1 reported that it occurred often, 2 sometimes, and 2 never.

4.2 Procedure

All procedures were reviewed and approved by the Mississippi State University Institutional Review Board. We observed participants as they completed training in both environments (desktop and VR) to evaluate user preferences and usability of VR training compared to desktop training for learning voice controls for a sUGV in a law enforcement domain.

Participants completed a short demographics survey and an initial simulator sickness questionnaire (SSQ) [15]. The initial SSQ score provided a baseline score for comparison. Participants were randomly assigned to start with the VR mode or the desktop mode. Participants opened the training tool using a shortcut on the desktop. In VR mode, participants put on the HMD and picked up the controllers. In desktop mode, participants put on the headset microphone. In both the VR mode and desktop mode, participants began by completing an unscored trial in the desert city environment to familiarize themselves with the display and the controls used in the current mode. The remaining three environments were presented in random order. In each trial, participants searched the environment for two boxes of contraband and a single bomb. We instructed participants to perform the following tasks: (1) find the items, (2) use the robot’s ‘scan’ function to verify that the object was contraband or a bomb, (3) take a photo using the robot’s ‘photo’ function, and (4) in the case of a bomb, use the robot’s ‘disarm’ command to disable the bomb. We further instructed participants that the highest priority was to find and disarm the bomb. Participants were given up to eight (8) minutes to search the environment. When participants disarmed the bomb, the trial ended, whether they had discovered the contraband boxes or not.

After each trial, participants removed the HMD or the headset microphone and completely closed the training tool application. Participants then completed a SSQ and a system usability survey (SUS) [32]. After completing all four trials in VR mode or desktop mode, participants completed a 30-question presence survey [33, 34] then switched to the other mode. After completing all trials for both modes, participants were asked to indicate their preferred mode: VR, desktop, or both on 10 usability items (adapted from the SUS) [32].

5 Results

Survey data was collected on-site using a Qualtrics web-based survey. Overall, the results revealed no significant differences between the desktop and HMD modes for simulator sickness, sense of presence, or perceived usability. When participants were asked to choose between the desktop mode and the HMD mode, results indicated that, overall, participants preferred the head-mounted display. However, participants also reported that the head-mounted display was more complex, less consistent, and more difficult to learn to use. The desktop mode was perceived as easier to use and participants reported being more confident when using it.

5.1 Simulator Sickness

The SSQ consists of 16 items that describe symptoms associated with simulator sickness (e.g., headache, eyestrain, etc) [15]. Participants responded by indicating their current feelings with respect to the symptoms with possible responses including None (0), Slight (1), Moderate (2), and Severe (3). We calculated the total simulator sickness score according to [15] for each trial. The average and maximum total score for the VR and desktop are listed in Table 3. There was no significant difference in simulator sickness symptoms between baseline, VR mode, and desktop mode, F(2,8) = .942, p = .48.

Table 3. Descriptive statistics for SSQ for baseline, VR, and Desktop

5.2 Presence

The presence survey consisted of 30 items that taken together attempt to assess the level of immersion in the virtual environment. Our survey was based on [33] with two questions related to haptic interaction removed. A presence survey was completed at the completion of the VR mode trials and the desktop mode trials. Table 4 lists descriptive statistics for the presence survey. As with the SSQ, there was no significant difference between the VR and desktop modes.

Table 4. Descriptive statistics for Presence for VR and Desktop

5.3 Usability

Participants were asked about usability of the VR mode and the desktop mode in two different ways: First, participants completed the SUS [32] after each trial. Second, after all trials were completed, participants were asked to select their preferred mode for 10 items based on the SUS items. The SUS is a 10-item survey designed to evaluate the usability of a system. We scored the SUS for each trial and then combined the VR and desktop scores to compare the overall means. As with the SSQ results, there was not a significant difference between the mean reported usability for the VR and desktop systems, t(4) = −1.793, p = .147. In Table 5, there did appear to be a large difference in minimum reported usability. In our preliminary data set, there was a single outlier participant that particularly disliked the VR system (M = 11.67 SUS) but appeared to find the desktop more usable (M = 31.67 SUS). This is the only participant with a large difference in SUS scores for the two modes. This difference also was only observed for SUS; there was not a large difference in participant’s SSQ and presence results for the two modes. For all other participants, the mean SUS for VR and desktop were roughly the same.

Table 5. Descriptive statistics for SUS for VR and Desktop

After completing all trials, participants were asked a series of questions based on the SUS items. For each of the 10 items, participants chose between the VR mode, the desktop mode, or both. Table 6 lists the item text and the proportion of responses for each item.

Table 6. User selections for usability items.

6 Discussion

This preliminary study revealed clear differences in user perception of the VR and desktop modes of the training tool. Both modes of training (VR and desktop) showed no signs of simulator sickness despite requiring participants to explore four separate scenarios, each one lasting up to 8 min. Neither virtual environment imparted any significant symptoms of simulator sickness to the participants and was likely not a factor in their perceived usability of the system.

When comparing the VR and desktop versions, the majority of participants preferred to use VR or both systems. Only one participant preferred only the desktop version. The increased complexity of the VR mode reported by participants was likely due to the added complexity of the navigation system used by the participant to move in the VR environment. VR was also perceived to be inconsistent and to be poorly integrated into the system. Again, the added complexity of the movement system in VR likely contributed to this perception. In addition, the mapping of actions to controller buttons could also be improved. There was some inconsistency in use of the joystick button for menu selection (push button + pull trigger) and for movement (push button + release button). This may also have contributed to perception of complexity in the VR mode.

The increased complexity of the VR training tool likely contributed to the increase in the participants expectation that additional support and learning would be required to use the VR system for training. The combination of these factors likely contributed to the overall sense that, in comparison to the VR training, the desktop mode was perceived as easier to use and imparted a higher sense of overall user confidence.

Overall, the users were able to use both modes to interact with the voice recognition system and the training tool appears to have potential regardless of which mode users prefer.

7 Conclusions and Future Work

This small pilot study compared participant experience in two modes: VR and desktop. We believed that the VR mode would provide additional immersion and sense of presence to participants but would also be more difficult to use and could cause participants to suffer symptoms of simulator sickness. Participants’ responses indicated that the two modes provided similar sense of presence and usability. When asked to select between the systems, participants’ responses indicated a preference for the VR mode but also identified challenges that may limit use of the VR mode. Overall, the training tool scored well on usability. Future work should expand the sample size. The single participant that reported a poor experience could be a true outlier or could represent a minority group that would strongly prefer the desktop mode. Future research should also evaluate participant performance with the voice recognition system, progress throughout training, and long-term retention and transfer from the training tool to the real world.