Keywords

1 Introduction

Virtual environments (VEs) provide people with opportunities to experience places and situations remote from their actual physical surroundings. Virtual reality systems potentially allow people to learn about an environment which, for reasons of time, distance, expense, and safety, would not otherwise be available. This work focuses on head-mounted display (HMD) technology because it is relatively inexpensive and readily available as compared to other immersive technologies. Our work examines exploring an HMD-based VE by physically walking. By using physical locomotion, we seek to leverage the natural ability of people to maintain spatial orientation. This modality is natural for the HMD since HMD technology often uses a head tracker that measures changes in the orientation and the position of the user’s head within the physical environment. Unfortunately, the finite range of the HMD tracking system, or, more importantly, the limited amount of space a commodity level user may have to devote to an HMD system, constrains the size of space that can be freely explored using bipedal locomotion. Of course, using some other type of locomotor interface such as a joystick to translate in an environment might be a solution, but bipedal locomotion results in much better spatial orientation [4].

Williams et al. [23] investigate increasing the translational gain of walking (where each step in physical space moves the user a longer distance through virtual space) as a viable method to explore a large VE. They present two experiments that show the translational gain of bipedal walking can be scaled, and this type of locomotion results in better spatial orientation compared to using a joystick. However, their experiments limit the scale of translational gain to a factor of ten, since head movements and other small movements become distracting at higher gains. This paper expands the findings of Williams et al. [23] and examines how far translational gain can be increased with the aid of engineering solutions to improve the problems of small head movements. At high translational gains small locomotive movements become disorienting, making it difficult to position oneself near stationary objects in the VE.

Fig. 1.
figure 1

This figure show a top-down view of a VE that is \(\approx \)5 m \(\times \) 5 m with head motion a user “looking around” in yellow. (Color figure online)

Fig. 2.
figure 2

This example shows the same physical movement as Fig. 1, with virtual movement scaled by a factor of twenty.

There are two potential issues to address when designing an algorithm to alleviate distracting head movements [7, 23]. First, when people locomote at high rates of gain, a strategy must be employed to allow users to move locally in a natural way. That is, small head movements when the user is not locomoting to a new position need to be filtered or somehow minimized. For example, it is difficult and unnatural to maintain a fixed head position and rotate about that axis with the HMD. Consider the head movement of a user examining the contents of a VE from a center location as in Fig. 1 where locomotion in the physical space matches locomotion in the virtual space. In Fig. 2, this same physical movement is replicated, yet the translational gain is scaled by a factor of 20. In this example, simply turning to view the contents of the room amounts to considerable visual motion in the VE. Motions such as gazing around the room (Fig. 1) should not be scaled. Second, when users walk in an environment their heads may bob from side to side as they shift their weight. This side–to–side movement should not be scaled. Locomotion should only be scaled in the direction of intended travel. Although we did investigate this issue, we did not find it to be a problem at high rates of gain. Thus, our algorithm does not address side–to–side bob.

This work uses a nonlinear method of scaling gain to minimize the distracting effects of small head movements. The basic idea of the algorithm involves ramping to high gain when the user’s speed reaches above a certain threshold. Experiments 1 and 2 aid in the creation and testing of our nonlinear scaling technique. As a second focus to this paper, we examine whether it may be advantageous to also scale eyeheight. We reasoned that scaling eyeheight could allow gain to be scaled even higher and allow us to gain more explorable space from our HMD system. Therefore, this work investigates whether a person’s spatial orientation is improved when eyeheight is increased while locomoting through a virtual world at high rates of translational gain. We reasoned that increasing the eyeheight to explore a large VE could be useful when exploring an outdoor environment like a large city. Such a strategy would allow users to develop spatial orientation based on a map-like overview yet unlike virtual flying still give users the proprioceptive feedback of walking.

The main experiment of this work is found in Experiment 3 of Sect. 6. Here we directly compare linearly scaled translation gain (no correction of small head movements), nonlinearly scaled translational gain (minimizing the effects of small distracting head movements), and gain scaled proportionally to eyeheight. Specifically, we compare these three locomotion methods using four different scaling factors of translational gain: 10, 25, 50, and 100. We show that scaling gain nonlinearly is significantly superior to scaling gain linearly. Additionally, we find that people can maintain good spatial orientation with translational gains up to 50 using the nonlinear scaling technique presented in this paper. We find no significant advantage with respect to the user’s spatial orientation when eyeheight is scaled.

2 Background

Much like in the real world, humans update their spatial knowledge or spatial awareness with respect to a VE as their relationship to objects in the environment change [25]. However, humans are more disoriented in VEs [4, 8]. Thus, navigation, the most common way people interact with a VE [2], causes people to feel disorientated. Exploring a VE by physically walking seems to result in the best spatial awareness [4], but the size of the space that can be explored using a tracking system is limited without alternate interventions such as teleporting. Much work has looked at how best to explore a VE larger than the tracked space while maintaining spatial awareness [6, 9, 15, 19, 27, 30]. These locomotion methods involve real walking, redirected walking, walking–in–place, teleporting, joystick, swimming, arm–swinging and more (for a literature review see [1]). Much of the more recent navigation work has focused on engaging the user in physical movement as it seems to result in better spatial awareness of the VE as compared to using a joystick [6, 9, 19, 20, 27].

The current work adds to the body of work on redirected walking [3, 5, 13, 14, 19] as we are manipulating walking. Rieser et al. [16] and Mohler et al. [12] show that people can quickly recalibrate to a new mapping between their own physical translation and visual input. However, the scaling factor of the translational gain in these recalibration studies was significantly smaller than that which is investigated in this work. Kuhl [10] reports that people can also recalibrate rotations. A compelling reason to manipulate translations instead of rotations is that research shows that physical rotations are more disorienting than physical translations with respect to spatial orientation [16]. Moreover, rotations are not a problem; a user can turn through any distance of rotation in any space that is large enough to stand in. Williams et al. [23] show that the translational gain of walking can be scaled by a factor of ten and that there is no significant difference in spatial orientation when compared to exploring an environment using normal bipedal locomotion.

This work also examines the role of eyeheight when experiencing a VE. More specifically, eyeheight refers to the distance from the viewer’s visual horizon to the ground (for more information, see [18]). An observer’s eyeheight influences perception and action in the physical world; it is used to scale the distances of objects and to scale the height and width of apertures. In our everyday lives, we humans constantly change our viewing perspective by sitting, standing, etc., yet the perceived relative size of objects remains the same. This may be because of familiar size or previous knowledge about size and shape [28]. Additionally, the angle of declination from the horizon line to the ground also provides another source of information. People use this information to recalibrate the relative sizes of objects at different eyeheights [28]. Wraga et al. [28] compare seated, standing and ground–level prone observations and find that seated and standing observations are similar, but prone observations are significantly less accurate. Warren [21] finds that people judged whether they could sit on a surface according to whether the surface height exceeded 88% of their leg length. Moreover, people choose to climb or sit on a surface according to the relationship between the surface’s height and their eyeheight [11].

3 Method for Minimizing Disorienting Head Movements

To investigate how high gain can be scaled, a method of scaling gain while minimizing these disorienting movements was devised. Informal user studies of participants at unfiltered high gain (100:1 and 50:1) revealed that small head movements were disorienting. More specifically, disorientation seemed to occur when the user’s locomotion was minimal and they were simply trying to either perform a local task such as move a few feet, or observe the environment. Participants also reported that large gain factors seem more natural and much less disorienting if their own physical locomotion was above a certain rate. Thus, we sought a method to minimize this effect by targeting the problem of disorientation when gain is scaled by large factors at slow speeds.

Fig. 3.
figure 3

This is figure shows a ramping cubic function used in Experiment 1. For speeds above 0.5 m/s gain was scaled by 100.

Fig. 4.
figure 4

This shows all three ramping functions evaluated in Experiment 2. For speeds above 0.5 m/s gain was scaled by 100.

In the experiments presented in this paper, users “ramp-up” to high gain based on the magnitude of their velocity, or speed. When users are not moving, but simply observing an environment, then their speed is low and the translational gain is also low. As they begin to locomote, their speed is increasingly scaled up to the desired gain. We refer to this method as nonlinear translational gain. In this nonlinear condition, once users reach a critical speed threshold all movements are scaled linearly by a scaling factor (or a simple linearly scaled translational gain). Speeds below the critical threshold are scaled nonlinearly according to a pre-specified function. Thus, for physical speeds between zero and the critical threshold speed, virtual speed is obtained by scaling physical speed according to this function. Suitable functions should be strictly monotonically increasing with an initial value equal to zero (for zero speeds) and value at the threshold equal to the threshold multiplied by the high gain scaling factor. An example of such a function is seen in Fig. 3 where speeds above the critical threshold of 0.5 m/s are scaled by a factor of 100. Speeds below 0.5 m/s are scaled according to a cubic function. User speed is calculated every time the graphics are updated, which was 60 Hz. Speed is defined as the distance between the user’s position at the time of the graphics refresh (\(p_x\), \(p_z\)) and the position of the preceding graphics refresh (\(p_x'\), \(p_z'\)) divided by the refresh rate, refreshRate. To calculate the distance traveled we simply use the user’s position in the x and z directions and ignored y direction idicating the user’s viewing height. Thus, speed is calculated as follows: \(speed= \frac{\sqrt{(p_x-p_x')^2 +(p_z-p_z')^2} }{refreshRate }\). In “high gain mode” when gain is linearly scaled, calculating the new virtual position involves scaling the speed by the gain amount, scale. Thus, in high gain mode the virtual position in the new x and z positions in virtual space, \(v_x\) and \(v_z\), can be obtained from the user’s position at the previous and current frames: \(v_x= v_x'+(p_x-p_x')*scale\) and \(v_z= v_z'+(p_z-p_z')*scale\), where \(v_x'\) and \(v_z'\) represent virtual position from the previous frame.

There are many functions that meet the requirements for a ramping function, and beyond these requirements our goal was to select one which was pleasing from a user’s perspective. Additionally, the value of the critical threshold itself needs to be determined. We evaluated the different functions using user studies. Thus, two experiments were designed to validate engineering choices for both the threshold and ramping functions. First, Experiment 1 examines the critical speed threshold at which a user should enter into linearly scaled high gain or linear gain. Experiment 2 evaluates three plausible functions used to scale speeds smaller than the critical threshold: an exponential, a cubic polynomial, and a quadratic polynomial. In the experiments there is a chicken-and-egg problem in that a ramping function cannot be derived without knowing the critical threshold, and determining a critical threshold assumes the use of some form of ramping function. In this work we do not examine this question exhaustively. Rather, we assume a cubic ramping function to determine the critical threshold, then assume this threshold is the best value for testing different ramping functions.

The mathematical details below describe the simple cubic function (Fig. 3). Below the critical threshold, the virtual speed, \(s_v\), is described in terms of physical speed, \(s_p\) as follows: \(s_v=s_p+c_1(s_p)^3\), where \(c_1\) is a constant whose value depends on the gain level. Thus, the value of \(c_1\) changes with each gain level. Above the critical threshold gain is scaled directly by the high gain amount. We use this simplistic form of the cubic because it has a desirable slope and has one solution. The function we use passes through (0,0). Thus, at a physical speed of 0 m/s, virtual speed is also 0 m/s. Subjects ramp up to high gain according to a function whose first derivative is not continuous. This discontinuity represents the boundary between normal walking and high gain. Interestingly, the discontinuity does not produce a noticeable artifact. We explore this idea further in Sect. 7.

Table 1. Values of the constants

As an example we solve for \(c_1\) at 100:1 gain and a critical threshold value of 0.5 m/s. The refresh rate of the graphics and tracking system has a direct impact on the values of the constants found in the above equation. For purposes of this example, let us assume that tracking updates every 1 s. At 0.5 m/s speed should be scaled by 100, and values under 0.5 m/s should be scaled according to the cubic function. We know that at a physical speed of 0.5 m/s the virtual speed should be 50 m/s (0.5 m/s * 100). Thus, plugging in two known values, \(s_p=0.5\) and \(s_v=50\) gives us \(50=0.5+c_1(0.5)^3\), \(c_1=396\). Thus, we scale gains lower than 0.5 m/s according to the following function: \(s_v=s_p+396(s_p)^3\), which, again, is plotted in Fig. 3. In our system, the graphics are refreshed every 60 Hz. Therefore the constants change. Let us look again at the cubic function at 100:1 gain. Since we are updating the graphics every \(\frac{1}{60}\) of a second, we would like a speed of \(\frac{1}{60} *0.5\) (or 0.0083) to map to \(\frac{1}{60}*50\) (or 0.8333) since each frame is \(\frac{1}{60}\) of a second. Thus we solve for \(c_1\) with these values \(s_p=0.0083\) and \(s_v=.8333\) and find that the value of \(c_1\) at 100:1 gain, a critical threshold of 0.5 m/s, and a refresh rate of 60 Hz, is \(1.4256e+06\). The constants for the quadratic and exponential ramping functions at each of the gain levels are found in a similar manner. The quadratic function we evaluated was: \(s_v=s_p+c_1(s_p)^2\), and the exponential had the form \(s_v=s_p+c_1e^{c_2 s_p}-c_1\). We wanted the exponential function to be flat or have a small slope at small speeds so that gain would be scaled by a minimal amount. The values of the constants for a 1/60 refresh rate are shown in Table 1. The three functions are plotted in Fig. 4.

4 Experiment 1: Finding the Critical Threshold

The purpose of this experiment was two-fold. First, this within–subject experiment investigates how rapidly users can switch from speed scaled by a function to the linearly scaled high-gain speed. This experiment examines two critical speed threshold values: 0.5 m/s and 1 m/s and compares these results to linearly scaled translation gain where there are no critical values and gain is simply scaled by the high-gain amount. Thus, the second objective of this experiment is to formally evaluate the use of this “ramp-up” function and investigate whether users feel that problems with disorienting small head movements have become negligible with the proposed method. In this experiment the high gain value or the highest scaled value of translational gain was fixed at 100:1. The scaling function used to scale speeds lower than the critical threshold speed value was a cubic polynomial (Fig. 3).

Six subjects participated in the experiment for compensation. Subjects were unfamiliar with the experiment and the VE. Subjects were asked to find and read three different Snellen eye charts which were arranged on the sides of buildings in a large outdoor VE. They were allowed to get as close as they liked and could readjust their position at any time. The ease of reading these charts allowed subjects to report a subjective measurement of the ease of localized movements or local locomotion in each condition. Subjects read three different charts for each condition because we wanted the subjects to get a feel for making small position changes in the VE in each condition. To understand the goal of the Snellen chart task, it is important to note the difficulty of controlling small movements when no “ramp up” function is used. When gain is simply scaled by 100, 1 cm of movement corresponds to 100 cm of virtual movement. Therefore, it is challenging for users to position themselves in a precise location and hold their heads steady enough to read the small letters on the chart. They were also asked to find and walk to a series of seven objects in the VE that were a considerable distance apart. This task allowed subjects to report the ease of large-scale locomotion through the entire environment, which is referred to as global locomotion.

4.1 Materials

The virtual world was viewed through a full color stereo NVIS nVisor Head Mounted Display with 1280 \(\times \) 1024 resolution per eye, a nominal field of view of \(60^\circ \) diagonally, and a frame rate of 60 Hz. The HMD weighs approximately 1 kg. An InterSense IS-900 tracker was used to update the participant’s rotational movements around all three axes. Position was updated using two optical tracking cameras with an accuracy of <0.5 cm over a 3 m \(\times \) 3 m \(\times \) 3 m volume and an update rate of 60 Hz. The size of the physical room in which the experiments were performed was approximately 5 m \(\times \) 6 m, and within the room the limits of the tracking system was approximately 5 m by 5 m. The same 650 m \(\times \) 650 m large, outdoor environment was used in each of the conditions. The size of the Snellen eye charts that participants was instructed to read were approximately 0.7 m \(\times \) 0.7 m and the charts were randomly located on the sides of buildings that appeared in the environment. A ten line Snellen eye chart was randomly generated for each trial using software that is freely available [17]. The environment is pictured in Fig. 5. Buildings and other objects were scattered throughout the environment. These objects were of natural shape and size and were items that one would expect to see outdoors. Larger objects are positioned further away from the center of the environment and smaller objects were closer to the center enabling the viewing of all objects from the center of the environment. The seven target objects that the subjects had to walk to varied by trial but were such things as the front door of the cathedral, the water tower, the swing set, the entrance to the Panera building, the front of the hotel, the parking meter, the police car, etc.

4.2 Procedure

There were three conditions in this experiment: critical threshold speeds of 0 m/s (linear gain scaled by 100), 0.5 m/s, and of 1 m/s. Two conditions use a cubic polynomial to scale gain until a critical threshold speed is reached, then gain is simply scaled by 100. If speed drops below the critical value, then gain is again scaled according to the cubic function. Each of the six participants explored each environment under the three different critical thresholds (0.5 m/s, 1 m/s, and 0 m/s). Since there were six orders of three different critical threshold speeds, one subject was tested in each order in a counter-balanced fashion. The experimental procedure was explained to the participant prior to viewing the VE. Subjects were told what condition they were experiencing and were instructed to walk freely around the environment to familiarize themselves with the gain and the critical threshold of that condition. When the subject indicated to the experimenter that they felt comfortable with the environment, they were instructed to find the first Snellen eye chart and read as many lines down the Snellen chart as they felt comfortable. The subjects were allowed to position themselves as close to the Snellen chart as possible, and reading the smallest rows generally required subjects to be about two virtual feet away from the Snellen chart. After they had read as many rows as possible, they were instructed to find the second Snellen chart and read that set of letters, and continue on to the third Snellen chart.

After they had read as much of the charts as possible, participants were asked to find and locomote to seven different objects in the environment. The objects were far enough apart so that subjects were required to exceed the critical threshold speed and locomote at high gain to reach the objects. If subjects walked too slowly in the environment to reach an object, a situation could occur where they could not reach that object because they reached the limits of the tracking system first (or reached a physical wall). We refer to this error as an out-of-range target error. When this error occurred, the experimenter would slowly lead the subject backward in the physical environment so that they were moving at low gains backward in the VE. This was done until the experimenter felt that the subject had enough tracking space to reach the target object. This issue only had the potential to occur in the nonlinear conditions (or when there was a critical value equal to 0.5 m/s or 1 m/s). The frequency of this occurrence was recorded. The speed and accuracy of reading the Snellen chart was also recorded. The subject indicated to the experimenter that they were ready to read the chart. The experimenter then began timing the subject reading the Snellen chart and stopped the timer when the subject was finished reading the chart or when they indicated that they could no longer read the rest of the chart. Time was recorded using a stopwatch. After completing each condition, subjects were asked to rate the following on a scale from 1 to 10: local control, global control, sense of sickness, and sense of balance. Upon completing all three trials and the post-trial questions, subjects were asked to indicate what condition they preferred. They were also asked specifically if they found the scaling of side-to-side movement at high gain disorienting.

4.3 Results

The results of the post-condition tests are shown in Table 2. In each condition differentiated by critical threshold value, subjects were asked to rate the local control of their movement, the global control of their movement, their feeling of sickness, and their feeling of unbalancedness on a scale from 1 to 10. In the 0.5 m/s critical threshold condition, subjects felt the highest global control or sense of being able to control traveling around the environments for greater distances. They also felt control over local movements or locomotion needed to travel short distances. Participants felt the highest control over local movements with a 1 m/s critical threshold speed, yet their sense of global control was considerably less than when using the 0.5 m/s critical threshold. The linearly scaled gain (or 0 m/s critical threshold speed) provided very little local control and reasonable global control. The linearly scaled gain condition made people feel nauseated and altered their sense of balance. People rarely felt these effects in the other two nonlinear gain conditions.

Table 2. Mean ratings of the post-condition test of Experiment 1

When asked to rate which method they prefer best, four of the six participants preferred a critical threshold of 0.5 m/s, while the other two preferred the 1 m/s critical threshold. One of the subjects that preferred the 1 m/s over the 0.5 m/s condition found reading the Snellen charts easier in the 1 m/s condition yet preferred 0.5 m/s for walking long distances. Overall, subjects found the 0.5 m/s felt “most natural” for doing both local and global locomotion. Interestingly, four of the six subjects in the 1 m/s condition had problems reaching their target objects in a few of their trials because they did not travel fast enough and ran out of tracking space. This out-of-range target error only occurred once in the 0.5 m/s critical threshold condition across all of the subjects. As for reading the Snellen charts, in the 0.5 m/s condition, it took participants an average of 105 s to read the chart with an average of 0.3 mistakes per chart. This means on average, subjects did not make a mistake reading the chart. However, after reading approximately three charts, they would be more likely to make a mistake. Similarly, for the 1 m/s critical threshold value, Snellen charts were read at an average of 111 s and were done so with an average of 0.28 mistakes per chart. In the linearly scaled gain condition, no subject was able to read the last three lines of the Snellen chart. On average, they could discern a few letters on the fourth to last line, but usually stopped because they felt uncomfortable. At the end of the experiment subjects indicated whether they felt side-to-side movements while walking at high gain was disorienting. None of the subjects found this disorienting or thought any method of filtering needed to be employed.

We find that a critical value of 0.5 m/s is best since it provides a nice compromise between global and local control. Users can travel longer distances with little physical space, yet small head movements are not as distracting and disorienting as the linearly scaled gain. We also found that the 0.5 m/s threshold resulted in little or no sickness. Users also had the best sense of balance as compared to the other conditions. Thus, we use a critical value of 0.5 m/s for the remainder of this paper. Future work might involve using a more exhaustive experiment to find a more precise value of the critical threshold. However, given the good user evaluations of this method, we feel that 0.5 m/s represents a reasonable critical threshold. Some of the user comments about the method were: “stepping on the gas in a car”, “felt in control of their locomotion even though they were really moving fast”, “Wow, this is cool.” With no filtering several subjects noted that positioning themselves in front of the Snellen chart was “particularly difficult.”

5 Experiment 2: Finding the “Ramping” Function

Six subjects participated in this experiment and were given compensation for their participation. The subjects were unfamiliar with the experiment and the VE. The materials used in this condition were the same as Experiment 1. The procedure for this experiment was almost the same as Experiment 1. However, the difference was that participants experienced different ramping functions in each of the three conditions. The critical threshold speed was fixed at 0.5 m/s. Additionally, in this experiment they were not told which condition they were experiencing. They were again asked to read three Snellen charts and locomote to seven target objects. After each condition, subjects rated their experiences. After completion of all three conditions, subjects indicated which condition they preferred best.

Table 3. The mean ratings of the post-condition test of Experiment 2

The results of the post-condition questionnaire are presented in Table 3. In all of the conditions, subjects felt a high amount of global control and local control. The quadratic function had the lowest local control. From observing the three functions in Fig. 4, we can see that gain is scaled higher at smaller speeds for the quadratic function than the other two functions. People felt a slight sense of sickness in the quadratic condition as well, an effect that was not observed with the cubic and exponential functions. Since subjects were not told what condition they were experiencing, they were asked which condition they like best by the order of experience. Four of the six participants preferred the exponential function, while the other two preferred the cubic function. The average time to completely read the Snellen chart in the exponential condition was 112 s and the average time to read the cubic was 109 s. On average participants were unable to completely read the last line of the chart in the quadratic condition. Again, subjects were asked about the side-to-side movement when speed is linearly scaled in high gain and it was also not an issue in this experiment. Overall, the exponential function performs best; compared to the other two methods, it seems to give the user the highest amount of global and local control. Upon examining the functions in Fig. 4, the exponential has a smaller slope at small speeds which gives it an increased local control. Thus, our nonlinear scaling method involves an exponential “ramping” function with a 0.5 m/s critical threshold.

Fig. 5.
figure 5

This figure shows a view of the VE used in the experiments at normal eyeheight (\(\approx \)1.67 m.)

Fig. 6.
figure 6

This figure represents 10 times normal eyeheight (\(\approx \)16.7 m). Gaze is directed downward by \(20^\circ \).

Fig. 7.
figure 7

This figure represents 25 times normal eyeheight (\(\approx \)41.7 m). Gaze is directed downward by \(30^\circ \).

Fig. 8.
figure 8

This figure represents 50 times normal eyeheight. Gaze is directed downward by \(35^\circ \).

Fig. 9.
figure 9

This figure represents 100 times normal eyeheight. Gaze is directed downward by \(40^\circ \).

6 Experiment 3

Having selected the ramping function and threshold, we are now in a position to examine the limits of scaling translational gain. Thus, in this experiment, the goal was to assess how well subjects could maintain spatial orientation when the gain of translation in the virtual environment was varied relative to translation in the physical environment. More specifically, we wanted to find the limit to which gain can be scaled under three different conditions: linearly scaled gain, nonlinearly scaled gain, and linearly scaled gain with eyeheight scaled. The subjects’ spatial orientation was tested in each of the five translational gain conditions: 1:1, 10:1, 25:1, 50:1, and 100:1. To test orientation, subjects were asked to remember the location of five objects in the environment, then to move themselves to a new point of observation and instructed to turn to face the targets from memory without vision. Each subject performed the task in each of the five gain scales under one of three conditions: linearly scaled gain, nonlinearly scaled gain, and linear gain scaled proportionally to eyeheight.

Forty-five subjects participated in the experiment. Subjects were unfamiliar with the experiment and the VE. Subjects were given compensation for their participation.

6.1 Materials

The same HMD system that was used in Experiments 1 and 2 was used in this experiment. Also, the same 650 m \(\times \) 650 m large outdoor VE was used in this experiment for all of the gain conditions. Figures 5, 6, 7, 8 and 9 show the VE used in this experiment. These figures give a glimpse of the VE at each of the different scaled eyeheights. The explorable region of the VE changed according to the size of the gain in each of the different conditions. The size of the explorable region in the 10:1 condition was 50 m \(\times \) 50 m or 10 times the size of the explorable region in the 1:1 condition. Similarly, the virtually explorable region for the 25:1, 50:1, and 100:1 conditions was 125 m \(\times \) 125 m, 250 m \(\times \) 250 m, and 500 m \(\times \) 500 m, respectively. In each environment, subjects were asked to memorize the location of five objects differing in shape and size. An example of one of the five objects in the 1:1 environment was a fire hydrant. Example objects in the 10:1, 25:1, 50:1, and 100:1 environments are a picnic table, an 18-wheel truck, a church, and a tall hotel, respectively. These five target objects were arranged in a particular configuration, such that the configuration in the 1:1, 10:1, 25:1, 50:1, and 100:1 conditions varied only in scale (1, 10, 25, 50, and 100, respectively), and by a rotation about the center axis. In this manner, the five objects were arranged similarly in the two environments so that the angles between the target objects were preserved.

6.2 Procedure

One-third of the subjects performed the experiment in the linearly scaled gain condition, one-third performed the experiment in the nonlinearly scaled gain condition, and the last third performed the experiment with linear gain and eyeheight scaled proportionally. Translational gain was defined as the rate of translational flow in the VE that mapped onto a given amount of motor activity. In all three conditions, rotation in the VE matched rotation in the physical environment. In the 1:1, 10:1, 25:1, 50:1, and 100:1 conditions, the translational gain of the tracker was scaled by one, scaled by 10, scaled by 25, scaled by 50 and scaled by 100, respectively. Since there were 120 orders of the five gain conditions, subjects were tested in a pseudo-balanced fashion using a Latin square design. Given the five gain conditions and 15 subjects, we used three Latin squares to counterbalance our testing. Full details can be found in [22].

The experimental procedure was fully explained to the subjects prior to seeing the VEs. After about three minutes of study, the experimenter tested the subjects by having them walk to various targets, close their eyes, and point to randomly selected targets. This testing and learning procedure was repeated until the subject felt confident that the configuration had been learned and the experimenter agreed.

Participants’ spatial orientation was tested from five different locations. A given testing position and orientation were indicated to the subject by the appearance of a tall red rod and an avatar in the environment. Subjects were instructed to locomote to the red rod, position themselves near it and face the avatar. At each testing location, the subject completed three trials by turning to face three different target objects in the environment, making 15 trials per condition. Specifically, subjects were instructed, “Close your eyes and turn to face the \(\langle \)target name\(\rangle \).” After each trial, subjects were instructed to rotate back to their starting position facing the avatar. To compare the angles of correct responses across conditions, the same trials were used for each condition. The testing location and target locations were analogous in all conditions. The trials were designed so that the angle of correct response was evenly distributed in the range of 20–180\(^\circ \). Once the subject reached a testing location (the red rod), they were not allowed to look at the target objects as the objects were made invisible. They were, however, encouraged to re-orient themselves after finishing each testing position and locomoting to the next test position.

In the eyeheight condition, gain was scaled proportionally to eyeheight. In the 10:1, 25:1, 50:1, and 100:1 conditions users experienced the environment from a new viewing height. The target objects appeared smaller to the user since their eyeheight was elevated. Moreover, targets were observed by looking down. In this experiment eyeheight and gain were coupled. We considered a few different potential experimental designs for this experiment. We chose to run an experiment where gain was scaled proportionally to eyeheight. Other designs are possible. We could have held gain constant, but findings would have been specific to a particular gain. Running several such conditions at different gains was considered too cumbersome. An advantage of investigating eyeheight scaled proportionally to gain is that we are not limiting ourselves to findings relative to a particular gain. Another possible experimental design was to fix eyeheight and vary the gains, but Experiment 3 already gives us results for eyeheight fixed at one eyeheight, natural eye level. Thus, we felt that we could gain the most knowledge in a practical experiment by scaling gain proportional to eyeheight. However, the disadvantage of choosing this experiment is that eyeheight and gain are confounded.

To assess the degree of difficulty of updating orientation relative to objects in the VE, latencies and errors were recorded. Latencies were measured from the time when the target was identified until subjects said they had completed their turning movement and were facing the target. Turning errors were measured as the absolute value of the difference in the subjects’ actual facing direction minus the correct facing direction. The subjects indicated to the experimenter that they were facing the target by verbal instruction, and the experimenter recorded their time and rotational position. The time was recorded using a stopwatch, and the rotational position was recorded using the InterSense tracker. Subjects were encouraged to respond as rapidly as possible while maintaining accuracy.

Fig. 10.
figure 10

This figure represents the mean turning error of each conditions: Linear, Nonlinear, and scaled Eyeheight. Error bars show standard errors of the mean.

Fig. 11.
figure 11

This figure represents the mean latency of each condition: Linear, Nonlinear, and scaled Eyeheight. Error bars show standard errors of the mean.

6.3 Results

Figures 10 and 11 show the mean errors and latency collapsed across gain in the linearly scaled gain, nonlinearly scaled gain, and eyeheight condition. Figures 12, 13, 14, 15, 16 and 17 show the mean turning error and latency across different subjects, in the different experiment conditions (linear and nonlinear), and with different levels of translational gain (1:1, 10:1, 25:1, 50:1, and 100:1).

The linear and nonlinear gain data of this experiment were analyzed with five gain conditions. We first examine the effects of the levels of translational gain in the two different experimental conditions of linear and nonlinear gain. All subjects were tested on different levels of translational gain, hence gain was a within-subjects factor; subjects were tested in one of the three experimental conditions, hence experimental condition was between-subjects. Separate analyses were done for each of the two dependent variables, turning error and latency. A multivariate repeated measures analysis on mean turning error showed main effects of gain, \(F(4, 112) = 10.6\), \(p < .001\), experiment condition, \(F(1, 28) = 13.3\), \(p = .001\), and a significant interaction of the two, \(F(4, 112) = 2.6, p= .05\). Participants’ errors were greater in the 1:1 and 100:1 gain levels, as well as in the linear gain experiment condition, than in other gain levels or in the nonlinear gain condition. Planned comparisons revealed that in the nonlinear gain condition, turning errors in the 1:1 gain level were significantly different from errors in the 10:1, 25:1, and 50:1 levels, but not from the 100:1 level. Interestingly, in the linear gain condition, errors at the 1:1, 10:1, 25:1, and 50:1 levels were all significantly different from errors at the 100:1 gain level. A similar within subjects analyses on mean latency showed a main effect of gain, \(F(4, 112) = 3.7\), \(p < .05\), a marginal effect of the experiment condition, \(F(1, 28) = 3.9\), \(p = .06\), and no significant interaction. In both the linear and nonlinear gain, participants were faster in the 10:1, 25:1, and 50:1 gain levels, and slower in the 1:1 and 100:1 levels. These differences were significant in the nonlinear gain condition but not in the linear gain condition.

Fig. 12.
figure 12

This figure shows the mean turning errors in the Linear Gain condition for each of the translational gains. Error bars represent standard errors of the mean.

Fig. 13.
figure 13

This figure shows the mean latencies in the Linear Gain condition for each of the translational gains. Error bars represent standard errors of the mean.

Analyses with order, experiment condition, and gain levels follow. We used three Latin squares to complete a counterbalanced array for 15 subjects at 5 different conditions. Thus, three subjects from each group had performed the experiment first in a given condition. A mixed model analysis on the dependent variable turning error, with translational gain levels (1:1, 10:1, 25:1, 50:1, and 100:1) and order (1:1 first, 10:1 first, 25:1 first, 50:1 first, 100:1 first) within group, and experiment condition (eyeheight, linear, nonlinear) between groups, showed a main effect of gain \(F(4, 120) = 9.7, p < .001\); a main effect of order \(F(4, 30) = 2.6\), \(p = .05\), and a main effect of condition \(F(2, 30) = 7.4\), \(p < .005\). Only the gain by condition interaction was significant, \(F(8, 120) = 2.9\), \(p < .05\). Participants were liable to make more errors in the 1:1 and 100:1 gain levels, more errors when they had the 10:1 gain level first in the eye-height condition (one-way \(F(4, 10) = 4.1\), \(p < .05\)) and the 50:1 gain level first in the linear gain condition (one-way \(F(4, 10) = 5.5\), \(p < .05\)). Overall participants made the fewest errors in the nonlinear gain condition. When we repeated the analyses without the 1:1 gain level (i.e., with only four gain levels), we obtained similar main effects of gain, order, and condition but no interactions were significant. A similar analysis on latency as the dependent variable showed a main effect of gain, \(F (4, 120) = 4.1\), \(p = .02\), but no effect of order or condition. The gain by order interaction was significant, \(F(16, 120) = 3.6\), \(p = .001\). There were no other significant interactions. In general participants were slower in responding to the gain levels that they first performed, however overall most participants took longer to respond when they started with the 100:1 and 10:1 gain levels. These results did not change when we removed the 1:1 gain level from the analyses.

Fig. 14.
figure 14

This figure shows the mean turning errors in the Nonlinear Gain condition for each of the translational gains. Error bars represent standard errors of the mean.

Fig. 15.
figure 15

This figure shows the mean latencies in the Nonlinear Gain condition for each of the translational gains. Error bars represent standard errors of the mean.

We report the effects of three experimental conditions (linear, nonlinear, and eyeheight) analyzed without the 1:1 data in all of the conditions. We started by testing for effects of the levels of translational gain (four), in the three different experimental conditions. All subjects were tested on different levels of translational gain, hence gain was a within-subjects factor; subjects were tested in one of the three experimental conditions, hence experiment condition was between-subjects. Separate analyses were done for each of the two dependent variables, turning error and latency. A multivariate repeated measures analysis on mean turning error showed main effects of gain, \(F(3, 126) = 11.4\), \(p < .001\), and experiment condition, \(F(2, 42) = 7.6\), \(p = .002\), but no significant interaction. Participants’ errors were less in the 10:1 gain level, and increased as gain increased; participants’ errors were also less in the nonlinear gain condition than in the other two experimental groups. Planned comparisons revealed that errors in the 10:1 gain level were significantly lower than errors in the 50:1 (\(t(44) = -2.4\), \(p < .05\)), and errors in the 10:1, 25:1 and 50:1 gain levels were all lower than errors in the 100:1 gain level (all \(t > 3\), \(p < .001\)). A similar within–subjects analyses on mean latency showed a main effect of gain, \(F(3, 126) = 3.9\), \(p < .05\), no significant effect of the experimental condition, and no significant interaction. Similar to error, planned comparisons revealed that participants were faster to respond in the 10:1, 25:1, and 50:1 gain levels, than in the 100:1 gain level, all \(t > 2\), \(p < .05\).

7 Discussion

This paper looks at how high gain can be scaled. Increasing the user’s eyeheight proportional to gain was added as an extra factor in the experimental design. Eyeheight could potentially aid in spatial orientation and this warranted further investigation. The results of this work suggest further techniques on how best to build a virtual HMD system when the size of the tracking space is small.

Three experiments are presented in this paper. The first two experiments investigate a method of minimizing small head movements when gain is scaled higher than ten. A user study indicates two movements that were particularly distracting in high gain: simply looking around the environment and localized movements. Thus the method of ramping up to high gain discussed in this work minimizes these effects. Experiment 1 reports that subjects preferred a 0.5 m/s critical threshold because they were able to control local and global movements. This critical speed threshold is found using a cubic function to move into a linearly scaled translational gain. In Experiment 2, the critical threshold value is fixed at 0.5 m/s, and we find that subjects preferred an exponential ramping function.

Fig. 16.
figure 16

This figure shows the mean turning errors in the scaled Eyeheight condition for each of the translational gains. Error bars represent standard errors of the mean.

Fig. 17.
figure 17

This figure shows the mean latencies in the scaled Eyeheight condition for each of the translational gains. Error bars represent standard errors of the mean.

The results of Experiment 1 suggest that using this ramping function is an effective method of minimizing the visible effects of small head movements. We test this more closely in Experiment 3 using four different gain values (10:1, 25:1, 50:1, 100:1). Experiment 3 further reveals that using the ramping function results in better spatial orientation than simply scaling gain linearly. Turning errors in this condition are significantly better than the linearly scaled gain. There is also a marginal effect of nonlinearly scaling gain on latency. This marginal effect of faster responses in the nonlinear gain condition could suggest that people are more spatially oriented, but definitely shows that people were not making speed accuracy trade-offs. Experiment 3 also shows that scaling eyeheight proportionally to gain did not aid in spatial orientation as compared to linearly scaling gain.

This work shows that scaling gain nonlinearly is an effective method of exploring a large VEs for gains up to 50. According to results of Experiment 3, turning errors and latencies get significantly worse at 100:1, making 100:1 an unreasonable choice for allowing users to explore a VE and expecting them to maintain spatial orientation. At 50:1, turning errors and latencies are statistically the same as the 10:1 and 25:1 levels. Performance is better at the 50:1 gain than at the 1:1 gain. Thus, with a tracked HMD system, one can expect to explore a virtual space 50 times the size of the tracked space. For example, a 5 m by 5 m tracked HMD space allows users to explore a virtual space that is 250 m by 250 m. This increase is a huge space gain.

In Experiment 3, we also looked at spatial orientation when eyeheight was scaled proportionally to gain. Our motivation for doing this was that virtual reality allows user to experience environments in ways that they could not normally in the real world. Thus, we hypothesized that manipulating eyeheight could give the user an advantage when exploring a large city where the user would have a map-like overview of the environment. However, we found that scaling the eyeheight proportionally to gain does not result in better spatial orientation than scaling gain using the user’s normal eyeheight. Raising the eyeheight did bring up an interesting issue about viewing angle with HMDs and its role on our ability to be spatially oriented in an environment.

We conjectured that the high errors in the 1:1 condition of Experiment 3 occurred because the objects appeared on the ground and users had to look downward to view and memorize the locations of the objects in this condition as opposed to more naturally viewing them in the other conditions. Williams et al. [26] looked specifically at people’s ability to learn the spatial layout of objects at different viewing angles by having subjects memorize objects of different heights across conditions. They found no effect of viewing angle. Attempting to replicate this result with more controlled factors is a subject for future work.

Simply scaling translation gain is not the final answer to the problem of exploring large VEs, however. Inevitably, the physical limits of the tracking system will be reached. Our related research presents methods that were developed to intervene with users when they reach the end of their physical space by changing their location in physical space while maintaining their spatial orientation and location in the virtual environment [23, 24]. This system of interventions, called resets, can be combined with the system of scaled translational gain described here [29]. Xie et al. [29] used such a system to navigate in a VE that measured 750 m by 750 m with turning errors close to those in this paper. Several factors remain to be engineered before this becomes a practical system, but this work and Xie et al. [29] may form the basis for a system that can allow users to freely explore vast VEs.

Finally, although our results regarding eyeheight were disappointing, we feel it is too early to dismiss it as a modality for navigating in a VE. Experiment 3 raises some interesting questions regarding the role of eyeheight on spatial orientation in a VE. We would like to revisit this topic in future work. Specifically, we would like to fix eyeheight relative to different gains. We feel that increasing eyeheight proportionally to gain in our experiments resulted in participants being too high in the VE.