Keywords

1 Introduction

Advancements in virtual reality (VR) and augmented reality (AR), along with the ever-increasing graphical capabilities of modern computers, emphasizes the need for improved three-dimensional, or 3D, interaction. Given that objects in real 3D space can be described and controlled in six degrees of freedom (DOF), it is reasonable to assert that similar DOFs are needed for an input control to achieve natural interaction in virtual 3D space.

There has been substantial research on methods of input with DOFs higher than the typical two DOF common in traditional inputs like the mouse and capacitive touch. Ortega et al. [15] provide an extensive overview of the technologies and devices supporting higher DOF inputs.

Despite emerging methods of input, the traditional computer mouse remains the primary device for 3D tasks in areas such as 3D modeling, computer-aided design (CAD), and game development. Inputs with only two DOF are also ubiquitous in the video game industry, as joysticks are designed into nearly every controller for Nintendo, Xbox, and PlayStation consoles.

Whether lower DOF devices are better, or even adequate, for 3D tasks seems inconclusive. While there is some evidence that two-DOF inputs are fast and accurate for 3D selection [18] and 3D object translation [1], other evaluations found that input devices with three DOF are faster for 3D object rotation [e.g., 6].

The research described herein is the first empirical study of 3D object rotation using Fitts’ law and the performance measure throughput, as described in ISO 9241-9 [7].Footnote 1 In the following section, we review related work and the calculation of throughput, as per the ISO standard with extensions to our technique for 3D rotation. Then, we discuss the methodology and results of our user study. This is followed with our conclusions on user performance with the devices tested and task employed.

2 Related Work

Table 1 gives an overview of research where user studies evaluated devices for 2D and 3D tasks. The table is organized is by task type, with target selection tasks on top and object translation and rotation tasks on the bottom. The table also identifies the performance measurements and the types of inputs.

Table 1. Overview of user studies evaluating methods of 2D and 3D input.

Additional discussion on the studies in Table 1 is provided in the following sections.

2.1 Degrees of Freedom in Input Devices for 3D Tasks

The number of DOF in an input control determines the mappings to an output display. Interaction where an input device has the same DOFs as an object on an output display can achieve spatial congruence, resulting in natural interactions that are easy to learn [9, p. 78]. When there are less DOF in the device than the display, modes or mappings are required. These must be learned, however, and this may impact usability.

Table 1 highlights evaluations of devices with varying DOF. Both mice and joysticks have been evaluated using Fitts’ law tasks for target selection in 2D [11, 14]. Since these devices have two DOF, they have a near-congruent relationship with a desktop display, where only the input z-axis requires mapping to the output y-axis. As an example, the 2D control display mappings for the mouse are shown in Fig. 1.

Fig. 1.
figure 1

Mapping of mouse control space to cursor display space [9, p. 76]. The plus (+) symbol indicates that positive motion of the control yields positive motion in the display.

The mapping from the z-axis to the y-axis is simple and is easily learned. However, these two–DOF devices require complex mappings to control 3D displays. Consider the translation and rotation of an object in 3D, a task that requires six DOF. A mouse or joystick can only control two DOF at once without mappings or modes. Common practices for 3D rotation with two-DOF devices are to introduce additional graphical controls [2] or to implement a virtual sphere mapping [2, 4,5,6]. The virtual sphere maps each axis in the two-DOF device to an axis of rotation in the three-DOF display. This is shown in Fig. 2. Note that rotation around the z-axis cannot be controlled.

Fig. 2.
figure 2

Virtual sphere mapping of two-DOF controls.

The translation of an object in 3D can be accomplished with two DOF using multiple viewpoint cameras [1] or ray-casting techniques [18].

In contrast, devices with six DOF can achieve spatial congruence in 3D translation and rotation with one-to-one control-display mappings. Typically, the six DOF are implemented via an accelerometer and magnetic sensor [1, 20, 22]. Such mapping is shown in Fig. 3.

Fig. 3.
figure 3

3D spatially congruent mapping with a six-DOF input control.

Prior work found that spatially congruent mappings with higher DOF devices do not necessarily make 3D target selection faster or more accurate than with two-DOF devices like the mouse [1, 18]. The robust user performance measure known as throughput has been used for some of these evaluations [18, 21].

Other empirical evidence, such as the work of Hinckley et al. [6], suggests that for the three-DOF task of object rotation, spatial congruency reduces task time compared to the mouse. However, to our knowledge, no such study exists using throughput as a performance measure for 3D rotation.

2.2 Fitts’ Law and Throughput for 2D Target Selection

Fitts’ law quantifies the relationship between distance, movement time, and accuracy for rapid aimed movements [3]. The usual formulation of this relationship is

$$ MT = a + b\,ID $$
(1)

where MT is the movement time to complete a target-selection task, a and b are linear regression coefficients, and ID is the index of difficulty, with units “bits”. ID was originally defined as

$$ ID = \log_{2} \left( {\frac{2A}{W}} \right) . $$
(2)

The A variable is the amplitude of the movement, or the distance from the start of an initial location to a final target of width W.

Fitts also defined the term IP, called the index of performance, which quantifies the human information capacity of the motor system. IP has units “bits per second”, or “bps”, and is defined as

$$ IP = ID/MT . $$
(3)

MacKenzie proposed a variation of ID according to Shannon’s information theory [16], modifying ID to

$$ ID = \log_{2} \left( {\frac{A}{W} + 1} \right) . $$
(4)

This variation has been incorporated into an ISO standard for performance measurements of pointing devices. ISO 9241-9 proposes the throughput (TP) measurement [7], which has been refined to

$$ TP = \frac{{\log_{2} \left( {\frac{{A_{\text{e}} }}{{4.133 \times SD_{x} }} + 1} \right)}}{MT} . $$
(5)

Throughput is the current equivalent of Fitts’ index of performance. The \( A_{\text{e}} \) term in Eq. 5 is the effective amplitude, which is the amount a participant or cursor moved, rather than what the task specified. The \( SD_{x} \) term is the standard deviation of the selection endpoints, as projected on the task axis.

Throughput’s usefulness comes from its robustness and inclusion of both speed and accuracy [10]. Furthermore, if calculated consistently, it provides a basis for between-study comparisons, where device evaluations and findings are directly compared in different studies [16].

2.3 3D Target Selection

Since the emergence of ISO 9241-9, throughput is the standard user performance measure for evaluating of 2D pointing devices, and thus extensive comparisons can be made with existing literature. However, despite throughput being robust enough to describe movements in 3D as well as 2D, the calculation of throughput in 3D tasks is much less common.

Some exceptions are the work of Young et al. [21], where throughput is a performance measure for a 3D arm-mounted inertial controller, and the work of Teather and Stuerzlinger [18], where the throughputs of various 3D pen and cursor inputs are compared with throughputs typical of two-DOF mice and pen devices. Teather et al. [19] designed a system to extend the ISO 9241-9 Fitts’ law pointing task into 3D environments, adding a depth component to targets.

2.4 Throughput Calculation for Rotation Tasks

Fitts’ law has also been examined in rotation tasks. Meyer et al. [13] conducted 1D rotation experiments using Fitts’ original definition of ID, shown in Eq. 2. Using an apparatus for measuring wrist rotation, they studied rotation about the wrist joint from an initial angle to a target angle. A was defined as the specified rotation, in degrees, from the starting position to the target, and W was defined as the specified target range, in degrees. From their experiments, Meyer et al. presented Fitts’ law-derived descriptive models for total, primary-submovement, and secondary-submovement endpoint distributions and movement times [13]. Since this work predates the introduction of throughput, index of performance is used instead of throughput in their quantitative analysis.

In more recent work, Stoelen and Akin conducted a 1D rotation Fitts’ law experiment using the modified definition of ID, shown in Eq. 4 [17]. Throughput was used in their quantitative analysis. They defined the rotational index of difficulty ID as

$$ ID = \log_{2} \left( {\frac{\alpha }{\omega } + 1} \right) $$
(6)

where \( \alpha \) is the rotation amplitude between a cursor’s start and target angle, and \( \omega \) is the target angle width. These parameters are shown in Fig. 4.

Fig. 4.
figure 4

Definition of \( \alpha \) and \( \omega \) in a rotational index of difficulty (ID) [17].

2.5 Defining a Fitts’ Law Task for 3D Rotation

The experimental task presented herein was designed to be a spatially congruent interpretation of the 2D Fitts’ law task described in the ISO 9241-9 standard. A standard 2D implementation is provided in the FittsTaskTwo software, as shown in Fig. 5.Footnote 2

Fig. 5.
figure 5

ISO 9241-9 2D task showing a sequence of 13 targets.

Some challenges were faced in designing a suitable task to test throughput for object rotation in 3D. Firstly, because of the 3D nature of the task, both the cursor and the target become warped when oriented at certain angles in 3D space. For example, when oriented along the x-axis, all depth in the z-direction of the target range of angles is lost. This removes the user’s ability to place the cursor in the center of the target angle range. The image on the left in Fig. 6 illustrates this. It was therefore decided that the target should not be fixed at any orientation other than along z-axis perpendicular to the screen. This prevents the true target angle range from becoming distorted.

Fig. 6.
figure 6

A single discrete rotation task.

Furthermore, the target can be represented by a cone because it encompasses a range of angles; however, the cursor can only be represented by a single line. Thus, the cursor becomes foreshortened when oriented in the z-direction, becoming a single point when parallel to it. For this reason, it was decided that the cursor should be fixed in the positive z-direction, facing towards the user. Although this restricts the cursor to a single point, it can be represented by a crosshair and becomes consistent throughout the trials. These issues can be mitigated somewhat if the apparatus includes 3D glasses and head tracking; however, our apparatus was limited to 2D rendering of 3D space.

Combining the above restrictions, the task was designed so that the target, represented by a cone, must be rotated to the cursor, represented by a crosshair and fixed along the z-axis in the positive direction. The target becomes undistorted and circular in shape after being rotated to the cursor. The circular surface of the target cone informs the user if the target is facing in the positive or negative z-direction; the cone is red when on the same side of the z-axis as the cursor, and grey when on the opposing side of the z-axis. Once the target is on the cursor, the user presses Enter to end the task. Figure 6 shows a single example trial. Like the ISO 9241-9 2D task, subsequent targets alternate in a rotating fashion around the center point. Unlike the standard 2D task, the designed 3D rotation task is a series of discrete tasks, and thus a reaction time component exists. Reaction time was accounted for by starting the task timer only when the cursor leaves a negligible dead zone.

Extending Eq. 6 to 3D, and considering adjustments for the effective index of difficulty, the parameters \( \alpha \) and \( \omega \) are defined as

$$ \alpha = \cos^{ - 1} ({\mathbf{A}} \cdot {\mathbf{z}}) $$
(7)

and

$$ \alpha_{\text{e}} = \alpha + dx . $$
(8)

The coefficients in Eqs. 7 and 8 are illustrated in Fig. 7.

Fig. 7.
figure 7

Amplitude (\( \alpha \)), width (\( \omega \)), and axes definitions.

Recommendations outlined by Soukoreff and MacKenzie [16] were used for Fitts’ law model construction. For calculating the effective index of difficulty \( ID_{\text{e}} \), amplitude \( \alpha \) was adjusted to the effective amplitude \( \alpha_{\text{e}} \) using the angle difference between the cursor and the center of the target at the end of the task, as projected onto the task axis. The target width \( \omega \) was adjusted to \( \omega_{\text{e}} \) using the standard deviation \( SD_{x} \) of the task endpoint differences dx. Equation 7 through Eq. 11 were used for throughput calculation:

$$ \omega_{\text{e}} = 4.133 \times SD_{x} $$
(9)
$$ ID_{\text{e}} = \log_{2} \left( {\frac{{\alpha_{\text{e}} }}{{\omega_{\text{e}} }} + 1} \right) $$
(10)
$$ TP = \frac{{ID_{\text{e}} }}{MT} $$
(11)

This is a three DOF task, but only requires two DOF to complete. This has the benefit of removing the need for modes in the two-DOF input conditions, so that the results are more comparable to those obtained in the standard 2D ISO 9241-9 task.

3 Methodology

3.1 Participants

Twelve unpaid participants were recruited from local universities. The participants were a mixture of graduate and undergraduate students. Two of the participants were female, eight were male. Ages ranged from 21 to 28 years. All participants were right-handed, though not by experimental design. Furthermore, all participants use computers daily.

The participants’ median response to the number of hours of video games played per week was two to four hours. Two of the participants do not play video games on a regular basis. Only two participants use CAD software, between two to four hours a week. Seven participants had more than two hours of experience using AR or VR technology.

3.2 Apparatus

Hardware.

A Microsoft Surface Pro 4 tablet running Windows 10 ran the experiment for all conditions (Fig. 8a). The tablet configuration included the optional keyboard. Only the Enter key was used on the keyboard.

Fig. 8.
figure 8

The host system and input devices. (a) Microsoft Surface Pro 4 tablet with optional keyboard, (b) Logitech M-U0026 mouse, (c) Microsoft Xbox One game controller, and (d) LG Nexus 5 mobile phone.

A Logitech M-U0026 optical USB mouse was used for the mouse condition (Fig. 8b). The default Windows 10 mouse pointer speed was used, with enhanced pointer precision enabled.

A Microsoft Xbox One controller was used for the joystick condition (Fig. 8c). Either the left or right thumb stick could be used for object rotation, depending on a participant’s preference. Joystick control-display gain was measured as the velocity of sphere rotation, in degrees per second. The gain was linear from 0°/s to 150°/s, corresponding to the joystick’s rest position and maximum displacement, respectively.

An LG Nexus 5 was used for its on-board magnetic gyroscope for the mobile phone condition (Fig. 8d). The gyroscope data were recorded at 50 Hz and streamed over an ad-hoc Wi-Fi network to the remote application on the tablet. The data streaming introduces a 46-ms lag in the system, but was not expected to increase error rates by more than about 5% [12]. Cursor movement was position-control with 1:1 gain (1° of device rotation caused 1° of cursor rotation on the sphere).

Software.

The experimental software was developed primarily in JavaScript and compiled using Electron.Footnote 3 A Node.js server was hosted on the tablet for all conditions and was responsible for data logging, as well as streaming gyroscope data in the mobile phone tasks. The 3D orientation task was implemented with the three.js library.Footnote 4

For both the mouse and joystick input methods, the Arcball method of rotation for two-DOF inputs was implemented because of its intuitiveness and ease of implementation [6].

Task.

The software implemented the experiment task described in the preceding section. For each input method, the participant performed eight sequences of 19 trials. For each input method, the participant was instructed to hold the device in their preferred hand, then place their other hand on the Enter key. This was meant to reduce homing time.

3.3 Procedure

When participants arrived, they were seated at the system and familiarized with the task. For each input method, they performed a practice sequence of at least 19 trials to get used to the input method. These practice trials were meant to nullify any learning effects during the timed trials.

After the practice trials, participants were asked to perform all eight sequences continuously. They were instructed to select the targets at a comfortable pace, while proceeding as quickly and accurately as possible. The participants were permitted a rest break after each method of input. An example of a participant performing the experiment task with each input method is shown in Fig. 9.

Fig. 9.
figure 9

Input methods: mobile phone (left), mouse (middle), and joystick (right).

A questionnaire was given after the experiment to gather information on how much time they played video games, worked with CAD software, and worked with VR or AR technology. The questionnaire also inquired about participants’ impressions of the input methods on a 7-point Likert scale.

3.4 Design

The experiment was a fully within-subjects \( 3 \times 4 \times 2 \) design, with the following independent variables and levels:

Input Method: mobile phone, mouse, joystick

Amplitude (\( \alpha \)): \( \frac{\pi }{4}, \frac{\pi }{3}, \frac{3\pi }{4}, \frac{5\pi }{6} \)

Width (\( \omega \)): \( \frac{\pi }{8}, \frac{\pi }{16} \)

The independent variable of primary interest was input method. The amplitude and width independent variables were necessary to ensure the computation of throughput covered a range of task difficulties. The primary dependent variables were movement time, error rate, and throughput.

To offset learning effects the order of presenting the three input methods was counterbalanced. Two participants were assigned to each of the six possible orders.

The amplitude and width variations give way to eight \( IDs \) ranging from 1.58 bits to 3.84 bits. See Table 2. These were presented in each condition in a random sequence. For each of the eight sequences, 19 trials were performed. Throughput and error rate were calculated for each of the eight sequences.

Table 2. Task amplitudes, widths, and index of difficulties.

In all, there were 12 Participants × 3 Input Methods × 4 Amplitudes × 2 Widths × 19 Trials = 5472 trials.

4 Results and Discussion

After the experiment was finished, the data were imported into a Microsoft Excel spreadsheet where summaries of various measures were calculated and charts were created. The statistical tests were performed using the GoStats application.Footnote 5

4.1 Data Adjustment

During many of the trials for the mobile phone condition, participants had to adjust their grip on the device or had trouble mapping the device’s orientation to the virtual object’s orientation. This usually happened when the target amplitude changed from one sequence to the next, resulting in a task that required a new, drastic rotation. These trials, which typically had movement times greater than two standard deviations from the 19-trial mean, were considered outliers and removed. Using this criterion, 369 of 5472 trials (6.7%) were excluded from analysis.Footnote 6 After outlier removal, input method throughput and error rate were calculated by first calculating throughput and error rate on each 19-trial sequence, then averaging these measures to produce values across participants and conditions.

4.2 Throughput

The grand mean for throughput was 2.86 bps. Figure 10 shows the throughputs for each input method. The mouse throughput of 4.09 bps was about 100% greater than the mobile phone throughput at 2.05 bps and 70% greater than the joystick throughput at 2.42 bps. An ANOVA revealed that input method had a significant effect on throughput (F2,10 = 45.04, p < .0001). A Scheffé post hoc analysis revealed that the difference was significant between the mouse and the mobile phone and the mouse and the joystick, but not between the mobile phone and the joystick.

Fig. 10.
figure 10

Throughput by input method. Error bars show ± 1 SE.

The throughputs for the mouse and joystick were similar to values reported in other work [11, 14, 18]. Interestingly, the joystick in this rotation task yielded a higher throughput (2.42 bps) than it did for 2D selection experiments performed by MacKenzie et al. [11] (1.8 bps) and Natapov and MacKenzie [14] (2.01 bps). This suggests that the thumb-controlled joystick is better suited to the 3D rotation task than it is to the 2D target-selection task. However, a true assessment of this would require a within-study comparison of these conditions, to eliminate confounding influences.

Despite the spatially congruent mapping, the mobile phone condition still performed worse than the input methods with less DOF. The possible reasons are discussed below.

4.3 Error Rate

The grand mean for error rate was 2.03%. Figure 11 shows the error rate for each input method. The mouse condition had the lowest error rate of only 0.88%, and the mobile phone condition had the highest error rate of 3.46%. The joystick error rate was about half the mobile phone error rate (1.76%). There was a significant effect of input method on error rate (F2,10 = 10.63, p < .001). A Scheffé post hoc analysis revealed that the error rate only differed significantly between the mouse and mobile phone.

Fig. 11.
figure 11

Error rate by input method. Error bars show ± 1 SE.

The error rates for each input method were relatively low compared to other research. The mouse in the work of MacKenzie et al. [11] had a throughput and error rate of 4.9 bps and 9.4%, whereas we observed 4.09 bps and 0.88% in the 3D rotation task. This discrepancy might be caused by our participants emphasizing accuracy over speed in the rotation task, along with the relatively large target widths.

4.4 Effective Index of Difficulty

Since participants rarely performed the task exactly as specified, the effective index of difficulty IDe varies. Table 3 shows the specified and effective index of difficulties averaged across participants, along with the standard deviation of endpoints for each condition.

Table 3. Expected index of difficulty (ID) and endpoint variation (SDx) and for each condition the observed effective index of difficulty (IDe) and standard deviation (SDx).

The specified, or “expected”, endpoint variation is normalized to a 4% error rate [16], and is calculated as

$$ SD_{x} = \frac{\omega }{4.133}. $$
(12)

As for the observed participant behavior, there was a consistent discrepancy between the difficulty of the task specified and the difficulty of the task performed. For all conditions, IDe > ID. This is a natural consequence of participants focusing on accuracy and achieving low error rates. With a low error rate, participants’ endpoint variation was lower than the variation expected for a nominal 4% error rate. This tends to push IDe up, as seen in Table 3.

4.5 Fitts’ Law Linear Regression

By averaging the eight sequence conditions separately over all the participants, a linear regression model was created for each input method. The models are shown in Fig. 12 in the plots of movement time vs. IDe.

Fig. 12.
figure 12

Movement time (ms) vs. effective index of difficulty (bits) with regression models.

For both the mouse and joystick conditions, the linear regression model fits the data well (R2 > .9), indicating that both conform to Fitts’ law. Furthermore, the intercepts are relatively small, within the range outlined by Soukoreff and MacKenzie [16].

These results, along with the resemblance of the 3D rotation throughput to 2D target selection throughput in other work, validate the methodology and task design. The 3D rotation task can be representative of how a device with two DOF might perform in the 2D ISO 9241-9 task.

However, there is no overlooking the exceptionally poor fit of the mobile phone data to the regression model (R2 = .0126). The effective index of difficulty for the mobile phone condition had no apparent impact on the movement time. There are a couple of explanations.

First, the mobile phone condition was not performed single-handed for this task, as originally expected. In many cases, participants had to use both hands to manipulate the mobile phone, taking their second hand from the Enter key. This doubly impacted homing time, since time was spent moving from the Enter key both before and after movement to the target.

Reaction time was also not properly eliminated in the mobile phone condition. A negligible dead zone was created at the start of each trial, inside which the movement timer would not start. For the mouse and joystick conditions, this effectively removed reaction time because the dead zone was never left before the participant purposely moved the cursor. However, because the mobile phone operated in position-control and is inherently affected by noise in the magnetic sensor, it frequently triggered a false start, causing the system to record reaction time as well.

4.6 Qualitative Results

Participants provided their impressions on the input methods at the end of the experiment. On a 7-point Likert scale, they were asked to rate how well they thought each input method was implemented (1 = not well, 7 = very well). The results are seen in Fig. 13. From best to worst, the results favored the mouse (6.8), then the joystick (6.1), then the mobile phone (4.2). A Friedman non-parametric test deemed the differences statistically significant (χ2 = 20.4, df = 2, p < .0001). All pairwise differences were also significant, as indicated by Conover’s F (p < .05). The mobile phone rating is quite poor compared to the mouse and joystick ratings. Clearly, there is room for improvement in the mobile phone interaction.

Fig. 13.
figure 13

Seven-point Likert-scale responses on participants’ impressions on the implementation of each input method. Higher scores are better. Error bars show ± 1 SD.

Many participants also described the mobile phone condition as fatiguing, possibly negatively impacting their performance. One participant commented on arm pain when using the mobile phone condition. Another noted, “I had to mind [the] spatial mapping between several things: what I see, hand movements (and its limit), [the] phone, [and the] phone’s shape”.

5 Conclusion

A novel task for testing two- and three-DOF devices was designed and evaluated. The task used throughput as a performance measure and was intended to support the comparison of two-DOF devices used for 2D selection tasks with higher-DOF devices used for 3D rotation tasks.

The throughput for the mouse and joystick and the corresponding linear Fitts’ law equations demonstrate that, with proper control-display mapping, a two-DOF device performs just as well in a 3D rotation task as it does in a 2D target-selection task.

The mobile phone accelerometer, however, did not work well, producing a low value of throughput, a poor Fitts’ law model, and an overall negative impression on participants.

6 Future Work

Though the 3D rotation task performed favorably for the standard two DOF inputs, it requires more testing and validation for three DOF inputs. Homing time can be further reduced by introducing a method of rotation confirmation that does not require a free hand, such as a foot pedal or button on each input method. Reaction time can be reduced by implementing a larger dead zone, and then adjusting for its impact on effective amplitude.