Keywords

1 Introduction

The recent developments in low-cost tracking systems such as the Wii Remote, Leap Motion, Microsoft Kinect [1] and the advances in gesture recognition algorithms led to the increasing popularity of gesture-based interfaces given their relevance for a growing number of applications namely in gaming, Virtual and Augmented Reality [2, 3], and in other scenarios [4]. However, the development of gesture interfaces poses several usability and technical issues and challenges related to the lack of universal consensus regarding gesture-function associations, and the need to tackle a variety of environment types and technical limitations as mentioned by Wachs et al. [5] and Norman and Nielsen [6]. 3D user interfaces are seen as the natural choice for large displays contexts as pointed out by Bowman [7]. According to these authors the traditional mouse and keyboard setups are difficult to use with public displays also because they are meant to call multiple users to interact. Moreover, users want some freedom of movement in front of the display. Touch screens can solve some of these problems, but not without imposing another set of limitations, e.g. the user must stand at arm’s reach from the display limiting the amount of the display that can be seen. Interfaces that do not require the user to wear any additional input devices allowing unconstrained interaction with the user own body through gestures are essential in this public large displays context fostering alternatives in terms of design and interaction [7].

The work presented in this paper was developed as part of a public display interactive system we have installed at the entrance hall of our Department (shown in Fig. 1). The system consists of a large screen and a Kinect sensor, and is meant to provide relevant information concerning our department, and showcase demos or games [8]. Several contents and interaction methods have been developed as part of this interactive system, though most of the content used was 2D based. Besides these existing applications, it was deemed useful to allow virtual walkthroughs as well as the presentation of 3D models of prototypes developed at the department and thus the need to visualize and interact through gestures with the 3D content available on our system appeared. This would allow passing by users to navigate through virtual environments, manipulate 3D models or simply have fun with 3D games. The remaining of this paper describes related work and the design, implementation, and evaluation of suitable methods for the purpose of 3D object manipulation in such a scenario.

Fig. 1.
figure 1

Public display interactive system installed at the entrance hall of our Department.

2 Related Work

Jankowski and Hachet [9] define three universal interaction tasks commonly used throughout the literature: Navigation – the process of moving around in a virtual environment; Selection and Manipulation – the ability to choose an object and perform translation, rotation and scaling operations; System Control - the communication between user and the system which is not part of the virtual environment. The scope of this paper is only the manipulation task, since the work focuses on the development of gesture-based methods that enable the visualization and manipulation of 3D models. This problem is not new: Bowman and Hodges [10] already mention this capability as a desirable feature in many VR applications that is typically accomplished using a real-world metaphor. Manipulation of 3D virtual objects using hand gestures appeared in the eighties through instrumented gloves [11]. In the following years several other approaches have been used such as the solution of [12] using vision-based gesture tracking, and the ones described in [5, 13, 14]; it is, however, the development of affordable depth cameras that is boosting and encouraging numerous applications since these devices have helped overcome many of the technical challenges of gesture tracking.

Freehand gestures have been used in various situations, as computer aided design, medical systems and assistive technologies, computer supported collaborative work systems, mobile, tangible and wearable computing, as well as in entertainment and human-robot interaction, but virtual reality systems have been a particularly interesting application for spatial freehand gestures, as in [2, 16]. More recently, gestures have also appeared as an attractive alternative in ubiquitous computing or for interaction with large public displays that create the opportunity for passing by users to access and interact with public or private content. These scenarios require at distance interaction and benefit significantly when no input device is needed as in gesture interaction [2, 17].

Real world metaphors are commonly used to accomplish such desirable properties of 3D object visualization and manipulation in interactive systems [15]. In this vein, the work of Song et al. [18] significantly contributed to the development of one of our manipulation methods. They proposed a handle bar metaphor as an effective control between the user’s hand gestures and the corresponding virtual object manipulation operations. The main strength of this metaphor is the physical familiarity that is presented to the users, as they mentally map their bi-manual hand gestures to manipulation operations such as translation and rotation in a 3D virtual environment. One of the features that proved to be effective was the visual representation of a virtual handle bar corresponding to the ever-changing positions of the user’s hands since it provided a strong sense of control to the user during the interactive visual manipulation.

In our case, previous experience with the system [8], and the literature, namely [18, 19], provided hints on a set of gestures that might be used as a starting point for a refinement process. This process evolved iteratively based on preliminary experiments and the analysis of qualitative and quantitative data collected through the observation of users interacting with the system, logging their interaction, and asking for their opinion and suggestions. The results obtained from this analysis were used as formative evaluation to improve alternative interaction methods until they were usable enough to be integrated in our system. According to [2] the majority of evaluations of freehand gesture systems have been of an exploratory nature used as formative evaluation [20]; however, we deem summative evaluation is important to guarantee the methods are usable enough and thus a final user study was performed to compare the alternatives and select the best fit for the purpose of 3D object manipulation.

3 Proposed Manipulation Methods

According to Bowman et al. [21], the word manipulation means the act of handling physical objects using user’s hands. In this paper we consider the manipulation tasks of Rotating and Scaling, which respectively consist in changing the orientation and increasing or decreasing the size of an object. These operations were selected as basic for our system as they will allow the user to view and explore 3D models of prototypes by modifying the zoom and viewpoints.

Two methods for 3D object manipulation dubbed “OneHand” and “HandleBar” were developed. “OneHand” allows the user to rotate the object while grabbing it with the dominant hand (closing hand), while the “HandleBar” method uses the position of the user’s hands to determine the rotation angle of the object. A 3D handle-bar model was placed at the center of the object in order to visually map the position of the hands. Both methods also provided scaling operations. While for the “OneHand” we resorted to the GUI, using two additional buttons to increase or decrease the scaling factor of the object, for the “HandleBar” we were inspired by the pinch gesture used to zoom in mobile phones and multi-touch applications but, instead of the fingers, we map the distance between user’s hands to the scaling factor.

3.1 OneHand

Our first approach was to manipulate a 3D virtual object with a single hand, using the cursor-based metaphor of grabbing and manipulating it with the hand movement. We implemented the “OneHand” method (Fig. 2) in which the rotation is determined using the offset between the grabbing point (where the grab gesture was first detected) and the position of the moving hand. The Microsoft Kinect SDK provided the grab and release events for the user’s dominant hand. The scaling manipulation is implemented using two GUI buttons (Fig. 2 (b)). By continuously pushing either button with the dominant hand it is possible to increase or decrease the scaling factor of the 3D object.

Fig. 2.
figure 2

“OneHand” method (a) Rotation around the Y and Z axes (b) Scaling.

The rotation of the 3D object (Fig. 2 (a)) is calculated for both offsets for axes Y and Z and determined using the 2D coordinates of the hand cursor (screen space), provided by the KinectRegion from SDK controls. Since the rotation of every object on the 3D engine is described by a four-dimensional vector space (Quaternion), we have implemented a function to generate a quaternion from an angle and a direction. The current object rotation is given by the multiplication between the quaternions describing the rotations around Y and Z and the previous accumulated rotations. This accumulated rotation defines all previous transformations and is updated each time a hand release event occurs.

3.2 HandleBar

After preliminary tests it became clear that the “OneHand” method had some limitations due to the cursor-based movement, which implied a mapping of a 2D coordinate space of the cursor into the 3D space of the object. As an additional constraint, the Kinect sensor was not able to detect the wrist orientation which impeded the implementation of all 3 degrees of freedom (DOF) of the rotation. As opposed to “OneHand”, in the “HandleBar” method (Fig. 3) manipulation occurs in a 3D space, in which the 3D coordinates of the user’s hands are mapped directly for the manipulation of the object. This method implements a handle bar metaphor based upon the experiments of Song et al. [18]. This metaphor consists of a bi-manual interaction to manipulate a single object, using the grab and release gestures. A virtual handle bar is positioned at the center of the 3D object representing the relative orientation of the users’ hands, and provides helpful visual feedback. In this method, the KinectRegion provided by the Kinect SDK to obtain the hands position and state was not sufficient. In order to obtain the hands 3D positions it was necessary to access the skeleton data provided by the Kinect SDK. The hands state (close or open) was also provided by the Kinect SDK through the KinectInteraction API.

Fig. 3.
figure 3

“HandleBar” method. (a) Top view, rotation around the Z axis. (b) Front view, rotation around the X axis. (c) Scaling. (Color figure online)

The rotation and scaling were implemented with 2 DOF as in “OneHand”. For each axis, the rotation of the object is based on the relative offset of the handle bar rotation (hands position). When both hands are closed (grab) the current rotation of the object is temporarily stored, and the offset is determined based on the relative position of each hand. Then, similarly to the “OneHand”, we generate the quaternion from the angle and direction. In this case, the vectors encoding the direction are different, since the rotation DOF differ from the previous method. Figure 3(a, b) shows the two implemented rotations: to rotate the object around the Z axis, the user must move one of his/her hands forward and the other backwards, we map the angle between the imaginary line between both hands and the X axis of the object to the rotation of the object; to rotate around the X axis, one hand must move up and the other one move down and the rotation angle is the angle between the line defined by the hands and the Y axis of the object. No absolute angular mapping is needed since each time the user opens at least one hand, the rotation is stored and the user may re-initiate a bi-manual grab gesture at a new position and perform a further rotation. This allows the user to make large angular changes to the 3D virtual object with respect to the two axes without getting into an undesirable situation where, for example, the front hand occludes the back hand, making it impossible to determine the rotation angle. The object scaling is computed using the distance between the left hand and the right hand: the farther the hands, the larger the object. Again, we use an accumulation strategy, enabling the user to perform successive scaling operations. When the user reaches his/her maximum arm stretch or both hands are close together, he/she can release the manipulation by opening at least one hand; this event causes the storage of the current scaling, and the user may place their hands in another position and start another manipulation. Also, a color scheme gives feedback to the user when both hands are opened (white) and when both hands are in the grabbing state (green).

3.3 Improved HandleBar

During preliminary evaluation tests, some users proposed an improvement to the “HandleBar” method (Fig. 4). They suggested that we could use both hands movement in the same direction (up or down) to introduce the missing DOF, (rotation around the Y axis). This improvement was implemented first by checking if both hands are in the grab state. Then we analyze if both hands are parallel with each other, i.e. the hands are at the same height. If these two conditions are verified during the interaction, the angle between the Y axis and the imaginary line going from a reference position (the hip center joint) to the middle point between both hands is determined; with this distance we compute the rotation. This implies that the hip center joint position is retrieved from the skeleton data in each frame. Apart from that, the algorithm is similar to the one used in the original “HandleBar” method. This improvement allowed the full rotation DOFs (roll, pitch and yaw). In order to give visual feedback about the different rotations, we added an extra condition to the color scheme of the handle bar (orange) when the new gesture corresponding to a rotation around the Y axis is detected.

Fig. 4.
figure 4

Improved “HandleBar” method.

4 User Studies

We conducted a preliminary test with 8 students to establish the experimental protocol to be used. Moreover, this test served as testbed to determine which performance measures should be logged in order to evaluate the usability of the methods. This preliminary test also provided a way to improve the “HandleBar” method, as previously mentioned.

In order to evaluate the new version of “HandleBar”, comparing it with the “OneHand” method, a second user study was performed with the collaboration of 40 participants. This section presents the protocol used in these studies, as well as the main results.

4.1 Preliminary Test

The first test comprised two tasks meant to assess the usability of “OneHand” and “HandleBar”, as well as the accuracy attained by users with both methods while manipulating an object. The first task consisted in manipulating a sphere with a marker (represented by a multi-colored cross) using rotation only (Fig. 5); the goal was to rotate the sphere such that the marker was aligned with respect to the target (represented by a similar but transparent cross), thus matching the colors of both crosses. This task ended when users considered that the pointer best overlapped the target and did not interact with the system for 15 s. During the manipulation several events were automatically logged by the system in order to obtain the solid angular difference between models, elapsed time, and scaling.

Fig. 5.
figure 5

3D Manipulation – model used in the rotation test.

The second task introduced the scaling manipulation; this task consisted in manipulating a 3D model in order to match it to a target represented by the same model with a degree of transparency and in a different position (Fig. 6). The above mentioned termination condition was also used for this task; however, while the user was performing the task, the scaling value of the model was also recorded in addition to the other performance variables.

Fig. 6.
figure 6

3D Manipulation with “OneHand” – model and GUI buttons used in the test with rotation and scaling.

Each participant completed the two tasks using both methods and answered a post-task questionnaire including a few questions regarding the methods.

As mentioned, this preliminary test allowed fine tuning the experimental protocol, namely the variables logged. We observed that the total task time was not the best performance measure, as some users were perfectionists trying to leave the model as aligned as possible with the target and thus taking more time, while others just quickly performed the manipulation leaving the model fairly unaligned. Alternatively, the time users took to achieve an angular distance between the two models below 5° was logged. The underlying idea of measuring this time interval is to evaluate which method allows users to reach faster a small angular difference between models, meaning that it might be better suited for a coarse manipulation.

4.2 Controlled Experiment

The second study was a controlled experiment designed to test the (equality) hypothesis: both methods provide the same level of usability in the given context. The independent (input) variable was the manipulation method with two levels, “OneHand” and “Improved HandleBar”. Participants’ performance and satisfaction, as well as opinion on the methods were the dependent (output) variables. Performance was assessed through the above mentioned measures: angular distance between the final position of the manipulated object and the target position, scaling factor (a scaling factor of 1 means exactly the same scaling between source and target models) and the time to accomplish various tasks. Participants’ satisfaction and preferences were obtained from the post-experiment questionnaire. The questionnaire included ten questions to be answered in a 5 level Likert-type scale, as well as the possibility to leave any comments or suggestions concerning the methods. Questions were used to evaluate intuitiveness, need for training, easiness to obtain the desired position, annoying characteristics, and overall satisfaction.

As a within-subject experimental design was used, we counterbalanced for possible effects of learning or boredom on the results by asking half of the users to start by one method and the other half by the other method. The protocol for the experiment is illustrated in Fig. 7 and included 40 volunteer students (34 male and 6 female), aged between 20 and 27 years old. Most of the users stated in the questionnaire they did not have significant experience with 3D user interfaces.

Fig. 7.
figure 7

Experimental protocol: within-group experimental design; input variable (with two levels): manipulation method (“OneHand” and “Improved HandleBar”); output variables: participant’s performance, satisfaction and opinion.

5 Results and Discussion

Figure 8 and Table 1 present the main results for the controlled experiment. The average times, and final angular distances in degrees (between source and target models) are presented for both methods “Improved Handle-Bar” and “One-Hand” and for both experiments (rotation without and with scaling). A better performance with the “Improved HandleBar” method is clearly visible from the average times and angular distances, as these values were always lower than the ones obtained with “OneHand”. For all variables (total time, time to 5°, and angular distance) Wilcoxon tests rejected the equality hypothesis (with p < 0.05 for all cases: 0.00015 for total time, < 0.0000001 for time to 5° and 0.000312 for angular distance) meaning that the difference between the “Improved HandleBar” and the “OneHand” is significant in both experiments. It is clear from the box plot analysis that users spent more time aligning the 3D objects with “OneHand”; in particular, regarding the variable Time below 5°, users took almost twice as much time with the “OneHand” to reach an acceptable accuracy (5° error) than with the “Improved HandleBar” method.

Fig. 8.
figure 8

Times (total time and time below 5°, in seconds) with “OneHand” (OH) and “Improved HandleBar” (HB) methods for the rotation only and rotation with scaling tests.

Table 1. Average performance values obtained with “OneHand” (OH) and “Improved HandleBar” (HB) methods for the rotation only and rotation with scaling tests.

Data collected during the experiment, especially during the rotation only test suggests the “Improved HandleBar” method attained a much better acceptance among participants. Accordingly, the results obtained from the questionnaires concerning the first test (Fig. 9) show that, in terms of overall satisfaction, participants evaluated the “Improved HandleBar” method more positively than the “OneHand”. Specifically, variables evaluating the easiness in obtaining the desired position and the intuitiveness of the “Improved HandleBar” were much higher than the ones for “OneHand”. Moreover, when asked directly which method was more satisfactory and which had less annoying features, participants preferred the “Improved HandleBar” method. These differences were validated by a Wilcoxon matched pairs test (p < 0.05) that shows significant differences between methods concerning “Easy to obtain position”, “Intuitive manipulation”, “Annoying characteristics”, and “Overall satisfaction”.

Fig. 9.
figure 9

Questionnaire results for the rotation only test obtained with “OneHand” (OH) and “Improved HandleBar” (HB) (median values in a 5 point Lickert-type scale: 1-completely disagree, 5-completely agree).

Similar results were obtained concerning the rotation with scaling test (Fig. 10). Despite a smaller difference between methods, we can still observe a slight user preference for the “Improved HandleBar” over the “OneHand” method, namely regarding “Easy to obtain position”, “Easy with more training” and “Overall satisfaction”. The differences for these three questions were validated with a Wilcoxon matched pairs with p < 0.05.

Fig. 10.
figure 10

Questionnaire results for the rotation with scaling test obtained with “OneHand” (OH) and “Improved HandleBar” (HB) (median values in a 5 point Lickert-type scale: 1-completely desagree, 5-completely agree).

From these results, it is clear that the “Improved HandleBar”, method introducing the missing DOF, was significantly better in terms of usability when compared to “OneHand”. Furthermore, by observing the participants during the experiment, we noticed that after using the “Improved HandleBar” method, participants did not like at all the “OneHand” alternative.

6 Conclusion and Future Work

In this work we studied methods for 3D object manipulation using freehand gestures. Our goal was to integrate these methods into a public display system in the entrance hall of our department. We proposed and implemented two gesture based methods for manipulation of 3D objects: “OneHand”, a cursor-based interaction method, and “HandleBar” a bi-manual interaction method. A preliminary study showed that both methods suffered from the lack of one DOF, leading to several cumulative rotations that were confusing to the users. Upon this preliminary test, users suggested a possible way to improve this issue by introducing the gesture of both hands moving up or down simultaneously to enable the rotation about the missing DOF. This improvement led to the development of a novel 3D object manipulation method that allows users to perform rotation around the three axes. An experiment involving 40 participants was conducted. Results revealed significant differences in several usability dimensions (both from data logged during the experiment and from questionnaires answers), suggesting that the “Improved HandleBar” provides a more efficient method for rotation and scaling of 3D objects method, clearly outperforming the “OneHand”.

As future work we plan to test these methods with the Kinect ONE sensor which enables the detection of the wrists’ orientation, not only allowing the improvement of our methods, but also the implementation of new manipulation approaches. Expanding our manipulation methods to afford object translation is also an envisaged direction for future developments. This would provide the possibility to manipulate objects in virtual worlds in a natural way by grabbing, moving, rotating and scaling objects simply using the hands as manipulation device. Some additional work on user representation in the virtual world may also be relevant. We have done preliminary tests replacing the handle bar by two virtual hands moving in space according to the user hands; yet, this did not lead to better performance, potentially due to occlusions problems.