Keywords

1 Introduction

In 1991, Howard Rheingold [1] defines virtual reality (VR) as an experience in which a person is “surrounded by a three-dimensional computer-generated representation, and is able to move around in the virtual world and see it from different angles, to reach into it, grab it, and reshape it.” Virtual experience has become a reality for ordinary consumer. A smartphone and a headset (Oculus, HTV vive or PlayStation VR) are terminals accessible for only a few euros. Tomorrow, our environment will become virtual. Today omnidirectional systems are developing in all areas (science, medicine, street views, datamining, etc.). The problem is no longer creating an environment but interacting with it. CAVE [2] was certainly the first omnidirectional interface but the CAVE has not produced any significant new control paradigms. At the beginning of the 2000’s Hua, Brown and Gao [3] proposed a HMPD technology (Head-Mounted Projective Display) maned SCAPE. SCAPE was a technology that lies on the boundary of conventional HDMs and CAVE like projective display. SCAPE was a room display system allowing collaborative applications by improving the perception of the real world, by providing the capability to create an arbitrary number of individual viewpoints and by retaining natural face-to face communication. One step further in immersive environment with the 240° curved screen i-Cone [4] (Fig. 1). They have developed an interaction paradigm allowing multiple users to share a virtual environment in a conventional single-view stereoscopic. They used spatially tracked PDAs as a common interaction device for every user of the system, combining ray casting selection and direct object motion in the virtual environment with system control for menus, tools, and modes on the “private” interface of each user’s PDA.

Fig. 1.
figure 1

240° i-Cone display

In 2010 Magnor et al. [5] proposed a paper presenting some results from computer graphics research, offering solutions to contemporary challenges in digital planetarium rendering and modeling. The user is immersed in a virtual world but the possibilities of interaction are narrow or nonexistent. Still in 2010, Benko and Wilson used also a dome display but introduced new interactions with an immersive omnidirectional environment. [6]. The grammar to exchange with information shows on screen of the dome is very limited. The interaction vocabulary consists of five different primitives: hand pinch, two hand circle, one hand clasp, speech recognition and interactions with an IR laser pointer.

2 Mid-Air Interaction, Related Work

To create a total immersion impression, the 5 senses must perceive the digital environment as real. Immersive technology can stimulate the senses through: 3D panoramic displays, surround sound, force-feedback, movement recognition devices, and artificial creation of tastes and odors. Today a little number of systems brings together all these vectors of interaction. Augmented reality systems strive to reproduce display, more or less realistically. The main difficulty is to design interactions from the operator to the system. The principal approach is to factor these 6 DOF into 2D spaces that are mapped to the x, y and sometime z axes of a mouse. This metaphor is inherently modal because one needs to switch between subspaces, and disconnects the input space from the modeling space. Wang [7] propose a bimanual hand tracking system that provides physically-motivated 6-DOF control for 3D assembly. This system is reserved to CAD/CAM that typically uses tasks such as manipulating the camera perspective and assembling pieces. It builds 3D interactions based on the recognition of the position of two hands in space and on the recognition of simple gestures based on a metaphor of the real world. Technically the principle is simple. The authors have two cameras which, in diving, film both hands of the operator (Fig. 2).

Fig. 2.
figure 2

Wang’s 6 DOF interaction

The advantage of this solution is that it is able to detect the movements of the hands without additional artifact and without being intrusive. System haves a small set of gestures that are comfortable to use, precise, and easy to remember.

With a more “immersive” preoccupation, Hilligues et al. have developed the Holodesk system [8]. The system combines a transparent screen and a Kinect to create the illusion that the user interacts directly with the virtual world (Fig. 3). The interaction space is located below the glass surface. The image is displayed by a screen above the display. The operator has the illusion that the objects are above his hands. A Kinect analyses the operator’s hands movements and associates these movements to the scene.

Fig. 3.
figure 3

Holodesk direct interaction with virtual objects

Mokup Builder [9] is a semi-immersive system consecrated to modeling and manipulating objects in 3 dimensions (Fig. 4). The authors argue that modeling in immersive environments provides three major benefits in the design process. The first one concerns the possibility of interacting with objects in real time. The second one is that operators work with various notions of scale of representation, in the construction and interaction spaces. Last one, immersive environments allow a stronger match between the subjective ideas of designers and the principles of intuitive conceptions.

Fig. 4.
figure 4

Mokup builder et son interaction environment

The main freehand gestural interaction issue is the problem of gesture limits (begin/end). How can the application know when the movement is intended to be a gesture or action and not simply a human movement through space? More precisely, it is often difficult to precisely know the exact moment the gesture started or ended. When you use some tactile interactions, touch contacts provide straightforward delimiters: when the user touches the surface, they are engaged, and lift-off usually signals the end of the action. However, in the open air, we must consider the 3D environment in which we live.

In 2012 [10] Song et al. proposes a new handle bar metaphor as an effective visual control metaphor between the user’s hand gestures and the corresponding virtual object manipulation operations. It mimics a familiar situation of handling objects that are skewered with a bimanual handle bar (Fig. 5).

Fig. 5.
figure 5

(a) The metaphor of two remote gripping-hands projected into the 3D virtual space, (b) The metaphor of a handle bar extended from two clasp hands, which is used to pierce through the teapot for rotation and translation manipulations.

This method is concerned with enabling a single user to inter-actively manipulate single or multiple 3D objects in a virtual environment. The system is able to recognize three basic single-handed gestures: POINT, OPEN, and CLOSE. You execute different visual manipulation operations by moving, closing or opening one or two hands freely within the physical space. Homogenous bimanual gestures will perform basic rotation-translation-scaling (RTS). The handle bar metaphor provides 7 DOF manipulation (3D translation, 3D rotation, and 1D scaling) of virtual object and supports continuous transitions between operations. The results are rather interesting and the simplicity of the movements makes the learning times are almost zero (https://www.youtube.com/watch?v=p0EM9Ejv0r0). However, the system has some limitations. For some translations and rotations, the authors find it is difficult to interact with system continuously. The work carried out by Rodrigues et al. [11] compare the approach developed by Song to manipulation techniques, via 3D gestures, of virtual objects in semi-immersive or even immersive systems with a virtual reality helmet. The system developed by Rodrigues has five different modules (Fig. 6).

Fig. 6.
figure 6

Five manipulation modules

Authors observe that the mid-air interactions in immersive systems are most efficient and satisfying for all users. The main raison is the possibility to manipulate 6 DOF directly and the mining of natural gestures to interact with objects.

3 Hyve-3D

3.1 Presentation

With only plane display, the difficulty to represent and understand complex 3D shapes, proportion interpretation, the human scale, and the observer’s fixed angle of vision have been described by Landsdown [12]. Creating sketches directly in 3D in VR opens up a new dimension in the application of sketching in architectural co-design. The main complexity with GUI arises in 3D interaction due to that 3D data need to be supplied via abstract 2D interfaces. This complexity make creative thinking more difficult. Structured interaction of the mouse with menus forces the user to make premature decisions, demanding more accuracy compared to pen-on-paper techniques. These difficulties distance an architect from creative task. The importance of sketching has been shown in several studies suggesting that different characteristic like ambiguity, abstraction or inaccuracy help architect in the conceptual design [13,14,15]. Sketches provide a medium of freedom with a flexible degree of abstraction, allowing multiple readings and interpretations.

Hyves-3D is an omnidirectional immersive concept for co-design in architecture [16,17,18]. Users are situated into 360° screen. They watch a display showing a 3D virtual environment. Hyve-3D is designed for local and remote collaboration. Users can be in the same room, or interconnected across the globe, in full-scale and real-time (Fig. 7).

Fig. 7.
figure 7

Hyves-3D screen and control remote

3.2 Hyves-3D Cursor and Navigation

Gyroscope and accelerometer are used to know physical position and orientation of the tablet. This information is used to manipulate the 3D cursor in virtual space. A 3D cursor is projected in the virtual world as a rectangular frame with the same ratio as the tablet display. A 3D Tracker (magnetic tracker system) is used to reproduce the device position from the real world to virtual world. Users can move and rotate the displayed 3D scene with multi-touch gestures (Fig. 8). When you sliding up on the screen when tablet is in horizontal position, you obtain a forward movement. The same gesture results in an upward movement when the tablet is held vertically or a climbing movement when the device is held diagonally. User can change the view by combining a navigation button and a single finger dragging.

Fig. 8.
figure 8

Gesture and different 3D orientation for displacement in virtual world

When user don’t press constraint or navigation buttons, the 3D cursor is fixed in space and it acts as a Drawing Area. Tactile display become a drawing tablet and Users can create freehand sketches by either finger or pressure-sensitive stylus (Fig. 9). The sketches create on the tablet are replicated on the Drawing space in virtual world. User can zoom the draw area using pinch gesture. For non-planar sketches (free 3D sketches), user can sketch while using one of the four constraint modes allowing the 3D cursor to move during sketching.

Fig. 9.
figure 9

Sketching in Hyves-3D

Objects can be selected using the 3D cursor via Butterfly-Net metaphor. The 3D objects intersected by the 3D cursor are alternatively selected/deselected. When an object is selected, affine transformations, such as moving, rotating, scaling, and duplicating, can be done.

4 Hyves-3D Interface Utilization

Different experimentations show that the actual interface is only used 28% during an activity [18]. For 72% of time, the system is on standby. By observing log-files from Hyve-3D application, Authors find out that all of the navigation, 3D cursor placement, and sketching were efficient (Fig. 10). In total, 38% of the active time was used for navigation which was followed by 34% for 3D cursor placement and 28% for sketching.

Fig. 10.
figure 10

Utilization of interface during observation

5 Leap-Motion

Leap Motion is an USB sensor device released in July 2013 by Leap Motion Inc. Leap motion controller is new interactive devices mainly aiming at hand gestures and finger position detection. It could detect palm and fingers movements on top of it (Fig. 11). He is designed to provide real-time tracking of hands and fingers in three-dimensional space with 0.01-millimeter accuracy. It allows a user to get information about objects located in device’s field of view (about 150 degree with distance not exceeding 1 m). Details of how Leap Motion performs 3D scene capturing have not been revealed by Leap Motion, Inc. Hardware consists of three infrared LEDs which are used for scene illumination, while two cameras, spaced 4 cm apart, capture images with 50–200 fps framerates, dependent whether USB 2.0 or 3.0 is used.

Fig. 11.
figure 11

Leap motion device

Information sent by LeapMotion, are the position of a hand, but also physical properties such as the width and length of the hand and arm as well as the width and length of each digit and the four bones associated with each digit. In addition to these properties, the Leap recognizes certain movement patterns as “gestures” (Fig. 12). There are four currently recognized gestures: a circle, a swipe, a key tap, and a screen tap [19]. A circle gesture is simply a single finger drawing a circle; a swipe is a long linear movement of a finger; a key tap is a finger rotating slightly downwards and back up; and a screen tap is a finger moving forward and backward quickly. These four gestures, have their own properties, such as speed.

Fig. 12.
figure 12

Basic gestures recognized by leap motion

Hand gesture interfaces provide an intuitive and natural way for interacting with a wide range of applications. The LEAP motion controller has been specifically designed to interact with these applications. Leap motion is a perfect device to produce freehand drawing recognition algorithms to interpret the tracking data of the hand and finger movements. The device can detect four different hands and permits a collaborative work.

6 Grammar of Air Gestures in Omnidirectional Immersive Environment

Today, we find a lot of studies that present interest of mid-air gesture in unidirectional environment [7,8,10] or [11]. Studies in omnidirectional immersive environments are more confidential. The main reason is that it is not easy to possess these infrastructures. Omni-directional interfaces, such as CAVE displays [20], room displays [3], cone displays [4], or dome displays [5] offer an interesting solution. With these displays, research associated, are focused on problem of rendering. Interactions in immersive virtual environments have been an important research area with most solutions requiring the use of tracked/connected gloves or styli. One of more completed solution is the one proposed by Benko and Wilson [21] combine speech commands with freehand pinch and clasping gestures and infrared laser pointers. To interact with these omnidirectional environments with a gestural interface you need to build an intuitive grammar of gestures. Rovelo [22] has proved for an OVD system, that it is very difficult to find a consensual gesture to realize an interaction. It obtained different propositions to execute different actions like: play, stop, pause, forward. Evaluation shows too that gesture are different if you alone or if you work in a collaborative way (Fig. 13).

Fig. 13.
figure 13

Gesture for: play, pause, stop, skip scene, fast forward, go backward, pan and zoom and an example how participants mirrored gesture

Our working hypothesis is that interactions in omnidirectional immersive systems, like Hyves-3D, cannot be efficiently performed with a single type of device. We believe that a mid-air gestures device can be a complementary mode of interaction to realize a set of specific tasks like sketching or moving in virtual world. All Hyves-3D interactions can be grouped into 4 categories:

  • Moving interaction (Forward, backward, up, down, position in space, go back, turn etc.).

  • Drawing interaction (sketching, erasing, coloring, etc.).

  • Working interactions (working zone, working stats, collaborative exchanges, etc.).

  • Manipulation interaction (single object selection, multi-objects selection, rotation, position, scale, assembly, etc.).

Today, moving inside a virtual Hyves-3D environment is a little constraining. For example, navigating in virtual world require: activate the moving mode (navigation button), to define an orientation of tactile tablet, dragging gesture moves in the desired 3D direction [18]. For a long displacement, the same dragging gesture has to be reproduce in large numbers. The movement is jerky and the time to realize an action could be long. A sketch is built on tablet. Users have on a reduce screen, the projection of only one part of all the virtual world. Architect quick sketches with a pen on the screen of tablet and observes the result on hemispheric display. We note two problems. First one is the user have only a restrictive representation of the world on tablet. If the sketching hang over his 3D cursor projection, to realize a large sketching becomes complicated. The second one is that user needs visually to go back and forth from tablet to large display and loses the complete perception of environment.

A Mid-Air gestures interface, and more specifically the leap motion device, could be used by at least two categories: drawing (sketching) in space and fluid displacements in the scene (showing the direction).

  • Drawing is space is:

    • Natural

    • Efficient

    • Not limited to tablet screen

    • Accurate

  • Moving in the virtual world showing a direction:

    • Intuitive gestures without learning forward, up, scale, turn, etc.

    • Continuity of movement as long as the gesture is performed.

Most of the mid-air gesture frameworks provide standard gestures that are easy to use for interactions linked to the system or the device functionalities. However, more complex gestures are often difficult to implement and to describe. Kammer et al. [23, 24], for tactile interactions, contributes to formalize gestures interactions, complex or not. They have described a formalization of gestures for multi-touch contacts based on semiotics, which describes all phenomena associated with the production and interpretation of signs and symbols. In this context, they have created a syntax based on atomic gestures and able to describe gestures or sequences of gestures (Fig. 14).

Fig. 14.
figure 14

Rotate gesture described by GeForm grammar and resulting gesture

Based on this approach, we propose a pseudo formalization of LeapMotion interactions. Derived from the extended Backus-Naur form (EBNF), we define blow the language of interaction. In the EBNF, the following characters represent operators (by order of increasing importance):

  • + concatenation

  • | choice

  • = definition

  • ; termination

Sketching and moving in virtual world we defined six atomic gestures; all are recognized by LeapMotion (Table 1):

Table 1. Different gestures used for our application.

Moreover, LeapMotion device has the possibility to detect that user grasps a pencil (Fig. 15).

Fig. 15.
figure 15

Pencil detection and Line Drawing with LeapMotion©

LeapMotion Interactions can be described by the following pseudo expressions:

  • LeapInteraction ::= Mov*| Sketch*

  • Sketch ::= Tap + Draw* + Tap

  • Draw ::= Tool + Line *

  • Tool ::= True| False

  • Mov ::= Gesture*

  • Gesture ::= Forward | Up | Down | Turn Right | Turn Left;

Note that backward it’s just a Forward with opposite direction (or after a rotation Right/Left of 360°).

7 LeapMotion and Hyve-3D

A technical difficulty is that LeapMotion device need must be integrated to Hyves-3D environment. The first possibility is to adapter this device on central consol. To realize a mid-air interaction, users have to returned compulsorily at the middle of working space. This solution is the easier to develop but the less adapted to free interactions in all space. The second one is to couple current interface (iPad) and Leap Motion (like additional trackers used today). For this we have designed a base on which we fit Leap Motion (Fig. 16).

Fig. 16.
figure 16

Ipad-leap assembly

However, it is by no means a perfect solution. To access of mid-air gestures, user can for example to turn Ipad. System can detect the iPad orientation and turn off current action to activate sketching and moving way. The disadvantage of this solution is the necessity to still have a wire connection (to server computer) and to separate interaction modes. The last solution is to couple LeapMotion with an Arduino system by USB and create another separate enter point. The system must will be self-powered and exchanges with server will be done by WIFI. The last both solutions are investigated today.

8 First Evaluation

We conducted a first comparative investigation on the satisfaction rate when users manipulate both interfaces to move in a virtual world. It was impossible for use to develop an application in hyves-3D (SDK under development). We had simulated a close progression with basic movements realized in Hyves-3D: Forward, Backward, Up, Down, Turn Right, Turn Left. The same dragging gesture was reproduced for long displacement with iPad. The focus group constituted with 15 members (8 men – 7 women) aged from 27 to 50 years old (avg 32.6). Everybody have used a tactile tablet but nobody the device LeapMotion (4 have used a Kinect to play to tennis game). Users had to move in a pseudo-labyrinth with two devices: IPad and LeapMotion. The time to realize progression was not controlled. Users were asked to indicate their printing on a Likert scale (5 level, 0: unhappy– 5 happy). We don’t have indicated the gesture to perform for backward move. The only gestures presented were those in Table 1 (without Tap). The users had to rotate 360° to go back.

The results show that on the majority of movements the Mid-air gestures obtain better satisfaction score for this area of utilization (Table 2).

Table 2. Average scores

The only one gesture where the score is better when focus group used a tablet is for backward move. In first time users try to turn wrist with index finger pointing to themselves. It was not a comfortable position and they didn’t obtain the intended result (not defined in our application). In addition, the wrist was sometimes occluded by the arm (One tried to turn around device).

We also found that to indicate right or left direction, users preferred to use opposite hand.

9 Conclusion and Future Works

This work in progress try to prove that Mid-air gesture is an intuitive solution to interact with an omnidirectional immersive system. According to our previous studies on tactile interaction, we propose a pseudo grammar and a set of gestures reserved for sketching and moving in virtual world. The first results show that Mid-air gestures are well adapted to carry out different actions like the displacement in 3D space. The satisfaction rate is promising and high and scoring above IPad. The next step is to integrate the Mid-air device to Hyve-3D environment. When all technical problems will be solved, we hope to prove that such interfaces are more efficient that tactile interactions used today. We hope quickly solve technical solutions and integrate Leap Motion device in an omnidirectional immersive environment and realize more evaluations in situ. We hope to prove too that drawing lines in 3D, by reproducing identically the natural gesture, should be more efficient and more easy and, in the medium term, generalize LeapMotion to a large set of interactions like manipulation of objects, modeling and the management of environment.