Experiencing real 3D gestural interaction with mobile devices

https://doi.org/10.1016/j.patrec.2013.02.004Get rights and content

Abstract

Number of mobile devices such as smart phones or Tablet PCs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with the device more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device, in the camera’s field of view. In this paper, our gestural interaction heavily relies on particular patterns from local orientation of the image called Rotational Symmetries. This approach is based on finding the most suitable pattern from a large set of rotational symmetries of different orders that ensures a reliable detector for hand gesture. Consequently, gesture detection and tracking can be hired as an efficient tool for 3D manipulation in various applications in computer vision and augmented reality. The final output will be rendered into color anaglyphs for 3D visualization. Depending on the coding technology, different low cost 3D glasses can be used for the viewers.

Highlights

► We presented a novel approach for 3D gestural interaction with mobile devices. ► 3D manipulation of virtual objects on the screen is not limited as 2D interaction. ► This algorithm is computationally efficient and works well in real-time.

Introduction

Currently, gesture detection, recognition or tracking are terms that have frequently been encountered in the discussions of human computer interaction. Gesture recognition enables humans to interact with computers and makes input devices such as keyboard, joystick or touch screen panels redundant. Having a more effective interaction with mobile devices could be the most significant reason behind the manufacturing of devices with larger screens during the recent years. Although the idea of working with larger touch screen displays helps users to have a better interaction with the device, their limitations in 2D space manipulation, remain an unsolved issue. A novel solution for limitations of 2D touch-screen displays is taking advantage of 3D space behind the camera. Manipulation in the camera’s field of view provides a chance for users to work with any mobile device regardless of the screen size or touch sensitivity. As it is shown in Fig. 1, the user’s hand in farther distances from the camera occupies smaller place in the screen, which is a positive point to compensate the limited area for fingers on 2D displays. Moreover, behind the camera, users are capable of moving in depth, that could handle a lot of difficulties in various applications. Our experiments on mobile applications reveal that in 2D interaction on the screen, users have limitations in moving in depth, zooming into observe the details of an image, map, text or zooming out to skim through a data. Even in more complicated applications rotations around different axes are unavoidable.

Here, the question is whether it is possible to solve these limitations by introducing a new interaction environment? Our experiment shows that in 3D interaction under the mobile phone’s camera, fingertips approximately occupy 10–20% area of the touch-screen display. This observation reveals that interaction resolution in 3D space is at least, 5–10 times higher than 2D displays. Moreover, regarding the higher degrees of freedom in 3D space, limitations in rotation can be handled. Therefore, this new interaction environment can be introduced to significantly improve the efficiency and effectiveness of the human mobile device interaction. Our gesture recognition method is based on the rotational symmetries detection in video input from the camera. This method finds patterns from local orientation of the image. Rotational symmetries can represent different groups of basic low level features such as lines, curvatures, circular and star patterns. Our implemented operator searches for particular features in local images and detects expected patterns associated with the user’s gesture. Tracking the detected gesture enables humans to interact with mobile phone in 3D space and manipulate in various applications. Reliable gesture detection algorithm raises a strong possibility of extracting the gesture motion. By finding the corresponding keypoints in consecutive frames, the relative rotation and translation between two image frames can be computed. Our method is based on very low-level operations with the minimum level of intelligence that makes the system more reliable and efficient in gesture detection and tracking part. In order to convey the depth illusion to viewers, stereoscopic techniques are performed for visualization. Our system adjusts the output stereo channels and renders them in different types of color anaglyphs. The 3D output can be displayed on the mobile device and users simply need to use low cost anaglyph eyeglasses to view in 3D.

Section snippets

Related work

Designing a robust gesture detection system, using a single camera, independent of lighting conditions or camera quality is still a challenging issue in the field of computer vision. A common method for gesture detection is marker-based approach. Most of the augmented reality applications are based on marked gloves for accurate and reliable fingertip tracking (Dorfmueller-Ulhaas and Schmalstieg, 2001, Maggioni, 1993). However, in marker-based methods users have to wear special inconvenient

System description

In this part the proposed 3D camera-based interaction approach is presented. As user moves his/her hand gesture in the camera’s field of view, behind the mobile device, the device captures a sequence of images. Then this input will be processed in gesture detection block. As a result, the user’s gesture will be detected and localized. Afterwards, stable features in each frame are extracted to compute the relative rotation and translation of the hand gesture between two frames. Finally, this

Experimental results

As a matter of fact, for a particular gesture behind the mobile device’s camera, users have freedom to move in a reasonable distance. Moreover, depending on the application, they are free to rotate in different angles. Our observation indicates that the effective interaction happens in the space between 15 and 25 cm from the camera. Interaction in the space beyond 25 cm does not seem to be convenient for users. Clearly, for distances below 15 cm, gesture occupies a large area in the screen and

Conclusion

In this paper we presented a novel approach for 3D camera-based gesture interaction with mobile devices. Rotation, translation, and manipulation of virtual objects on the screen are not limited as 2D interaction any more. Our detection algorithm can estimate the position of the user’s gesture in consecutive frames. The relative pose estimation method can accurately extract the relative rotation and translation of the gesture between two consecutive frames, that can be used to facilitate the

References (42)

  • A. Erol et al.

    Vision-based hand pose estimation: A review

    Computer Vision and Image Understanding

    (2007)
  • Agrawal, S., Constandache, I., Gaonkar, S., Choudhury, R.R., Caves, K., DeRuyter, F., 2011. Using mobile phones to...
  • F. Arce et al.

    Accelerometer-Based Hand Gesture Recognition Using Artificial Neural Networks

    In Soft Computing for Intelligent Control and Mobile Robotics

    (2011)
  • M. Baldauf et al.

    Markerless visual fingertip detection for natural mobile device interaction

    Mobile HCI

    (2011)
  • H. Bay et al.

    Surf: Speeded up robust features

    In ECCV

    (2006)
  • M. Bencheikh et al.

    A New Method of Finger Tracking Applied to the Magic Board

    (2004)
  • Bretzner, L., Laptev, I., Lindeberg, T., 2002. Hand gesture recognition using multi-scale colour features, hierarchical...
  • A. Bulbul et al.

    A color-based face tracking algorithm for enhancing interaction with mobile devices

    International Journal of Computer Graphics - Special Issue on Cypberworlds

    (2010)
  • Choe, B.W., Min, J.K., Cho, S.B., 2010. Online Gesture Recognition for User Interface on Accelerometer Built-in Mobile...
  • J. Choi et al.

    Enabling a gesture-based numeric input on mobile phones

    IEEE International Conference on Consumer Electronics (ICCE)

    (2011)
  • Chum, O., 2005. Two-View Geometry Estimation by Random Sample and Consensus. PhD Dissertation, Center for Machine...
  • Dorfmueller-Ulhaas, K., Schmalstieg, D., 2001. Finger Tracking for Interaction in Augmented Environments. In: 2nd...
  • Dubois, E., 2001. A Projection Method To Generate Anaglyph Stereo Images. In: Proc. IEEE Int. Conf. Acoustics Speech...
  • Fischler, M., Bolles, R., 1987. Random sample consensus: a paradigm for model fitting with applications to image...
  • Ha, T., Woo, W., 2010. An empirical evaluation of virtual hand techniques for 3D object manipulation in a tangible...
  • Hagbi, N., Bergig, O., El-Sana, J., Billinghurst, M., 2009. Shape Recognition and Pose Estimation for Mobile Augmented...
  • Hannuksela, J., Barnard, M., Sangi, P., Heikkil, J., 2011. Camera-Based Motion Recognition for Mobile Interaction. ISRN...
  • Hardenberg, C.V., Berard, F., 2001. Bare-hand human-computer interaction. ACM International Conference Proceeding...
  • Harris, C., Stephens, M., 1988. A Combined Corner and Edge Detector. Proceedings of the 4th Alvey Vision Conference,...
  • R. Hartley et al.

    Multiple View Geometry

    (2004)
  • Holliman, N., 2004. Mapping perceived depth to regions of interest in stereoscopic images. In Proc. SPIE Vol. 5291,...
  • Cited by (13)

    • AR Peephole Interface: Extending the workspace of a mobile device using real-space information

      2021, Pervasive and Mobile Computing
      Citation Excerpt :

      User interfaces that allow users to directly manipulate displayed virtual objects by hand have been proposed. Hurst et al. proposed a method that allows users to manipulate virtual objects in 3D space, such as by scaling, rotation, and transformation, by moving their fingers, to which markers were attached, in the space behind the mobile device [18]. However, they mentioned that performing operations in the air was not stable and was less accurate than the conventional touchscreen-based method.

    • SlidAR+: Gravity-aware 3D object manipulation for handheld augmented reality

      2021, Computers and Graphics (Pergamon)
      Citation Excerpt :

      Gesture-based manipulation methods use the device’s camera to detect and track hand gestures performed in front of the camera to manipulate 6 DoFs of the virtual object. Users can perform a variety of gestures, such as pushing, grabbing, or twisting [11,43–46] to manipulate the virtual object. Alternatively, these gestures can also be performed by other devices, like a pen [47].

    • Direct hand pose estimation for immersive gestural interaction

      2015, Pattern Recognition Letters
      Citation Excerpt :

      In this experiment, grab gesture is used for object manipulation. The grab gesture is one of the most natural and frequently used gestures to manipulate objects in 3D space, and has been used to perform gesture-based interaction for mobile applications [15,33]. As shown in Fig. 12, system accurately estimates the user hand motion, and updates the model accordingly.

    • Intelligent learning systems design for self-defense education

      2017, Proceedings - 3rd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2017
    View all citing articles on Scopus
    View full text