Skip to main content
Log in

Untethered gesture acquisition and recognition for virtual world manipulation

  • Published:
Virtual Reality Aims and scope Submit manuscript

Abstract

Humans use a combination of gesture and speech to interact with objects and usually do so more naturally without holding a device or pointer. We present a system that incorporates user body-pose estimation, gesture recognition and speech recognition for interaction in virtual reality environments. We describe a vision-based method for tracking the pose of a user in real time and introduce a technique that provides parameterized gesture recognition. More precisely, we train a support vector classifier to model the boundary of the space of possible gestures, and train Hidden Markov Models (HMM) on specific gestures. Given a sequence, we can find the start and end of various gestures using a support vector classifier, and find gesture likelihoods and parameters with a HMM. A multimodal recognition process is performed using rank-order fusion to merge speech and vision hypotheses. Finally we describe the use of our multimodal framework in a virtual world application that allows users to interact using gestures and speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bazaraa M, Sherali H, Shetty C (1993) Nonlinear programming: theory and algorithms. Wiley, London

    Google Scholar 

  2. Besl P, MacKay N (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Analysis Mach Intell 14:239–256

    Article  Google Scholar 

  3. Breazeal C (2003) Towards sociable robots. Robot Auton Syst 42(3–4):167–175

    Article  Google Scholar 

  4. Bregler C, Malik J (1998) Tracking people with twists and exponential maps. In: Proceedings of computer vision and pattern recognition (CVPR’98)

  5. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Article  Google Scholar 

  6. Cassell J (2000) Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents. In: Cassell J, Prevost S, Sullivan J, Churchill E (eds) Embodied conversational agents. MIT Press, cambridge

    Google Scholar 

  7. Collobert R, Bengio S, MariTthoz J (2002) Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46,IDIAP(2002)

  8. Corradini A, Wesson R, Cohen P (2002) A map-based system using speech and 3D gestures for pervasive computing. In : Proceedings of international conference on multimodal interfaces (ICMI’02). Pittsburgh, PA, pp 191–196

  9. Darrell T, Demirdjian D, Checka N, Felzenszwalb P (2001) Plan-view trajectory estimation with dense stereo background models. In: Proceedings of international conference on computer vision (ICCV’01). Vancouver, Canada

  10. Darrell T, Maes P, Blumberg B, Pentland A (1994) A novel environment for situated vision and behavior. In: IEEE workshop on visual behaviors

  11. Davis JW, Bobick AF (2001) The recognition of human movement using temporal templates. IEEE Trans Patt Anal Mach intell 23(3):257–267

    Article  Google Scholar 

  12. Delamarre Q, Faugeras OD (1999) 3D articulated models and multi-view tracking with silhouettes. In:Proceedings of international conference on computer vision (ICCV’99), pp 716–721

  13. Demirdjian D.(2003) Enforcing constraints for human body tracking. In: Proceedings of workshop on multi-object tracking, Madison, Wisconsin

  14. Demirdjian D, Darrell T (2002) 3D articulated pose tracking for untethered deictic reference. In: Proceedings of international conference on multimodal interfaces (ICMI’02), Pittsburgh, PA

  15. Fua P, Brechbuhler C (1996) Imposing hard constraints on soft snakes. In: Proceedings of european conference on computer vision (ECCV’96), pp 495–506

  16. Gavrila D, Davis L (1996) 3D model-based tracking of humans in action: A multi-view approach. In:Proceedings of computer vision and pattern recognition (CVPR’96)

  17. Hall D, Le Gal C, Martin J, Chomat O, Crowley JL (2001) Magicboard: a contribution to an intelligent office environment. In: Intelligent robotic systems

  18. Isard M, Blake A (1998) Icondensation: unifying low-level and high-level tracking in a stochastic framework. In: Proceedings of european conference on computer vision (ECCV’98)

  19. Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8):852–872

    Article  Google Scholar 

  20. Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In: Proceedings of international conference on computational linguisitics, pp 369–375

  21. Jojic N, Turk M, Huang T (1999) Tracking articulated objects in dense disparity maps. In: International conference on computer vision, pp 123–130

  22. Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Feiner S, Cohen P (2003) Mutual disambiguation of 3d multimodal interaction in augmented and virtual reality. In: Proceedings of international conference on multimodal interfaces (ICMI’03). Vancouver, BC, pp 12–19

  23. Kakadiaris I, Metaxas D (1998) 3D human body model acquisition from multiple views. Int Jf Comput Vis 30(3):191-218

    Article  Google Scholar 

  24. Koons D, Sparrell C, Thrisson K (1993) Integrating simultaneous input from speech, gaze and hand gestures. Intell Multimedia Interfaces, pp 257–276

  25. Krahnstoever N, Kettebekov S, Yeasin M, Sharma R (2002) A real-time framework for natural multimodal interaction with large screen displays. In: Proceedings of international conference on multimodal interfaces (ICMI’02). Pittsburgh, PA

  26. Oka K, Sato Y, Koike H (2002) Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems. In: IEEE international conference on automatic face and gesture recognition

  27. Rabiner L, Juang B (1986) An introduction to hidden markov models. IEEE ASSP Mag 3(1):4–16

    Google Scholar 

  28. Scholkopf B, Burges C, Smola A (1998) Advances in kernel methods. MIT Press, Cambridge

    Google Scholar 

  29. Seneff S, Hurley E, Lau R, Pao C, Schmid P, Zue V (1998) Galaxy-ii: a reference architecture for conversational system development. In: ICSLP, vol 3. Sydney, Australia, pp 931–934

  30. Sidenbladh H, Black MJ, Fleet DJ (2000) Stochastic tracking of 3D human figures using 2d image motion. In:Proceedings of European conference on computer vision (ECCV’00), pp 702–718

  31. Sminchisescu C, Triggs B (2001) Covariance scaled sampling for monocular 3D body tracking. In:Proceedings of the conference on computer vision and pattern recognition (CVPR’01), Kauai, Hawaii

  32. Vogler C, Metaxas D (1999) Parallel hidden markov models for american sign language recognition. In:International conference on computer vision, Kerkyra, Greece

  33. Wilson A, Bobick A (1999) Parametric hidden markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9):884–900

    Article  Google Scholar 

  34. Wren C, Azarbayejani A, Darrell T, Pentland A (1997) Pfinder: Real-time tracking of the human body. IEEE Trans Pattern Anal and Mach Intell 19(7):780–785

    Article  Google Scholar 

  35. Yamamoto M, Yagishita K (2000) Scene constraints-aided tracking of human body. In:Proceedings of computer vision and pattern recognition (CVPR’00)

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demirdjian, D., Ko, T. & Darrell, T. Untethered gesture acquisition and recognition for virtual world manipulation. Virtual Reality 8, 222–230 (2005). https://doi.org/10.1007/s10055-005-0155-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10055-005-0155-3

Keywords

Navigation