Untethered gesture acquisition and recognition for virtual world manipulation

Demirdjian, David; Ko, Teresa; Darrell, Trevor

doi:10.1007/s10055-005-0155-3

Untethered gesture acquisition and recognition for virtual world manipulation

Published: 12 July 2005

Volume 8, pages 222–230, (2005)
Cite this article

Virtual Reality Aims and scope Submit manuscript

David Demirdjian¹,
Teresa Ko¹ &
Trevor Darrell¹

212 Accesses
21 Citations
Explore all metrics

Abstract

Humans use a combination of gesture and speech to interact with objects and usually do so more naturally without holding a device or pointer. We present a system that incorporates user body-pose estimation, gesture recognition and speech recognition for interaction in virtual reality environments. We describe a vision-based method for tracking the pose of a user in real time and introduce a technique that provides parameterized gesture recognition. More precisely, we train a support vector classifier to model the boundary of the space of possible gestures, and train Hidden Markov Models (HMM) on specific gestures. Given a sequence, we can find the start and end of various gestures using a support vector classifier, and find gesture likelihoods and parameters with a HMM. A multimodal recognition process is performed using rank-order fusion to merge speech and vision hypotheses. Finally we describe the use of our multimodal framework in a virtual world application that allows users to interact using gestures and speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Bazaraa M, Sherali H, Shetty C (1993) Nonlinear programming: theory and algorithms. Wiley, London
Google Scholar
Besl P, MacKay N (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Analysis Mach Intell 14:239–256
Article Google Scholar
Breazeal C (2003) Towards sociable robots. Robot Auton Syst 42(3–4):167–175
Article Google Scholar
Bregler C, Malik J (1998) Tracking people with twists and exponential maps. In: Proceedings of computer vision and pattern recognition (CVPR’98)
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Article Google Scholar
Cassell J (2000) Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents. In: Cassell J, Prevost S, Sullivan J, Churchill E (eds) Embodied conversational agents. MIT Press, cambridge
Google Scholar
Collobert R, Bengio S, MariTthoz J (2002) Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46,IDIAP(2002)
Corradini A, Wesson R, Cohen P (2002) A map-based system using speech and 3D gestures for pervasive computing. In : Proceedings of international conference on multimodal interfaces (ICMI’02). Pittsburgh, PA, pp 191–196
Darrell T, Demirdjian D, Checka N, Felzenszwalb P (2001) Plan-view trajectory estimation with dense stereo background models. In: Proceedings of international conference on computer vision (ICCV’01). Vancouver, Canada
Darrell T, Maes P, Blumberg B, Pentland A (1994) A novel environment for situated vision and behavior. In: IEEE workshop on visual behaviors
Davis JW, Bobick AF (2001) The recognition of human movement using temporal templates. IEEE Trans Patt Anal Mach intell 23(3):257–267
Article Google Scholar
Delamarre Q, Faugeras OD (1999) 3D articulated models and multi-view tracking with silhouettes. In:Proceedings of international conference on computer vision (ICCV’99), pp 716–721
Demirdjian D.(2003) Enforcing constraints for human body tracking. In: Proceedings of workshop on multi-object tracking, Madison, Wisconsin
Demirdjian D, Darrell T (2002) 3D articulated pose tracking for untethered deictic reference. In: Proceedings of international conference on multimodal interfaces (ICMI’02), Pittsburgh, PA
Fua P, Brechbuhler C (1996) Imposing hard constraints on soft snakes. In: Proceedings of european conference on computer vision (ECCV’96), pp 495–506
Gavrila D, Davis L (1996) 3D model-based tracking of humans in action: A multi-view approach. In:Proceedings of computer vision and pattern recognition (CVPR’96)
Hall D, Le Gal C, Martin J, Chomat O, Crowley JL (2001) Magicboard: a contribution to an intelligent office environment. In: Intelligent robotic systems
Isard M, Blake A (1998) Icondensation: unifying low-level and high-level tracking in a stochastic framework. In: Proceedings of european conference on computer vision (ECCV’98)
Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8):852–872
Article Google Scholar
Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In: Proceedings of international conference on computational linguisitics, pp 369–375
Jojic N, Turk M, Huang T (1999) Tracking articulated objects in dense disparity maps. In: International conference on computer vision, pp 123–130
Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Feiner S, Cohen P (2003) Mutual disambiguation of 3d multimodal interaction in augmented and virtual reality. In: Proceedings of international conference on multimodal interfaces (ICMI’03). Vancouver, BC, pp 12–19
Kakadiaris I, Metaxas D (1998) 3D human body model acquisition from multiple views. Int Jf Comput Vis 30(3):191-218
Article Google Scholar
Koons D, Sparrell C, Thrisson K (1993) Integrating simultaneous input from speech, gaze and hand gestures. Intell Multimedia Interfaces, pp 257–276
Krahnstoever N, Kettebekov S, Yeasin M, Sharma R (2002) A real-time framework for natural multimodal interaction with large screen displays. In: Proceedings of international conference on multimodal interfaces (ICMI’02). Pittsburgh, PA
Oka K, Sato Y, Koike H (2002) Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems. In: IEEE international conference on automatic face and gesture recognition
Rabiner L, Juang B (1986) An introduction to hidden markov models. IEEE ASSP Mag 3(1):4–16
Google Scholar
Scholkopf B, Burges C, Smola A (1998) Advances in kernel methods. MIT Press, Cambridge
Google Scholar
Seneff S, Hurley E, Lau R, Pao C, Schmid P, Zue V (1998) Galaxy-ii: a reference architecture for conversational system development. In: ICSLP, vol 3. Sydney, Australia, pp 931–934
Sidenbladh H, Black MJ, Fleet DJ (2000) Stochastic tracking of 3D human figures using 2d image motion. In:Proceedings of European conference on computer vision (ECCV’00), pp 702–718
Sminchisescu C, Triggs B (2001) Covariance scaled sampling for monocular 3D body tracking. In:Proceedings of the conference on computer vision and pattern recognition (CVPR’01), Kauai, Hawaii
Vogler C, Metaxas D (1999) Parallel hidden markov models for american sign language recognition. In:International conference on computer vision, Kerkyra, Greece
Wilson A, Bobick A (1999) Parametric hidden markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9):884–900
Article Google Scholar
Wren C, Azarbayejani A, Darrell T, Pentland A (1997) Pfinder: Real-time tracking of the human body. IEEE Trans Pattern Anal and Mach Intell 19(7):780–785
Article Google Scholar
Yamamoto M, Yagishita K (2000) Scene constraints-aided tracking of human body. In:Proceedings of computer vision and pattern recognition (CVPR’00)

Download references

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
David Demirdjian, Teresa Ko & Trevor Darrell

Authors

David Demirdjian
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Ko
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Darrell
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demirdjian, D., Ko, T. & Darrell, T. Untethered gesture acquisition and recognition for virtual world manipulation. Virtual Reality 8, 222–230 (2005). https://doi.org/10.1007/s10055-005-0155-3

Download citation

Published: 12 July 2005
Issue Date: September 2005
DOI: https://doi.org/10.1007/s10055-005-0155-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Untethered gesture acquisition and recognition for virtual world manipulation

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment

A review of hand gesture and sign language recognition techniques

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Untethered gesture acquisition and recognition for virtual world manipulation

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment

A review of hand gesture and sign language recognition techniques

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation