Skip to main content

Advertisement

Log in

Analysis of composite gestures with a coherent probabilistic graphical model

  • Published:
Virtual Reality Aims and scope Submit manuscript

An Erratum to this article was published on 20 October 2005

Abstract

Traditionally, gesture-based interaction in virtual environments is composed of either static, posture-based gesture primitives or temporally analyzed dynamic primitives. However, it would be ideal to incorporate both static and dynamic gestures to fully utilize the potential of gesture-based interaction. To that end, we propose a probabilistic framework that incorporates both static and dynamic gesture primitives. We call these primitives Gesture Words (GWords). Using a probabilistic graphical model (PGM), we integrate these heterogeneous GWords and a high-level language model in a coherent fashion. Composite gestures are represented as stochastic paths through the PGM. A gesture is analyzed by finding the path that maximizes the likelihood on the PGM with respect to the video sequence. To facilitate online computation, we propose a greedy algorithm for performing inference on the PGM. The parameters of the PGM can be learned via three different methods: supervised, unsupervised, and hybrid. We have implemented the PGM model for a gesture set of ten GWords with six composite gestures. The experimental results show that the PGM can accurately recognize composite gestures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. 3rd Tech. Hiball-3100 sensor, http://www.3rdtech.com/HiBall.htm

  2. Anandan P (1989) A computational framework and an algorithm for the measurement of visual motion. Int J Comput Vis 2(3):283–310

    Article  Google Scholar 

  3. Ascension technology corporation. Flock of birds, http://www.ascension-tech.com/products/flockofbirds.php

  4. Athitsos V, Sclaroff S (2003) Estimating 3D hand pose from a cluttered image. In Proceedings of Comput Vis Pattern Recognit 2:432–439

    Google Scholar 

  5. Azuma RT (1997) A survey of augmented reality. Presence: Teleoperators Virtual Environments 6(11):1–38

    Google Scholar 

  6. Belongie S, Malik J, Puzicha J (2000) Shape context: a new descriptor for shape matching and object recognition. In: Neural Information Processing, pp 831–837

  7. Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  8. Bobick A, Wilson A (1997) A state-based approach to the representation and recognition of gesture. IEEE Trans Pattern Anal Mach Intell 19(12):1325–1337

    Article  Google Scholar 

  9. Brand M, Oliver N, Pentland AP (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the 1997 Computer Vision Pattern Recognition, pp 994–999

  10. Bregler C (1997) Learning and recognizing human dynamics in video sequences. In: IEEE Conference on Computer Vision Pattern Recognition

  11. Cai Q, Aggarwal JK (1999) Human motion analysis: a review. J Comput Vis Image Understand 73(3):428–440

    Article  Google Scholar 

  12. Corso JJ (2004) Vision-based techniques for dynamic, collaborative mixed-realities. In: Thompson BJ (eds) Research papers of the Link Foundation fellows. University of Rochestor Press in association with the Link Foundation, vol 4

  13. Corso JJ, Burschka D, Hager GD (2003) The 4DT: unencumbered HCI with VICs. In: IEEE workshop on Human Computer Interaction at Conference on Computer Vision and Pattern Recognition

  14. Faugeras O (1993) Three-dimensional computer vision. MIT Press, Cambridge

    Google Scholar 

  15. Galata A, Johnson N, Hogg D (2001) Learning variable-length Markov models of behavior. Comput Vis Image Understand 83(1):398–413

    Article  Google Scholar 

  16. Insko B, Meehan M, Whitton M, Brooks F (2001) Passive haptics significantly enhances virtual environments. Technical Report 01–10, Department of Computer Science, UNC Chapel Hill

  17. Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872

    Google Scholar 

  18. Jelinek F (1999) Statistical methods for speech recognition. MIT Press, Cambridge

    Google Scholar 

  19. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  20. Malassiotis S, Aifanti N, Strintzis M (2002) A gesture recognition system using 3D data. In: Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmisssion, pp 190–193

  21. Mckenna SJ, Morrison K (2004) A comparison of skin history and trajectory-based representation schemes for the recognition of user-specific gestures. Pattern Recognit 37:999–1009

    Article  Google Scholar 

  22. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Processing Magazine, pp 47–60

  23. Nickel K, Stiefelhagen R (2003) Pointing gesture recognition based on 3D-tracking of face, hands and head orientation. In: Workshop on Perceptive User Interfaces, pp 140–146

  24. Oka K, Sato Y,Koike H (2002) Real-time fingertip tracking and gesture recognition. IEEE Comput Graph Appl 22(6):64–71

    Article  Google Scholar 

  25. Parameswaran V, Chellappa R (2003) View invariants for human action recognition. In Proceedings of Comput Vis Pattern Recognit 2:613–619

    Google Scholar 

  26. Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695

    Article  Google Scholar 

  27. Pentland A, Liu A (1999) Modeling and prediciton of human behavior. Neural Comput 11(1):229–242

    Article  PubMed  Google Scholar 

  28. Quek F (1996) Unnencumbered gesture interaction. IEEE Multimedia 3(3):36–47

    Article  Google Scholar 

  29. Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  30. Salada M, Colgate JE, Lee M, Vishton P (2002) Validating a novel approach to rendering fingertip contact sensations. In: Proceedings of the 10th IEEE Virtual Reality Haptics Symposium, pp 217–224

  31. Schalkoff RJ (1997) Artificial neural networks. McGraw-Hill, New York

    Google Scholar 

  32. Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. In Proceedings of Comput Vis Pattern Recognit 2:862–869

    Google Scholar 

  33. Shin MC, Tsap LV, Goldgof DB (2004) Gesture recognition using bezier curvers for visualization navigation from registered 3-D data. Pattern Recognit 37(0):1011–1024

    Article  Google Scholar 

  34. Starner T, Pentland A (1996) Real-time american sign language recognition from video using hidden Markov models. Technical Report TR-375, M.I.T. Media Laboratory

  35. Tomasi C, Petrov S, Sastry A (2003) 3D Tracking = Classification + Interpolation. In: Proceeding International Conference Computer Vision, pp 1441–1448

  36. von Hardenberg C, Berard F (2001) Bare-hand human-computer interaction. In: Workshop on Perceptive User Interfaces

  37. Wilson A, Bobick A (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9):884–900

    Article  Google Scholar 

  38. Wren C, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–784

    Article  Google Scholar 

  39. Wu Y, Huang TS (2000) View-independent recognition of hand postures. In Proceedings of Comput Vis Pattern Recognit 2:88–94

    Google Scholar 

  40. Wu Y, Huang TS (2001) Hand modeling, analysis, and recognition. IEEE Signal Processing Magazine 18(3):51–60

    Article  Google Scholar 

  41. Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: Proceedings of the 1992 IEEE Conference on Computer Vision Pattern Recognition, pp 379–385

  42. Ye G, Corso JJ, Burschka D, Hager GD (2004) VICs: a modular hci framework using spatio-temporal dynamics. Machine Vision and Applications

  43. Ye G, Corso JJ, Hager GD (2004) Gesture recognition using 3D appearance and motion features. In: Proceedings of CVPR workshop on Real-Time Vision for Human-Computer Interaction

  44. Ye G, Corso JJ, Hager GD, Okamura AM (2003) VisHap: augmented reality combining haptics and vision. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp 3425–3431

  45. Yokokohji Y, Kinoshita J, Yoshikawa T (2001) Path planning for encountered-type haptic devices that render multiple objects in 3D space. In: Proceedings of IEEE Virtual Reality, pp 271–278

  46. Yoshikawa T, Nagura A (2001) A touch/force display system for haptic interface. Presence 10(2):225–235

    Google Scholar 

  47. Zhang Z, Wu T, Shan Y, Shafer S (2001) Visual panel: virtual mouse keyboard and 3D controller with an ordinary piece of paper. In: Workshop on Perceptive User Interfaces

  48. Zhou H, Lin DJ, Huang TS (2004) Static hand postures recognition based on local orientation histogram feature distribution model. In: Proceedings of CVPR workshop on Real-Time Vision for Human-Computer Interaction

Download references

Acknowledgements

We thank Darius Burschka for his help with the Visual Interaction Cues project. This work was in part funded by a Link Foundation Fellowship in Simulation and Training and by the National Science Foundation under Grant No. 0112882.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason J. Corso.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s10055-005-0007-1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corso, J.J., Ye, G. & Hager, G.D. Analysis of composite gestures with a coherent probabilistic graphical model. Virtual Reality 8, 242–252 (2005). https://doi.org/10.1007/s10055-005-0157-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10055-005-0157-1

Keywords

Navigation