Abstract
Traditionally, gesture-based interaction in virtual environments is composed of either static, posture-based gesture primitives or temporally analyzed dynamic primitives. However, it would be ideal to incorporate both static and dynamic gestures to fully utilize the potential of gesture-based interaction. To that end, we propose a probabilistic framework that incorporates both static and dynamic gesture primitives. We call these primitives Gesture Words (GWords). Using a probabilistic graphical model (PGM), we integrate these heterogeneous GWords and a high-level language model in a coherent fashion. Composite gestures are represented as stochastic paths through the PGM. A gesture is analyzed by finding the path that maximizes the likelihood on the PGM with respect to the video sequence. To facilitate online computation, we propose a greedy algorithm for performing inference on the PGM. The parameters of the PGM can be learned via three different methods: supervised, unsupervised, and hybrid. We have implemented the PGM model for a gesture set of ten GWords with six composite gestures. The experimental results show that the PGM can accurately recognize composite gestures.
Similar content being viewed by others
References
3rd Tech. Hiball-3100 sensor, http://www.3rdtech.com/HiBall.htm
Anandan P (1989) A computational framework and an algorithm for the measurement of visual motion. Int J Comput Vis 2(3):283–310
Ascension technology corporation. Flock of birds, http://www.ascension-tech.com/products/flockofbirds.php
Athitsos V, Sclaroff S (2003) Estimating 3D hand pose from a cluttered image. In Proceedings of Comput Vis Pattern Recognit 2:432–439
Azuma RT (1997) A survey of augmented reality. Presence: Teleoperators Virtual Environments 6(11):1–38
Belongie S, Malik J, Puzicha J (2000) Shape context: a new descriptor for shape matching and object recognition. In: Neural Information Processing, pp 831–837
Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Bobick A, Wilson A (1997) A state-based approach to the representation and recognition of gesture. IEEE Trans Pattern Anal Mach Intell 19(12):1325–1337
Brand M, Oliver N, Pentland AP (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the 1997 Computer Vision Pattern Recognition, pp 994–999
Bregler C (1997) Learning and recognizing human dynamics in video sequences. In: IEEE Conference on Computer Vision Pattern Recognition
Cai Q, Aggarwal JK (1999) Human motion analysis: a review. J Comput Vis Image Understand 73(3):428–440
Corso JJ (2004) Vision-based techniques for dynamic, collaborative mixed-realities. In: Thompson BJ (eds) Research papers of the Link Foundation fellows. University of Rochestor Press in association with the Link Foundation, vol 4
Corso JJ, Burschka D, Hager GD (2003) The 4DT: unencumbered HCI with VICs. In: IEEE workshop on Human Computer Interaction at Conference on Computer Vision and Pattern Recognition
Faugeras O (1993) Three-dimensional computer vision. MIT Press, Cambridge
Galata A, Johnson N, Hogg D (2001) Learning variable-length Markov models of behavior. Comput Vis Image Understand 83(1):398–413
Insko B, Meehan M, Whitton M, Brooks F (2001) Passive haptics significantly enhances virtual environments. Technical Report 01–10, Department of Computer Science, UNC Chapel Hill
Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872
Jelinek F (1999) Statistical methods for speech recognition. MIT Press, Cambridge
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Malassiotis S, Aifanti N, Strintzis M (2002) A gesture recognition system using 3D data. In: Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmisssion, pp 190–193
Mckenna SJ, Morrison K (2004) A comparison of skin history and trajectory-based representation schemes for the recognition of user-specific gestures. Pattern Recognit 37:999–1009
Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Processing Magazine, pp 47–60
Nickel K, Stiefelhagen R (2003) Pointing gesture recognition based on 3D-tracking of face, hands and head orientation. In: Workshop on Perceptive User Interfaces, pp 140–146
Oka K, Sato Y,Koike H (2002) Real-time fingertip tracking and gesture recognition. IEEE Comput Graph Appl 22(6):64–71
Parameswaran V, Chellappa R (2003) View invariants for human action recognition. In Proceedings of Comput Vis Pattern Recognit 2:613–619
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695
Pentland A, Liu A (1999) Modeling and prediciton of human behavior. Neural Comput 11(1):229–242
Quek F (1996) Unnencumbered gesture interaction. IEEE Multimedia 3(3):36–47
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Salada M, Colgate JE, Lee M, Vishton P (2002) Validating a novel approach to rendering fingertip contact sensations. In: Proceedings of the 10th IEEE Virtual Reality Haptics Symposium, pp 217–224
Schalkoff RJ (1997) Artificial neural networks. McGraw-Hill, New York
Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. In Proceedings of Comput Vis Pattern Recognit 2:862–869
Shin MC, Tsap LV, Goldgof DB (2004) Gesture recognition using bezier curvers for visualization navigation from registered 3-D data. Pattern Recognit 37(0):1011–1024
Starner T, Pentland A (1996) Real-time american sign language recognition from video using hidden Markov models. Technical Report TR-375, M.I.T. Media Laboratory
Tomasi C, Petrov S, Sastry A (2003) 3D Tracking = Classification + Interpolation. In: Proceeding International Conference Computer Vision, pp 1441–1448
von Hardenberg C, Berard F (2001) Bare-hand human-computer interaction. In: Workshop on Perceptive User Interfaces
Wilson A, Bobick A (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9):884–900
Wren C, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–784
Wu Y, Huang TS (2000) View-independent recognition of hand postures. In Proceedings of Comput Vis Pattern Recognit 2:88–94
Wu Y, Huang TS (2001) Hand modeling, analysis, and recognition. IEEE Signal Processing Magazine 18(3):51–60
Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: Proceedings of the 1992 IEEE Conference on Computer Vision Pattern Recognition, pp 379–385
Ye G, Corso JJ, Burschka D, Hager GD (2004) VICs: a modular hci framework using spatio-temporal dynamics. Machine Vision and Applications
Ye G, Corso JJ, Hager GD (2004) Gesture recognition using 3D appearance and motion features. In: Proceedings of CVPR workshop on Real-Time Vision for Human-Computer Interaction
Ye G, Corso JJ, Hager GD, Okamura AM (2003) VisHap: augmented reality combining haptics and vision. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp 3425–3431
Yokokohji Y, Kinoshita J, Yoshikawa T (2001) Path planning for encountered-type haptic devices that render multiple objects in 3D space. In: Proceedings of IEEE Virtual Reality, pp 271–278
Yoshikawa T, Nagura A (2001) A touch/force display system for haptic interface. Presence 10(2):225–235
Zhang Z, Wu T, Shan Y, Shafer S (2001) Visual panel: virtual mouse keyboard and 3D controller with an ordinary piece of paper. In: Workshop on Perceptive User Interfaces
Zhou H, Lin DJ, Huang TS (2004) Static hand postures recognition based on local orientation histogram feature distribution model. In: Proceedings of CVPR workshop on Real-Time Vision for Human-Computer Interaction
Acknowledgements
We thank Darius Burschka for his help with the Visual Interaction Cues project. This work was in part funded by a Link Foundation Fellowship in Simulation and Training and by the National Science Foundation under Grant No. 0112882.
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at http://dx.doi.org/10.1007/s10055-005-0007-1.
Rights and permissions
About this article
Cite this article
Corso, J.J., Ye, G. & Hager, G.D. Analysis of composite gestures with a coherent probabilistic graphical model. Virtual Reality 8, 242–252 (2005). https://doi.org/10.1007/s10055-005-0157-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-005-0157-1