Abstract
In recent times, there have been significant efforts to develop intelligent and natural interfaces for interaction between human users and computer systems by means of a variety of modes of information (visual, audio, pen, etc.). These modes can be used either individually or in combination with other modes. One of the most promising interaction modes for these interfaces is the human user’s natural gesture.
In this work, we apply computer vision techniques to analyze real-time video streams of a user’s freehand gestures from a predefrined vocabulary. We propose the use of a set of hybrid recognizers where each of them accounts for one single gesture and consists of one hidden Markov model (HMM) whose state emission probabilities are computed by partially recurrent artificial neural networks (ANN).
The underlying idea is to take advantage of the strengths of ANNs to capture the nonlinear local dependencies of a gesture, while handling its temporal structure within the HMM formalism. The recognition engine’s accuracy outperforms that of HMM- and ANN-based recognizers used individually.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amari S.-I., “Dynamics of pattern formation in lateral-inhibition type neural fields”, Biological Cybernetics, 27:77–87, 1977.
Backer J., “Stochastic Modeling for Automatic Speech Understanding”, Speech Recognition, Reddy D. eds, pp. 521–542, Academic Press, New York, 1975.
Bengio Y., “A Connectionist Approach to Speech Recognition”, International Journal of Pattern Recognition and Artificial Intelligence, 7(4):3–22, 1993.
Bengio Y., “Markovian Model for Sequential Data”, Neural Computing Surveys, 2:129–162, 1999.
Bishop C. M., “Neural Networks for Pattern Recognition”, Clarendon Press, 1995.
Bourlard H., and Morgan N., “Hybrid Connectionist Models for continuous Speech Recognition”, Automatic Speech and Speaker Recognition: Advanced Topics, Lee, Soong and Paliwal eds., pp. 259–283, Kluwert Academic, 1997.
Braumann U.-D., “Multi-Cue-Ansatz für ein Dynamisches Auffälligkeitssytem zur Visuellen Personenlokalisation”, PhD thesis, TU Ilmenau (Germany), 2001.
Bridle J. S., “Probabilistic Interpretation of Feedforward Classification Network Outputs with Relationship to Statistical Pattern Recognition”, Neurocomputing: Algorithms, Architectures and Applications, Soulie’ F. and Herault J. eds., NATO ASI Series, pp. 227–236, 1990.
Cohen P. R., Johnston M., McGee D. R., Oviatt S., Pittman J., Smith I., Chen L., and Clow J., “QuickSet: Multimodal interaction for distributed applications”, Proceedings of the 5th International Multimedia Conference, pp. 31–40, 1997.
Böhme H.-J., Braumann U.-D., Corradini A., and Groß H.-M., “Person Localization and Posture Recognition for Human-robot Interaction”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 105–116, 1999.
Corradini A., and Groß H.-M., “Implementation and Comparison of Three Architectures for Gesture Recognition”, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000.
Corradini A., Böhme H.-J., and Groß H.-M., “A Hybrid Stochastic-Connectionist Architecture for Gesture Recognition”, special issue of the International Journal on Artificial Intelligence Tools, 9(2):177–204, 2000.
Dorffner G., “Neural Networks for Time Series Processing↦, Neural NetworkWorld, 6(4):447–468, 1996.
Efron D., “Gesture, Race and Culture”, Mouton & Co. (The Hague), 1972.
Elman J. L., “Finding Structure in Time”, Cognitive Science, 14:179–211, 1990.
Essa I. A., and Pentland A.,“Facial Expression Recognition using Dynamic Model and Motion Energy”, MIT Media Laboratory, Technical Report 307, 1995.
Hjorth J. S. U., “Computer Intensive Statistical Methods Validation, Model Selection, and Bootstrap”, Chapman & Hall, 1994.
Jordan M. I., “Serial Order: A Parallel Distributed Processing Approach”, Advances in Connectionist Theory, Elman L. and Rumelhart E. eds., Lawrence Erlbaum, 1989.
Kendon A., “Current Issues in the Study of Gesture”, Current Issues in the Study of Gesture, Nespoulous J.-L., Perron P., and Lecours A. R. eds pp. 200–241, 1986.
King S., and Weiman C., “Helpmate Autonomous Mobile Robot Navigation System”, Proc. of the SPIE Conf. on Mobile Robots, pp. 190–198, Vol. 2352, 1990.
Kundu A., and Bahl L., “Recognition of Handwritten Script: a Hidden Markov Model based Approach”, Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 928–931, 1988.
LaViola J. J., “A Multimodal Interface Frameworkfor Using Hand Gestures and Speech in Virtual Environments Applications”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 303–314, 1999.
McGee D. R., and Cohen P. R., “Creating tangible interfaces by augmenting physical objects with multimodal language”, Proceedings of the International Conference on Intelligent User Interfaces, ACM Press, pp. 113–119, 2001.
McKenzie Mills K., and Alty J. L., “Investigating the Role of Redundancy in Multimodal Input Systems”, Gesture and Sign-Language in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1371, pp. 159–171, 1997.
McNeill D., “Hand and Mind: what gestures reveal about thought”, The University of Chicago Press (Chicago, IL), 1992.
Morris D., Collett P., Marsh P., and O’shaughnessy M., “Gestures: their origin and distribution”, Stein and Day, 1979.
Mozer M. C., “Neural Net Architectures for Temporal Sequence Processing”, Weigend A. and Gerschenfeld N. eds., Time Series Prediction: Forecasting the Future and Understanding the Past, Addison-Wesley, pp. 243–264, 1993.
Pavlovic V. I., Sharma R., and Huang T. S., “Visual Interpretation of Hand Gestures for Human Computer Interaction: A Review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):677–695, July 1997.
Rabiner L. R., and Juang B. H., “An Introduction to Hidden Markov Models”, IEEE ASSP Magazine, pp. 4–16, 1986.
Richard M. D., and Lippmann R. P., “Neural Networks Classifiers Estimate Bayesian a posteriori Probabilities”, Neural Computation, 3:461–483, 1992.
Rubine D, “Specifying Gestures by Example”, Computer Graphics, 25(4), pp. 329–337, 1991.
Schenkel M. K., “Handwriting Recognition using Neural Networks and Hidden Markov Models”, Series in Microelectronics, Vol. 45, Hartung-Gorre Verlag, 1995.
Schraft R. D., Schmierer G., “Serviceroboter”, Springer Verlag, 1998.
Sowa T., Fröhlich M., and Latoschik M. E., “Temporal Symbolic Intgration Applied to a Multimodal System Using Gestures and Speech”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 291–302, 1999.
Stove A., “Non-Emblematic Gestures for Estimating Mood”, Gesture and Sign-Language in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1371, pp. 165–171, 1997.
Waibel A., Hanazawa T., Hinton G. E., Shikano K., and Lang K. J., “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE Transactions on Acoustic, Speech, and Signal Processing, 37(12):1888–1898, 1989.
Waibel A., and Lee K., ℝdReadings in Speech Recognition”, Morgan Kaufmann, 1990.
Waldherr S., Thrun S., and Romero R., “A Gesture-based Interface for Humanrobot Interaction”, to appear in: Autonomous Robots, 2000.
Wren C. R., Azarbayejani A., Darrell T., and Pentland A. P., “Pfinder: Real-Time Tracking of the Human Body”, IEEE Transactions on PAMI, 19(7):780–785, 1997.
Wu Y., and Huang T. S., “Vision-Based Gesture Recognition: A Review”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 103–116, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Corradini, A. (2002). Real-Time Gesture Recognition by Means of Hybrid Recognizers. In: Wachsmuth, I., Sowa, T. (eds) Gesture and Sign Language in Human-Computer Interaction. GW 2001. Lecture Notes in Computer Science(), vol 2298. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47873-6_4
Download citation
DOI: https://doi.org/10.1007/3-540-47873-6_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43678-2
Online ISBN: 978-3-540-47873-7
eBook Packages: Springer Book Archive