Skip to main content

Real-Time Gesture Recognition by Means of Hybrid Recognizers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2298))

Abstract

In recent times, there have been significant efforts to develop intelligent and natural interfaces for interaction between human users and computer systems by means of a variety of modes of information (visual, audio, pen, etc.). These modes can be used either individually or in combination with other modes. One of the most promising interaction modes for these interfaces is the human user’s natural gesture.

In this work, we apply computer vision techniques to analyze real-time video streams of a user’s freehand gestures from a predefrined vocabulary. We propose the use of a set of hybrid recognizers where each of them accounts for one single gesture and consists of one hidden Markov model (HMM) whose state emission probabilities are computed by partially recurrent artificial neural networks (ANN).

The underlying idea is to take advantage of the strengths of ANNs to capture the nonlinear local dependencies of a gesture, while handling its temporal structure within the HMM formalism. The recognition engine’s accuracy outperforms that of HMM- and ANN-based recognizers used individually.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amari S.-I., “Dynamics of pattern formation in lateral-inhibition type neural fields”, Biological Cybernetics, 27:77–87, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  2. Backer J., “Stochastic Modeling for Automatic Speech Understanding”, Speech Recognition, Reddy D. eds, pp. 521–542, Academic Press, New York, 1975.

    Google Scholar 

  3. Bengio Y., “A Connectionist Approach to Speech Recognition”, International Journal of Pattern Recognition and Artificial Intelligence, 7(4):3–22, 1993.

    Article  Google Scholar 

  4. Bengio Y., “Markovian Model for Sequential Data”, Neural Computing Surveys, 2:129–162, 1999.

    Google Scholar 

  5. Bishop C. M., “Neural Networks for Pattern Recognition”, Clarendon Press, 1995.

    Google Scholar 

  6. Bourlard H., and Morgan N., “Hybrid Connectionist Models for continuous Speech Recognition”, Automatic Speech and Speaker Recognition: Advanced Topics, Lee, Soong and Paliwal eds., pp. 259–283, Kluwert Academic, 1997.

    Google Scholar 

  7. Braumann U.-D., “Multi-Cue-Ansatz für ein Dynamisches Auffälligkeitssytem zur Visuellen Personenlokalisation”, PhD thesis, TU Ilmenau (Germany), 2001.

    Google Scholar 

  8. Bridle J. S., “Probabilistic Interpretation of Feedforward Classification Network Outputs with Relationship to Statistical Pattern Recognition”, Neurocomputing: Algorithms, Architectures and Applications, Soulie’ F. and Herault J. eds., NATO ASI Series, pp. 227–236, 1990.

    Google Scholar 

  9. Cohen P. R., Johnston M., McGee D. R., Oviatt S., Pittman J., Smith I., Chen L., and Clow J., “QuickSet: Multimodal interaction for distributed applications”, Proceedings of the 5th International Multimedia Conference, pp. 31–40, 1997.

    Google Scholar 

  10. Böhme H.-J., Braumann U.-D., Corradini A., and Groß H.-M., “Person Localization and Posture Recognition for Human-robot Interaction”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 105–116, 1999.

    Google Scholar 

  11. Corradini A., and Groß H.-M., “Implementation and Comparison of Three Architectures for Gesture Recognition”, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000.

    Google Scholar 

  12. Corradini A., Böhme H.-J., and Groß H.-M., “A Hybrid Stochastic-Connectionist Architecture for Gesture Recognition”, special issue of the International Journal on Artificial Intelligence Tools, 9(2):177–204, 2000.

    Article  Google Scholar 

  13. Dorffner G., “Neural Networks for Time Series Processing↦, Neural NetworkWorld, 6(4):447–468, 1996.

    Google Scholar 

  14. Efron D., “Gesture, Race and Culture”, Mouton & Co. (The Hague), 1972.

    Google Scholar 

  15. Elman J. L., “Finding Structure in Time”, Cognitive Science, 14:179–211, 1990.

    Article  Google Scholar 

  16. Essa I. A., and Pentland A.,“Facial Expression Recognition using Dynamic Model and Motion Energy”, MIT Media Laboratory, Technical Report 307, 1995.

    Google Scholar 

  17. Hjorth J. S. U., “Computer Intensive Statistical Methods Validation, Model Selection, and Bootstrap”, Chapman & Hall, 1994.

    Google Scholar 

  18. Jordan M. I., “Serial Order: A Parallel Distributed Processing Approach”, Advances in Connectionist Theory, Elman L. and Rumelhart E. eds., Lawrence Erlbaum, 1989.

    Google Scholar 

  19. Kendon A., “Current Issues in the Study of Gesture”, Current Issues in the Study of Gesture, Nespoulous J.-L., Perron P., and Lecours A. R. eds pp. 200–241, 1986.

    Google Scholar 

  20. King S., and Weiman C., “Helpmate Autonomous Mobile Robot Navigation System”, Proc. of the SPIE Conf. on Mobile Robots, pp. 190–198, Vol. 2352, 1990.

    Google Scholar 

  21. Kundu A., and Bahl L., “Recognition of Handwritten Script: a Hidden Markov Model based Approach”, Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 928–931, 1988.

    Google Scholar 

  22. LaViola J. J., “A Multimodal Interface Frameworkfor Using Hand Gestures and Speech in Virtual Environments Applications”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 303–314, 1999.

    Google Scholar 

  23. McGee D. R., and Cohen P. R., “Creating tangible interfaces by augmenting physical objects with multimodal language”, Proceedings of the International Conference on Intelligent User Interfaces, ACM Press, pp. 113–119, 2001.

    Google Scholar 

  24. McKenzie Mills K., and Alty J. L., “Investigating the Role of Redundancy in Multimodal Input Systems”, Gesture and Sign-Language in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1371, pp. 159–171, 1997.

    Google Scholar 

  25. McNeill D., “Hand and Mind: what gestures reveal about thought”, The University of Chicago Press (Chicago, IL), 1992.

    Google Scholar 

  26. Morris D., Collett P., Marsh P., and O’shaughnessy M., “Gestures: their origin and distribution”, Stein and Day, 1979.

    Google Scholar 

  27. Mozer M. C., “Neural Net Architectures for Temporal Sequence Processing”, Weigend A. and Gerschenfeld N. eds., Time Series Prediction: Forecasting the Future and Understanding the Past, Addison-Wesley, pp. 243–264, 1993.

    Google Scholar 

  28. Pavlovic V. I., Sharma R., and Huang T. S., “Visual Interpretation of Hand Gestures for Human Computer Interaction: A Review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):677–695, July 1997.

    Article  Google Scholar 

  29. Rabiner L. R., and Juang B. H., “An Introduction to Hidden Markov Models”, IEEE ASSP Magazine, pp. 4–16, 1986.

    Google Scholar 

  30. Richard M. D., and Lippmann R. P., “Neural Networks Classifiers Estimate Bayesian a posteriori Probabilities”, Neural Computation, 3:461–483, 1992.

    Article  Google Scholar 

  31. Rubine D, “Specifying Gestures by Example”, Computer Graphics, 25(4), pp. 329–337, 1991.

    Article  Google Scholar 

  32. Schenkel M. K., “Handwriting Recognition using Neural Networks and Hidden Markov Models”, Series in Microelectronics, Vol. 45, Hartung-Gorre Verlag, 1995.

    Google Scholar 

  33. Schraft R. D., Schmierer G., “Serviceroboter”, Springer Verlag, 1998.

    Google Scholar 

  34. Sowa T., Fröhlich M., and Latoschik M. E., “Temporal Symbolic Intgration Applied to a Multimodal System Using Gestures and Speech”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 291–302, 1999.

    Google Scholar 

  35. Stove A., “Non-Emblematic Gestures for Estimating Mood”, Gesture and Sign-Language in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1371, pp. 165–171, 1997.

    Google Scholar 

  36. Waibel A., Hanazawa T., Hinton G. E., Shikano K., and Lang K. J., “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE Transactions on Acoustic, Speech, and Signal Processing, 37(12):1888–1898, 1989.

    Article  Google Scholar 

  37. Waibel A., and Lee K., ℝdReadings in Speech Recognition”, Morgan Kaufmann, 1990.

    Google Scholar 

  38. Waldherr S., Thrun S., and Romero R., “A Gesture-based Interface for Humanrobot Interaction”, to appear in: Autonomous Robots, 2000.

    Google Scholar 

  39. Wren C. R., Azarbayejani A., Darrell T., and Pentland A. P., “Pfinder: Real-Time Tracking of the Human Body”, IEEE Transactions on PAMI, 19(7):780–785, 1997.

    Google Scholar 

  40. Wu Y., and Huang T. S., “Vision-Based Gesture Recognition: A Review”, Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, Lecture Notes in Artificial Intelligence 1739, pp. 103–116, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Corradini, A. (2002). Real-Time Gesture Recognition by Means of Hybrid Recognizers. In: Wachsmuth, I., Sowa, T. (eds) Gesture and Sign Language in Human-Computer Interaction. GW 2001. Lecture Notes in Computer Science(), vol 2298. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47873-6_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-47873-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43678-2

  • Online ISBN: 978-3-540-47873-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics