Skip to main content

Visual Speech Recognition Using Motion Features and Hidden Markov Models

  • Conference paper
Computer Analysis of Images and Patterns (CAIP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4673))

Included in the following conference series:

Abstract

This paper presents a novel visual speech recognition approach based on motion segmentation and hidden Markov models (HMM). The proposed method identifies utterances from mouth video, without evaluating voice signals. The facial movements in the video data are represented using 2D spatial-temporal templates (STT). The proposed technique combines discrete stationary wavelet transform (SWT) and Zernike moments to extract rotation invariant features from the STTs. HMMs are used as speech classifier to model English phonemes. The preliminary results demonstrate that the proposed technique is suitable for phoneme classification with a high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arjunan, S.P., Kumar, D.K., Yau, W.C., Weghorn, H.: Unspoken Vowel Recognition Using Facial Electromyogram. IEEE EMBC, New York (2006)

    Google Scholar 

  2. Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)

    Article  Google Scholar 

  3. Foo, S.W., Dong, L.: Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, Y.-C., Chang, L.-W., Hsu, C.-T. (eds.) PCM 2002. LNCS, vol. 2532, pp. 607–614. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Goldschen, A.J., Garcia, O.N., Petajan, E.: Continuous Optical Automatic Speech Recognition by Lipreading. In: 28th Annual Asilomar Conf. on Signal Systems and Computer (1994)

    Google Scholar 

  5. Hazen, T.J.: Visual Model Structures and Synchrony Constraints for Audio-Visual Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing 14(3), 1082–1089 (2006)

    Article  Google Scholar 

  6. Kaynak, M.N., Qi, Z., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio-visual modeling for bimodal speech recognition. IEEE Transactions on Systems, Man and Cybernetics 34, 564–570 (2001)

    Google Scholar 

  7. Khontazad, A., Hong, Y.H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 489–497 (1990)

    Article  Google Scholar 

  8. Liang, L., Liu, X., Zhao, Y., Pi, X., Nefian, A.V.: Speaker Independent Audio-Visual Continuous Speech Recognition. In: IEEE Int. Conf. on Multimedia and Expo (2002)

    Google Scholar 

  9. Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition. In: GLOBECOM 1984 (1984)

    Google Scholar 

  10. Potamianos, G., Neti, C., Gravier, G., Senior, A.W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. Proc. of IEEE 91 (2003)

    Google Scholar 

  11. Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards Practical Deployment of Audio-Visual Speech Recognition. In: ICASSP, IEEE (2004)

    Google Scholar 

  12. Rabiner, L.R.: A tutorial on HMM and selected applications in speech recognition. Proc. IEEE 77(2-2), 257–286 (1989)

    Article  Google Scholar 

  13. Teh, C.H., Chin, R.T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 496–513 (1988)

    Article  MATH  Google Scholar 

  14. Yau, W.C., Kumar, D.K., Arjunan, S.P.: Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features. In: IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Walter G. Kropatsch Martin Kampel Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yau, W.C., Kumar, D.K., Weghorn, H. (2007). Visual Speech Recognition Using Motion Features and Hidden Markov Models. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds) Computer Analysis of Images and Patterns. CAIP 2007. Lecture Notes in Computer Science, vol 4673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74272-2_103

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74272-2_103

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74271-5

  • Online ISBN: 978-3-540-74272-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics