Visual Speech Recognition Using Motion Features and Hidden Markov Models

Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans

doi:10.1007/978-3-540-74272-2_103

Wai Chee Yau¹,
Dinesh Kant Kumar¹ &
Hans Weghorn²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4673))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

1843 Accesses
5 Citations

Abstract

This paper presents a novel visual speech recognition approach based on motion segmentation and hidden Markov models (HMM). The proposed method identifies utterances from mouth video, without evaluating voice signals. The facial movements in the video data are represented using 2D spatial-temporal templates (STT). The proposed technique combines discrete stationary wavelet transform (SWT) and Zernike moments to extract rotation invariant features from the STTs. HMMs are used as speech classifier to model English phonemes. The preliminary results demonstrate that the proposed technique is suitable for phoneme classification with a high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arjunan, S.P., Kumar, D.K., Yau, W.C., Weghorn, H.: Unspoken Vowel Recognition Using Facial Electromyogram. IEEE EMBC, New York (2006)
Google Scholar
Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)
Article Google Scholar
Foo, S.W., Dong, L.: Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, Y.-C., Chang, L.-W., Hsu, C.-T. (eds.) PCM 2002. LNCS, vol. 2532, pp. 607–614. Springer, Heidelberg (2002)
Chapter Google Scholar
Goldschen, A.J., Garcia, O.N., Petajan, E.: Continuous Optical Automatic Speech Recognition by Lipreading. In: 28th Annual Asilomar Conf. on Signal Systems and Computer (1994)
Google Scholar
Hazen, T.J.: Visual Model Structures and Synchrony Constraints for Audio-Visual Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing 14(3), 1082–1089 (2006)
Article Google Scholar
Kaynak, M.N., Qi, Z., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio-visual modeling for bimodal speech recognition. IEEE Transactions on Systems, Man and Cybernetics 34, 564–570 (2001)
Google Scholar
Khontazad, A., Hong, Y.H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 489–497 (1990)
Article Google Scholar
Liang, L., Liu, X., Zhao, Y., Pi, X., Nefian, A.V.: Speaker Independent Audio-Visual Continuous Speech Recognition. In: IEEE Int. Conf. on Multimedia and Expo (2002)
Google Scholar
Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition. In: GLOBECOM 1984 (1984)
Google Scholar
Potamianos, G., Neti, C., Gravier, G., Senior, A.W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. Proc. of IEEE 91 (2003)
Google Scholar
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards Practical Deployment of Audio-Visual Speech Recognition. In: ICASSP, IEEE (2004)
Google Scholar
Rabiner, L.R.: A tutorial on HMM and selected applications in speech recognition. Proc. IEEE 77(2-2), 257–286 (1989)
Article Google Scholar
Teh, C.H., Chin, R.T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 496–513 (1988)
Article MATH Google Scholar
Yau, W.C., Kumar, D.K., Arjunan, S.P.: Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features. In: IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, RMIT University, GPO Box 2476V Melbourne, Victoria 3001, Australia
Wai Chee Yau & Dinesh Kant Kumar
Information Technology, BA-University of Cooperative Education, Stuttgart, Germany
Hans Weghorn

Authors

Wai Chee Yau
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Kant Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Hans Weghorn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Walter G. Kropatsch Martin Kampel Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yau, W.C., Kumar, D.K., Weghorn, H. (2007). Visual Speech Recognition Using Motion Features and Hidden Markov Models. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds) Computer Analysis of Images and Patterns. CAIP 2007. Lecture Notes in Computer Science, vol 4673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74272-2_103

Download citation

DOI: https://doi.org/10.1007/978-3-540-74272-2_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74271-5
Online ISBN: 978-3-540-74272-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics