Static and Dynamic Features for Improved HMM based Visual Speech Recognition

Rajavel, R.; Sathidevi, P. S.

doi:10.1007/978-81-8489-203-1_17

R. Rajavel² &
P. S. Sathidevi²

1144 Accesses
4 Citations

Abstract

Visual speech recognition refers to the identification of utterances through the movements of lips, tongue, teeth, and other facial muscles of the speaker without using the acoustic signal. This work shows the relative benefits of both static and dynamic visual speech features for improved visual speech recognition. Two approaches for visual feature extraction have been considered: (1) an image transform based static feature approach in which Discrete Cosine Transform (DCT) is applied to each video frame and 6×6 triangle region coefficients are considered as features. Principal Component Analysis (PCA) is applied over all 60 features corresponding to the video frame to reduce the redundancy; the resultant 21 coefficients are taken as the static visual features. (2) Motion segmentation based dynamic feature approach in which the facial movements are segmented from the video file using motion history images (MHI). DCT is applied to the MHI and triangle region coefficients are taken as the dynamic visual features. Two types of experiments were done one with concatenated features and another with dimension reduced feature by using PCA to identify the utterances. The left-right continuous HMMs are used as visual speech classifier to classify nine MPEG-4 standard viseme consonants. The experimental result shows that the concatenated as well as dimension reduced features improve te visual speech recognition with a high accuracy of 92.45% and 92.15% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, T.: Audio Visual Speech Processing: Lip Reading and Lip Synchronization. IEEE Signal Processing Magazine, pp. 9–21 (2001)
Google Scholar
Rowan S., Darryl, S., Ji Ming M.: Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos. EURASIP Journal on Image and Video Processing. Hindawi Publishing Corporation (2008)
Google Scholar
Potamianos, G., Neti, C., Luettin, J., Matthews, I.,: Audio-Visual Automatic Speech Recognition: An Overview. In: Issues in Visual and Audio-Visual Speech processing. Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) MIT Press (2004)
Google Scholar
Say Wei Foo, Yong Lian, Liang Dong: Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. IEEE Transactions on Circuits and Systems for Video Technology Vol. 14, No. 5 (2004)
Google Scholar
Yau, W.C., Kumar, D.K., Arjunan, S.P.,: Voiceless Speech Recoginition Using Dynamic Visual Speech Features. Australian Computer Society, Inc, pp. 93–101. Canberra Australia (2006)
Google Scholar
Yau, W.C., Kumar, D.K., Weghorn, H.,: Visual Speech Recoginition Using Motion Features and Hidden Markov Models. In: Kampel, M., Hanbury, A. (eds.) LNCS, pp. 832–839. Springer, Heidelberg (2007)
Google Scholar
Potamianos, G., Verma, A., Neti, C., Iyengar, G., Basu, S.,: A Cascade Image Transform for Speaker Independent Automatic Speech Reading. In: Proceeding of IEEE International Conference on Multimedia and Expo. pp. 1097–1100, New York (2000)
Google Scholar
Potamianos, G., Graf, H.P., Cosatto, E.,: An Image Transform Approach for HMM Based Automatic Lip Reading. In: Proc of the International Conference on Image Processing. Vol. 3, pp. 173–177, Chicago (1998)
Google Scholar
Foo, S.W., Dong, L.,: Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, Y.C., Chang, L.W., Hsu, C.T. (eds.) LNCS, pp. 607–614. Springer, Heidelberg (2002)
Google Scholar
Rabiner, L.R., Juang, B.H.,: Fundamentals of Speech Recognition. Signal Processing Series, Prentice-hall, Englewood Cliffs, NJ (1993)
Google Scholar
Petajan, E.D., Bischoff, B., Bodoff, D.,: An Improved Automatic Lip Reading System to Enhance Speech Recognition. In: ACM SIGCHI-88. pp. 19–25 (1988)
Google Scholar
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Hass, N., Jiang, J.,: Towards Practical Development of Audio-Visual Speech Recognition. In: IEEE International Conference on Acoustic, Speech, and Signal Processing (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Calicut, India
R. Rajavel & P. S. Sathidevi

Authors

R. Rajavel
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sathidevi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Institute of Information Technology, Allahabad, India
U. S. Tiwary (Professor), Tanveer J. Siddiqui (Assistant Professor), M. Radhakrishna (Professor) & M. D. Tiwari (Director) (Professor), (Assistant Professor), (Professor) & (Director)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajavel, R., Sathidevi, P.S. (2009). Static and Dynamic Features for Improved HMM based Visual Speech Recognition. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds) Proceedings of the First International Conference on Intelligent Human Computer Interaction. Springer, New Delhi. https://doi.org/10.1007/978-81-8489-203-1_17

Download citation

DOI: https://doi.org/10.1007/978-81-8489-203-1_17
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-8489-404-2
Online ISBN: 978-81-8489-203-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics