Abstract
Visual speech recognition refers to the identification of utterances through the movements of lips, tongue, teeth, and other facial muscles of the speaker without using the acoustic signal. This work shows the relative benefits of both static and dynamic visual speech features for improved visual speech recognition. Two approaches for visual feature extraction have been considered: (1) an image transform based static feature approach in which Discrete Cosine Transform (DCT) is applied to each video frame and 6×6 triangle region coefficients are considered as features. Principal Component Analysis (PCA) is applied over all 60 features corresponding to the video frame to reduce the redundancy; the resultant 21 coefficients are taken as the static visual features. (2) Motion segmentation based dynamic feature approach in which the facial movements are segmented from the video file using motion history images (MHI). DCT is applied to the MHI and triangle region coefficients are taken as the dynamic visual features. Two types of experiments were done one with concatenated features and another with dimension reduced feature by using PCA to identify the utterances. The left-right continuous HMMs are used as visual speech classifier to classify nine MPEG-4 standard viseme consonants. The experimental result shows that the concatenated as well as dimension reduced features improve te visual speech recognition with a high accuracy of 92.45% and 92.15% respectively.
Preview
Unable to display preview. Download preview PDF.
References
Chen, T.: Audio Visual Speech Processing: Lip Reading and Lip Synchronization. IEEE Signal Processing Magazine, pp. 9–21 (2001)
Rowan S., Darryl, S., Ji Ming M.: Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos. EURASIP Journal on Image and Video Processing. Hindawi Publishing Corporation (2008)
Potamianos, G., Neti, C., Luettin, J., Matthews, I.,: Audio-Visual Automatic Speech Recognition: An Overview. In: Issues in Visual and Audio-Visual Speech processing. Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) MIT Press (2004)
Say Wei Foo, Yong Lian, Liang Dong: Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. IEEE Transactions on Circuits and Systems for Video Technology Vol. 14, No. 5 (2004)
Yau, W.C., Kumar, D.K., Arjunan, S.P.,: Voiceless Speech Recoginition Using Dynamic Visual Speech Features. Australian Computer Society, Inc, pp. 93–101. Canberra Australia (2006)
Yau, W.C., Kumar, D.K., Weghorn, H.,: Visual Speech Recoginition Using Motion Features and Hidden Markov Models. In: Kampel, M., Hanbury, A. (eds.) LNCS, pp. 832–839. Springer, Heidelberg (2007)
Potamianos, G., Verma, A., Neti, C., Iyengar, G., Basu, S.,: A Cascade Image Transform for Speaker Independent Automatic Speech Reading. In: Proceeding of IEEE International Conference on Multimedia and Expo. pp. 1097–1100, New York (2000)
Potamianos, G., Graf, H.P., Cosatto, E.,: An Image Transform Approach for HMM Based Automatic Lip Reading. In: Proc of the International Conference on Image Processing. Vol. 3, pp. 173–177, Chicago (1998)
Foo, S.W., Dong, L.,: Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, Y.C., Chang, L.W., Hsu, C.T. (eds.) LNCS, pp. 607–614. Springer, Heidelberg (2002)
Rabiner, L.R., Juang, B.H.,: Fundamentals of Speech Recognition. Signal Processing Series, Prentice-hall, Englewood Cliffs, NJ (1993)
Petajan, E.D., Bischoff, B., Bodoff, D.,: An Improved Automatic Lip Reading System to Enhance Speech Recognition. In: ACM SIGCHI-88. pp. 19–25 (1988)
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Hass, N., Jiang, J.,: Towards Practical Development of Audio-Visual Speech Recognition. In: IEEE International Conference on Acoustic, Speech, and Signal Processing (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Indian Institute of Information Technology, India
About this paper
Cite this paper
Rajavel, R., Sathidevi, P.S. (2009). Static and Dynamic Features for Improved HMM based Visual Speech Recognition. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds) Proceedings of the First International Conference on Intelligent Human Computer Interaction. Springer, New Delhi. https://doi.org/10.1007/978-81-8489-203-1_17
Download citation
DOI: https://doi.org/10.1007/978-81-8489-203-1_17
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-8489-404-2
Online ISBN: 978-81-8489-203-1
eBook Packages: Computer ScienceComputer Science (R0)