Abstract
This paper presents a novel visual speech recognition approach based on motion segmentation and hidden Markov models (HMM). The proposed method identifies utterances from mouth video, without evaluating voice signals. The facial movements in the video data are represented using 2D spatial-temporal templates (STT). The proposed technique combines discrete stationary wavelet transform (SWT) and Zernike moments to extract rotation invariant features from the STTs. HMMs are used as speech classifier to model English phonemes. The preliminary results demonstrate that the proposed technique is suitable for phoneme classification with a high accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arjunan, S.P., Kumar, D.K., Yau, W.C., Weghorn, H.: Unspoken Vowel Recognition Using Facial Electromyogram. IEEE EMBC, New York (2006)
Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)
Foo, S.W., Dong, L.: Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, Y.-C., Chang, L.-W., Hsu, C.-T. (eds.) PCM 2002. LNCS, vol. 2532, pp. 607–614. Springer, Heidelberg (2002)
Goldschen, A.J., Garcia, O.N., Petajan, E.: Continuous Optical Automatic Speech Recognition by Lipreading. In: 28th Annual Asilomar Conf. on Signal Systems and Computer (1994)
Hazen, T.J.: Visual Model Structures and Synchrony Constraints for Audio-Visual Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing 14(3), 1082–1089 (2006)
Kaynak, M.N., Qi, Z., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio-visual modeling for bimodal speech recognition. IEEE Transactions on Systems, Man and Cybernetics 34, 564–570 (2001)
Khontazad, A., Hong, Y.H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 489–497 (1990)
Liang, L., Liu, X., Zhao, Y., Pi, X., Nefian, A.V.: Speaker Independent Audio-Visual Continuous Speech Recognition. In: IEEE Int. Conf. on Multimedia and Expo (2002)
Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition. In: GLOBECOM 1984 (1984)
Potamianos, G., Neti, C., Gravier, G., Senior, A.W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. Proc. of IEEE 91 (2003)
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards Practical Deployment of Audio-Visual Speech Recognition. In: ICASSP, IEEE (2004)
Rabiner, L.R.: A tutorial on HMM and selected applications in speech recognition. Proc. IEEE 77(2-2), 257–286 (1989)
Teh, C.H., Chin, R.T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 496–513 (1988)
Yau, W.C., Kumar, D.K., Arjunan, S.P.: Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features. In: IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yau, W.C., Kumar, D.K., Weghorn, H. (2007). Visual Speech Recognition Using Motion Features and Hidden Markov Models. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds) Computer Analysis of Images and Patterns. CAIP 2007. Lecture Notes in Computer Science, vol 4673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74272-2_103
Download citation
DOI: https://doi.org/10.1007/978-3-540-74272-2_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74271-5
Online ISBN: 978-3-540-74272-2
eBook Packages: Computer ScienceComputer Science (R0)