Abstract
Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arsic, I., Thiran, J.P.: Mutual information eigenlips for audio-visual speech recognition. 14th European Signal Processing Conference (EUSIPCO) (2006)
Cai, D., He, X., Zhou, K.: Locality sensitive discriminant analysis. International Joint Conference on Artificial Itelligence. pp. 708–713 (2007)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Feng, X., Wang, W.: DTCWT-based dynamic texture features for visual speech recognition IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2008). pp. 497–500 (2008)
Gupta, D., Singh, P., Laxmi, V., Gaur, M.S.: Comparison of parametric visual features for speech recognition. Proceedings of the IEEE International Conference on Network Communication and Computer (ICNCC, 2011). pp. 432–435 (2011)
Hong, X., Yao, H., Wan, Y., Chen, R.: A PCA based visual DCT feature extraction method for lip-reading. International Conference on Intelligent Information Hiding and Multimedia Signal Processing, (IIH-MSP ’06). pp. 321–326 (2006)
Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2001). pp. 825–828 (2001)
Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of Visual Features for Lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)
Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic Bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Process. 1274–1288 (2002)
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards practical deployment of audio-visual speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) 3(iii), 777–80 (2004)
Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)
Wang, X., Hao, Y., Fu, D., Yuan, C.: Audio-visual automatic speech recognition for connected digits. Proceedings of 2nd International Symposium on Intelligent Information Technology Application. pp. 328–332 (2008)
University of Waikato.: Open Source Machine Learning Software WEKA. http://www.cs.waikato.ac.nz/ml/weka/
Yau, W.C., Kumar, D.K., Arjunan, S.P., Kumar, S.: Visual speech recognition using image moments and multiresolution wavelet images. International Conference on Computer Graphics, Imaging and Visualisation. pp. 194–199 (2006)
Acknowledgment
The authors are grateful to the Department of Science & Technology, Government of India, for supporting and funding this project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this paper
Cite this paper
Gupta, D., Singh, P., Laxmi, V., Gaur, M.S. (2011). Boundary Descriptors for Visual Speech Recognition. In: Gelenbe, E., Lent, R., Sakellari, G. (eds) Computer and Information Sciences II. Springer, London. https://doi.org/10.1007/978-1-4471-2155-8_39
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2155-8_39
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2154-1
Online ISBN: 978-1-4471-2155-8
eBook Packages: EngineeringEngineering (R0)