Boundary Descriptors for Visual Speech Recognition

Gupta, Deepika; Singh, Preety; Laxmi, V.; Gaur, Manoj S.

doi:10.1007/978-1-4471-2155-8_39

Boundary Descriptors for Visual Speech Recognition

Deepika Gupta⁴,
Preety Singh⁴,
V. Laxmi⁴ &
…
Manoj S. Gaur⁴

Conference paper
First Online: 01 January 2011

924 Accesses
1 Citations

Abstract

Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arsic, I., Thiran, J.P.: Mutual information eigenlips for audio-visual speech recognition. 14th European Signal Processing Conference (EUSIPCO) (2006)
Google Scholar
Cai, D., He, X., Zhou, K.: Locality sensitive discriminant analysis. International Joint Conference on Artificial Itelligence. pp. 708–713 (2007)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Feng, X., Wang, W.: DTCWT-based dynamic texture features for visual speech recognition IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2008). pp. 497–500 (2008)
Google Scholar
Gupta, D., Singh, P., Laxmi, V., Gaur, M.S.: Comparison of parametric visual features for speech recognition. Proceedings of the IEEE International Conference on Network Communication and Computer (ICNCC, 2011). pp. 432–435 (2011)
Google Scholar
Hong, X., Yao, H., Wan, Y., Chen, R.: A PCA based visual DCT feature extraction method for lip-reading. International Conference on Intelligent Information Hiding and Multimedia Signal Processing, (IIH-MSP ’06). pp. 321–326 (2006)
Google Scholar
Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2001). pp. 825–828 (2001)
Google Scholar
Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of Visual Features for Lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)
Article Google Scholar
Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic Bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Process. 1274–1288 (2002)
Google Scholar
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards practical deployment of audio-visual speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) 3(iii), 777–80 (2004)
Google Scholar
Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)
Google Scholar
Wang, X., Hao, Y., Fu, D., Yuan, C.: Audio-visual automatic speech recognition for connected digits. Proceedings of 2nd International Symposium on Intelligent Information Technology Application. pp. 328–332 (2008)
Google Scholar
University of Waikato.: Open Source Machine Learning Software WEKA. http://www.cs.waikato.ac.nz/ml/weka/
Yau, W.C., Kumar, D.K., Arjunan, S.P., Kumar, S.: Visual speech recognition using image moments and multiresolution wavelet images. International Conference on Computer Graphics, Imaging and Visualisation. pp. 194–199 (2006)
Google Scholar

Download references

Acknowledgment

The authors are grateful to the Department of Science & Technology, Government of India, for supporting and funding this project.

Author information

Authors and Affiliations

Department of Computer Engineering, Malaviya National Institute of Technology, Jaipur, India
Deepika Gupta, Preety Singh, V. Laxmi & Manoj S. Gaur

Authors

Deepika Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Preety Singh
View author publications
You can also search for this author in PubMed Google Scholar
V. Laxmi
View author publications
You can also search for this author in PubMed Google Scholar
Manoj S. Gaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoj S. Gaur .

Editor information

Editors and Affiliations

, Dept of Electrical and Electronics Eng'g, Imperial College, London, SW7 2BT, United Kingdom
Erol Gelenbe
Imperial College, London, United Kingdom
Ricardo Lent
University of East London, London, United Kingdom
Georgia Sakellari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, D., Singh, P., Laxmi, V., Gaur, M.S. (2011). Boundary Descriptors for Visual Speech Recognition. In: Gelenbe, E., Lent, R., Sakellari, G. (eds) Computer and Information Sciences II. Springer, London. https://doi.org/10.1007/978-1-4471-2155-8_39

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2155-8_39
Published: 29 September 2011
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2154-1
Online ISBN: 978-1-4471-2155-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics