Skip to main content

Boundary Descriptors for Visual Speech Recognition

  • Conference paper
  • First Online:

Abstract

Lip reading has attracted considerable research interest for improved performance of automatic speech recognition (Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)). The key issue in visual speech recognition is the representation of the information from speech articulators as a feature vector. In this paper, we define the lips using lip contour spatial coordinates as boundary descriptors. Traditionally, Principal Component Analysis (PCA), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) techniques are applied on pixels from images of the mouth. In our paper, we apply PCA on spatial points for data reduction. DCT and DFT are applied directly on the boundary descriptors to transform these spatial coordinates into the frequency domain. The new spatial and frequency domain feature vectors are used to classify the spoken word. Accuracy of 53.4% is obtained in the spatial domain and 54.3% in the frequency domain which is comparable to results reported in literature.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Arsic, I., Thiran, J.P.: Mutual information eigenlips for audio-visual speech recognition. 14th European Signal Processing Conference (EUSIPCO) (2006)

    Google Scholar 

  2. Cai, D., He, X., Zhou, K.: Locality sensitive discriminant analysis. International Joint Conference on Artificial Itelligence. pp. 708–713 (2007)

    Google Scholar 

  3. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)

    Article  Google Scholar 

  4. Feng, X., Wang, W.: DTCWT-based dynamic texture features for visual speech recognition IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2008). pp. 497–500 (2008)

    Google Scholar 

  5. Gupta, D., Singh, P., Laxmi, V., Gaur, M.S.: Comparison of parametric visual features for speech recognition. Proceedings of the IEEE International Conference on Network Communication and Computer (ICNCC, 2011). pp. 432–435 (2011)

    Google Scholar 

  6. Hong, X., Yao, H., Wan, Y., Chen, R.: A PCA based visual DCT feature extraction method for lip-reading. International Conference on Intelligent Information Hiding and Multimedia Signal Processing, (IIH-MSP ’06). pp. 321–326 (2006)

    Google Scholar 

  7. Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2001). pp. 825–828 (2001)

    Google Scholar 

  8. Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of Visual Features for Lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)

    Article  Google Scholar 

  9. Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic Bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Process. 1274–1288 (2002)

    Google Scholar 

  10. Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards practical deployment of audio-visual speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) 3(iii), 777–80 (2004)

    Google Scholar 

  11. Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, New Jersey (1993)

    Google Scholar 

  12. Wang, X., Hao, Y., Fu, D., Yuan, C.: Audio-visual automatic speech recognition for connected digits. Proceedings of 2nd International Symposium on Intelligent Information Technology Application. pp. 328–332 (2008)

    Google Scholar 

  13. University of Waikato.: Open Source Machine Learning Software WEKA. http://www.cs.waikato.ac.nz/ml/weka/

  14. Yau, W.C., Kumar, D.K., Arjunan, S.P., Kumar, S.: Visual speech recognition using image moments and multiresolution wavelet images. International Conference on Computer Graphics, Imaging and Visualisation. pp. 194–199 (2006)

    Google Scholar 

Download references

Acknowledgment

The authors are grateful to the Department of Science & Technology, Government of India, for supporting and funding this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoj S. Gaur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this paper

Cite this paper

Gupta, D., Singh, P., Laxmi, V., Gaur, M.S. (2011). Boundary Descriptors for Visual Speech Recognition. In: Gelenbe, E., Lent, R., Sakellari, G. (eds) Computer and Information Sciences II. Springer, London. https://doi.org/10.1007/978-1-4471-2155-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2155-8_39

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2154-1

  • Online ISBN: 978-1-4471-2155-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics