Skip to main content

Gabor Based Lipreading with a New Audiovisual Mandarin Corpus

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11691))

Abstract

Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abel, A., Gao, C., Smith, L., Watt, R., Hussain, A.: Fast lip feature extraction using psychologically motivated Gabor features. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1033–1040. IEEE (2018)

    Google Scholar 

  2. Abel, A., Hussain, A.: Novel two-stage audiovisual speech filtering in noisy environments. Cogn. Comput. 6(2), 200–217 (2014)

    Article  Google Scholar 

  3. Abel, A., Hussain, A.: Cognitively Inspired Audiovisual Speech Filtering. SCC, vol. 5. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13509-0

  4. Assael, Y.M., Shillingford, B., Whiteson, S., De Freitas, N.: LipNet: end-to-end sentence-level lipreading (2016)

    Google Scholar 

  5. Bhadu, A., Tokas, R., Kumar, D.V.: Facial expression recognition using DCT, Gabor and Wavelet feature extraction techniques. Int. J. Eng. Innovative Technol. 2(1), 92–95 (2012)

    Google Scholar 

  6. Cao, J.: Chinese pronunciation: the complete guide for beginner. https://www.digmandarin.com/chinese-pronunciation-guide.html

  7. Dakin, S.C., Watt, R.J.: Biological “bar codes” in human faces. J. Vis. 9(4), 2.1–10 (2009)

    Article  Google Scholar 

  8. Han, J., Zhang, D., Hu, X., Guo, L., Ren, J., Wu, F.: Background prior-based salient object detection via deep reconstruction residual. IEEE Trans. Circ. Syst. Video Technol. 25(8), 1309–1321 (2014)

    Google Scholar 

  9. Huang, W.: Character-level convolutional network for text classification applied to Chinese corpus (2016)

    Google Scholar 

  10. Hursig, R.E., Zhang, J.X., Kam, C.: Lip localization algorithm using Gabor filters (2011)

    Google Scholar 

  11. Petridis, S., Wang, Y., Li, Z., Pantic, M.: End-to-end multi-view lipreading. In: British Machine Vision Conference, London, September 2017

    Google Scholar 

  12. Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  13. Sterpu, G., Harte, N.: Towards lipreading sentences with active appearance models. arXiv preprint arXiv:1805.11688 (2018)

  14. Sujatha, B., Santhanam, T.: A novel approach integrating geometric and gabor wavelet approaches to improvise visual lip-reading. Int. J. Soft Comput. 5, 13–18 (2010)

    Article  Google Scholar 

  15. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  16. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  17. Wand, M., Koutník, J., Schmidhuber, J.: Lipreading with long short-term memory. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6115–6119. IEEE (2016)

    Google Scholar 

  18. Weng, X.: On the importance of video action recognition for visual lipreading. arXiv preprint arXiv:1903.09616 (2019)

  19. Zhang, X., Gong, H., Dai, X., Yang, F., Liu, N., Liu, M.: Understanding pictograph with facial features: end-to-end sentence-level lip reading of Chinese (2019)

    Google Scholar 

  20. Zhou, Z., Zhao, G., Hong, X., Pietikäinen, M.: A review of recent advances in visual speech decoding. Image Vis. Comput. 32(9), 590–605 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by XJTLU Grant RDF 16-01-35, and partially funded by the Research Institute of Big Data Analytics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Abel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Y., Li, Y., Abel, A. (2020). Gabor Based Lipreading with a New Audiovisual Mandarin Corpus. In: Ren, J., et al. Advances in Brain Inspired Cognitive Systems. BICS 2019. Lecture Notes in Computer Science(), vol 11691. Springer, Cham. https://doi.org/10.1007/978-3-030-39431-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39431-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39430-1

  • Online ISBN: 978-3-030-39431-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics