Gabor Based Lipreading with a New Audiovisual Mandarin Corpus

Xu, Yan; Li, Yuexuan; Abel, Andrew

doi:10.1007/978-3-030-39431-8_16

Gabor Based Lipreading with a New Audiovisual Mandarin Corpus

Yan Xu¹⁶,
Yuexuan Li¹⁶ &
Andrew Abel¹⁶

Conference paper
First Online: 01 February 2020

1269 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11691))

Abstract

Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abel, A., Gao, C., Smith, L., Watt, R., Hussain, A.: Fast lip feature extraction using psychologically motivated Gabor features. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1033–1040. IEEE (2018)
Google Scholar
Abel, A., Hussain, A.: Novel two-stage audiovisual speech filtering in noisy environments. Cogn. Comput. 6(2), 200–217 (2014)
Article Google Scholar
Abel, A., Hussain, A.: Cognitively Inspired Audiovisual Speech Filtering. SCC, vol. 5. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13509-0
Assael, Y.M., Shillingford, B., Whiteson, S., De Freitas, N.: LipNet: end-to-end sentence-level lipreading (2016)
Google Scholar
Bhadu, A., Tokas, R., Kumar, D.V.: Facial expression recognition using DCT, Gabor and Wavelet feature extraction techniques. Int. J. Eng. Innovative Technol. 2(1), 92–95 (2012)
Google Scholar
Cao, J.: Chinese pronunciation: the complete guide for beginner. https://www.digmandarin.com/chinese-pronunciation-guide.html
Dakin, S.C., Watt, R.J.: Biological “bar codes” in human faces. J. Vis. 9(4), 2.1–10 (2009)
Article Google Scholar
Han, J., Zhang, D., Hu, X., Guo, L., Ren, J., Wu, F.: Background prior-based salient object detection via deep reconstruction residual. IEEE Trans. Circ. Syst. Video Technol. 25(8), 1309–1321 (2014)
Google Scholar
Huang, W.: Character-level convolutional network for text classification applied to Chinese corpus (2016)
Google Scholar
Hursig, R.E., Zhang, J.X., Kam, C.: Lip localization algorithm using Gabor filters (2011)
Google Scholar
Petridis, S., Wang, Y., Li, Z., Pantic, M.: End-to-end multi-view lipreading. In: British Machine Vision Conference, London, September 2017
Google Scholar
Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Sterpu, G., Harte, N.: Towards lipreading sentences with active appearance models. arXiv preprint arXiv:1805.11688 (2018)
Sujatha, B., Santhanam, T.: A novel approach integrating geometric and gabor wavelet approaches to improvise visual lip-reading. Int. J. Soft Comput. 5, 13–18 (2010)
Article Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Wand, M., Koutník, J., Schmidhuber, J.: Lipreading with long short-term memory. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6115–6119. IEEE (2016)
Google Scholar
Weng, X.: On the importance of video action recognition for visual lipreading. arXiv preprint arXiv:1903.09616 (2019)
Zhang, X., Gong, H., Dai, X., Yang, F., Liu, N., Liu, M.: Understanding pictograph with facial features: end-to-end sentence-level lip reading of Chinese (2019)
Google Scholar
Zhou, Z., Zhao, G., Hong, X., Pietikäinen, M.: A review of recent advances in visual speech decoding. Image Vis. Comput. 32(9), 590–605 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported by XJTLU Grant RDF 16-01-35, and partially funded by the Research Institute of Big Data Analytics.

Author information

Authors and Affiliations

Research Institute of Big Data Analytics (RIBDA), Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Yan Xu, Yuexuan Li & Andrew Abel

Authors

Yan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuexuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Abel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Abel .

Editor information

Editors and Affiliations

University of Strathclyde, Glasgow, UK
Jinchang Ren
Edinburgh Napier University, Edinburgh, UK
Amir Hussain
Guangdong Polytechnic Normal University, Guangzhou, China
Huimin Zhao
Xi’an Jiaotong-Liverpool University, Suzhou, China
Kaizhu Huang
Northwestern Polytechnical University, Xi'an, China
Jiangbin Zheng
Guangdong Polytechnic Normal University, Guangzhou, China
Jun Cai
Guangdong Polytechnic Normal University, Guangzhou, China
Rongjun Chen
Guangdong Polytechnic Normal University, Guangzhou, China
Yinyin Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y., Li, Y., Abel, A. (2020). Gabor Based Lipreading with a New Audiovisual Mandarin Corpus. In: Ren, J., et al. Advances in Brain Inspired Cognitive Systems. BICS 2019. Lecture Notes in Computer Science(), vol 11691. Springer, Cham. https://doi.org/10.1007/978-3-030-39431-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-39431-8_16
Published: 01 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39430-1
Online ISBN: 978-3-030-39431-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics