Skip to main content
Log in

Review on research progress of machine lip reading

  • Survey
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Machine lip reading recognizes text content through the speaker's lip motion information. Lip reading has significant research and application value. With the continuous breakthrough of deep learning technology, lip reading research is also developing rapidly, and researchers have published many related studies. This paper studies the development of lip reading in detail, especially the latest research results of lip reading. We focus on the lip reading datasets and their comparison, including some recently released datasets. At the same time, we introduce the feature extraction methods of lip reading and compare various methods in detail. Finally, the future development direction of lip reading is discussed and prospected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Mcgurk, H., Macdonald, J.: Hearing lips and seeing voices. Nature 264(5588), 746–748 (1976)

    Article  Google Scholar 

  2. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003)

    Article  Google Scholar 

  3. Petajan, E.D.: Automatic lipreading to enhance speech recognition (speech reading). In: University of Illinois at Urbana-Champaign, 1984.

  4. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77, 257–268 (1989)

    Article  Google Scholar 

  5. Neti, C.: Audio-visual speech recognition. In: Clsp Workshop, vol 2000.

  6. Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02166-7

    Article  Google Scholar 

  7. Li, L., Jiadi, Y., Chen, Y., Liu, H., Zhu, Y.M., Kong, L., Li, M.: lip reading-based user authentication through acoustic sensing on smartphones. IEEE/ACM Trans Netw 27(1), 447–460 (2019)

    Article  Google Scholar 

  8. Mathulaprangsan, S., Wang, C.-Y., Kusum A.Z., Tai, T.-C., Wang, J.-C.: A survey of visual lip reading and lip-password verification. In: 2015 International Conference on Orange Technologies (ICOT), vol 2015.

  9. Ding, R., Pang, C., Liu, H.: Audio-Visual Keyword Spotting Based on Multidimensional Convolutional Neural Network. In: 2018 25th IEEE International Conference on Image Processing (ICIP), vol 2018.

  10. Zhang, Y., Liang, S., Yang, S., Liu, X., Wu, Z., Shan, S., Chen, X.: Unified context network for robust active speaker detection. In: ACM Multimedia 2021, vol 2021.

  11. Stafylakis, T., Tzimiropoulos, G.: Zero-Shot Keyword Spotting for Visual Speech Recognition in-the-Wild. Springer, Cham (2018)

    Book  Google Scholar 

  12. Yao, Y., Wang, T., Du, H., Zheng, L., Gedeon, T.D.: Spotting visual keywords from temporal sliding windows. In: 2019 International Conference on Multimodal Interaction, vol 2019.

  13. Huang, X., Wang, M., Gong, M.: Fine-grained talking face generation with video reinterpretation. Vis. Comput. 37(1), 95–105 (2020)

    Article  Google Scholar 

  14. Fang, Z., Liu, Z., Liu, T., Hung, C.C., Feng, G.: Facial expression GAN for voice-driven face generation. Vis. Comput. 38(3), 1151–1164 (2021)

    Article  Google Scholar 

  15. Mirzaei, M.R., Ghorshi, S., Mortazavi, M.: Audio-visual speech recognition techniques in augmented reality environments. Vis. Comput. 30(3), 245–257 (2014)

    Article  Google Scholar 

  16. Fernandez-Lopez, A., Sukno, F.M.: Survey on automatic lip-reading in the era of deep learning. Image Vis. Comput. 78, 53–72 (2018)

    Article  Google Scholar 

  17. Hao, M., Mamut, M., Yadikar, N., Aysa, A., Ubul, K.: A survey of research on lipreading technology. IEEE Access 8, 204518–204544 (2020)

    Article  Google Scholar 

  18. Oghbaie, M., Sabaghi, A., Hashemifard, K., Akbari M.: Advances and Challenges in Deep Lip Reading. arXiv preprint arXiv:2110.07879 (2021).

  19. Anina, I., Zhou, Z., Zhao, G., Pietikainen, M.: OuluVS2: a multi-view audiovisual database for non-rigid mouth motion analysis. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol 2015.

  20. Chung, J.S., Zisserman, A. Lip reading in the wild. In: Asian Conference on Computer Vision 2016.

  21. Fox, N.A., O’Mullane, B.A., Reilly, R.B.: VALID: A New Practical Audio-Visual Database, and Comparative Results. Springer, Berlin, Heidelberg (2005)

    Google Scholar 

  22. Movellan, J.R. Visual speech recognition with stochastic networks. In: Advances in Neural Information Processing Systems 7, [NIPS Conference, Denver, Colorado, USA, 1994], vol 1994.

  23. Vanegas, O., Tokuda, K., Kitamura, T.: Location normalization of HMM-based lip-reading: experiments for the M2VTS database. In: International Conference on Image Processing, vol 1999

  24. Yanjun, X., Limin, D., Guoqiang, L., Xin, Z., Zhi, Z.: Chinese auditory visual bimodal database CAVSR1.0. Acta Acoust. A Sinica 25(1), 8 (2000)

    Google Scholar 

  25. Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)

    Article  Google Scholar 

  26. Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.N.: CUAVE: a new audio-visual database for multimodal human-computer interface research. In: IEEE International Conference on Acoustics, vol 2002

  27. Hazen, T.J., Saenko, K., La, C.H., Glass, J.R.: A segment-based audio-visual speech recognizer: data collection, development, and initial experiments. In: International Conference on Multimodal Interfaces, vol 2004

  28. Fox, N.A.: VALID: a new practical audio-visual database, and comparative results. (2005)

  29. Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421 (2006)

    Article  Google Scholar 

  30. Cox, S., Harvey, R., Lan, Y.: The challenge of multispeaker lip-reading. In: Proc of International Conference on Auditory-visual Speech Processing, vol 2008.

  31. Zhao, G., Barnard, M., Pietikainen, M.: Lipreading With Local Spatiotemporal Descriptors. IEEE Trans. Multimedia 11(7), 1254–1265 (2009)

    Article  Google Scholar 

  32. Chung, J.s., Senior, A.W., Vinyals, O., Zisserman, A.: Lip reading sentences in the Wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 2017.

  33. Afouras, T., Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Deep audio-visual speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2018). https://doi.org/10.1109/TPAMI.2018.2889052

    Article  Google Scholar 

  34. Afouras, T., Chung, J.S., Zisserman, A.: LRS3-TED: a large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496, 2018 (2018)

  35. Yang, S., Zhang, Y., Feng, D., Yang, M., Wang, C., Xiao, J., Long, K., Shan, S. and Chen, X.: LRW-1000: a naturally-distributed large-scale benchmark for lip reading in the Wild. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), vol

  36. Makino, T., Liao, H., Assael, Y,, Shillingford, B., Siohan, O.: Recurrent neural network transducer for audio-visual speech recognition. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), vol 2019.

  37. Zhao, Y., Xu, R., Song, M.: A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading. In: MMAsia '19: ACM Multimedia Asia, vol 2019.

  38. Prajwal, K.R., Mukhopadhyay, R., Namboodiri, V., Jawahar, C.V.: Learning individual speaking styles for accurate lip to speech synthesis. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 2020.

  39. Chen, X., Du, J., Zhang, H.: Lipreading with DenseNet and resBi-LSTM. Signal Image Video Process 14(5), 981–989 (2020)

    Article  Google Scholar 

  40. Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiye, V.A., Nurpeiissov, M., Varol, H.A.: A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. arXiv preprint arXiv:2009.10334 2021.

  41. Egorov, E., Kostyumov, V., Konyk, M., & Kolesnikov, S.: LRWR: large-scale benchmark for lip reading in Russian language. arXiv preprint arXiv:2109.06692 2021.

  42. Lubitz, A., Valdenegro-Toro, M., Kirchner, F.: The VVAD-LRS3 Dataset for Visual Voice Activity Detection. arXiv preprint arXiv:2109.13789 (2021).

  43. Messer, K.: XM2VTSDB: the extended m2vts database. Proc. intl. conf. on Audio & Video Based Biometric Person Authentication 1999.

  44. Sanderson, C.: The VidTIMIT database. idiap communication 2004.

  45. Bailly-Bailliére, E., Bengio, S., Thiran, J. P.: The BANCA database and evaluation protocol. In: International Conference on Audio-& Video-based Biometric Person Authentication, vol 2003.

  46. Lee, B., Hasegawajohnson, M., Goudeseune, C., Kamdar, S., Borys, S., Liu, M., Huang, T.: AVICAR: audio-visual speech corpus in a car environment. In: Conf Spoken Language, Jeju, Korea, vol 2011.

  47. Jing, H., Potamianos, G., Connell, J., Neti, C.: Audio-visual speech recognition using an infrared headset. Speech Commun. 44(1–4), 83–96 (2004)

    Google Scholar 

  48. Lucey, P.J., Potamianos, G., Sridharan, S.: Patch-based analysis of visual speech from multiple views. (2008)

  49. Mccool, C., Levy, C., Matrouf, D., Bonastre, J.F., Tresadern, P., Cootes T., Marcel, S., Hadid, A., Pietikainen, M., Matejka, P.: Bi-modal person recognition on a mobile phone: using mobile phone data. In: 2012 IEEE International Conference on Multimedia and Expo Workshops, vol 2012.

  50. Rekik, A., Ben-Hamadou, A., Mahdi, W.: A new visual speech recognition approach for rgb-d cameras. In: Campilho, A., Kamel, M. (eds.) International Conference on Image Analysis & Recognition. Springer, Cham (2014)

    Google Scholar 

  51. Laea, B., Tqa, A., Sso, A.: An Arabic visual dataset for visual speech recognition. Procedia Computer Sci. 163, 400–409 (2019)

    Article  Google Scholar 

  52. Liu, M., Wang, L., Lee, K.A., Zhang, H., Zeng, C., Dang, J.: Exploring deep learning for joint audio-visual lip biometrics. arXiv preprint arXiv:2104.08510 2021.

  53. Abdrakhmanova, M., Kuzdeuov, A., Jarju, S., Khassanov, Y., Varol, H.A.: SpeakingFaces: a large-scale multimodal dataset of voice commands with visual and thermal video streams. Sensors 21(10), 3465 (2021)

    Article  Google Scholar 

  54. Chuanzhen, R., Zhenjun, Y., Yongxing, J., Yuan, W., Yu, Y.: Research progress on key technologies of lip recognition. Data acquisition and processing S2): 7, (2012).

  55. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2(3), 141–151 (2000)

    Article  Google Scholar 

  56. Li, M., Cheung, Y.M.: A novel motion based lip feature extraction for lip-reading. In: International Conference on Computational Intelligence & Security, vol 2008.

  57. Alizadeh, S., Boostani, R., Asadpour, V.: Lip feature extraction and reduction for HMM-based visual speech recognition systems. In: Signal Processing, 2008. ICSP 2008. 9th International Conference on, vol 2008.

  58. Ma, X., Yan, L., Zhong, Q. Lip Feature Extraction Based on Improved Jumping-Snake Model. In: Control Conference (pp. 6928–6933). IEEE, vol

  59. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. IJCV 1(4), 321–331 (1988)

    Article  MATH  Google Scholar 

  60. Timothy, F.: Active shape models-their training and application. Computer Vis Understand 61(1995).

  61. Chen, J., Tiddeman, B., Zhao, G.: Real-Time Lip Contour Extraction and Tracking Using an Improved Active Contour Model. Springer, Cham (2008)

    Book  Google Scholar 

  62. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. Springer, Berlin, Heidelberg (1998)

    Book  Google Scholar 

  63. Lan, Y., Theobald, B.J., Harvey, R.: View independent computer lip-reading. IEEE Computer Soc (2012)

  64. Lan, Y., Harvey, R., Theobald, B.J.: Insights into machine lip reading. In: IEEE International Conference on Acoustics, vol 2012.

  65. Watanabe, T., Katsurada, K., Kanazawa, Y.: Lip Reading from Multi View Facial Images Using 3D-AAM. 2017.

  66. Aleksic, P.S., Katsaggelos, A.K.: Audio-visual biometrics. Proc. IEEE 94, 2025–2044 (2006)

    Article  Google Scholar 

  67. Stillittano, S., Girondel, V., Caplier, A.: Lip contour segmentation and tracking compliant with lip-reading application constraints. Mach. Vis. Appl. 24(1), 1–18 (2013)

    Article  Google Scholar 

  68. Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Lipreading using convolutional neural network. Made available by the northern territory library via the publications act 2014.

  69. Garg, A., Noyola, J., Bagadia, S.: Lip reading using CNN and LSTM. InStanford University, 2016.

  70. Lee, D., Lee, J., Kim, K.E.: Multi-view Automatic Lip-Reading Using Neural Network. Asian Conference on Computer Vision 2017.

  71. Nakadai, K.O., Hiroshi, G., Ogata, T., Noda, K., Yamaguchi. Audio-visual speech recognition using deep learning. Applied Intelligence the International Journal of Artificial Intelligence Neural Networks & Complex Problem Solving Technologies 2015.

  72. Zhou, P., Yang, W., Chen, W., Wang, Y., Jia, J.: Modality attention for end-to-end audio-visual speech recognition. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 2019.

  73. Saitoh, T., Zhou, Z., Zhao, G., Pietikäinen, M.: Concatenated frame image based CNN for visual speech recognition. In: Asian Conference on Computer Vision, vol 2016.

  74. Lin, M., Chen, Q., Yan, S.: Network in network. Computer Science 2013.

  75. Mesbah, A., Berrahou, A., Hammouchi, H., Berbia, H., Qjidaa, H., Daoudi M.: Lip Reading with Hahn Convolutional Neural Networks. Image and Vision Computing 2019.

  76. Assael, Y.M., Shillingford, B., Whiteson, S., Freitas, N.D.: LipNet: sentence-level lipreading. 2016.

  77. Fung, I., Mak, B.: End-to-end low-resource lip-reading with Maxout CNN and LSTM. 2511–2515, (2018)

  78. Xu, K., Li, D., Cassimatis, N., Wang, X.: LCANet: End-to-end lipreading with Cascaded Attention-CTC. (2018)

  79. Weng, X., Kitani, K.: Learning spatio-temporal features with two-stream deep 3D CNNs for lipreading. In: The 30th British Machine Vision Conference (2019), vol 2019

  80. Wiriyathammabhum P.: SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading. (2020)

  81. Stafylakis, T., Khan, M.H., Tzimiropoulos, G.: Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs. Computer Vision & Image Understanding 2018.

  82. Feng, D., Yang, S., Shan, S., Chen, X.: An efficient software for building lip reading models without pains. In: 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, vol 2021.

  83. Afouras, T., Chung, J.S., Andrew, Z.: My lips are concealed: Audio-visual speech enhancement through obstruction. arXiv preprint arXiv:1907.04975 (2019)

  84. Xu, B., Lu, C., Guo, Y., Wang, J.: Discriminative Multi-modality Speech Recognition. (2020)

  85. Luo, M., Yang, S., Shan, S., Chen, X.: Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading. In: IEEE FG, vol 2020.

  86. Xiao, J., Yang, S., Zhang, Y., Shan, S., Chen, X.: Deformation flow based two-stream network for lip reading. In: IEEE FG, vol 2020.

  87. Zhao, X., Yang, S., Shan, S., Chen, X.: Mutual information maximization for effective lip reading. IEEE FG 2020.

  88. Petridis, S., Stafylakis, T., Ma, P., Cai, F., Pantic, M.: End-to-end audiovisual speech recognition. In: IEEE International Conference on Acoustics,vol 2018.

  89. Petridis, S., Li, Z., Pantic, M.: End-to-end visual speech recognition with LSTMs. In: ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 2017.

  90. Petridis, S., Wang, Y., Li, Z., Pantic, M.: End-to-end multi-view lipreading. In: British Machine Vision Conference 2017, vol 2017.

  91. Petridis, S., Jie, S., Cetin, D., Pantic, M.: Visual-only recognition of normal, whispered and silent speech. In: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),vol 2018.

  92. Rahmani, M.H., Almasganj, F.: Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features. In: 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA), vol 2017.

  93. Wand, M., Schmidhuber, J.: Improving speaker-independent lipreading with domain-adversarial training. Interspeech 2017, (2017)

  94. Wand, M., Schmidhuber, J., Vu, N.T.: Investigations on end- to-end audiovisual fusion. In: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 2018.

  95. Moon, S., Kim, S., Wang, H.: Multimodal transfer deep learning with applications in audio-visual recognition. (2014)

  96. Chung, J.S., Andrew, Z.: Out of time: automated lip sync in the wild. In: Asian Conference on Computer Vision, vol 2017.

  97. Chung, J.S., Zisserman, A.: Learning to lip read words by watching videos. Computer Vis. Image Understand. 173, 76–85 (2018)

    Article  Google Scholar 

  98. Oliveira, D., Mattos, A.B., Morais, E.: Improving viseme recognition with GAN-based muti-view mapping. In: International Conference on Automatic Face and Gesture Recognition, vol

  99. Jha, A., Namboodiri, V.P., Jawahar, C.V.: Word spotting in silent lip videos. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), vol 2018.

  100. Zhao, Y., Xu, R., Wang, X., Hou, P., Tang, H., Song, M.: Hearing lips: improving lip reading by distilling speech recognizers. Proceedings of the AAAI Conference on Artificial Intelligence, 2020.

  101. Zhang, X., Gong, H., Dai, X., Yang, F., Liu, M.: Understanding pictograph with facial features: end-to-end sentence-level lip reading of Chinese. Proc. AAAI Conf. Artific. Intell. 33, 9211–9218 (2019)

    Google Scholar 

  102. Assael, Y.M., Shillingford, B., Whiteson, S., Freitas, N.D.: LipNet: End-to-End Sentence-level Lipreading. arXiv preprint arXiv:1611.01599 2016.

  103. Torfi, A., Iranmanesh, S.M., Nasrabadi, N., Dawson, J.: 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition. IEEE Access 99, 1–1 (2017)

    Google Scholar 

  104. Shillingford, B., Assael, Y., Hoffman, M.W., Paine, T., Freitas, N.D.: Large-Scale Visual Speech Recognition. In: Interspeech 2019, vol 2019.

  105. Kumar, Y., Jain, R., Salik, K.M., Shah, R.R., Yin, Y., Zimmermann, R.: Lipper: synthesizing thy speech using multi-view lipreading. Proc. AAAI Conf. Artific. Intell. 33, 2588–2595 (2019)

    Google Scholar 

  106. Kai, X., Li, D., Cassimatis, N., Wang, X.: LCANet: End-to-end lipreading with cascaded attention-CTC. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018),vol 2018.

  107. Liu, J., Ren, Y., Zhao, Z., Zhang, C., Yuan, J. FastLR: non-autoregressive lipreading model with integrate-and-fire. (2020)

  108. Themos, S., Georgios T.: Combining residual networks with LSTMs for lipreading. Interspeech 2017.

  109. Stafylakis, T., Tzimiropoulos, G.: Deep word embeddings for visual speech recognition. IEEE 2017.

  110. Petridis, S., Stafylakis, T., Ma, P., Tzimiropoulos, G., Pantic M.: Audio-Visual speech recognition with a hybrid CTC/attention architecture. In: 2018 IEEE Spoken Language Technology Workshop (SLT), vol 2018.

  111. Sterpu, G., Saam, C., Harte, N.: Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition. 2018.

  112. Chenhao W. Multi-grained spatio-temporal modeling for lip-reading. InThe 30th British Machine Vision Confer-ence (2019), vol 2019

  113. Sterpu, G., Saam, C., Harte N.: Should we hard-code the recurrence concept or learn it instead? Exploring the Transformer architecture for Audio-Visual Speech Recognition. arXiv preprint arXiv:2005.09297 (2020)

  114. Ma, P., Petridis, S., Pantic M.: End-to-end Audio-visual Speech Recognition with Conformers. (2021)

  115. Ma, P., Martinez, B., Petridis, S., Pantic M.:Towards practical lipreading with distilled and efficient models. ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

  116. Tamura, S., Seko, T., Hayamizu, S.: Integration of deep bottleneck features for audio-visual speech recognition. In: Proceedings of the Sixteenth annual conference of the international speech communication association, pp 1–6, (2014)

  117. Wand, M., Koutník, J., Schmidhuber, J.: lipreading with long short-term memory. In: IEEE International Conference on Acoustics, vol 2016

  118. Petridis, S., Wang, Y., Li, Z., Pantic M.: End-to-end audiovisual fusion with LSTMs. In: International Conference on Auditory-visual Speech Processing, vol 2017

Download references

Funding

This study was funded by the Scientific Research Key Project of Hebei Provincial Department of Education (Grant No.ZD2020161) and the Natural Science Foundation of Hebei Province (Grant No.F2021409007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huijuan Wang.

Ethics declarations

Conflict of interest

Author Gangqiang Pu declares that he has no conflict of interest. Author Huijuan Wang declares that she has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pu, G., Wang, H. Review on research progress of machine lip reading. Vis Comput 39, 3041–3057 (2023). https://doi.org/10.1007/s00371-022-02511-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02511-4

Keywords

Navigation