Automatic lipreading based on optimized OLSDA and HMM

Lu, Yuanyao; Gu, Ke; Cai, Ying

doi:10.1007/s00500-022-06864-9

Automatic lipreading based on optimized OLSDA and HMM

Foundations
Published: 01 March 2022

Volume 26, pages 4141–4150, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yuanyao Lu¹,
Ke Gu¹ &
Ying Cai¹

143 Accesses
3 Citations
Explore all metrics

Abstract

Automatic visual lipreading, an efficient and convenient way of human–machine interaction, recognizes the content of the conversation from dynamic visual features of the speakers. Automatic lipreading based on acoustic speech alone can effectively prevent interference in complex environment, particularly under noisy conditions. In this paper, we propose novel visual extraction and HMM classification methods for automatic lipreading system, which reduce the dimension by using locally sensitive discriminant analysis algorithm and quantitative cluster by K-means algorithm. A model-based hybrid feature extraction method is proposed by optimizing the weight matrix of the LSDA algorithm. The effectiveness of the suggested approach is demonstrated by preliminary experiments on the English video database. Experimental results demonstrate that the proposed optimized algorithm can increase recognition rate up to 97%, which is 18% higher than the original algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Visual Speech Recognition with Selected Boundary Descriptors

Data Availability

Enquiries about data availability should be directed to the authors.

References

Abdelaziz AH, Zeiler S, Kolossa D (2015) Learning dynamic stream weights for coupled-HMM-based audio-visual speech recognition. IEEE/ACM transactions on audio, speech, and language processing. IEEE J Mag 23:863–876
Google Scholar
Baum LE, Eagon JA (1967) An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull Am Math Soc 73(3):360–363. https://doi.org/10.1090/S0002-9904-1967-11751-8
Article MathSciNet MATH Google Scholar
Baum LE, Sell G (1968) Growth transformations for functions on manifolds. Pacific J Math 27(2):211–227. https://doi.org/10.2140/pjm.1968.27.211
Article MathSciNet MATH Google Scholar
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171. https://doi.org/10.1214/aoms/1177697196
Article MathSciNet MATH Google Scholar
Baum LE (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III (Proc. 3rd Symp. Univ. Calif. Los Angeles Calif. 1969; dedicated to the memory of Theodore S. Motzkin), New York: Academic, 1972, pp 1–8
Costilla-Reyes O, Scully P, Ozanyan KB (2016) Temporal pattern recognition in gait activities recorded with a footprint imaging sensor system. IEEE Sens J 16(24):8815–8822. https://doi.org/10.1109/JSEN.2016.2583260
Article Google Scholar
Fernandez-Lopez A, Martinez O, Sukno F M, et al. (2017) Towards estimating the upper bound of visual-speech recognition: The visual lip-reading feasibility database[C]. In: Proceedings of the 2017. 12th IEEE international conference on automatic face & gesture recognition (FG 2017), Washington, DC, USA, May 30-June 3, 2017. New York: IEEE, 2017 pp 208–205
Gatteschi V, Lamberti F, Montuschi P, Sanna A (2016) Semantics-based intelligent human-computer interaction. IEEE Intell Syst 31(4):11–21. https://doi.org/10.1109/MIS.2015.97
Article Google Scholar
Gurban M, Thiran JP (2009) Information theoretic feature extraction for audio-visual speech recognition. IEEE Trans Signal Process 57(12):4765–4776
Article MathSciNet Google Scholar
Gutierrez A, Robert Z (2017) Lip reading word classification. Comput Vision-ACCV 2017
Hong X, Yao H, Wan Y, et al. (2006) A PCA based visual DCT feature extraction method for lip-reading. In: Proceedings of the intelligent information hiding and multimedia signal processing, 2006, pp 321–326
Jun HE, Zhang H (2009) LDA based feature extraction method in DCT domain in lip reading. Comput Eng Appl 45(32):150–155
Google Scholar
Kulkarni RH, Padmanabham P (2017) Integration of artificial intelligence activities in software development processes and measuring effectiveness of integration. IET Softw 11(1):18–26. https://doi.org/10.1049/iet-sen.2016.0095
Article Google Scholar
Lin B-S, Yao Y-H, Liu C-F, Lien C-F, Lin B-S (2017) Development of novel lip-reading recognition algorithm. IEEE Access 5:794–801. https://doi.org/10.1109/ACCESS.2017.2649838
Article Google Scholar
Morade SS, Patnaik S (2014) A novel lip reading algorithm by using localized ACM and HMM: tested for digit recognition. Optik 125(18):5181–5186. https://doi.org/10.1016/j.ijleo.2014.05.011
Article Google Scholar
Morade SS, Patnaik S (2015) Comparison of classifiers for lip reading with CUAVE and TULIPS database. Optik 126(24):5753–5761. https://doi.org/10.1016/j.ijleo.2015.08.192
Article Google Scholar
Potamianos G, Graf HP, Cosatto E (1998) An image transform approach for HMM based automatic lip reading. ICIP, pp 173–177
Puviarasan N, Palanivel S (2011) Lip reading of hearing impaired persons using HMM. Expert Syst Appl 38:4477–4481
Article Google Scholar
Rahmani MH, Almasganj F (2017) Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features[C]. In: Proceedings of the 2017 3rd international conference on pattern recognition and image analysis (IPRIA), Shahrekord, Iran, April 19–20,2017. New York: IEEE, 2017 pp 195–199
Rekik A, Ben-Hamadou A, Mahdi W et al (2016) An adaptive approach for lip-reading using image and depth data. Multimed Tools Appl 75(14):8609–8636
Article Google Scholar
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Article Google Scholar
Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theory 13(2):260–269. https://doi.org/10.1109/TIT.1967.1054010
Article MATH Google Scholar
Wang SL, Liew AWC, Lau WH, Leung SH (2008) An automatic lip reading system for spoken digits with limited training data. IEEE Trans Circuits Syst Video Technol 18(12):1760–1765. https://doi.org/10.1109/TCSVT.2008.2004924
Article Google Scholar
Wu D, Ruan Q (2014) Lip reading based on cascade feature extraction and HMM. ICSP Proceedings, pp 1306–1310
Zhou Z, Zhao G, Pietikainen M, et al. (2011) Towards a practical lip reading system[C]. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition (CVPR), Colorado Springs, CO, USA, June 20–25, 2011. New York: CVPR, 2011 pp 137–144

Download references

Acknowledgements

The research was supported by the National Natural Science Foundation of China (61971007, 61571013) and by the Beijing Natural Science Foundation of China (4143061).

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

School of Information Science and Technology, North China University of Technology, Beijing, 100144, China
Yuanyao Lu, Ke Gu & Ying Cai

Authors

Yuanyao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ke Gu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanyao Lu.

Ethics declarations

Conflict of interest

We all declare that we have no conflict of interest in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Y., Gu, K. & Cai, Y. Automatic lipreading based on optimized OLSDA and HMM. Soft Comput 26, 4141–4150 (2022). https://doi.org/10.1007/s00500-022-06864-9

Download citation

Accepted: 30 January 2022
Published: 01 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00500-022-06864-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic lipreading based on optimized OLSDA and HMM

Abstract

Access this article

Similar content being viewed by others

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Visual Speech Recognition with Selected Boundary Descriptors

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic lipreading based on optimized OLSDA and HMM

Abstract

Access this article

Similar content being viewed by others

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Visual Speech Recognition with Selected Boundary Descriptors

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation