Skip to main content

Advertisement

Log in

Automatic lipreading based on optimized OLSDA and HMM

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Automatic visual lipreading, an efficient and convenient way of human–machine interaction, recognizes the content of the conversation from dynamic visual features of the speakers. Automatic lipreading based on acoustic speech alone can effectively prevent interference in complex environment, particularly under noisy conditions. In this paper, we propose novel visual extraction and HMM classification methods for automatic lipreading system, which reduce the dimension by using locally sensitive discriminant analysis algorithm and quantitative cluster by K-means algorithm. A model-based hybrid feature extraction method is proposed by optimizing the weight matrix of the LSDA algorithm. The effectiveness of the suggested approach is demonstrated by preliminary experiments on the English video database. Experimental results demonstrate that the proposed optimized algorithm can increase recognition rate up to 97%, which is 18% higher than the original algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7.
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

Enquiries about data availability should be directed to the authors.

References

  • Abdelaziz AH, Zeiler S, Kolossa D (2015) Learning dynamic stream weights for coupled-HMM-based audio-visual speech recognition. IEEE/ACM transactions on audio, speech, and language processing. IEEE J Mag 23:863–876

    Google Scholar 

  • Baum LE, Eagon JA (1967) An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull Am Math Soc 73(3):360–363. https://doi.org/10.1090/S0002-9904-1967-11751-8

    Article  MathSciNet  MATH  Google Scholar 

  • Baum LE, Sell G (1968) Growth transformations for functions on manifolds. Pacific J Math 27(2):211–227. https://doi.org/10.2140/pjm.1968.27.211

    Article  MathSciNet  MATH  Google Scholar 

  • Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171. https://doi.org/10.1214/aoms/1177697196

    Article  MathSciNet  MATH  Google Scholar 

  • Baum LE (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III (Proc. 3rd Symp. Univ. Calif. Los Angeles Calif. 1969; dedicated to the memory of Theodore S. Motzkin), New York: Academic, 1972, pp 1–8

  • Costilla-Reyes O, Scully P, Ozanyan KB (2016) Temporal pattern recognition in gait activities recorded with a footprint imaging sensor system. IEEE Sens J 16(24):8815–8822. https://doi.org/10.1109/JSEN.2016.2583260

    Article  Google Scholar 

  • Fernandez-Lopez A, Martinez O, Sukno F M, et al. (2017) Towards estimating the upper bound of visual-speech recognition: The visual lip-reading feasibility database[C]. In: Proceedings of the 2017. 12th IEEE international conference on automatic face & gesture recognition (FG 2017), Washington, DC, USA, May 30-June 3, 2017. New York: IEEE, 2017 pp 208–205

  • Gatteschi V, Lamberti F, Montuschi P, Sanna A (2016) Semantics-based intelligent human-computer interaction. IEEE Intell Syst 31(4):11–21. https://doi.org/10.1109/MIS.2015.97

    Article  Google Scholar 

  • Gurban M, Thiran JP (2009) Information theoretic feature extraction for audio-visual speech recognition. IEEE Trans Signal Process 57(12):4765–4776

    Article  MathSciNet  Google Scholar 

  • Gutierrez A, Robert Z (2017) Lip reading word classification. Comput Vision-ACCV 2017

  • Hong X, Yao H, Wan Y, et al. (2006) A PCA based visual DCT feature extraction method for lip-reading. In: Proceedings of the intelligent information hiding and multimedia signal processing, 2006, pp 321–326

  • Jun HE, Zhang H (2009) LDA based feature extraction method in DCT domain in lip reading. Comput Eng Appl 45(32):150–155

    Google Scholar 

  • Kulkarni RH, Padmanabham P (2017) Integration of artificial intelligence activities in software development processes and measuring effectiveness of integration. IET Softw 11(1):18–26. https://doi.org/10.1049/iet-sen.2016.0095

    Article  Google Scholar 

  • Lin B-S, Yao Y-H, Liu C-F, Lien C-F, Lin B-S (2017) Development of novel lip-reading recognition algorithm. IEEE Access 5:794–801. https://doi.org/10.1109/ACCESS.2017.2649838

    Article  Google Scholar 

  • Morade SS, Patnaik S (2014) A novel lip reading algorithm by using localized ACM and HMM: tested for digit recognition. Optik 125(18):5181–5186. https://doi.org/10.1016/j.ijleo.2014.05.011

    Article  Google Scholar 

  • Morade SS, Patnaik S (2015) Comparison of classifiers for lip reading with CUAVE and TULIPS database. Optik 126(24):5753–5761. https://doi.org/10.1016/j.ijleo.2015.08.192

    Article  Google Scholar 

  • Potamianos G, Graf HP, Cosatto E (1998) An image transform approach for HMM based automatic lip reading. ICIP, pp 173–177

  • Puviarasan N, Palanivel S (2011) Lip reading of hearing impaired persons using HMM. Expert Syst Appl 38:4477–4481

    Article  Google Scholar 

  • Rahmani MH, Almasganj F (2017) Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features[C]. In: Proceedings of the 2017 3rd international conference on pattern recognition and image analysis (IPRIA), Shahrekord, Iran, April 19–20,2017. New York: IEEE, 2017 pp 195–199

  • Rekik A, Ben-Hamadou A, Mahdi W et al (2016) An adaptive approach for lip-reading using image and depth data. Multimed Tools Appl 75(14):8609–8636

    Article  Google Scholar 

  • Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  • Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theory 13(2):260–269. https://doi.org/10.1109/TIT.1967.1054010

    Article  MATH  Google Scholar 

  • Wang SL, Liew AWC, Lau WH, Leung SH (2008) An automatic lip reading system for spoken digits with limited training data. IEEE Trans Circuits Syst Video Technol 18(12):1760–1765. https://doi.org/10.1109/TCSVT.2008.2004924

    Article  Google Scholar 

  • Wu D, Ruan Q (2014) Lip reading based on cascade feature extraction and HMM. ICSP Proceedings, pp 1306–1310

  • Zhou Z, Zhao G, Pietikainen M, et al. (2011) Towards a practical lip reading system[C]. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition (CVPR), Colorado Springs, CO, USA, June 20–25, 2011. New York: CVPR, 2011 pp 137–144

Download references

Acknowledgements

The research was supported by the National Natural Science Foundation of China (61971007, 61571013) and by the Beijing Natural Science Foundation of China (4143061).

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyao Lu.

Ethics declarations

Conflict of interest

We all declare that we have no conflict of interest in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Y., Gu, K. & Cai, Y. Automatic lipreading based on optimized OLSDA and HMM. Soft Comput 26, 4141–4150 (2022). https://doi.org/10.1007/s00500-022-06864-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-06864-9

Keywords

Navigation