Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

Birara, Muluken; Gebremeskel, Gebeyehu Belay

doi:10.1007/s11042-022-12399-w

Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

Published: 19 March 2022

Volume 81, pages 24377–24397, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

303 Accesses
Explore all metrics

Abstract

The method of automatic lip motion recognition is an essential input for visual speech detection. It is a technological approach to demystify people who are hard to hear, deaf, and a challenge of silent communication in day-to-day life. However, the recognition process is a challenge in terms of pronunciation variation, speech speeds, gesture variation, color, makeup, the video quality of the camera, and the way of feature extraction. This paper proposed a solution for automatic lip motion recognition by identifying lip movements and characterizing their association with the spoken words for the Amharic language spoken using the information available in lip movements. The input video is converting into consecutive image frames. We use a Viola-Jones object detection algorithm to gain YIQ color space and apply the saturation components to detect lip images from the face area. Sobel’s edge detection and morphological image operations implement to identify and extract the exact contour of the lip. We applied ANN and SVM classifiers on averaging shape information features, and we gained 65.71% and 66.43% classification accuracies of ANN and SVM, respectively. The findings presented in the Amharic Speech Recognition is the newly introduced technology to enhance the academic and linguistic skills of hearing-problem people, health domain experts, physicians, researchers, etc. The future research work presents in the light of the findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Multimodal emotion classification using machine learning in immersive and non-immersive virtual reality

Article Open access 06 May 2024

References

Abate ST, Menzel W, Tafira B (2005) An Amharic speech corpus for large vocabulary continuous speech recognition. Proceede9th Eur Conference Speech Comm Technol (January):1601–1604
Acharya, T., & Ray, A. K. (2005). Image Process: Principl Appl (Vol. 15). A JOHN WILEY & SONS, MC., PUBLICATION. https://doi.org/10.1117/1.2348895
Assefa D (2006) Amharic speech training for the deaf Addis Abeba
Aybar E (2006) Sobel edge detection method for Matlab. Anadolu University, 3–7.
Badura, S., & Mokrys, M. (2015). Feature extraction for automatic lips reading system for isolated vowels. Int Virtual Sci Conf Inform Manag Sci. 96–104. Retrieved from http://ictic.sk/archive/?vid=1&aid=2&kid=50401-241
Borde P, Varpe A, Manza R, Yannawar P (2015) Recognition of isolated words using Zernike and MFCC features for audio-visual speech recognition. Int J Speech Technol 18(2):167–175. https://doi.org/10.1007/s10772-014-9257-1
Article Google Scholar
Carrie W, William SD (2020) Understanding visual lip-based biometric authentication for mobile devices. EURASIP J Inf Secur 2020(3):1–16. https://doi.org/10.1186/s13635-020-0102-6
Article Google Scholar
Bender et al (1976) Language in Ethiopia. (C. a. F. Lionel M. Bender, J. D. Bowen, R. L. Cooper, Ed.). Oxford University Press, London
Dabre K, Dholay S (2014) A machine learning model for sign language interpretation using webcam images. Int Conf Circ, Syst, Commun Inform Technol Appl, CSC ITA 2014:317–321. https://doi.org/10.1109/CSCITA.2014.6839279
Farid H (2001) Blind inverse gamma correction. IEEE Trans Image Process 10(10):1428–1433
Article Google Scholar
Gonzalez RC, Woods RE, Eddins SL (2009) Digital image processing with Matlab (second Edi). U.S: Gatesmark Publishing.
Guan X, Jian S, Hongda P, Zhiguo Z, Haibin G (2009) An image enhancement method based on gamma correction. Second Int Symp Comput Intell Design, 60–63.
Hayward K et al (1999) Amharic. In handbook of the international phonetic association: a guide to the use of the international phonetic alphabet, (3rd ed.). The University Press, Cambridge
Heracleous P, Aboutabit N, Beautemps D (2009) Lip shape and hand position fusion for automatic vowel recognition in cued speech for French. IEEE Signal ProcessLett 16(5):339–342. https://doi.org/10.1109/LSP.2009.2016011
Article Google Scholar
Ichino M (2014) Lip-movement-based speaker recognition using a fusion of canonical angles. Int Conf Control Autom Robotics Vision, ICARCV:958–963. https://doi.org/10.1109/ICARCV.2014.7064435
Jie L, Jigui S, Shengsheng W (2006) Pattern recognition: an overview. Int J Comp Sci Network Sec, IJCSNS 6(6):57–61. https://doi.org/10.5923/j.ajis.20120201.04
Article Google Scholar
Jinchang R (2012) ANN vs. SVM : which one performs better in the classification of MCCs in mammogram imaging. ELSEVIER. Knowl-Based Syst 26:144–153. https://doi.org/10.1016/j.knosys.2011.07.016
Article Google Scholar
Jinfeng Y, Zhouyu F, Tieniu T, Weiming H (2004) Skin color detection using multiple cues. 17th Int Conf Patt Recogn, ICPR’04, 1 1:632–635. https://doi.org/10.1109/ICPR.2004.1334237
Article Google Scholar
Jixin L (1998) An empirical comparison between SVMs and ANNs for speech recognition. Rutgers University, 4–7.
Kalra A (2016) A hybrid approach using Sobel and canny operator for digital image edge detection. Int Conf Micro-Electron Telecomm Eng. https://doi.org/10.1109/ICMETE.2016.49
Kim D (2013) Sobel operator and canny edge detector, 1–10
Google Scholar
Liu H (2010) Study on lipreading recognition based on computer vision. 2nd international conference on information engineering and computer science - proceedings. ICIECS 2(1). https://doi.org/10.1109/ICIECS.2010.5677823
Liu X, Cheung YM (2014) Learning multi-boosted HMMs for lip-password-based speaker verification. IEEE Trans Inform Forensics Sec 9(2):233–246. https://doi.org/10.1109/TIFS.2013.2293025
Article Google Scholar
Marathe A et al (2019) Iterative improved learning algorithm for petrographic image classification accuracy enhancement. Int J Electrical Comp Eng 9(1):289–296
Google Scholar
Najafiana M, Russell M (2020) Automatic accent identification as an analytical tool for accent robust automatic speech recognition. Elsevier, Speech Commun 122:44–55
Article Google Scholar
Nixon M, Aguado A (2008) Feature extraction and image processing. Elsevier Ltd (Second). https://doi.org/10.1016/B978-0-12-396549-3.00001-X
Park Y, West T (2011) Identifying optimal Gaussian filter for Gaussian noise removal. Third National Conf Comput Vision, Patt Recogn, Image Process, Graphic:126–129
Stavros P et al (2020) End-to-end visual speech recognition for small-scale datasets, arXiv:1904.01954 [cs.CV], pp 1–8
Javad Peymanfard et al (2021) Lip-reading using external viseme secoding, arXiv:2104.04784 [cs.CV], pp 1–6
Pironkov G et al (2020) Hybrid-task learning for robust automatic speech recognition, Elsevier. Comput Speech Lang 64(101103):1–13
Poomhiran L et al (2021) Improving the recognition performance of lip reading using the concatenated three sequence keyframe image technique, engineering. Technol Appl Sci Res 11(2):6986–6992
Article Google Scholar
Saitoh T, Konishi R (2006) Word recognition based on a two-dimensional lip motion trajectory. Int Symp Intell Signal Process Commun, ISPACS’06, 287–290. https://doi.org/10.1109/ISPACS.2006.364888
Saitoh T, Konishi R (2011) Real-time word lip-reading system based on trajectory feature. IEEJ Trans Electr Electron Eng 6(3):289–291
Article Google Scholar
Saranya G, Pravin A (2020) A comprehensive study on disease risk predictions in machine learning. Int J Electrical Comp Eng 10(4):4217–4225
Google Scholar
Sengupta S, Bhattacharya A, Desai P, Gupta A (2012) Automated lip Reading technique for password authentication. Int J Appl Inform Syst. IJAIS 4(3):18–24
Google Scholar
Sharma H, Saurav S, Singh S, Saini AK, Saini R (2015) Analyzing the impact of image scaling algorithms on the viola-jones face detection framework. IEEE:1715–1718. https://doi.org/10.1109/ICACCI.2015.7275860
Mohammed Q et al (2020) A new approach for content-based image retrieval for medical applications using low-level image descriptors. Int J Electr Comput Eng 10(4):4363–4371
Kuldeep S et al (2019) Automatic detection of rust disease of Lentilby machine learning system using microscopic images. Int J Electrical Comput Eng (IJECE) 9(1):660–666
Article Google Scholar
Sowjanya KS, Devi YAS, Sandeep K (2015) User authentication using lip movement as a password. Int J Advanc Res Comput Science and Software Engineering 5(10):456–461
Google Scholar
Swami JU, Jayasimha S (2021) Lip Reading recognition. J Emerg Technol Innov Res (JETIR) 8(5):1424–1230
Google Scholar
Talha KS, Wan K, Za’ba SK, Razlan ZM, Shahriman AB (2013) Speech analysis based on image information from lip movement. IOP Conf Series: Mat Sci Eng 53:12016. https://doi.org/10.1088/1757-899X/53/1/012016
Article Google Scholar
Teferi D, Bigun J (2007) Damascening video databases for evaluation of face tracking and recognition – the DXM2VTS database. Pattern Recogn Lett 28:2143–2156. https://doi.org/10.1016/j.patrec.2007.06.007
Article Google Scholar
Ullendorff E (1973) The Ethiopians: an introduction to the country and people, 3rd edn. Oxford University Press, London. https://doi.org/10.1080/01434639608666286
Book Google Scholar
Vazifehdan M et al (2019) A hybrid Bayesian network and tensor factorization approach for missing value imputation to improve breast cancer recurrence prediction. J King Saud Univ – Comput Inform Sci 31:175–184
Vincent OR, Folorunso O (2009) A descriptive algorithm for Sobel image edge detection. Proceed Inform Sci & IT Educ Conf (InSITE):97–107
Viola P, Jones MJ (2001) Robust real-time face detection. Eighth IEEE Int Conf Comput Vision 20(2):7695. https://doi.org/10.1109/ICCV.2001.937709
Article Google Scholar
Wang J, Green JR, Samal A, Carrell TD (2010) Vowel recognition from continuous articulatory movements for speaker-dependent applications. Int Conf Signal Process Commun Syst, ICSPCS’2010
Werda S, Mahdi W, Hamadou AB (2007) Lip localization and Viseme classification for visual speech recognition. Int J Comput Inform Sciences 5(1):62–75 Retrieved from http://arxiv.org/abs/1301.4558
Google Scholar
Yang M, Kriegman DJ, Ahuja N (2002) Detecting faces in images : a survey. IEEE Trans Pattern Anal Mach 24(1):34–58
Article Google Scholar
Yargic A, Dogan M (2013) A lip-reading application on MS Kinect camera. IEEE. https://doi.org/10.1109/INISTA.2013.6577656
Yau WC (2008) Video analysis of mouth movement using motion templates for computer-based lip Reading. RMIT University
Google Scholar
Yimama B (1997) . Ethiopia J language Literature, 1–32.
Yu D, Ghita O, Sutherland A, Whelan PF (2007) A new manifold representation for visual speech recognition. Int Mach Vision Image Process Conf (p. 210). https://doi.org/10.1109/IMVIP.2007.35
Cuesta De La, AG, Zhang J, Miller P (2008) Biometric identification using motion history images of a speaker’s lip movements. Proceedings - IMVIP 2008, 2008 international machine vision and image processing conference, 83–88. https://doi.org/10.1109/IMVIP.2008.13
Zhang Y et al (2020) Can we read speech beyond the lips? Rethinking RoI Selection for Deep Visual Speech Recognition, arXiv:2003.03206 [cs.CV], PP. 1–8
Zhao G, Barnard M, Pietik M, Member S (2009) Lipreading with local spatiotemporal descriptors. IEEE Trans Multimedia:1–11
Zheng GL, Zhu M, Feng L (2014) Review of lip-reading recognition. Int Symp Comput Intell Design, ISCID 1:293–298. https://doi.org/10.1109/ISCID.2014.110
Article Google Scholar

Download references

Acknowledgments

We would like to thanks the anonymous reviewers for their detailed review, valuable comments, and constructive suggestions. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Bahir Dar University: Bahir Dar Institute of Technology, Poly Campus, P. O. Box 37, Bahir Dar, Ethiopia
Muluken Birara & Gebeyehu Belay Gebremeskel

Authors

Muluken Birara
View author publications
You can also search for this author in PubMed Google Scholar
Gebeyehu Belay Gebremeskel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gebeyehu Belay Gebremeskel.

Ethics declarations

Conflict of interest

On this paper, entitled “Augmenting Machine Learning for Amharic Speech Recognition: A paradigm of patient’s Lips Motions Detection” has no any conflict of interest, and I hereby affirm that the contents of this Technical Paper are original. Furthermore, it has neither been published elsewhere in any language fully or partly nor is it under review for publication elsewhere. And also there is no any funding about this manuscript.

I affirm that all the authors have seen and agreed to the submitted version of the technical paper and their inclusion of names as co-author. Also, if my/our technical paper is accepted, I/We agree to comply with the terms and conditions as given on the website of the journal & you are free to publish its contribution in your journal and website.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Birara, M., Gebremeskel, G.B. Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection. Multimed Tools Appl 81, 24377–24397 (2022). https://doi.org/10.1007/s11042-022-12399-w

Download citation

Received: 12 May 2021
Revised: 04 January 2022
Accepted: 25 January 2022
Published: 19 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11042-022-12399-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Multimodal emotion classification using machine learning in immersive and non-immersive virtual reality

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Multimodal emotion classification using machine learning in immersive and non-immersive virtual reality

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation