Abstract
Using automatic lip recognition technology to promote social interaction and integration of hearing impaired individuals and dysphonic people is one of promising applications of artificial intelligence in healthcare and rehabilitation. Due to inaccurate mouth shapes and unclear expressions, hearing impaired individuals and dysphonic people cannot communicate as normal people do. In this paper, a speech training system for hearing impaired individuals and dysphonic people is constructed using state-of-the-art automatic lip-reading technology which combines convolutional neural network (CNN) and recurrent neural network (RNN). We train their speech skills by comparing different mouth shapes between the hearing impaired individuals and normal people. The speech training system can be divided into four parts. Firstly, we create a speech training database that stores mouth shapes of normal people and corresponding sign language vocabulary. Secondly, the system implements automatic lip-reading through a hybrid neural network of the MobileNet and the Long-Short-Term Memory Networks (LSTM). Thirdly, the system finds correct lip shape matched by sign language vocabulary from the speech training database and compares the result with lip shapes of hearing impaired individuals. Finally, the system draws comparison data and similarity rate based on the size of lips of hearing impaired individuals, the angle of opening lips, and the differences between different lip shapes. Giving a standard lip-reading sequence for the hearing impaired for their learning and training. As a result, hearing impaired individuals and dysphonic people can analyze and correct their vocal lip shapes based on the comparison results. They can perform training independently to improve their mouth shape. Besides, the system can help hearing impaired individuals learn how to pronounce correctly with the help of medical devices such as cochlear implants. Experiments show that the speech training system based on automatic lip-reading recognition can effectively correct lip shape of the hearing impaired individuals while they speak and improve their speech ability without help from others.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ogawa, T., Uchida, Y.: Hearing-impaired elderly people have smaller social networks: A population-based aging study. Arch. Gerontol. Geriatricsr 83, 75–80 (2019)
Xiang, Z.: China Statistical Yearbook on the Work for Persons with Disabilities (2018). ISBN 978-7-5037-8563-4
Melissa, R.: A survey of clinicians with specialization in childhood apraxia of speech. Am. J. Speech-Lang. Pathol. 28, 1659–1672 (2019)
Bhutta, M.F.: Models of service delivery for ear and hearing care in remote or resource-constrained environments. J. Laryngol. Otol. 18, 1–10 (2018)
Perry, H.B., Zulliger, R., Rogers, M.M.: Community health workers in low-, middle-, and high-income countries: an overview of their history, recent evolution, and current effectiveness. Annu. Rev. Public Health 35(1), 399–421 (2014)
Jaimes, A., Sebe, N.: Multimodal human–computer interaction: a survey. Comput. Vis. Image Underst. 108, 116–134 (2007)
Ma, N.W.: Enlightenment of domestic research of lip-reading on speech rehabilitation of hearing-impaired children. Modern Spec. Educ. 12, 54–57 (2015)
Lu, Y.Y., Li, H.B.: Automatic lip-reading system based on deep convolutional neural network and attention-based long short-term memory. Appl. Sci. Basel. 9 (2019). https://doi.org/10.3390/app9081599
Hassanat, A.B.: Visual passwords using automatic lip reading. arXiv 2014. arXiv:1409.0924
Thanda, A., Venkatesan, S.M.: Multi-task learning of deep neural networks for audio visual automatic speech recognition. arXiv 2017 arXiv:1701.02477
Biswas, A., Sahu, P.K., Chandra, M.: Multiple cameras audio visual speech recognition using active appearance model visual features in car environment. Int. J. Speech Technol. 19, 159–171 (2016)
Werth, J., Radha, M.: Deep learning approach for ECG-based automatic sleep state classification in preterm infants. Biomed. Sig. Process. Control https://doi.org/10.1016/j.bspc.2019.101663
McNeely-White, D., Beveridge, J.R., Draper, B.A.: Inception and ResNet features are (almost) equivalent. Cogn. Syst. Res. 59, 312–318 (2020)
Rauf, HT., Miu, L., Zahoor, S.: Visual features based automated identification of fish species using deep convolutional neural networks. Comput. Electron. Agric. https://doi.org/10.1016/j.compag.2019.105075
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018
Chen, H.Y., Su, C.Y.: An enhanced hybrid MobileNet. In: Proceedings of the International Conference on Awareness Science and Technology (2018)
Michele, A., Colin, V., Santika, D.D.: Santika MobileNet convolutional neural networks and support vector machines for palmprint recognition. Procedia Comput. Sci. 157, 110–117 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: Proceedings of the 9th International Conference on Artificial Neural Networks: ICANN 1999, Edinburgh, UK, 7–10 September 1999
Acknowledgements
The research was partially supported by the National Natural Science Foundation of China (no. 61571013 and no. 61971007), the Beijing Natural Science Foundation of China (no. 4143061).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, Y., Yang, S., Xu, Z., Wang, J. (2020). Speech Training System for Hearing Impaired Individuals Based on Automatic Lip-Reading Recognition. In: Nunes, I. (eds) Advances in Human Factors and Systems Interaction. AHFE 2020. Advances in Intelligent Systems and Computing, vol 1207. Springer, Cham. https://doi.org/10.1007/978-3-030-51369-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-51369-6_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51368-9
Online ISBN: 978-3-030-51369-6
eBook Packages: EngineeringEngineering (R0)