Abstract
We have been developing a speech enhancement device for laryngectomees. Our approach is to use a lip-reading technology to be able to recognize Japanese words from lip images and generate speech outputs using mobile devices. The target words are translated into registered 36 viseme sequences, and converted into VAE (Variational Auto Encoder) feature parameters. Then the corresponding words are recognized using CNN-based model. Previously the PC-based experimental prototype was tested with 20 Japanese words and a well-trained single subject. From the result, we confirmed 65% recognition accuracy, and 100% including 1st and 2nd candidates from that test. In this paper, a couple of recognition performance improvement methods were investigated with larger vocabulary and multiple subjects. After adjusting the speech rate and the mouth movement, we obtained about 60% word recognition accuracy including the 1st through 6th candidates with inexperienced users. Also, we developed a mobile device based prototype and conducted the preliminary recognition experiment with 20 words by a well-trained single subject, and 95% accuracy was obtained including the 1st through 6th candidates, which was almost equivalent to the PC-based system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Matsui, K., et al.: Enhancement of esophageal speech using formant synthesis. J. Acoust. Soc. Jpn. (E) 23(2), 66–79 (2002)
Matsui, K., et al.: Development of speech enhancement system. IEEJ J. 134(4), 216–219 (2014)
Kimura, K., et al.: Development of wearable speech enhancement system for laryngectomees. In: NCSP 2016, pp. 339–342, March (2016)
Nakahara, T., et al.: Speech enhancement system using lip-reading. In: 17th International Conference on Distributed Computing and Artificial Intelligence, DCAI 2020, pp. 159–167, October (2020)
Matsui, K., et al.: Mobile device-based speech enhancement system using lip-reading. In: IICAIET 2020. September (2020)
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., et al.: Silent speech interfaces. Speech Commun. 52(4), 270 (2010)
Kapur, A., Kapur, S., Maes, P.: AlterEgo: a personalized wearable silent speech interface. In: IUI 2018. Tokyo, Japan, March 7–11 (2018)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Leaning. MIT Press, Cambridge, Massachusetts (2016)
Saito, Y.: Deep Learning From Scratch. O’Reilly, Japan (2016)
Hideki, A., et al.: Deep Leaning. Kindai Kagakusya, Tokyo (2015)
King, D.E.: Max-Margin Object Detection. arXiv:1502.00046v1 [cs.CV], 31, Jan (2015)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees, In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)
Assael, Y.M., Shillingford, B., Whiteson, S., de Freitas, N.: LipNet: end-to-end sentence-level lip reading. In: GPU Technology Conference (2017)
Ito, D., Takiguchi, T., Ariki, Y.: Lip image to speech conversion using LipNet. Acoustic Society of Japan articles, March (2018)
Kawahara, H.: STRAIGHT, exploitation of the other aspect of vocoder: perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol. 27(6), 349–353 (2006)
Asami, et al.: Basic study on lip reading for Japanese speaker by machine learning. In: 33rd, Picture Coding Symposium (PCSJ/IMPS2018), pp. 3–8. Nov. (2018)
Saitoh, T., Kubokawa, M.: SSSD: Japanese speech scene database by smart device for visual speech recognition. IEICE 117(513), 163–168 (2018)
Saitoh, T., Kubokawa, M.: SSSD: speech scene database by smart device for visual speech recognition. In: Proceeding of the ICPR 2018, pp. 3228–3232 (2018)
Acknowledgments
This study was subsidized by JSPS Grant-in-Aid for Scientific Research 1 9K12905.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Eguchi, F., Matsui, K., Nakatoh, Y., Kato, Y.O., Rivas, A., Corchado, J.M. (2022). Development of Mobile Device-Based Speech Enhancement System Using Lip-Reading. In: Matsui, K., Omatu, S., Yigitcanlar, T., González, S.R. (eds) Distributed Computing and Artificial Intelligence, Volume 1: 18th International Conference. DCAI 2021. Lecture Notes in Networks and Systems, vol 327. Springer, Cham. https://doi.org/10.1007/978-3-030-86261-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-86261-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86260-2
Online ISBN: 978-3-030-86261-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)