Skip to main content

Advertisement

Log in

Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Speech, the principal mode of human interaction, involves the articulation of language through vocal sounds generated by the vocal apparatus. It encompasses various forms such as vocalized speech, whispering, silent speech, and subvocal speech. Silent speech refers to the absence of audible sound despite movement of speech articulators due to minimized airflow. This study seeks to convert silently mouthed words into audible speech aimed at providing communication assistance to individuals with speech impairments. The research utilizes electromyographic (EMG) signals captured from facial muscles during speech production, in conjunction with corresponding audio recordings of the same. Both the EMG signals and the audio recordings are utilized for feature extraction are the extracted features are then collectively employed to train three distinct neural network models viz. Convolutional neural network (CNN) model, Gated Recurrent Unit (GRU) and Convolutional neural network Long Short Term Memory (CNN-LSTM). Model predicts the audio features based on EMG features input and they are subsequently passed through vocoder to reconstruct the original audio speech. The models are tested on real time data and the corresponding metrics and plots are evaluated. The performance metrics establishes the superiority of the CNN-LSTM model over the other models with mean squared error (MSE) as low as 0.036. Such an approach holds promise for improving communication aids and speech rehabilitation technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availibility

The surface EMG data as well as the audio signals have been taken from the dataset of [1].

Code Availability

On request, the codes used in the experiments might be provided.

Materials Availability

Not applicable.

References

  1. Gaddy D. Voicing silent speech. Berkeley: Kluwer, Electrical Engineering and Computer Sciences University of California; 2022.

    Google Scholar 

  2. Diener L, Janke M, Schultz T. Direct conversion from facial myoelectric signals to speech using deep neural networks. Paper presented at: International Joint Conference on Neural Networks (IJCNN). 2015. https://doi.org/10.1109/IJCNN.2015.7280404.

  3. BerndAccou Vanthornhout J, Hamme H, Francart T. Decoding of the speech envelope from EEG using the VLAAI deep neural network. Sci Rep. 2023;13:812. https://doi.org/10.1038/s41598-022-27332-2.

    Article  Google Scholar 

  4. Janke M, Wand M, Nakamura K, Schultz T. Further investigations on EMG-To-speech conversion. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. https://doi.org/10.1109/ICASSP.2012.6287892.

  5. Jou S-C, Schultz T, Walliczek M, Kraft F, Waibel A. Towards continuous speech recognition using surface electromyography. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech). 2006. https://doi.org/10.21437/Interspeech.2006-212.

  6. Janke M, Diener L. Direct generation of speech from facial electromyographic signal. IEEE/ACM Trans Audio Speech Lang Process. 2017;25(12):2375–85. https://doi.org/10.1109/TASLP.2017.2738568.

    Article  Google Scholar 

  7. Bocquelet F, Hueber T, Girin L, Badin P, Yvert B. Robust articulatory speech synthesis using deep neural networks for BCI applications. In: 15th Annual Conference of the International Speech Communication Association. 2014. https://doi.org/10.21437/Interspeech.2014-449.

  8. Kapur A, Kapur S, Maes P. AlterEgo: a personalized wearable silent speech interface. In: 23rd international conference on intelligent user interfaces. 2014. https://doi.org/10.1145/3172944.3172977.

  9. Kapur A, Sarawgi U, Wadkins E, Wu M. Non-invasive silent speech recognition in multiple sclerosis with dysphonia. Sci Rep. 2020;116:25–8. https://proceedings.mlr.press/v116/kapur20a.html.

  10. Gaddy D, Klein D. Digital voicing of silent speech. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. https://doi.org/10.18653/v1/2020.emnlp-main.445.

  11. Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior AW, Kavukcuoglu K. WaveNet: a generative model for raw audio. 2016. arXiv:1609.03499.

  12. Kong J, Kim J, Bae J. HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Proc Adv Neural Inf Process Syst. 2020. https://doi.org/10.48550/arXiv.2010.05646

    Article  Google Scholar 

  13. Diener L, Herff C, Janke M, Schultz T. An initial investigation into the real-time conversion of facial surface EMG signals to audible speech. In: 8th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2016). https://api.semanticscholar.org/CorpusID:19187108.

  14. Diener L, Felsch G, Angrick M, Schultz T. Session-independent array-based EMG-to-speech conversion using convolutional neural networks. In: Speech Communication; 13th ITG-Symposium, Oldenburg, Germany, pp. 1–5, 2018.

  15. Lecun Y, Bottou YBL, Haffner P. Gradient-based learning applied to document recognition. IEEE. 1998;86:2278–324. https://doi.org/10.1109/5.726791.

    Article  Google Scholar 

  16. Mao X-J, Shen C, Yang Y. Image restoration using very deep convolutional encoder–decoder networks with symmetric skip connections. In: Neural information processing systems. 2016. https://api.semanticscholar.org/CorpusID:10987457.

  17. Vojtech JM, Chan MD, Shiwani B, Roy SH, Heaton JT, Meltzner GS, Contessa P, Luca GD, Patel R, Kline JC. Surface electromyography-based recognition, synthesis, and perception of prosodic subvocal speech. J Speech Lang Hear Res JSLHR. 2021:20. https://api.semanticscholar.org/CorpusID:234484078.

  18. Meltzner GS, Heaton JT, Deng Y, Luca GD, Roy SH, Kline JC. Development of SEMG sensors and algorithms for silent speech recognition. J Neural Eng. 2018;15(2): 046031. https://doi.org/10.1088/1741-2552/aac965.

    Article  Google Scholar 

  19. Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC. Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process. 2017;15(12):2386–98. https://doi.org/10.1109/TASLP.2017.2740000.

    Article  Google Scholar 

  20. Scheck K, Schultz T. Multi-speaker speech synthesis from electromyographic signals by soft speech unit prediction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. https://doi.org/10.1109/ICASSP49357.2023.10097120.

  21. Yamagishi J, Veaux C, MacDonald K. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019. https://api.semanticscholar.org/CorpusID:213060286.

  22. Ito K, Johnson L. The LJ speech dataset. 2017. https://keithito.com/LJ-Speech-Dataset/.

  23. Brumberg JS, Nieto-Castanon A, Kennedy PR, Guenther FH. Brain-computer interfaces for speech communication. Speech Commun. 2010;52(4):367–79. https://doi.org/10.1016/j.specom.2010.01.001.

    Article  Google Scholar 

  24. Toth AR, Wand M, Schultz T. Synthesizing speech from electromyography using voice transformation techniques. In: Proceedings of the ISCA 2009, Universität Karlsruhe, Germany. 2009. https://doi.org/10.21437/Interspeech.2009-229.

  25. Wand M, Schulte C, Janke M, Schultz T. Array-based electromyographic silent speech interface. Cognitive Systems Lab, Karlsruhe Institute of Technology. 2013. https://doi.org/10.5220/0004252400890096.

  26. Doyle AC. The adventures of Sherlock Holmes. Newport Beach: Books on Tape; 1978.

    Google Scholar 

  27. Wells HG. The war of the worlds. New York and London: Harper & Brothers; 1922. Pdf. https://www.loc.gov/item/24022215/.

  28. Ding M. A systematic review on the development of speech synthesis. In: 2023 8th International Conference on Computer and Communication Systems (ICCCS). 2023. https://doi.org/10.1109/ICCCS57501.2023.10150729.

  29. Krichen M. Generative adversarial network. In: 14th International Conference on Computing Communication and Networking Technologies. 2023. https://doi.org/10.1109/ICCCNT56998.2023.10306417.

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

This research was a collective effort, made possible through the collaboration and contributions of all authors involved.

Corresponding author

Correspondence to Uddipan Hazarika.

Ethics declarations

Conflict of interest (check journal-specific guidelines for which heading to use)

The authors declare that they have no Conflict of interest.

Ethics approval and consent to participate

Not applicable.

Consent for publication

After reading the final compilation of the paper, each author gave their approval for it to be published.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bharali, A., Borah, B.B., Hazarika, U. et al. Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach. SN COMPUT. SCI. 5, 1094 (2024). https://doi.org/10.1007/s42979-024-03457-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-03457-1

Keywords

Navigation