Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach

Bharali, Abhishek; Borah, Bidyut Bikash; Hazarika, Uddipan; Roy, Soumik

doi:10.1007/s42979-024-03457-1

Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach

Original Research
Published: 27 November 2024

Volume 5, article number 1094, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Abhishek Bharali¹,
Bidyut Bikash Borah¹^na1,
Uddipan Hazarika ORCID: orcid.org/0000-0001-8102-3973¹^na1 &
…
Soumik Roy¹^na1

80 Accesses
Explore all metrics

Abstract

Speech, the principal mode of human interaction, involves the articulation of language through vocal sounds generated by the vocal apparatus. It encompasses various forms such as vocalized speech, whispering, silent speech, and subvocal speech. Silent speech refers to the absence of audible sound despite movement of speech articulators due to minimized airflow. This study seeks to convert silently mouthed words into audible speech aimed at providing communication assistance to individuals with speech impairments. The research utilizes electromyographic (EMG) signals captured from facial muscles during speech production, in conjunction with corresponding audio recordings of the same. Both the EMG signals and the audio recordings are utilized for feature extraction are the extracted features are then collectively employed to train three distinct neural network models viz. Convolutional neural network (CNN) model, Gated Recurrent Unit (GRU) and Convolutional neural network Long Short Term Memory (CNN-LSTM). Model predicts the audio features based on EMG features input and they are subsequently passed through vocoder to reconstruct the original audio speech. The models are tested on real time data and the corresponding metrics and plots are evaluated. The performance metrics establishes the superiority of the CNN-LSTM model over the other models with mean squared error (MSE) as low as 0.036. Such an approach holds promise for improving communication aids and speech rehabilitation technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Overview of Automatic Speech Recognition Based on Deep Learning and Bio–Signal Sensors

An Investigational Analysis of Automatic Speech Recognition on Deep Neural Networks and Gated Recurrent Unit Model

Kannada Continuous Speech Recognition Using Deep Learning

Data Availibility

The surface EMG data as well as the audio signals have been taken from the dataset of [1].

Code Availability

On request, the codes used in the experiments might be provided.

Materials Availability

Not applicable.

References

Gaddy D. Voicing silent speech. Berkeley: Kluwer, Electrical Engineering and Computer Sciences University of California; 2022.
Google Scholar
Diener L, Janke M, Schultz T. Direct conversion from facial myoelectric signals to speech using deep neural networks. Paper presented at: International Joint Conference on Neural Networks (IJCNN). 2015. https://doi.org/10.1109/IJCNN.2015.7280404.
BerndAccou Vanthornhout J, Hamme H, Francart T. Decoding of the speech envelope from EEG using the VLAAI deep neural network. Sci Rep. 2023;13:812. https://doi.org/10.1038/s41598-022-27332-2.
Article Google Scholar
Janke M, Wand M, Nakamura K, Schultz T. Further investigations on EMG-To-speech conversion. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. https://doi.org/10.1109/ICASSP.2012.6287892.
Jou S-C, Schultz T, Walliczek M, Kraft F, Waibel A. Towards continuous speech recognition using surface electromyography. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech). 2006. https://doi.org/10.21437/Interspeech.2006-212.
Janke M, Diener L. Direct generation of speech from facial electromyographic signal. IEEE/ACM Trans Audio Speech Lang Process. 2017;25(12):2375–85. https://doi.org/10.1109/TASLP.2017.2738568.
Article Google Scholar
Bocquelet F, Hueber T, Girin L, Badin P, Yvert B. Robust articulatory speech synthesis using deep neural networks for BCI applications. In: 15th Annual Conference of the International Speech Communication Association. 2014. https://doi.org/10.21437/Interspeech.2014-449.
Kapur A, Kapur S, Maes P. AlterEgo: a personalized wearable silent speech interface. In: 23rd international conference on intelligent user interfaces. 2014. https://doi.org/10.1145/3172944.3172977.
Kapur A, Sarawgi U, Wadkins E, Wu M. Non-invasive silent speech recognition in multiple sclerosis with dysphonia. Sci Rep. 2020;116:25–8. https://proceedings.mlr.press/v116/kapur20a.html.
Gaddy D, Klein D. Digital voicing of silent speech. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. https://doi.org/10.18653/v1/2020.emnlp-main.445.
Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior AW, Kavukcuoglu K. WaveNet: a generative model for raw audio. 2016. arXiv:1609.03499.
Kong J, Kim J, Bae J. HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Proc Adv Neural Inf Process Syst. 2020. https://doi.org/10.48550/arXiv.2010.05646
Article Google Scholar
Diener L, Herff C, Janke M, Schultz T. An initial investigation into the real-time conversion of facial surface EMG signals to audible speech. In: 8th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2016). https://api.semanticscholar.org/CorpusID:19187108.
Diener L, Felsch G, Angrick M, Schultz T. Session-independent array-based EMG-to-speech conversion using convolutional neural networks. In: Speech Communication; 13th ITG-Symposium, Oldenburg, Germany, pp. 1–5, 2018.
Lecun Y, Bottou YBL, Haffner P. Gradient-based learning applied to document recognition. IEEE. 1998;86:2278–324. https://doi.org/10.1109/5.726791.
Article Google Scholar
Mao X-J, Shen C, Yang Y. Image restoration using very deep convolutional encoder–decoder networks with symmetric skip connections. In: Neural information processing systems. 2016. https://api.semanticscholar.org/CorpusID:10987457.
Vojtech JM, Chan MD, Shiwani B, Roy SH, Heaton JT, Meltzner GS, Contessa P, Luca GD, Patel R, Kline JC. Surface electromyography-based recognition, synthesis, and perception of prosodic subvocal speech. J Speech Lang Hear Res JSLHR. 2021:20. https://api.semanticscholar.org/CorpusID:234484078.
Meltzner GS, Heaton JT, Deng Y, Luca GD, Roy SH, Kline JC. Development of SEMG sensors and algorithms for silent speech recognition. J Neural Eng. 2018;15(2): 046031. https://doi.org/10.1088/1741-2552/aac965.
Article Google Scholar
Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC. Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process. 2017;15(12):2386–98. https://doi.org/10.1109/TASLP.2017.2740000.
Article Google Scholar
Scheck K, Schultz T. Multi-speaker speech synthesis from electromyographic signals by soft speech unit prediction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. https://doi.org/10.1109/ICASSP49357.2023.10097120.
Yamagishi J, Veaux C, MacDonald K. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019. https://api.semanticscholar.org/CorpusID:213060286.
Ito K, Johnson L. The LJ speech dataset. 2017. https://keithito.com/LJ-Speech-Dataset/.
Brumberg JS, Nieto-Castanon A, Kennedy PR, Guenther FH. Brain-computer interfaces for speech communication. Speech Commun. 2010;52(4):367–79. https://doi.org/10.1016/j.specom.2010.01.001.
Article Google Scholar
Toth AR, Wand M, Schultz T. Synthesizing speech from electromyography using voice transformation techniques. In: Proceedings of the ISCA 2009, Universität Karlsruhe, Germany. 2009. https://doi.org/10.21437/Interspeech.2009-229.
Wand M, Schulte C, Janke M, Schultz T. Array-based electromyographic silent speech interface. Cognitive Systems Lab, Karlsruhe Institute of Technology. 2013. https://doi.org/10.5220/0004252400890096.
Doyle AC. The adventures of Sherlock Holmes. Newport Beach: Books on Tape; 1978.
Google Scholar
Wells HG. The war of the worlds. New York and London: Harper & Brothers; 1922. Pdf. https://www.loc.gov/item/24022215/.
Ding M. A systematic review on the development of speech synthesis. In: 2023 8th International Conference on Computer and Communication Systems (ICCCS). 2023. https://doi.org/10.1109/ICCCS57501.2023.10150729.
Krichen M. Generative adversarial network. In: 14th International Conference on Computing Communication and Networking Technologies. 2023. https://doi.org/10.1109/ICCCNT56998.2023.10306417.

Download references

Funding

Not applicable.

Author information

Bidyut Bikash Borah, Uddipan Hazarika and Soumik Roy contributed equally to this work.

Authors and Affiliations

Department of Electronics and Communication Engineering, Tezpur University, Napaam, Tezpur, Assam, 784028, India
Abhishek Bharali, Bidyut Bikash Borah, Uddipan Hazarika & Soumik Roy

Authors

Abhishek Bharali
View author publications
You can also search for this author in PubMed Google Scholar
Bidyut Bikash Borah
View author publications
You can also search for this author in PubMed Google Scholar
Uddipan Hazarika
View author publications
You can also search for this author in PubMed Google Scholar
Soumik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

This research was a collective effort, made possible through the collaboration and contributions of all authors involved.

Corresponding author

Correspondence to Uddipan Hazarika.

Ethics declarations

Conflict of interest (check journal-specific guidelines for which heading to use)

The authors declare that they have no Conflict of interest.

Ethics approval and consent to participate

Not applicable.

Consent for publication

After reading the final compilation of the paper, each author gave their approval for it to be published.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bharali, A., Borah, B.B., Hazarika, U. et al. Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach. SN COMPUT. SCI. 5, 1094 (2024). https://doi.org/10.1007/s42979-024-03457-1

Download citation

Received: 04 June 2024
Accepted: 26 October 2024
Published: 27 November 2024
DOI: https://doi.org/10.1007/s42979-024-03457-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Overview of Automatic Speech Recognition Based on Deep Learning and Bio–Signal Sensors

An Investigational Analysis of Automatic Speech Recognition on Deep Neural Networks and Gated Recurrent Unit Model

Kannada Continuous Speech Recognition Using Deep Learning

Data Availibility

Code Availability

Materials Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest (check journal-specific guidelines for which heading to use)

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Enabling the Translation of Electromyographic Signals Into Speech: A Neural Network Based Decoding Approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Overview of Automatic Speech Recognition Based on Deep Learning and Bio–Signal Sensors

An Investigational Analysis of Automatic Speech Recognition on Deep Neural Networks and Gated Recurrent Unit Model

Kannada Continuous Speech Recognition Using Deep Learning

Data Availibility

Code Availability

Materials Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest (check journal-specific guidelines for which heading to use)

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation