This paper presents an analysis of the influence of various system parameters on the output quality of our neural network based real-time EMG-to-Speech conversion system. This EMG-to-Speech system allows for the direct conversion of facial surface electromyographic signals into audible speech in real time, allowing for a closed-loop setup where users get direct audio feedback. Such a setup opens new avenues for research and applications through co-adaptation approaches. In this paper, we evaluate the influence of several parameters on the output quality, such as time context, EMG-Audio delay, network-, training data- and Mel spectrogram size. The resulting output quality is evaluated based on the objective output quality measure STOI.
Cite as: Diener, L., Schultz, T. (2018) Investigating Objective Intelligibility in Real-Time EMG-to-Speech Conversion. Proc. Interspeech 2018, 3162-3166, doi: 10.21437/Interspeech.2018-2080
@inproceedings{diener18_interspeech, author={Lorenz Diener and Tanja Schultz}, title={{Investigating Objective Intelligibility in Real-Time EMG-to-Speech Conversion}}, year=2018, booktitle={Proc. Interspeech 2018}, pages={3162--3166}, doi={10.21437/Interspeech.2018-2080} }