Severe speech impairments limit the precision and range of producible speech sounds. As a result, generic automatic speech recognition (ASR) and keyword spotting (KWS) systems fail to accurately recognize the utterances produced by individuals with severe speech impairments. This paper describes an approach in a simple speech sound, namely isolated open vowel (/a/), is used in lieu of more motorically-demanding utterances. A neural network (NN) is trained to detect the isolated open vowel uttered by impaired speakers. The NN is trained with a two-phase approach. The pre-training phase uses samples from unimpaired speakers along with samples of background noises and unrelated speech; then the fine-tuning phase uses samples of vowel samples collected from individuals with speech impairments. This model can be built into an experimental mobile app to act as a switch that allows users to activate preconfigured actions such as alerting caregivers. Preliminary user testing indicates the vowel spotter has the potential to be a useful and flexible emergency communication channel for motor- and speech-impaired individuals.
Cite as: Cai, S., Lillianfeld, L., Seaver, K., Green, J.R., Brenner, M.P., Nelson, P.C., Sculley, D. (2021) A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting Using Neural Networks. Proc. Interspeech 2021, 4823-4827, doi: 10.21437/Interspeech.2021-330
@inproceedings{cai21c_interspeech, author={Shanqing Cai and Lisie Lillianfeld and Katie Seaver and Jordan R. Green and Michael P. Brenner and Philip C. Nelson and D. Sculley}, title={{A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting Using Neural Networks}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={4823--4827}, doi={10.21437/Interspeech.2021-330} }