Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases

Green, Jordan R.; MacDonald, Robert L.; Jiang, Pan-Pan; Cattiau, Julie; Heywood, Rus; Cave, Richard; Seaver, Katie; Ladewig, Marilyn A.; Tobin, Jimmy; Brenner, Michael P.; Nelson, Philip C.; Tomanek, Katrin

doi:10.21437/Interspeech.2021-1384

Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases

Jordan R. Green, Robert L. MacDonald, Pan-Pan Jiang, Julie Cattiau, Rus Heywood, Richard Cave, Katie Seaver, Marilyn A. Ladewig, Jimmy Tobin, Michael P. Brenner, Philip C. Nelson, Katrin Tomanek

This study evaluated the accuracy of personalized automatic speech recognition (ASR) for recognizing disordered speech from a large cohort of individuals with a wide range of underlying etiologies using an open vocabulary. The performance of these models was benchmarked relative to that of expert human transcribers and two different speaker-independent ASR models trained on typical speech. 432 individuals with self-reported disordered speech recorded at least 300 short phrases using a web-based application. Word error rates (WERs) were estimated for three different ASR models and for human transcribers. Metadata were collected to evaluate the potential impact of participants, atypical speech characteristics, and technical factors on recognition accuracy. Personalized models outperformed human transcribers with median and max recognition accuracy gains of 9% and 80%, respectively. The accuracies of personalized models were high (median WER: 4.6%) and better than those of speaker-independent models (median WER: 31%). The most significant improvements were for the most severely affected speakers. Low signal-to-noise ratio and fewer training utterances were associated with poor word recognition, even for speakers with mild speech impairments. Our results demonstrate the efficacy of personalized ASR models in recognizing a wide range of speech impairments and severities and using an open vocabulary.

doi: 10.21437/Interspeech.2021-1384

Cite as: Green, J.R., MacDonald, R.L., Jiang, P.-P., Cattiau, J., Heywood, R., Cave, R., Seaver, K., Ladewig, M.A., Tobin, J., Brenner, M.P., Nelson, P.C., Tomanek, K. (2021) Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases. Proc. Interspeech 2021, 4778-4782, doi: 10.21437/Interspeech.2021-1384

@inproceedings{green21_interspeech,
  author={Jordan R. Green and Robert L. MacDonald and Pan-Pan Jiang and Julie Cattiau and Rus Heywood and Richard Cave and Katie Seaver and Marilyn A. Ladewig and Jimmy Tobin and Michael P. Brenner and Philip C. Nelson and Katrin Tomanek},
  title={{Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4778--4782},
  doi={10.21437/Interspeech.2021-1384}
}