In treating people who stutter, clinicians often have their clients read a story in order to determine their stuttering frequency. As the client is speaking, the clinician annotates each disfluency. For further analysis of the client’s speech, it is useful to have a word transcription of what was said. However, as these are real-time annotations, they are not always correct, and they usually lag where the actual disfluency occurred. We have built a tool that rescores a word lattice taking into account the clinician’s annotations. In the paper, we describe how we incorporate the clinician’s annotations, and the improvement over a baseline version. This approach of leveraging clinician annotations can be used for other clinical tasks where a word transcription is useful for further or richer analysis.
Cite as: Heeman, P.A., Lunsford, R., McMillin, A., Yaruss, J.S. (2016) Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech. Proc. Interspeech 2016, 2651-2655, doi: 10.21437/Interspeech.2016-1388
@inproceedings{heeman16_interspeech, author={Peter A. Heeman and Rebecca Lunsford and Andy McMillin and J. Scott Yaruss}, title={{Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={2651--2655}, doi={10.21437/Interspeech.2016-1388} }