Abstract
Training models for speech recognition usually requires accurate word-level transcription of available speech data. For the domain of medical dictations, it is common to have “semi-literal” transcripts available: large numbers of speech files along with their associated formatted episode report, whose content only partially overlaps with the spoken content of the dictation. We present a semi-supervised method for generating acoustic training data by decoding dictations with an existing recognizer, confirming which sections are correct by using the associated report, and repurposing these audio sections for training a new acoustic model. The effectiveness of this method is demonstrated in two applications: first, to adapt a model to new speakers, resulting in a 19.7% reduction in relative word errors for these speakers; and second, to supplement an already diverse and robust acoustic model with a large quantity of additional data (from already known voices), leading to a 5.0% relative error reduction on a large test set of over one thousand speakers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Edwards, E., et al.: Medical speech recognition: reaching parity with humans. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 512–524. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_51
Jancsary, J., Klein, A., Matiasek, J., Trost, H.: Semantics-based automatic literal reconstruction of dictations. In: Semantic Representation of Spoken Language, pp. 67–74 (2007)
Kawahara, T.: Transcription system using automatic speech recognition for the Japanese Parliament (Diet). In: IAAI (2012)
Kleynhans, N., De Wet, F.: Aligning audio samples from the South African parliament with Hansard transcriptions (2014)
Pakhomov, S., Schonwetter, M., Bachenko, J.: Generating training data for medical dictations. In: Proceedings of NAACL-HLT, pp. 1–8 (2001)
Petrik, S., et al.: Semantic and phonetic automatic reconstruction of medical dictations. Comput. Speech Lang. 25(2), 363–385 (2011)
Petrik, S., Kubin, G.: Reconstructing medical dictations from automatically recognized and non-literal transcripts with phonetic similarity matching. In: ICASSP, vol. 4, pp. IV-1125. IEEE (2007)
Suendermann, D., Liscombe, J., Pieraccini, R.: How to drink from a fire hose: one person can annoscribe 693 thousand utterances in one month. In: Proceedings of SIGdial, Tokyo, Japan (2010)
Wightman, C.W., Harder, T.A.: Semi-supervised adaptation of acoustic models for large-volume dictation. In: Proceedings of Eurospeech, pp. 1371–1374 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Finley, G.P. et al. (2018). Semi-Supervised Acoustic Model Retraining for Medical ASR. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)