Semi-Supervised Acoustic Model Retraining for Medical ASR

Finley, Greg P.; Edwards, Erik; Salloum, Wael; Robinson, Amanda; Sadoughi, Najmeh; Axtmann, Nico; Korenevsky, Maxim; Brenndoerfer, Michael; Miller, Mark; Suendermann-Oeft, David

doi:10.1007/978-3-319-99579-3_19

Greg P. Finley¹⁶,
Erik Edwards¹⁶,
Wael Salloum^16,17,18,
Amanda Robinson¹⁶,
Najmeh Sadoughi¹⁶,
Nico Axtmann¹⁸,
Maxim Korenevsky¹⁶,
Michael Brenndoerfer¹⁷,
Mark Miller¹⁶ &
…
David Suendermann-Oeft¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1417 Accesses
1 Citations

Abstract

Training models for speech recognition usually requires accurate word-level transcription of available speech data. For the domain of medical dictations, it is common to have “semi-literal” transcripts available: large numbers of speech files along with their associated formatted episode report, whose content only partially overlaps with the spoken content of the dictation. We present a semi-supervised method for generating acoustic training data by decoding dictations with an existing recognizer, confirming which sections are correct by using the associated report, and repurposing these audio sections for training a new acoustic model. The effectiveness of this method is demonstrated in two applications: first, to adapt a model to new speakers, resulting in a 19.7% reduction in relative word errors for these speakers; and second, to supplement an already diverse and robust acoustic model with a large quantity of additional data (from already known voices), leading to a 5.0% relative error reduction on a large test set of over one thousand speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Edwards, E., et al.: Medical speech recognition: reaching parity with humans. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 512–524. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_51
Chapter Google Scholar
Jancsary, J., Klein, A., Matiasek, J., Trost, H.: Semantics-based automatic literal reconstruction of dictations. In: Semantic Representation of Spoken Language, pp. 67–74 (2007)
Google Scholar
Kawahara, T.: Transcription system using automatic speech recognition for the Japanese Parliament (Diet). In: IAAI (2012)
Google Scholar
Kleynhans, N., De Wet, F.: Aligning audio samples from the South African parliament with Hansard transcriptions (2014)
Google Scholar
Pakhomov, S., Schonwetter, M., Bachenko, J.: Generating training data for medical dictations. In: Proceedings of NAACL-HLT, pp. 1–8 (2001)
Google Scholar
Petrik, S., et al.: Semantic and phonetic automatic reconstruction of medical dictations. Comput. Speech Lang. 25(2), 363–385 (2011)
Article Google Scholar
Petrik, S., Kubin, G.: Reconstructing medical dictations from automatically recognized and non-literal transcripts with phonetic similarity matching. In: ICASSP, vol. 4, pp. IV-1125. IEEE (2007)
Google Scholar
Suendermann, D., Liscombe, J., Pieraccini, R.: How to drink from a fire hose: one person can annoscribe 693 thousand utterances in one month. In: Proceedings of SIGdial, Tokyo, Japan (2010)
Google Scholar
Wightman, C.W., Harder, T.A.: Semi-supervised adaptation of acoustic models for large-volume dictation. In: Proceedings of Eurospeech, pp. 1371–1374 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

EMR.AI Inc., San Francisco, CA, USA
Greg P. Finley, Erik Edwards, Wael Salloum, Amanda Robinson, Najmeh Sadoughi, Maxim Korenevsky, Mark Miller & David Suendermann-Oeft
University of California, Berkeley, Berkeley, CA, USA
Wael Salloum & Michael Brenndoerfer
DHBW, Karlsruhe, Germany
Wael Salloum & Nico Axtmann

Authors

Greg P. Finley
View author publications
You can also search for this author in PubMed Google Scholar
Erik Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Wael Salloum
View author publications
You can also search for this author in PubMed Google Scholar
Amanda Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Najmeh Sadoughi
View author publications
You can also search for this author in PubMed Google Scholar
Nico Axtmann
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Korenevsky
View author publications
You can also search for this author in PubMed Google Scholar
Michael Brenndoerfer
View author publications
You can also search for this author in PubMed Google Scholar
Mark Miller
View author publications
You can also search for this author in PubMed Google Scholar
David Suendermann-Oeft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Greg P. Finley .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Finley, G.P. et al. (2018). Semi-Supervised Acoustic Model Retraining for Medical ASR. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_19
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics