Automatic Speech Recognition Model Adaptation to Medical Domain Using Untranscribed Audio

Salimbajevs, Askars; Kapočiūtė-Dzikienė, Jurgita

doi:10.1007/978-3-031-09850-5_5

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1598))

Included in the following conference series:

International Baltic Conference on Digital Business and Intelligent Systems

490 Accesses

Abstract

Automatic speech recognition (ASR) technologies can provide significant efficiency gains in the health sector, by saving time and financial resources, allowing specialists to shift more time to high-value activities.

Creating customized ASR models requires domain- and task-related transcribed speech data. Unfortunately, producing such data usually is too expensive for medical institutions: it requires a lot of financial, human resources, and expertise. Consequently, his paper explores a semi-supervised medical domain adaptation method for the Latvian language that benefits from the untranscribed speech recordings. For the initial model, we use the currently available general-purpose hybrid ASR system with the core of a lattice-free maximum mutual information method used to train its acoustic model. The initial system is applied to the domain-related untranscribed data to extract sequences of pseudo-labels. Such automatic transcriptions are later added to the supervised and used together to update the acoustic model. To improve our ASR system further, we have also updated its language model with additional in-domain texts.

We have achieved significant improvements in the quality of speech recognition on all evaluation datasets. On the epicrises, psychiatry, and radiology datasets word error rate (WER) decreased by 39%, 27%–29%, and 21%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semi-Supervised Acoustic Model Retraining for Medical ASR

Adapting Automatic Speech Recognition to the Radiology Domain for a Less-Resourced Language: The Case of Latvian

Automatic Speech Recognition for Kreol Morisien: A Case Study for the Health Domain

References

Chen, Y., Wang, W., Wang, C.: Semi-supervised ASR by end-to-end self-training. arXiv abs/2001.09128 (2020)
Google Scholar
Errattahi, R., El Hannani, A., Salmam, F.Z., Ouahmane, H.: Incorporating label dependency for ASR error detection via RNN. Procedia Comput. Sci. 148, 266–272 (2019)
Article Google Scholar
Grezl, F., Karafiát, M.: Semi-supervised bootstrapping approach for neural network feature extractor training. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 470–475. IEEE (2013)
Google Scholar
Gruzitis, N., Dargis, R., Lasmanis, V.J., Garkaje, G., Gosko, D.: Adapting automatic speech recognition to the radiology domain for a less-resourced language: the case of Latvian. In: Nagar, A.K., Jat, D.S., Marín-Raventós, G., Mishra, D.K. (eds.) Intelligent Sustainable Systems. LNNS, vol. 333, pp. 267–276. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6309-3_27
Chapter Google Scholar
Heafield, K.: Kenlm: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
Google Scholar
Kahn, J., Lee, A., Hannun, A.: Self-training for end-to-end speech recognition. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7084–7088. IEEE (2020)
Google Scholar
Khonglah, B.K., Madikeri, S.R., Dey, S., Bourlard, H., Motlícek, P., Billa, J.: Incremental semi-supervised learning for multi-genre speech recognition. In: ICASSP, pp. 7419–7423. IEEE (2020)
Google Scholar
Lybarger, K., Ostendorf, M., Yetisgen, M.: Automatically detecting likely edits in clinical notes created using automatic speech recognition. In: AMIA Annual Symposium Proceedings, vol. 2017, p. 1186. American Medical Informatics Association (2017)
Google Scholar
Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Semi-supervised training of acoustic models using lattice-free mmi. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4844–4848. IEEE (2018)
Google Scholar
Pinnis, M., Auziņa, I., Goba, K.: Designing the Latvian speech recognition corpus. In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC 2014), pp. 1547–1553 (2014)
Google Scholar
Pinnis, M., Salimbajevs, A., Auzina, I.: Designing a speech corpus for the development and evaluation of dictation systems in Latvian. In: Chair, N.C.C., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, France (2016)
Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. iEEE Catalog No.: CFP11SRW-USB
Google Scholar
Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 08-12-Sept, pp. 2751–2755 (2016). https://doi.org/10.21437/Interspeech.2016-595
Salimbajevs, A.: Creating lithuanian and Latvian speech corpora from inaccurately annotated web data. In: Calzolari, N., et al. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, 7–12 May 2018. European Language Resources Association (ELRA) (2018). http://www.lrec-conf.org/proceedings/lrec2018/summaries/258.html
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1715–1725 (2016)
Google Scholar
Sheikh, I., Vincent, E., Illina, I.: On semi-supervised LF-MMI training of acoustic models with limited data. In: INTERSPEECH 2020, Shanghai, China (2020). https://hal.inria.fr/hal-02907924
Singh, K., et al.: Large scale weakly and semi-supervised learning for low-resource video ASR. In: INTERSPEECH, pp. 3770–3774. ISCA (2020)
Google Scholar
Synnaeve, G., et al.: End-to-end ASR: from supervised to semi-supervised learning with modern architectures. CoRR abs/1911.08460 (2019). http://arxiv.org/abs/1911.08460
Tam, Y.C., Lei, Y., Zheng, J., Wang, W.: ASR error detection using recurrent neural network language model and complementary ASR. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2312–2316. IEEE (2014)
Google Scholar
Thomas, S., Seltzer, M.L., Church, K., Hermansky, H.: Deep neural network features and semi-supervised training for low resource speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6704–6708. IEEE (2013)
Google Scholar
Veselỳ, K., Burget, L., Cernockỳ, J.: Semi-supervised DNN training with word selection for ASR. In: Interspeech, pp. 3687–3691 (2017)
Google Scholar
Wallington, E., Kershenbaum, B., Klejch, O., Bell, P.: On the learning dynamics of semi-supervised training for ASR. In: Proceedings of Interspeech 2021, pp. 716–720 (2021). https://doi.org/10.21437/Interspeech.2021-1777
Zhang, P., Liu, Y., Hain, T.: Semi-supervised DNN training in meeting recognition. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 141–146. IEEE (2014)
Google Scholar

Download references

Acknowledgements

This research has been supported by the ICT Competence Centre (www.itkc.lv) within the project “2.8. Automated voice communication solutions for the healthcare industry” of EU Structural funds, ID no 1.2.1.1/18/A/003.

Author information

Authors and Affiliations

Tilde SIA, Vienibas Street 75a, Riga, 1004, Latvia
Askars Salimbajevs
University of Latvia, Raina blvd. 19, Riga, 1050, Latvia
Askars Salimbajevs
Tilde IT, Naugarduko Street 100, 03160, Vilnius, Lithuania
Jurgita Kapočiūtė-Dzikienė
Faculty of Informatics, Vytautas Magnus University, Vileikos Street 8, 44404, Kaunas, Lithuania
Jurgita Kapočiūtė-Dzikienė

Authors

Askars Salimbajevs
View author publications
You can also search for this author in PubMed Google Scholar
Jurgita Kapočiūtė-Dzikienė
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Askars Salimbajevs .

Editor information

Editors and Affiliations

University of Novi Sad, Novi Sad, Serbia
Mirjana Ivanovic
Riga Technical University, Riga, Latvia
Marite Kirikova
University of Latvia, Riga, Latvia
Laila Niedrite

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salimbajevs, A., Kapočiūtė-Dzikienė, J. (2022). Automatic Speech Recognition Model Adaptation to Medical Domain Using Untranscribed Audio. In: Ivanovic, M., Kirikova, M., Niedrite, L. (eds) Digital Business and Intelligent Systems. Baltic DB&IS 2022. Communications in Computer and Information Science, vol 1598. Springer, Cham. https://doi.org/10.1007/978-3-031-09850-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-09850-5_5
Published: 27 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09849-9
Online ISBN: 978-3-031-09850-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Speech Recognition Model Adaptation to Medical Domain Using Untranscribed Audio