Skip to main content

Automatic Speech Recognition Model Adaptation to Medical Domain Using Untranscribed Audio

  • Conference paper
  • First Online:
Digital Business and Intelligent Systems (Baltic DB&IS 2022)

Abstract

Automatic speech recognition (ASR) technologies can provide significant efficiency gains in the health sector, by saving time and financial resources, allowing specialists to shift more time to high-value activities.

Creating customized ASR models requires domain- and task-related transcribed speech data. Unfortunately, producing such data usually is too expensive for medical institutions: it requires a lot of financial, human resources, and expertise. Consequently, his paper explores a semi-supervised medical domain adaptation method for the Latvian language that benefits from the untranscribed speech recordings. For the initial model, we use the currently available general-purpose hybrid ASR system with the core of a lattice-free maximum mutual information method used to train its acoustic model. The initial system is applied to the domain-related untranscribed data to extract sequences of pseudo-labels. Such automatic transcriptions are later added to the supervised and used together to update the acoustic model. To improve our ASR system further, we have also updated its language model with additional in-domain texts.

We have achieved significant improvements in the quality of speech recognition on all evaluation datasets. On the epicrises, psychiatry, and radiology datasets word error rate (WER) decreased by 39%, 27%–29%, and 21%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, Y., Wang, W., Wang, C.: Semi-supervised ASR by end-to-end self-training. arXiv abs/2001.09128 (2020)

    Google Scholar 

  2. Errattahi, R., El Hannani, A., Salmam, F.Z., Ouahmane, H.: Incorporating label dependency for ASR error detection via RNN. Procedia Comput. Sci. 148, 266–272 (2019)

    Article  Google Scholar 

  3. Grezl, F., Karafiát, M.: Semi-supervised bootstrapping approach for neural network feature extractor training. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 470–475. IEEE (2013)

    Google Scholar 

  4. Gruzitis, N., Dargis, R., Lasmanis, V.J., Garkaje, G., Gosko, D.: Adapting automatic speech recognition to the radiology domain for a less-resourced language: the case of Latvian. In: Nagar, A.K., Jat, D.S., Marín-Raventós, G., Mishra, D.K. (eds.) Intelligent Sustainable Systems. LNNS, vol. 333, pp. 267–276. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6309-3_27

    Chapter  Google Scholar 

  5. Heafield, K.: Kenlm: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)

    Google Scholar 

  6. Kahn, J., Lee, A., Hannun, A.: Self-training for end-to-end speech recognition. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7084–7088. IEEE (2020)

    Google Scholar 

  7. Khonglah, B.K., Madikeri, S.R., Dey, S., Bourlard, H., Motlícek, P., Billa, J.: Incremental semi-supervised learning for multi-genre speech recognition. In: ICASSP, pp. 7419–7423. IEEE (2020)

    Google Scholar 

  8. Lybarger, K., Ostendorf, M., Yetisgen, M.: Automatically detecting likely edits in clinical notes created using automatic speech recognition. In: AMIA Annual Symposium Proceedings, vol. 2017, p. 1186. American Medical Informatics Association (2017)

    Google Scholar 

  9. Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Semi-supervised training of acoustic models using lattice-free mmi. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4844–4848. IEEE (2018)

    Google Scholar 

  10. Pinnis, M., Auziņa, I., Goba, K.: Designing the Latvian speech recognition corpus. In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC 2014), pp. 1547–1553 (2014)

    Google Scholar 

  11. Pinnis, M., Salimbajevs, A., Auzina, I.: Designing a speech corpus for the development and evaluation of dictation systems in Latvian. In: Chair, N.C.C., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, France (2016)

    Google Scholar 

  12. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. iEEE Catalog No.: CFP11SRW-USB

    Google Scholar 

  13. Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 08-12-Sept, pp. 2751–2755 (2016). https://doi.org/10.21437/Interspeech.2016-595

  14. Salimbajevs, A.: Creating lithuanian and Latvian speech corpora from inaccurately annotated web data. In: Calzolari, N., et al. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, 7–12 May 2018. European Language Resources Association (ELRA) (2018). http://www.lrec-conf.org/proceedings/lrec2018/summaries/258.html

  15. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1715–1725 (2016)

    Google Scholar 

  16. Sheikh, I., Vincent, E., Illina, I.: On semi-supervised LF-MMI training of acoustic models with limited data. In: INTERSPEECH 2020, Shanghai, China (2020). https://hal.inria.fr/hal-02907924

  17. Singh, K., et al.: Large scale weakly and semi-supervised learning for low-resource video ASR. In: INTERSPEECH, pp. 3770–3774. ISCA (2020)

    Google Scholar 

  18. Synnaeve, G., et al.: End-to-end ASR: from supervised to semi-supervised learning with modern architectures. CoRR abs/1911.08460 (2019). http://arxiv.org/abs/1911.08460

  19. Tam, Y.C., Lei, Y., Zheng, J., Wang, W.: ASR error detection using recurrent neural network language model and complementary ASR. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2312–2316. IEEE (2014)

    Google Scholar 

  20. Thomas, S., Seltzer, M.L., Church, K., Hermansky, H.: Deep neural network features and semi-supervised training for low resource speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6704–6708. IEEE (2013)

    Google Scholar 

  21. Veselỳ, K., Burget, L., Cernockỳ, J.: Semi-supervised DNN training with word selection for ASR. In: Interspeech, pp. 3687–3691 (2017)

    Google Scholar 

  22. Wallington, E., Kershenbaum, B., Klejch, O., Bell, P.: On the learning dynamics of semi-supervised training for ASR. In: Proceedings of Interspeech 2021, pp. 716–720 (2021). https://doi.org/10.21437/Interspeech.2021-1777

  23. Zhang, P., Liu, Y., Hain, T.: Semi-supervised DNN training in meeting recognition. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 141–146. IEEE (2014)

    Google Scholar 

Download references

Acknowledgements

This research has been supported by the ICT Competence Centre (www.itkc.lv) within the project “2.8. Automated voice communication solutions for the healthcare industry” of EU Structural funds, ID no 1.2.1.1/18/A/003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Askars Salimbajevs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salimbajevs, A., Kapočiūtė-Dzikienė, J. (2022). Automatic Speech Recognition Model Adaptation to Medical Domain Using Untranscribed Audio. In: Ivanovic, M., Kirikova, M., Niedrite, L. (eds) Digital Business and Intelligent Systems. Baltic DB&IS 2022. Communications in Computer and Information Science, vol 1598. Springer, Cham. https://doi.org/10.1007/978-3-031-09850-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09850-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09849-9

  • Online ISBN: 978-3-031-09850-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics