Skip to main content

Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Abstract

In this paper, we present a spoken dialog system used for collecting data for future research in the field of dementia prediction from speech. The dialog system was used to collect the speech data of patients with mild cognitive deficits. The core task solved by the dialog system was the spoken description of the vivid shore picture for one minute. The patients also performed other simple speech-based tasks. All utterances were recorded and manually transcribed to obtain a ground-truth reference. We describe the architecture of the dialog system as well as the results of the first speech recognition experiments. The zero-shot Wav2Vec 2.0 speech recognizer was used and the recognition accuracy on word- and character-level was evaluated.

The work has been supported by the grant of the University of West Bohemia, project No. SGS-2022-017 and by the programme Cooperatio, Neuroscience Charles University in Prague. Research of selected physiological and pathological mechanisms of voice, language and speech, their evaluation and intervention in the context of speech-language therapy, special education and neurodevelopmental research conducted at the Faculty of Education, Palacký University in Olomouc (IGA_PdF_2022_014).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available at https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS.

References

  1. Al-Qatab, B.A., Mustafa, M.B.: Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features. IEEE Access 9, 18183–18194 (2021). https://doi.org/10.1109/ACCESS.2021.3053335

    Article  Google Scholar 

  2. Baevski, A., Rahman Mohamed, A.: Effectiveness of self-supervised pre-training for ASR. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7694–7698 (2020)

    Google Scholar 

  3. Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: Wav2Vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, 33, pp. 12449–12460 (2020)

    Google Scholar 

  4. Bartoš, A.: Netestuj, ale pobav - písemné záměrné pojmenování obrázků a jejich vybavení jako krátká kognitivní zkouška. Cesko Slov Neurol N. 112(6), 671–679 (2016)

    Google Scholar 

  5. Bartoš, A.: Netestuj, ale pobav - písemné zámšrné pojmenování obrázků a jejich vybavení jako krátká kognitivní zkouška. Cesko Slov Neurol N. 82(4), 369–378 (2019)

    Google Scholar 

  6. Baskar, M.K., Herzig, T., Nguyen, D., Diez, M., Polzehl, T., Burget, L., Černocký, J.H.: Speaker adaptation for wav2vec2 based dysarthric ASR (2022). arXiv preprint arXiv: 2204.00770

  7. De Roeck, E.E., De Deyn, P.P., Dierckx, E., Engelborghs, S.: Brief cognitive screening instruments for early detection of Alzheimer’s disease: a systematic review. Alzheimer’s Res. Ther. 11(1), 21 (2019). https://doi.org/10.1186/s13195-019-0474-3

    Article  Google Scholar 

  8. Lehečka, J., Švec, J., A.P., Psutka, J.: Exploring capabilities of monolingual audio transformers using large datasets in automatic speech recognition of Czech. In: Proceedings Interspeech (2022)

    Google Scholar 

  9. König, A., et al.: Automatic speech analysis for the assessment of patients with predementia and alzheimer’s disease. Alzheimer’s Dementia Diagn. Assessment Dis. Monit. 1(1), 112–124 (2015). https://doi.org/10.1016/j.dadm.2014.11.012, https://www.sciencedirect.com/science/article/pii/S2352872915000160

  10. Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Detecting cognitive decline using speech only: the ADReSSo challenge. In: Proceedings Interspeech 2021, pp. 3780–3784 (2021). https://doi.org/10.21437/Interspeech.2021-1220

  11. Pražák, A., Loose, Z., Psutka, J.V., Radová, V., Psutka, J., Švec, J.: Live tv subtitling through respeaking. In: INTERSPEECH 2021, pp. 2339–2340 (2021)

    Google Scholar 

  12. Pulido, M.L.B., et al.: Alzheimer’s disease and automatic speech analysis: a review. Expert Syst. Appl. 150, 113213 (2020). https://doi.org/10.1016/j.eswa.2020.113213, https://www.sciencedirect.com/science/article/pii/S0957417420300397

  13. Qiao, Y.: Computer-assisted speech analysis in mild cognitive impairment and alzheimer’s disease: a pilot study from shanghai, China. J. Alzheimer’s Dis. 75, 211–221 (2020). https://doi.org/10.3233/JAD-191056

    Article  Google Scholar 

  14. Ren, J., Liu, M.: An automatic dysarthric speech recognition approach using deep neural networks. Int. J. Adv. Comput. Sci. Appl. 8(12) (2017). https://doi.org/10.14569/IJACSA.2017.081207

  15. Stanislav, P., Psutka, J.V., Psutka, J.: Recognition of the electrolaryngeal speech: comparison between human and machine. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 509–517. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_57

    Chapter  Google Scholar 

  16. Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_40

    Chapter  Google Scholar 

  17. Vásquez-Correa, J., et al.: Convolutional neural networks and a transfer learning strategy to classify Parkinson’s disease from speech in three different languages. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds.) CIARP 2019. LNCS, vol. 11896, pp. 697–706. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33904-3_66

    Chapter  Google Scholar 

  18. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  19. Wang, C., et al.: VoxPopuli: a large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. In: Proceedings of ACL (Volume 1: Long Papers), pp. 993–1003. Association for Computational Linguistics, (2021). https://aclanthology.org/2021.acl-long.80

  20. Weiner, J., Herff, C., Schultz, T.: Speech-based detection of Alzheimer’s disease in conversational German. In: Interspeech, pp. 1938–1942 (2016)

    Google Scholar 

  21. Yadav, V.G.: The hunt for a cure for Alzheimer’s disease receives a timely boost. Sci. Transl. Med. 11(509), eaaz0311 (2019). https://doi.org/10.1126/scitranslmed.aaz0311, https://www.science.org/doi/abs/10.1126/scitranslmed.aaz0311

  22. Zhu, Y., Obyat, A., Liang, X., Batsis, J.A., Roth, R.M.: WavBERT: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection. In: Proceedings Interspeech 2021, pp. 3790–3794 (2021). https://doi.org/10.21437/Interspeech.2021-332

  23. Švec, J., Neduchal, P., Hrúz, M.: Multi-modal communication system for mobile robot. In: Proceedings of 17th International Conference on Programmable Devices and Embedded Systems, PDeS 2022 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Švec .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Švec, J., Polák, F., Bartoš, A., Zapletalová, M., Víta, M. (2022). Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics