Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders

Švec, Jan; Polák, Filip; Bartoš, Aleš; Zapletalová, Michaela; Víta, Martin

doi:10.1007/978-3-031-16270-1_41

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1198 Accesses

Abstract

In this paper, we present a spoken dialog system used for collecting data for future research in the field of dementia prediction from speech. The dialog system was used to collect the speech data of patients with mild cognitive deficits. The core task solved by the dialog system was the spoken description of the vivid shore picture for one minute. The patients also performed other simple speech-based tasks. All utterances were recorded and manually transcribed to obtain a ground-truth reference. We describe the architecture of the dialog system as well as the results of the first speech recognition experiments. The zero-shot Wav2Vec 2.0 speech recognizer was used and the recognition accuracy on word- and character-level was evaluated.

The work has been supported by the grant of the University of West Bohemia, project No. SGS-2022-017 and by the programme Cooperatio, Neuroscience Charles University in Prague. Research of selected physiological and pathological mechanisms of voice, language and speech, their evaluation and intervention in the context of speech-language therapy, special education and neurodevelopmental research conducted at the Faculty of Education, Palacký University in Olomouc (IGA_PdF_2022_014).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Comparison of wav2vec 2.0 models on three speech processing tasks

Article Open access 10 October 2024

An Open-Source Voice Command-Based Human-Computer Interaction System Using Speech Recognition Platforms

Notes

1.
Available at https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS.

References

Al-Qatab, B.A., Mustafa, M.B.: Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features. IEEE Access 9, 18183–18194 (2021). https://doi.org/10.1109/ACCESS.2021.3053335
Article Google Scholar
Baevski, A., Rahman Mohamed, A.: Effectiveness of self-supervised pre-training for ASR. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7694–7698 (2020)
Google Scholar
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: Wav2Vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, 33, pp. 12449–12460 (2020)
Google Scholar
Bartoš, A.: Netestuj, ale pobav - písemné záměrné pojmenování obrázků a jejich vybavení jako krátká kognitivní zkouška. Cesko Slov Neurol N. 112(6), 671–679 (2016)
Google Scholar
Bartoš, A.: Netestuj, ale pobav - písemné zámšrné pojmenování obrázků a jejich vybavení jako krátká kognitivní zkouška. Cesko Slov Neurol N. 82(4), 369–378 (2019)
Google Scholar
Baskar, M.K., Herzig, T., Nguyen, D., Diez, M., Polzehl, T., Burget, L., Černocký, J.H.: Speaker adaptation for wav2vec2 based dysarthric ASR (2022). arXiv preprint arXiv: 2204.00770
De Roeck, E.E., De Deyn, P.P., Dierckx, E., Engelborghs, S.: Brief cognitive screening instruments for early detection of Alzheimer’s disease: a systematic review. Alzheimer’s Res. Ther. 11(1), 21 (2019). https://doi.org/10.1186/s13195-019-0474-3
Article Google Scholar
Lehečka, J., Švec, J., A.P., Psutka, J.: Exploring capabilities of monolingual audio transformers using large datasets in automatic speech recognition of Czech. In: Proceedings Interspeech (2022)
Google Scholar
König, A., et al.: Automatic speech analysis for the assessment of patients with predementia and alzheimer’s disease. Alzheimer’s Dementia Diagn. Assessment Dis. Monit. 1(1), 112–124 (2015). https://doi.org/10.1016/j.dadm.2014.11.012, https://www.sciencedirect.com/science/article/pii/S2352872915000160
Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Detecting cognitive decline using speech only: the ADReSSo challenge. In: Proceedings Interspeech 2021, pp. 3780–3784 (2021). https://doi.org/10.21437/Interspeech.2021-1220
Pražák, A., Loose, Z., Psutka, J.V., Radová, V., Psutka, J., Švec, J.: Live tv subtitling through respeaking. In: INTERSPEECH 2021, pp. 2339–2340 (2021)
Google Scholar
Pulido, M.L.B., et al.: Alzheimer’s disease and automatic speech analysis: a review. Expert Syst. Appl. 150, 113213 (2020). https://doi.org/10.1016/j.eswa.2020.113213, https://www.sciencedirect.com/science/article/pii/S0957417420300397
Qiao, Y.: Computer-assisted speech analysis in mild cognitive impairment and alzheimer’s disease: a pilot study from shanghai, China. J. Alzheimer’s Dis. 75, 211–221 (2020). https://doi.org/10.3233/JAD-191056
Article Google Scholar
Ren, J., Liu, M.: An automatic dysarthric speech recognition approach using deep neural networks. Int. J. Adv. Comput. Sci. Appl. 8(12) (2017). https://doi.org/10.14569/IJACSA.2017.081207
Stanislav, P., Psutka, J.V., Psutka, J.: Recognition of the electrolaryngeal speech: comparison between human and machine. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 509–517. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_57
Chapter Google Scholar
Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_40
Chapter Google Scholar
Vásquez-Correa, J., et al.: Convolutional neural networks and a transfer learning strategy to classify Parkinson’s disease from speech in three different languages. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds.) CIARP 2019. LNCS, vol. 11896, pp. 697–706. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33904-3_66
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, C., et al.: VoxPopuli: a large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. In: Proceedings of ACL (Volume 1: Long Papers), pp. 993–1003. Association for Computational Linguistics, (2021). https://aclanthology.org/2021.acl-long.80
Weiner, J., Herff, C., Schultz, T.: Speech-based detection of Alzheimer’s disease in conversational German. In: Interspeech, pp. 1938–1942 (2016)
Google Scholar
Yadav, V.G.: The hunt for a cure for Alzheimer’s disease receives a timely boost. Sci. Transl. Med. 11(509), eaaz0311 (2019). https://doi.org/10.1126/scitranslmed.aaz0311, https://www.science.org/doi/abs/10.1126/scitranslmed.aaz0311
Zhu, Y., Obyat, A., Liang, X., Batsis, J.A., Roth, R.M.: WavBERT: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection. In: Proceedings Interspeech 2021, pp. 3790–3794 (2021). https://doi.org/10.21437/Interspeech.2021-332
Švec, J., Neduchal, P., Hrúz, M.: Multi-modal communication system for mobile robot. In: Proceedings of 17th International Conference on Programmable Devices and Embedded Systems, PDeS 2022 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, New Technologies for the Information Society, University of West Bohemia, Pilsen, Czech Republic
Jan Švec & Filip Polák
Charles University, Third Faculty of Medicine, University Hospital Královské Vinohrady, Department of Neurology, AD Center, Prague, Czech Republic
Aleš Bartoš
Faculty of Education, Palacký University in Olomouc, Olomouc, Czech Republic
Michaela Zapletalová
Department of Mathematics, Faculty of Informatics and Statistics, Prague University of Economics and Business, Prague, Czech Republic
Martin Víta

Authors

Jan Švec
View author publications
You can also search for this author in PubMed Google Scholar
Filip Polák
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Bartoš
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Zapletalová
View author publications
You can also search for this author in PubMed Google Scholar
Martin Víta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Švec .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Švec, J., Polák, F., Bartoš, A., Zapletalová, M., Víta, M. (2022). Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_41
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Comparison of wav2vec 2.0 models on three speech processing tasks

An Open-Source Voice Command-Based Human-Computer Interaction System Using Speech Recognition Platforms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Comparison of wav2vec 2.0 models on three speech processing tasks

An Open-Source Voice Command-Based Human-Computer Interaction System Using Speech Recognition Platforms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation