Abstract
In this paper, we present a spoken dialog system used for collecting data for future research in the field of dementia prediction from speech. The dialog system was used to collect the speech data of patients with mild cognitive deficits. The core task solved by the dialog system was the spoken description of the vivid shore picture for one minute. The patients also performed other simple speech-based tasks. All utterances were recorded and manually transcribed to obtain a ground-truth reference. We describe the architecture of the dialog system as well as the results of the first speech recognition experiments. The zero-shot Wav2Vec 2.0 speech recognizer was used and the recognition accuracy on word- and character-level was evaluated.
The work has been supported by the grant of the University of West Bohemia, project No. SGS-2022-017 and by the programme Cooperatio, Neuroscience Charles University in Prague. Research of selected physiological and pathological mechanisms of voice, language and speech, their evaluation and intervention in the context of speech-language therapy, special education and neurodevelopmental research conducted at the Faculty of Education, Palacký University in Olomouc (IGA_PdF_2022_014).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
References
Al-Qatab, B.A., Mustafa, M.B.: Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features. IEEE Access 9, 18183–18194 (2021). https://doi.org/10.1109/ACCESS.2021.3053335
Baevski, A., Rahman Mohamed, A.: Effectiveness of self-supervised pre-training for ASR. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7694–7698 (2020)
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: Wav2Vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, 33, pp. 12449–12460 (2020)
Bartoš, A.: Netestuj, ale pobav - písemné záměrné pojmenování obrázků a jejich vybavení jako krátká kognitivní zkouška. Cesko Slov Neurol N. 112(6), 671–679 (2016)
Bartoš, A.: Netestuj, ale pobav - písemné zámšrné pojmenování obrázků a jejich vybavení jako krátká kognitivní zkouška. Cesko Slov Neurol N. 82(4), 369–378 (2019)
Baskar, M.K., Herzig, T., Nguyen, D., Diez, M., Polzehl, T., Burget, L., Černocký, J.H.: Speaker adaptation for wav2vec2 based dysarthric ASR (2022). arXiv preprint arXiv: 2204.00770
De Roeck, E.E., De Deyn, P.P., Dierckx, E., Engelborghs, S.: Brief cognitive screening instruments for early detection of Alzheimer’s disease: a systematic review. Alzheimer’s Res. Ther. 11(1), 21 (2019). https://doi.org/10.1186/s13195-019-0474-3
Lehečka, J., Švec, J., A.P., Psutka, J.: Exploring capabilities of monolingual audio transformers using large datasets in automatic speech recognition of Czech. In: Proceedings Interspeech (2022)
König, A., et al.: Automatic speech analysis for the assessment of patients with predementia and alzheimer’s disease. Alzheimer’s Dementia Diagn. Assessment Dis. Monit. 1(1), 112–124 (2015). https://doi.org/10.1016/j.dadm.2014.11.012, https://www.sciencedirect.com/science/article/pii/S2352872915000160
Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Detecting cognitive decline using speech only: the ADReSSo challenge. In: Proceedings Interspeech 2021, pp. 3780–3784 (2021). https://doi.org/10.21437/Interspeech.2021-1220
Pražák, A., Loose, Z., Psutka, J.V., Radová, V., Psutka, J., Švec, J.: Live tv subtitling through respeaking. In: INTERSPEECH 2021, pp. 2339–2340 (2021)
Pulido, M.L.B., et al.: Alzheimer’s disease and automatic speech analysis: a review. Expert Syst. Appl. 150, 113213 (2020). https://doi.org/10.1016/j.eswa.2020.113213, https://www.sciencedirect.com/science/article/pii/S0957417420300397
Qiao, Y.: Computer-assisted speech analysis in mild cognitive impairment and alzheimer’s disease: a pilot study from shanghai, China. J. Alzheimer’s Dis. 75, 211–221 (2020). https://doi.org/10.3233/JAD-191056
Ren, J., Liu, M.: An automatic dysarthric speech recognition approach using deep neural networks. Int. J. Adv. Comput. Sci. Appl. 8(12) (2017). https://doi.org/10.14569/IJACSA.2017.081207
Stanislav, P., Psutka, J.V., Psutka, J.: Recognition of the electrolaryngeal speech: comparison between human and machine. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 509–517. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_57
Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_40
Vásquez-Correa, J., et al.: Convolutional neural networks and a transfer learning strategy to classify Parkinson’s disease from speech in three different languages. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds.) CIARP 2019. LNCS, vol. 11896, pp. 697–706. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33904-3_66
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, C., et al.: VoxPopuli: a large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. In: Proceedings of ACL (Volume 1: Long Papers), pp. 993–1003. Association for Computational Linguistics, (2021). https://aclanthology.org/2021.acl-long.80
Weiner, J., Herff, C., Schultz, T.: Speech-based detection of Alzheimer’s disease in conversational German. In: Interspeech, pp. 1938–1942 (2016)
Yadav, V.G.: The hunt for a cure for Alzheimer’s disease receives a timely boost. Sci. Transl. Med. 11(509), eaaz0311 (2019). https://doi.org/10.1126/scitranslmed.aaz0311, https://www.science.org/doi/abs/10.1126/scitranslmed.aaz0311
Zhu, Y., Obyat, A., Liang, X., Batsis, J.A., Roth, R.M.: WavBERT: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection. In: Proceedings Interspeech 2021, pp. 3790–3794 (2021). https://doi.org/10.21437/Interspeech.2021-332
Švec, J., Neduchal, P., Hrúz, M.: Multi-modal communication system for mobile robot. In: Proceedings of 17th International Conference on Programmable Devices and Embedded Systems, PDeS 2022 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Švec, J., Polák, F., Bartoš, A., Zapletalová, M., Víta, M. (2022). Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-16270-1_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)