Multimodal fusion for alzheimer’s disease recognition

Ying, Yangwei; Yang, Tao; Zhou, Hong

doi:10.1007/s10489-022-04255-z

Multimodal fusion for alzheimer’s disease recognition

Published: 01 December 2022

Volume 53, pages 16029–16040, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1317 Accesses
5 Citations
Explore all metrics

Abstract

Alzheimer’s disease (AD) is the most prevalent form of progressive degenerative dementia, which has a great impact on social economics throughout the world. In the vast majority of cases, AD patients are diagnosed by biochemical analysis, lumbar puncture and advanced imaging examination, which cannot play a preventive role in early stage of Alzheimer’s disease. Speech signals contain abundant personal information, especially AD patients always accompany with speech disorder, which provides a potential to utilize speech information to distinguish AD patients from healthy persons. The work presented in this paper aims to develop new approach for early detection of AD by noninvasive methods. We propose to make utilization of multimodal features with speech acoustic and linguistic features for the speech recognition of Alzheimer’s disease. Three different kinds of features, IS10_paraling features, deep acoustic using fine-tuned Wav2Vec2.0 model and deep linguistic features extracted using fine-tuned BERT, are adopted for AD classification by SVM classifier. By conducting experiments on two publicly available datasets of NCMMSC2021 and ADReSSo, the experimental results show that our model achieves state-of-the-art (SOTA) performance with satisfactory recognition effect. Our best-performing model obtains the accuracy of 89.1% and 84.0% in the long and short-audio of NCMMSC2021, and 83.7% in ADReSSo, which is promising for the early diagnosis and classification of AD patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based speech analysis for Alzheimer’s disease detection: a literature review

Article Open access 14 December 2022

Detection of Alzheimer’s Disease Through Speech Features and Machine Learning Classifiers

Multimodal Data Fusion for Automatic Detection of Alzheimer’s Disease

Data Availability

The data is not fully open access. The details can be found from http://www.ncmmsc2021.org/ and https://luzs.gitlab.io/adresso-2021/, respectively.

Notes

References

Mattson MP (2004) Pathways towards and away from alzheimer’s disease. Nature 430(7000):631–639
Article Google Scholar
Xu L, Wu X, Chen K, Li Yao (2015) Multi-modality sparse representation-based classification for alzheimer’s disease and mild cognitive impairment. Comput Methods Prog Biomed 122(2):182–190
Article Google Scholar
Mueller KD, Koscik RL, Hermann BP, Johnson SC, Turkstra LS (2018) Declines in connected language are associated with very early mild cognitive impairment: Results from the wisconsin registry for alzheimer’s prevention. Frontiers in Aging Neuroscience, p 9
Khelifa MOM, Elhadj YM, Abdellah Y, Belkasmi M (2017) Constructing accurate and robust hmm/gmm models for an arabic speech recognition system. Int J Speech Technol 20(4):937– 949
Article Google Scholar
Wang D, Wang X, Lv S (2019) An overview of end-to-end automatic speech recognition. Symmetry 11(8):1018
Article Google Scholar
Ying Y, Tu Y, Zhou H (2021) Unsupervised feature learning for speech emotion recognition based on autoencoder. Electronics 10(17):2086
Article Google Scholar
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2010) The interspeech 2010 paralinguistic challenge. In: Proc. INTERSPEECH 2010, Makuhari, Japan, pp 2794–2797
Eyben F, Scherer KR, Schuller BW, Sundberg J, André E, Busso C, Devillers LY, Epps J, Laukka P, Narayanan SS, Truong KP (2016) The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Trans Affect Comput 7(2):190–202
Article Google Scholar
Schuller B, Steidl S, Batliner A, Hirschberg J, Burgoon JK, Baird A, Elkins A, Zhang Y, Coutinho E, Evanini K et al (2016) The interspeech 2016 computational paralinguistics challenge: Deception, sincerity & native language. In: 17TH Annual conference of the international speech communication association (Interspeech 2016), vol 1-5, pp 2001–2005
Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, pp 1459–1462
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert:, Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta:, A robustly optimized bert pretraining approach. arXiv:1907.11692
Baevski A, Zhou Y, Mohamed A, Auli M (2020) wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449–12460
Google Scholar
Chen S, Wang C, Chen Z, Wu Y, Liu S, Chen Z, Li J, Kanda N, Yoshioka T, Xiao X et al (2021) Wavlm:, Large-scale self-supervised pre-training for full stack speech processing. arXiv:2110.13900
Forbes-McKay KE, Venneri A (2005) Detecting subtle spontaneous language decline in early alzheimer’s disease with a picture description task. Neurol Sci 26(4):243–254
Article Google Scholar
Mueller KD, Koscik RL, Hermann BP, Johnson SC, Turkstra LS (2018) Declines in connected language are associated with very early mild cognitive impairment: Results from the wisconsin registry for alzheimer’s prevention. Front Aging Neurosci 9:437
Article Google Scholar
Haider F, De La Fuente S, Luz S (2019) An assessment of paralinguistic acoustic features for detection of alzheimer’s dementia in spontaneous speech. IEEE J Sel Top Signal Process 14(2):272–281
Article Google Scholar
Nasreen S, Hough J, Purver M et al (2021) Detecting alzheimer’s disease using interactional and acoustic features from spontaneous speech Interspeech
Rohanian M, Hough J, Purver M (2021) Alzheimer’s dementia recognition using acoustic, lexical, disfluency and speech pause features robust to noisy inputs, arXiv:2106.15684
Yuan J, Bian Y, Cai X, Huang J, Ye Z, Church K (2020) Disfluencies and fine-tuning pre-trained language models for detection of alzheimer’s disease. In: INTERSPEECH, pp 2162–2166
Zhang C, Xue L (2021) Autoencoder with emotion embedding for speech emotion recognition. IEEE Access 9:51231–51241
Article Google Scholar
Vasquez-Correa JC, Arias-Vergara T, Schuster M, Orozco-Arroyave JR, Nöth E (2020) Parallel representation learning for the classification of pathological speech: studies on parkinson’s disease and cleft lip and palate. Speech Comm 122:56–67
Article Google Scholar
Padi S, Sadjadi SO, Sriram RD, Manocha D (2021) Improved speech emotion recognition using transfer learning and spectrogram augmentation. In: Proceedings of the 2021 international conference on multimodal interaction, pp 645–652
Chen L-W, Rudnicky A (2021) Exploring wav2vec 2.0 fine-tuning for improved speech emotion recognition. arXiv:2110.06309
Qin Y, Liu W, Peng Z, Ng S-I, Li J, Hu H, Lee T (2021) Exploiting pre-trained asr models for alzheimer’s disease recognition through spontaneous speech. arXiv:2110.01493
Balagopalan A, Eyre B, Rudzicz F, Novikova J (2020) To bert or not to bert:, comparing speech and language-based approaches for alzheimer’s disease detection. arXiv:2008.01551
Luz S, Haider F, De La Fuente S, Fromm D, MacWhinney B (2021) Detecting cognitive decline using speech only:, The adresso challenge. arXiv:2104.09356
Siriwardhana S, Reis Andrew, Weerasekera R, Nanayakkara S (2020) Jointly fine-tuning “bert-like” self supervised models to improve multimodal speech emotion recognition. arXiv:2008.06682
Syed MSS, Syed ZS, Lech M, Pirogova E (2020) Automated screening for alzheimer’s dementia through spontaneous speech. In: INTERSPEECH, pp 2222–2226
Chen J, Ye J, Tang F, Zhou J (2021) Automatic detection of alzheimer’s disease using spontaneous speech only. In: Proc. Interspeech, pp 3830–3834
Syed ZS, Syed MSS, Lech M, Pirogova E (2021) Tackling the adresso challenge 2021: the muet-rmit system for alzheimer’s dementia recognition from spontaneous speech. Proc Interspeech 2021:3815–3819
Google Scholar
Qiao Y, Yin X, Wiechmann D, Kerz E (2021) Alzheimer’s disease detection from spontaneous speech through combining linguistic complexity and (dis) fluency features with pretrained language models arXiv:2106.08689
Syed Zafi Sherhan, Sidorov Kirill, Marshall David (2018) Automated screening for bipolar disorder from audio/visual modalities. In: Proceedings of the 2018 on Audio/visual emotion challenge and workshop, pp 39–45
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5206–5210. IEEE
Zhang T, Wu F, Katiyar A, Weinberger KQ, Artzi Y (2020) Revisiting few-sample bert fine-tuning. arXiv:2006.05987
Kim T, Kim HY (2019) Forecasting stock prices with a feature fusion lstm-cnn model using different representations of the same data. PloS one 14(2):e0212320
Article Google Scholar
Liu G, He W, Jin B (2018) Feature fusion of speech emotion recognition based on deep learning. In: 2018 International conference on network infrastructure and digital content (IC-NIDC), pp 193–197, IEEE
Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of the 2020 Conference on empirical methods in natural language processing: Findings, pp 657–668, Online, November 2020. Association for Computational Linguistics
Pérez-Toro PA, Bayerl SP, Arias-Vergara T, Vásquez-Correa JC, Klumpp P, Schuster M, Nöth E, Orozco-Arroyave JR, Riedhammer K (2021) Influence of the interviewer on the automatic assessment of alzheimer’s disease in the context of the adresso challenge. In: Interspeech, pp 3785–3789
Wang N, Cao Y, Hao S, Shao Z, Subbalakshmi KP (2021) Modular multi-modal attention network for alzheimer’s disease detection using patient audio and language data. In: Interspeech, pp 3835–3839
Pappagari R, Cho J, Joshi S, Moro-Velázquez L, Zelasko P, Villalba J, Dehak N (2021) Automatic detection and assessment of alzheimer disease using speech and language technologies in low-resource scenarios. In: Interspeech, pp 3825–3829

Download references

Funding

This work is supported by Key Research and Development of Zhejiang Province of China under Grant 2021C03030 and National Key Research and Development Program of China under Grant 2019YFC0118202

Author information

Authors and Affiliations

Zhejiang Provincial Key Laboratory for Network Multimedia Technologies, Key Laboratory for Biomedical Engineering of Ministry of Education Zhejiang University, Hangzhou, 310027, China
Yangwei Ying, Tao Yang & Hong Zhou

Authors

Yangwei Ying
View author publications
You can also search for this author in PubMed Google Scholar
Tao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Zhou.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yangwei Ying and Tao Yang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ying, Y., Yang, T. & Zhou, H. Multimodal fusion for alzheimer’s disease recognition. Appl Intell 53, 16029–16040 (2023). https://doi.org/10.1007/s10489-022-04255-z

Download citation

Accepted: 09 October 2022
Published: 01 December 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04255-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal fusion for alzheimer’s disease recognition

Abstract

Access this article

Similar content being viewed by others

Deep learning-based speech analysis for Alzheimer’s disease detection: a literature review

Detection of Alzheimer’s Disease Through Speech Features and Machine Learning Classifiers

Multimodal Data Fusion for Automatic Detection of Alzheimer’s Disease

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal fusion for alzheimer’s disease recognition

Abstract

Access this article

Similar content being viewed by others

Deep learning-based speech analysis for Alzheimer’s disease detection: a literature review

Detection of Alzheimer’s Disease Through Speech Features and Machine Learning Classifiers

Multimodal Data Fusion for Automatic Detection of Alzheimer’s Disease

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation