Multimodal Data Fusion for Automatic Detection of Alzheimer’s Disease

Krstev, Ivan; Pavikjevikj, Milan; Toshevska, Martina; Gievska, Sonja

doi:10.1007/978-3-031-06018-2_6

Ivan Krstev⁸,
Milan Pavikjevikj⁸,
Martina Toshevska⁸ &
…
Sonja Gievska⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13320))

Included in the following conference series:

International Conference on Human-Computer Interaction

1849 Accesses
3 Citations

Abstract

The viability of multimodal fusion of linguistic and acoustic biomarkers in speech to help in identifying a person with probable Alzheimer’s dementia symptoms have been explored in this research. For capturing the effect of dementia on person’s language and verbal abilities, a novel way of disease detection was explored based on visual analysis of images of spectrogram extracted from patient’s interview recordings. We put forward three fusion methods, which allow the major advancements in representation learning to be utilized. The objective of the empirical study and ensuing discussion presented in this paper was threefold: 1) to examine the potential of state-of-the-art transformer-based architectures and transfer learning to assist the disease diagnosis, 2) to map the problem of acoustic analysis into the realm of image processing, by transforming spectrograms into images and employing pretrained deep neural networks, such as ResNet to extract visual patterns, and 3) to investigate the sound interplay of multi-modal biomarkers of Alzheimer’s dementia when fusing the learned representations in different modalities. We present the results of independent evaluations of the unimodal methods against which the fusion methods have been compared to.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal fusion for alzheimer’s disease recognition

Article 01 December 2022

Multimodal Approaches for Alzheimer’s Detection Using Patients’ Speech and Transcript

Alzheimer’s Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings

Notes

1.
https://dementia.talkbank.org/, last visited: 10.02.2022.

References

Akbik, A., Bergmann, T., Vollgraf, R.: Pooled contextualized embeddings for named entity recognition. In: NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 724–728 (2019)
Google Scholar
Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: COLING 2018, 27th International Conference on Computational Linguistics, pp. 1638–1649 (2018)
Google Scholar
Alyahya, R.S., Halai, A.D., Conroy, P., Ralph, M.A.L.: Mapping psycholinguistic features to the neuropsychological and lesion profiles in aphasia. Cortex 124, 260–273 (2020)
Article Google Scholar
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020)
Google Scholar
Becker, J.T., Boiler, F., Lopez, O.L., Saxton, J., McGonigle, K.L.: The natural history of Alzheimer’s disease: description of study cohort and accuracy of diagnosis. Arch. Neurol. 51(6), 585–594 (1994)
Article Google Scholar
Bucks, R.S., Singh, S., Cuerden, J.M., Wilcock, G.K.: Analysis of spontaneous, conversational speech in dementia of Alzheimer type: evaluation of an objective technique for analysing lexical performance. Aphasiology 14(1), 71–91 (2000)
Article Google Scholar
Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. CoRR abs/2003.10555 (2020). https://arxiv.org/abs/2003.10555
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Eyben, F., Wöllmer, M., Schuller, B.: OpenSmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462 (2010)
Google Scholar
Gauder, L., Pepino, L., Ferrer, L., Riera, P.: Alzheimer disease recognition using speech-based embeddings from pre-trained models. In: Proceedings of Interspeech 2021, pp. 3795–3799 (2021). https://doi.org/10.21437/Interspeech.2021-753
Gievska, S., Koroveshovski, K.: The impact of affective verbal content on predicting personality impressions in YouTube videos. In: Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition, pp. 19–22 (2014)
Google Scholar
Goedert, M., Spillantini, M.G.: A century of Alzheimer’s disease. Science 314(5800), 777–781 (2006)
Article Google Scholar
Goodglass, H., Kaplan, E., Weintraub, S.: BDAE: The Boston Diagnostic Aphasia Examination. Lippincott Williams & Wilkins, Philadelphia (2001)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)
Article Google Scholar
Huang, S.C., Pareek, A., Zamanian, R., Banerjee, I., Lungren, M.P.: Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Sci. Rep. 10(1), 1–9 (2020)
Article Google Scholar
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$1mb model size. CoRR abs/1602.07360 (2016). http://arxiv.org/abs/1602.07360
Joulin, A., Grave, E., Mikolov, P.B.T.: Bag of tricks for efficient text classification (2016)
Google Scholar
Khachaturian, Z.S.: Diagnosis of Alzheimer’s disease. Arch. Neurol. 42(11), 1097–1105 (1985)
Article Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777 (2017)
Google Scholar
Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Alzheimer’s dementia recognition through spontaneous speech: the adress challenge. arXiv preprint arXiv:2004.06833 (2020)
Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Detecting cognitive decline using speech only: the ADReSSo challenge. In: Proceedings of Interspeech 2021, pp. 3780–3784 (2021). https://doi.org/10.21437/Interspeech.2021-1220
Martinc, M., Pollak, S.: Tackling the ADReSS challenge: a multimodal approach to the automated recognition of Alzheimer’s dementia. In: INTERSPEECH, pp. 2157–2161 (2020)
Google Scholar
McFee, B., et al.: Thassilo: librosa/librosa: 0.8.1rc2, May 2021. https://doi.org/10.5281/zenodo.4792298
Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)
Article MathSciNet Google Scholar
Mucke, L.: Alzheimer’s disease. Nature 461(7266), 895–897 (2009)
Article Google Scholar
Pan, Y., et al.: Using the outputs of different automatic speech recognition paradigms for acoustic-and BERT-based Alzheimer’s dementia detection through spontaneous speech. In: Proceedings of Interspeech, pp. 3810–3814 (2021)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Article Google Scholar
Poria, S., Cambria, E., Howard, N., Huang, G.B., Hussain, A.: Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174, 50–59 (2016)
Article Google Scholar
Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448. IEEE (2016)
Google Scholar
Pérez-Toro, P., et al.: Influence of the interviewer on the automatic assessment of Alzheimer’s disease in the context of the ADReSSo challenge. In: Proceedings of Interspeech 2021, pp. 3785–3789 (2021). https://doi.org/10.21437/Interspeech.2021-1589
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November 2019. https://arxiv.org/abs/1908.10084
Sarawgi, U., Zulfikar, W., Soliman, N., Maes, P.: Multimodal inductive transfer learning for detection of Alzheimer’s dementia and its severity. arXiv preprint arXiv:2009.00700 (2020)
Shrestha, A., Serra, E., Spezzano, F.: Multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums. Netw. Model. Anal. Health Inform. Bioinform. 9(1), 1–11 (2020). https://doi.org/10.1007/s13721-020-0226-0
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556
Stark, B.C., et al.: Standardizing assessment of spoken discourse in aphasia: a working group with deliverables. Am. J. Speech Lang. Pathol. 30(1S), 491–502 (2021)
Article Google Scholar
Vina, J., Lloret, A.: Why women have more Alzheimer’s disease than men: gender and mitochondrial toxicity of amyloid-$\beta $ peptide. J. Alzheimers Dis. 20(s2), S527–S533 (2010)
Article Google Scholar
Wang, N., Cao, Y., Hao, S., Shao, Z., Subbalakshmi, K.: Modular multi-modal attention network for Alzheimer’s disease detection using patient audio and language data. In: Proceedings of Interspeech 2021, pp. 3835–3839 (2021). https://doi.org/10.21437/Interspeech.2021-2024
Wiley, J.: Alzheimer’s disease facts and figures. Alzheimers Dement. 17, 327–406 (2021)
Google Scholar
Zhou, G., Wang, J., Zhang, X., Yu, G.: DeepGOA: predicting gene ontology annotations of proteins via graph convolutional network. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1836–1841. IEEE (2019)
Google Scholar
Zhu, Y., Obyat, A., Liang, X., Batsis, J.A., Roth, R.M.: WavBERT: exploiting semantic and non-semantic speech using wav2vec and BERT for dementia detection. In: Proceedings of Interspeech 2021, pp. 3790–3794 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Ivan Krstev, Milan Pavikjevikj, Martina Toshevska & Sonja Gievska

Authors

Ivan Krstev
View author publications
You can also search for this author in PubMed Google Scholar
Milan Pavikjevikj
View author publications
You can also search for this author in PubMed Google Scholar
Martina Toshevska
View author publications
You can also search for this author in PubMed Google Scholar
Sonja Gievska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martina Toshevska .

Editor information

Editors and Affiliations

School of Industrial Engineering, Purdue University, West Lafayette, IN, USA
Vincent G. Duffy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krstev, I., Pavikjevikj, M., Toshevska, M., Gievska, S. (2022). Multimodal Data Fusion for Automatic Detection of Alzheimer’s Disease. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Health, Operations Management, and Design. HCII 2022. Lecture Notes in Computer Science, vol 13320. Springer, Cham. https://doi.org/10.1007/978-3-031-06018-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-06018-2_6
Published: 16 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06017-5
Online ISBN: 978-3-031-06018-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics