Abstract
The viability of multimodal fusion of linguistic and acoustic biomarkers in speech to help in identifying a person with probable Alzheimer’s dementia symptoms have been explored in this research. For capturing the effect of dementia on person’s language and verbal abilities, a novel way of disease detection was explored based on visual analysis of images of spectrogram extracted from patient’s interview recordings. We put forward three fusion methods, which allow the major advancements in representation learning to be utilized. The objective of the empirical study and ensuing discussion presented in this paper was threefold: 1) to examine the potential of state-of-the-art transformer-based architectures and transfer learning to assist the disease diagnosis, 2) to map the problem of acoustic analysis into the realm of image processing, by transforming spectrograms into images and employing pretrained deep neural networks, such as ResNet to extract visual patterns, and 3) to investigate the sound interplay of multi-modal biomarkers of Alzheimer’s dementia when fusing the learned representations in different modalities. We present the results of independent evaluations of the unimodal methods against which the fusion methods have been compared to.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://dementia.talkbank.org/, last visited: 10.02.2022.
References
Akbik, A., Bergmann, T., Vollgraf, R.: Pooled contextualized embeddings for named entity recognition. In: NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 724–728 (2019)
Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: COLING 2018, 27th International Conference on Computational Linguistics, pp. 1638–1649 (2018)
Alyahya, R.S., Halai, A.D., Conroy, P., Ralph, M.A.L.: Mapping psycholinguistic features to the neuropsychological and lesion profiles in aphasia. Cortex 124, 260–273 (2020)
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020)
Becker, J.T., Boiler, F., Lopez, O.L., Saxton, J., McGonigle, K.L.: The natural history of Alzheimer’s disease: description of study cohort and accuracy of diagnosis. Arch. Neurol. 51(6), 585–594 (1994)
Bucks, R.S., Singh, S., Cuerden, J.M., Wilcock, G.K.: Analysis of spontaneous, conversational speech in dementia of Alzheimer type: evaluation of an objective technique for analysing lexical performance. Aphasiology 14(1), 71–91 (2000)
Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. CoRR abs/2003.10555 (2020). https://arxiv.org/abs/2003.10555
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Eyben, F., Wöllmer, M., Schuller, B.: OpenSmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462 (2010)
Gauder, L., Pepino, L., Ferrer, L., Riera, P.: Alzheimer disease recognition using speech-based embeddings from pre-trained models. In: Proceedings of Interspeech 2021, pp. 3795–3799 (2021). https://doi.org/10.21437/Interspeech.2021-753
Gievska, S., Koroveshovski, K.: The impact of affective verbal content on predicting personality impressions in YouTube videos. In: Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition, pp. 19–22 (2014)
Goedert, M., Spillantini, M.G.: A century of Alzheimer’s disease. Science 314(5800), 777–781 (2006)
Goodglass, H., Kaplan, E., Weintraub, S.: BDAE: The Boston Diagnostic Aphasia Examination. Lippincott Williams & Wilkins, Philadelphia (2001)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)
Huang, S.C., Pareek, A., Zamanian, R., Banerjee, I., Lungren, M.P.: Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Sci. Rep. 10(1), 1–9 (2020)
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)1mb model size. CoRR abs/1602.07360 (2016). http://arxiv.org/abs/1602.07360
Joulin, A., Grave, E., Mikolov, P.B.T.: Bag of tricks for efficient text classification (2016)
Khachaturian, Z.S.: Diagnosis of Alzheimer’s disease. Arch. Neurol. 42(11), 1097–1105 (1985)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777 (2017)
Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Alzheimer’s dementia recognition through spontaneous speech: the adress challenge. arXiv preprint arXiv:2004.06833 (2020)
Luz, S., Haider, F., de la Fuente, S., Fromm, D., MacWhinney, B.: Detecting cognitive decline using speech only: the ADReSSo challenge. In: Proceedings of Interspeech 2021, pp. 3780–3784 (2021). https://doi.org/10.21437/Interspeech.2021-1220
Martinc, M., Pollak, S.: Tackling the ADReSS challenge: a multimodal approach to the automated recognition of Alzheimer’s dementia. In: INTERSPEECH, pp. 2157–2161 (2020)
McFee, B., et al.: Thassilo: librosa/librosa: 0.8.1rc2, May 2021. https://doi.org/10.5281/zenodo.4792298
Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)
Mucke, L.: Alzheimer’s disease. Nature 461(7266), 895–897 (2009)
Pan, Y., et al.: Using the outputs of different automatic speech recognition paradigms for acoustic-and BERT-based Alzheimer’s dementia detection through spontaneous speech. In: Proceedings of Interspeech, pp. 3810–3814 (2021)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Poria, S., Cambria, E., Howard, N., Huang, G.B., Hussain, A.: Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174, 50–59 (2016)
Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448. IEEE (2016)
Pérez-Toro, P., et al.: Influence of the interviewer on the automatic assessment of Alzheimer’s disease in the context of the ADReSSo challenge. In: Proceedings of Interspeech 2021, pp. 3785–3789 (2021). https://doi.org/10.21437/Interspeech.2021-1589
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November 2019. https://arxiv.org/abs/1908.10084
Sarawgi, U., Zulfikar, W., Soliman, N., Maes, P.: Multimodal inductive transfer learning for detection of Alzheimer’s dementia and its severity. arXiv preprint arXiv:2009.00700 (2020)
Shrestha, A., Serra, E., Spezzano, F.: Multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums. Netw. Model. Anal. Health Inform. Bioinform. 9(1), 1–11 (2020). https://doi.org/10.1007/s13721-020-0226-0
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556
Stark, B.C., et al.: Standardizing assessment of spoken discourse in aphasia: a working group with deliverables. Am. J. Speech Lang. Pathol. 30(1S), 491–502 (2021)
Vina, J., Lloret, A.: Why women have more Alzheimer’s disease than men: gender and mitochondrial toxicity of amyloid-\(\beta \) peptide. J. Alzheimers Dis. 20(s2), S527–S533 (2010)
Wang, N., Cao, Y., Hao, S., Shao, Z., Subbalakshmi, K.: Modular multi-modal attention network for Alzheimer’s disease detection using patient audio and language data. In: Proceedings of Interspeech 2021, pp. 3835–3839 (2021). https://doi.org/10.21437/Interspeech.2021-2024
Wiley, J.: Alzheimer’s disease facts and figures. Alzheimers Dement. 17, 327–406 (2021)
Zhou, G., Wang, J., Zhang, X., Yu, G.: DeepGOA: predicting gene ontology annotations of proteins via graph convolutional network. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1836–1841. IEEE (2019)
Zhu, Y., Obyat, A., Liang, X., Batsis, J.A., Roth, R.M.: WavBERT: exploiting semantic and non-semantic speech using wav2vec and BERT for dementia detection. In: Proceedings of Interspeech 2021, pp. 3790–3794 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Krstev, I., Pavikjevikj, M., Toshevska, M., Gievska, S. (2022). Multimodal Data Fusion for Automatic Detection of Alzheimer’s Disease. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Health, Operations Management, and Design. HCII 2022. Lecture Notes in Computer Science, vol 13320. Springer, Cham. https://doi.org/10.1007/978-3-031-06018-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-06018-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06017-5
Online ISBN: 978-3-031-06018-2
eBook Packages: Computer ScienceComputer Science (R0)