Abstract:
Depression is a major mental health issue in contemporary society, with an estimated 350 million people affected globally. The number of individuals diagnosed with depres...Show MoreMetadata
Abstract:
Depression is a major mental health issue in contemporary society, with an estimated 350 million people affected globally. The number of individuals diagnosed with depression continues to rise each year. Currently, clinical practice relies entirely on self-reporting and clinical assessment, which carries the risk of subjective biases. In this article, we propose a multimodal method based on facial expression and pupil to detect depression more objectively and precisely. Our method first extracts the features of facial expressions and pupil diameter using residual networks and 1-D convolutional neural networks. Second, a cross-modal fusion model based on self-attention networks (CMF-SNs) is proposed, which utilizes cross-modal attention networks within modalities and parallel self-attention networks between different modalities to extract CMF features of facial expressions and pupil diameter, effectively complementing information between different modalities. Finally, the obtained features are fully connected to identify depression. Multiple controlled experiments show that compared to single modality, the multimodal fusion method based on self-attention networks has a higher ability to recognize depression, with the highest accuracy of 75.0%. In addition, we conducted comparative experiments under three different stimulation paradigms, and the results showed that the classification accuracy under negative and neutral stimuli was higher than that under positive stimuli, indicating a bias of depressed patients toward negative images. The experimental results demonstrate the superiority of our multimodal fusion method.
Published in: IEEE Transactions on Computational Social Systems ( Volume: 12, Issue: 1, February 2025)