Abstract
Depression is a prevalent mental health disorder. The diagnosis of depression hinges largely on the medical practitioner’s subjective assessment of the patient’s diagnostic process. The involvement of multiple subjective factors during this process can further complicate the diagnosis. In this paper, we propose a novel approach for depression recognition using the graph neural network that incorporates potential connections within and between audio signals. Specifically, we first extract time series information between frame-level audio signal features through GRU. We then construct two graph neural network modules to explore the potential connections of inter-audio and inter-audio. In the first graph module, we construct a graph using the frame-level features of each audio as nodes and embed the output graph into a feature vector representation. In the second graph module, we represent the graph embedding feature vector as a node and encode the potential relationships between audio signals through node neighbourhood information propagation. Additionally, we extract emotional features related to depression using a pre-trained emotion recognition network and enhance the connection between coded audio signals through a self-attention mechanism to further improve the model’s performance. We conducted extensive experiments on three depression datasets, and our proposed model outperformed all benchmark models, demonstrating its effectiveness.
This work has been supported by the Natural Science Foundation of Ningbo (No. 2023J114).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Cai, H., et al.: MODMA dataset: a multi-modal open dataset for mental-disorder analysis. arXiv preprint arXiv:2002.09283 (2020)
Chen, H., Jiang, D., Sahli, H.: Transformer encoder with multi-modal multi-head attention for continuous affect recognition. IEEE Trans. Multimedia 23, 4171–4183 (2020)
Chen, T., Hong, R., Guo, Y., Hao, S., Hu, B.: MS\(^{2} \)-GNN: exploring GNN-based multimodal fusion network for depression detection. IEEE Trans. Cybern. (2022)
Chen, X., Pan, Z.: A convenient and low-cost model of depression screening and early warning based on voice data using for public mental health. Int. J. Environ. Res. Public Health 18(12), 6441 (2021)
Du, M., et al.: Depression recognition using a proposed speech chain model fusing speech production and perception features. J. Affect. Disord. 323, 299–308 (2023)
Ghadiri, N., Samani, R., Shahrokh, F.: Integration of text and graph-based features for detecting mental health disorders from voice. arXiv preprint arXiv:2205.07006 (2022)
Gong, Y., Poellabauer, C.: Topic modeling based multi-modal depression detection. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 69–76 (2017)
Gratch, J., et al.: The distress analysis interview corpus of human and computer interviews. Technical report, University of Southern California Los Angeles (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Z., Epps, J., Joachim, D.: Exploiting vocal tract coordination using dilated CNNs for depression detection in naturalistic environments. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6549–6553. IEEE (2020)
Kessler, R.C., et al.: The epidemiology of major depressive disorder: results from the national comorbidity survey replication (NCS-R). JAMA 289(23), 3095–3105 (2003)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Ma, X., Yang, H., Chen, Q., Huang, D., Wang, Y.: Depaudionet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 35–42 (2016)
Niu, M., Chen, K., Chen, Q., Yang, L.: HCAG: a hierarchical context-aware graph attention model for depression detection. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4235–4239. IEEE (2021)
Niu, M., Liu, B., Tao, J., Li, Q.: A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 450, 208–218 (2021)
Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., Othmani, A.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed. Signal Process. Control 71, 103107 (2022)
Seneviratne, N., Espy-Wilson, C.: Speech based depression severity level classification using a multi-stage dilated CNN-LSTM model. arXiv preprint arXiv:2104.04195 (2021)
Shirian, A., Guha, T.: Compact graph architecture for speech emotion recognition. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6284–6288. IEEE (2021)
Valstar, M., et al.: AVEC 2016: depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 3–10 (2016)
Yoon, J., Kang, C., Kim, S., Han, J.: D-vlog: multimodal vlog dataset for depression detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 12226–12234 (2022)
Zhang, P., Wu, M., Dinkel, H., Yu, K.: DEPA: self-supervised audio embedding for depression detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 135–143 (2021)
Zhou, L., Liu, Z., Shangguan, Z., Yuan, X., Li, Y., Hu, B.: TAMFN: time-aware attention multimodal fusion network for depression detection. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 669–679 (2022)
Zhou, L., Liu, Z., Yuan, X., Shangguan, Z., Li, Y., Hu, B.: CAIINET: neural network based on contextual attention and information interaction mechanism for depression detection. Digital Signal Process. 137, 103986 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, C., Dong, Y. (2024). An Audio Correlation-Based Graph Neural Network for Depression Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_32
Download citation
DOI: https://doi.org/10.1007/978-981-99-8543-2_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)