An Audio Correlation-Based Graph Neural Network for Depression Recognition

Sun, Chenjian; Dong, Yihong

doi:10.1007/978-981-99-8543-2_32

Chenjian Sun^15,16 &
Yihong Dong^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

312 Accesses

Abstract

Depression is a prevalent mental health disorder. The diagnosis of depression hinges largely on the medical practitioner’s subjective assessment of the patient’s diagnostic process. The involvement of multiple subjective factors during this process can further complicate the diagnosis. In this paper, we propose a novel approach for depression recognition using the graph neural network that incorporates potential connections within and between audio signals. Specifically, we first extract time series information between frame-level audio signal features through GRU. We then construct two graph neural network modules to explore the potential connections of inter-audio and inter-audio. In the first graph module, we construct a graph using the frame-level features of each audio as nodes and embed the output graph into a feature vector representation. In the second graph module, we represent the graph embedding feature vector as a node and encode the potential relationships between audio signals through node neighbourhood information propagation. Additionally, we extract emotional features related to depression using a pre-trained emotion recognition network and enhance the connection between coded audio signals through a self-attention mechanism to further improve the model’s performance. We conducted extensive experiments on three depression datasets, and our proposed model outperformed all benchmark models, demonstrating its effectiveness.

This work has been supported by the Natural Science Foundation of Ningbo (No. 2023J114).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Article Google Scholar
Cai, H., et al.: MODMA dataset: a multi-modal open dataset for mental-disorder analysis. arXiv preprint arXiv:2002.09283 (2020)
Chen, H., Jiang, D., Sahli, H.: Transformer encoder with multi-modal multi-head attention for continuous affect recognition. IEEE Trans. Multimedia 23, 4171–4183 (2020)
Article Google Scholar
Chen, T., Hong, R., Guo, Y., Hao, S., Hu, B.: MS\(^{2} \)-GNN: exploring GNN-based multimodal fusion network for depression detection. IEEE Trans. Cybern. (2022)
Google Scholar
Chen, X., Pan, Z.: A convenient and low-cost model of depression screening and early warning based on voice data using for public mental health. Int. J. Environ. Res. Public Health 18(12), 6441 (2021)
Article Google Scholar
Du, M., et al.: Depression recognition using a proposed speech chain model fusing speech production and perception features. J. Affect. Disord. 323, 299–308 (2023)
Article Google Scholar
Ghadiri, N., Samani, R., Shahrokh, F.: Integration of text and graph-based features for detecting mental health disorders from voice. arXiv preprint arXiv:2205.07006 (2022)
Gong, Y., Poellabauer, C.: Topic modeling based multi-modal depression detection. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 69–76 (2017)
Google Scholar
Gratch, J., et al.: The distress analysis interview corpus of human and computer interviews. Technical report, University of Southern California Los Angeles (2014)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Z., Epps, J., Joachim, D.: Exploiting vocal tract coordination using dilated CNNs for depression detection in naturalistic environments. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6549–6553. IEEE (2020)
Google Scholar
Kessler, R.C., et al.: The epidemiology of major depressive disorder: results from the national comorbidity survey replication (NCS-R). JAMA 289(23), 3095–3105 (2003)
Article Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Ma, X., Yang, H., Chen, Q., Huang, D., Wang, Y.: Depaudionet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 35–42 (2016)
Google Scholar
Niu, M., Chen, K., Chen, Q., Yang, L.: HCAG: a hierarchical context-aware graph attention model for depression detection. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4235–4239. IEEE (2021)
Google Scholar
Niu, M., Liu, B., Tao, J., Li, Q.: A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 450, 208–218 (2021)
Article Google Scholar
Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., Othmani, A.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed. Signal Process. Control 71, 103107 (2022)
Article Google Scholar
Seneviratne, N., Espy-Wilson, C.: Speech based depression severity level classification using a multi-stage dilated CNN-LSTM model. arXiv preprint arXiv:2104.04195 (2021)
Shirian, A., Guha, T.: Compact graph architecture for speech emotion recognition. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6284–6288. IEEE (2021)
Google Scholar
Valstar, M., et al.: AVEC 2016: depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 3–10 (2016)
Google Scholar
Yoon, J., Kang, C., Kim, S., Han, J.: D-vlog: multimodal vlog dataset for depression detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 12226–12234 (2022)
Google Scholar
Zhang, P., Wu, M., Dinkel, H., Yu, K.: DEPA: self-supervised audio embedding for depression detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 135–143 (2021)
Google Scholar
Zhou, L., Liu, Z., Shangguan, Z., Yuan, X., Li, Y., Hu, B.: TAMFN: time-aware attention multimodal fusion network for depression detection. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 669–679 (2022)
Article Google Scholar
Zhou, L., Liu, Z., Yuan, X., Shangguan, Z., Li, Y., Hu, B.: CAIINET: neural network based on contextual attention and information interaction mechanism for depression detection. Digital Signal Process. 137, 103986 (2023)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, 315211, China
Chenjian Sun & Yihong Dong
Zhejiang Key Laboratory of Mobile Network Application Technology, Ningbo University, Ningbo, 315211, China
Chenjian Sun & Yihong Dong

Authors

Chenjian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yihong Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yihong Dong .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, C., Dong, Y. (2024). An Audio Correlation-Based Graph Neural Network for Depression Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-8543-2_32
Published: 29 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Audio Correlation-Based Graph Neural Network for Depression Recognition