Skip to main content

An Audio Correlation-Based Graph Neural Network for Depression Recognition

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

  • 312 Accesses

Abstract

Depression is a prevalent mental health disorder. The diagnosis of depression hinges largely on the medical practitioner’s subjective assessment of the patient’s diagnostic process. The involvement of multiple subjective factors during this process can further complicate the diagnosis. In this paper, we propose a novel approach for depression recognition using the graph neural network that incorporates potential connections within and between audio signals. Specifically, we first extract time series information between frame-level audio signal features through GRU. We then construct two graph neural network modules to explore the potential connections of inter-audio and inter-audio. In the first graph module, we construct a graph using the frame-level features of each audio as nodes and embed the output graph into a feature vector representation. In the second graph module, we represent the graph embedding feature vector as a node and encode the potential relationships between audio signals through node neighbourhood information propagation. Additionally, we extract emotional features related to depression using a pre-trained emotion recognition network and enhance the connection between coded audio signals through a self-attention mechanism to further improve the model’s performance. We conducted extensive experiments on three depression datasets, and our proposed model outperformed all benchmark models, demonstrating its effectiveness.

This work has been supported by the Natural Science Foundation of Ningbo (No. 2023J114).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)

    Article  Google Scholar 

  2. Cai, H., et al.: MODMA dataset: a multi-modal open dataset for mental-disorder analysis. arXiv preprint arXiv:2002.09283 (2020)

  3. Chen, H., Jiang, D., Sahli, H.: Transformer encoder with multi-modal multi-head attention for continuous affect recognition. IEEE Trans. Multimedia 23, 4171–4183 (2020)

    Article  Google Scholar 

  4. Chen, T., Hong, R., Guo, Y., Hao, S., Hu, B.: MS\(^{2} \)-GNN: exploring GNN-based multimodal fusion network for depression detection. IEEE Trans. Cybern. (2022)

    Google Scholar 

  5. Chen, X., Pan, Z.: A convenient and low-cost model of depression screening and early warning based on voice data using for public mental health. Int. J. Environ. Res. Public Health 18(12), 6441 (2021)

    Article  Google Scholar 

  6. Du, M., et al.: Depression recognition using a proposed speech chain model fusing speech production and perception features. J. Affect. Disord. 323, 299–308 (2023)

    Article  Google Scholar 

  7. Ghadiri, N., Samani, R., Shahrokh, F.: Integration of text and graph-based features for detecting mental health disorders from voice. arXiv preprint arXiv:2205.07006 (2022)

  8. Gong, Y., Poellabauer, C.: Topic modeling based multi-modal depression detection. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 69–76 (2017)

    Google Scholar 

  9. Gratch, J., et al.: The distress analysis interview corpus of human and computer interviews. Technical report, University of Southern California Los Angeles (2014)

    Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Huang, Z., Epps, J., Joachim, D.: Exploiting vocal tract coordination using dilated CNNs for depression detection in naturalistic environments. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6549–6553. IEEE (2020)

    Google Scholar 

  12. Kessler, R.C., et al.: The epidemiology of major depressive disorder: results from the national comorbidity survey replication (NCS-R). JAMA 289(23), 3095–3105 (2003)

    Article  Google Scholar 

  13. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  14. Ma, X., Yang, H., Chen, Q., Huang, D., Wang, Y.: Depaudionet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 35–42 (2016)

    Google Scholar 

  15. Niu, M., Chen, K., Chen, Q., Yang, L.: HCAG: a hierarchical context-aware graph attention model for depression detection. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4235–4239. IEEE (2021)

    Google Scholar 

  16. Niu, M., Liu, B., Tao, J., Li, Q.: A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 450, 208–218 (2021)

    Article  Google Scholar 

  17. Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., Othmani, A.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed. Signal Process. Control 71, 103107 (2022)

    Article  Google Scholar 

  18. Seneviratne, N., Espy-Wilson, C.: Speech based depression severity level classification using a multi-stage dilated CNN-LSTM model. arXiv preprint arXiv:2104.04195 (2021)

  19. Shirian, A., Guha, T.: Compact graph architecture for speech emotion recognition. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6284–6288. IEEE (2021)

    Google Scholar 

  20. Valstar, M., et al.: AVEC 2016: depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 3–10 (2016)

    Google Scholar 

  21. Yoon, J., Kang, C., Kim, S., Han, J.: D-vlog: multimodal vlog dataset for depression detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 12226–12234 (2022)

    Google Scholar 

  22. Zhang, P., Wu, M., Dinkel, H., Yu, K.: DEPA: self-supervised audio embedding for depression detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 135–143 (2021)

    Google Scholar 

  23. Zhou, L., Liu, Z., Shangguan, Z., Yuan, X., Li, Y., Hu, B.: TAMFN: time-aware attention multimodal fusion network for depression detection. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 669–679 (2022)

    Article  Google Scholar 

  24. Zhou, L., Liu, Z., Yuan, X., Shangguan, Z., Li, Y., Hu, B.: CAIINET: neural network based on contextual attention and information interaction mechanism for depression detection. Digital Signal Process. 137, 103986 (2023)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihong Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, C., Dong, Y. (2024). An Audio Correlation-Based Graph Neural Network for Depression Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8543-2_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8542-5

  • Online ISBN: 978-981-99-8543-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics