skip to main content
10.1145/3347320.3357696acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Multi-Modal Hierarchical Recurrent Neural Network for Depression Detection

Published: 15 October 2019 Publication History

Abstract

We propose a multi-modal method with a hierarchical recurrent neural structure to integrate vision, audio and text features for depression detection. Such a method contains two hierarchies of bidirectional long short term memories to fuse multi-modal features and predict the severity of depression. An adaptive sample weighting mechanism is introduced to adapt to the diversity of training samples. Experiments on the testing set of a depression detection challenge demonstrate the effectiveness of the proposed method.

References

[1]
Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR, Vol. abs/1607.06450 (2016).
[2]
T. Baltrušaitis, P. Robinson, and L. Morency. 2016. OpenFace: An open source facial behavior analysis toolkit. In IEEE Winter Conference on Applications of Computer Vision (WACV). 1--10.
[3]
L. Chao, J. Tao, M. Yang, and Y. Li. 2015. Multi task sequence learning for depression scale prediction from video. In International Conference on Affective Computing and Intelligent Interaction (ACII) . 526--531.
[4]
J. F. Cohn, T. S. Kruez, I. Matthews, Y. Yang, M. H. Nguyen, M. T. Padilla, F. Zhou, and F. De la Torre. 2009. Detecting depression from facial actions and vocal prosody. In 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops . 1--7.
[5]
Harris Drucker. 1997. Improving regressors using boosting techniques. In ICML, Vol. 97. 107--115.
[6]
Fabien Ringeval and Björn Schuller and Michel Valstar and Nicholas Cummins and Roddy Cowie and Leili Tavabi and Maximilian Schmitt and Sina Alisamir and Shahin Amiriparian and Eva-Maria Messner and Siyang Song and Shuo Lui and Ziping Zhao and Adria Mallol-Ragolta and Zhao Ren, and Mohammad Soleymani, and Maja Pantic. 2019. AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition. In Proceedings of the 9th International Workshop on Audio/Visual Emotion Challenge, AVEC'19, co-located with the 27th ACM International Conference on Multimedia, MM 2019, Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, Roddy Cowie, and Maja Pantic (Eds.). ACM, Nice, France.
[7]
Yoav Freund and Robert E. Schapire. 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., Vol. 55, 1 (1997), 119--139.
[8]
Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, and Lawrence Carin. 2017. Learning Generic Sentence Representations Using Convolutional Neural Networks. In EMNLP . 2390--2400.
[9]
Yuan Gong and Christian Poellabauer. 2017. Topic Modeling Based Multi-modal Depression Detection. (2017), 69--76.
[10]
Alex Graves and Jü rgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LS™ and other neural network architectures. Neural Networks, Vol. 18, 5--6 (2005), 602--610.
[11]
Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning Distributed Representations of Sentences from Unlabelled Data. In NAACL-HLT . 1367--1377.
[12]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780.
[13]
Chao-Chun Hsu, Sheng-Yeh Chen, Chuan-Chun Kuo, Ting-Hao K. Huang, and Lun-Wei Ku. 2018. EmotionLines: An Emotion Corpus of Multi-Party Conversations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) .
[14]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In CVPR. 2261--2269.
[15]
Markus K"achele, Michael Glodek, Dimitrij Zharkov, Sascha Meudt, and Friedhelm Schwenker. 2014. Fusion of Audio-visual Features Using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression. In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods . 671--678.
[16]
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In NIPS. 3294--3302.
[17]
Adrienne Lehrer. 1974. Semantic fields and lexical structure. (1974).
[18]
Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, and Yunhong Wang. 2016. DepAudioNet: An Efficient Deep Model for Audio based Depression Classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge . 35--42.
[19]
Saif Mohammad. 2018. Word Affect Intensities. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) .
[20]
Saif Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence, Vol. 29, 3 (2013), 436--465.
[21]
E. Moore II, M. A. Clements, J. W. Peifer, and L. Weisser. 2008. Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech. IEEE Transactions on Biomedical Engineering, Vol. 55 (2008), 96--107.
[22]
Bo Pang, Lillian Lee, et almbox. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, Vol. 2, 1--2 (2008), 1--135.
[23]
Changqin Quan and Fuji Ren. 2010. A blog emotion corpus for emotional expression analysis in Chinese. Computer Speech & Language, Vol. 24, 4 (2010), 726--749.
[24]
T.F. Quatieri and N Malyska. 2012. Vocal-source biomarkers for depression: A link to psychomotor activity. INTERSPEECH, Vol. 2 (2012), 1059--1062.
[25]
Fabien Ringeval, Bjö rn W. Schuller, Michel F. Valstar, Jonathan Gratch, Roddy Cowie, and Maja Pantic (Eds.). 2017. Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23 - 27, 2017. ACM .
[26]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .
[27]
S. Song, L. Shen, and M. Valstar. 2018. Human Behaviour-Based Automatic Depression Analysis Using Hand-Crafted Statistics and Deep Learned Spectral Features. In FG. 158--165.
[28]
Peter D Turney and Michael L Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), Vol. 21, 4 (2003), 315--346.
[29]
Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge . 3--10.
[30]
L. Wen, X. Li, G. Guo, and Y. Zhu. 2015. Automated Depression Diagnosis Based on Facial Dynamic Analysis and Sparse Coding. IEEE Transactions on Information Forensics and Security, Vol. 10 (2015), 1432--1441.
[31]
Minghua Zhang, Yunfang Wu, Weikang Li, and Wei Li. 2018. Learning Universal Sentence Representations with Mean-Max Attention Autoencoder. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 . 4514--4523.

Cited By

View all
  • (2025)Two-Stage Temporal Modelling Framework for Video-Based Depression Recognition Using Graph RepresentationIEEE Transactions on Affective Computing10.1109/TAFFC.2024.341577016:1(161-178)Online publication date: Jan-2025
  • (2025)Facial action units guided graph representation learning for multimodal depression detectionNeurocomputing10.1016/j.neucom.2024.129106619(129106)Online publication date: Feb-2025
  • (2024)Systematic Analysis of Speech Transcription Modeling for Reliable Assessment of Depression SeveritySakarya University Journal of Computer and Information Sciences10.35377/saucis...13815227:1(77-91)Online publication date: 30-Apr-2024
  • Show More Cited By

Index Terms

  1. A Multi-Modal Hierarchical Recurrent Neural Network for Depression Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AVEC '19: Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop
    October 2019
    96 pages
    ISBN:9781450369138
    DOI:10.1145/3347320
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. depression detection
    2. multi modal learning
    3. neural networks

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation of China
    • National Key R&D Program of China

    Conference

    MM '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 52 of 98 submissions, 53%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)181
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Two-Stage Temporal Modelling Framework for Video-Based Depression Recognition Using Graph RepresentationIEEE Transactions on Affective Computing10.1109/TAFFC.2024.341577016:1(161-178)Online publication date: Jan-2025
    • (2025)Facial action units guided graph representation learning for multimodal depression detectionNeurocomputing10.1016/j.neucom.2024.129106619(129106)Online publication date: Feb-2025
    • (2024)Systematic Analysis of Speech Transcription Modeling for Reliable Assessment of Depression SeveritySakarya University Journal of Computer and Information Sciences10.35377/saucis...13815227:1(77-91)Online publication date: 30-Apr-2024
    • (2024)Predicting physical and mental health status through interview-based evaluation of work stress: initial attempts to standardize the interviewing methodIndustrial Health10.2486/indhealth.2023-014462:4(237-251)Online publication date: 2024
    • (2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
    • (2024)Disentangled-Multimodal Privileged Knowledge Distillation for Depression Recognition with Incomplete Multimodal DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681227(5712-5721)Online publication date: 28-Oct-2024
    • (2024)HiQuE: Hierarchical Question Embedding Network for Multimodal Depression DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679797(1049-1059)Online publication date: 21-Oct-2024
    • (2024)DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335007136:7(2956-2966)Online publication date: 5-Jan-2024
    • (2024)Multi-Modal and Multi-Task Depression Detection with Sentiment Assistance2024 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE59016.2024.10444213(1-5)Online publication date: 6-Jan-2024
    • (2024)MDAVIF: A Multi-Domain Acoustical-Visual Information Fusion Model for Depression Recognition from Vlog DataICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446491(8115-8119)Online publication date: 14-Apr-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media