skip to main content
10.1145/3133944.3133950acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Hybrid Depression Classification and Estimation from Audio Video and Text Information

Published: 23 October 2017 Publication History

Abstract

In this paper, we design a hybrid depression classification and depression estimation framework from audio, video and text descriptors. It contains three main components: 1) Deep Convolutional Neural Network (DCNN) and Deep Neural Network (DNN) based audio visual multi-modal depression recognition frameworks, trained with depressed and not-depressed participants, respectively; 2) Paragraph Vector (PV), Support Vector Machine (SVM) and Random Forest based depression classification framework from the interview transcripts; 3) A multivariate regression model fusing the audio visual PHQ-8 estimations from the depressed and not-depressed DCNN-DNN models, and the depression classification result from the text information. In the DCNN-DNN based depression estimation framework, audio/video feature descriptors are first input into a DCNN to learn high-level features, which are then fed to a DNN to predict the PHQ-8 score. Initial predictions from the two modalities are fused via a DNN model. In the PV-SVM and Random Forest based depression classification framework, we explore semantic-related text features using PV, as well as global text-features. Experiments have been carried out on the Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) dataset for the Depression Sub-challenge at the 2017 Audio-Visual Emotion Challenge (AVEC), results show that the proposed depression recognition framework obtains very promising results, with root mean square error (RMSE) as 3.088, mean absolute error (MAE) as 2.477 on the development set, and RMSE as 5.400, MAE as 4.359 on the test set, which are all lower than the baseline results.

References

[1]
Jeffrey F. Cohn, Tomas Simon Kruez, Iain Matthews, Ying Yang, Minh Hoai Nguyen, Margara Tejera Padilla, Feng Zhou, and Fernando De la Torre. 2009. Detecting depression from facial actions and vocal prosody. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on. IEEE, 1--7.
[2]
Nicholas Cummins, Julien Epps, Michael Breakspear, and Roland Goecke. 2011. An investigation of depressed speech detection: Features and normalization. In Twelfth Annual Conference of the International Speech Communication Association.
[3]
Paul Ekman and Erika L. Rosenberg. 1997. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA.
[4]
Ringeval Fabien, Bjorn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Mozgai Sharon, Cummins Nicholas, Schmitt Maximilian, and Maja Pantic. 2017. AVEC 2017 - Real-life Depression, and Affect Recognition Workshop and Challenge. In Proceedings of the 7th International Workshop on Audio/Visual Emotion Challenge.
[5]
Lynne Friedli, World Health Organization, et al. 2009. Mental health, resilience and inequalities. (2009).
[6]
Jonathan Gratch, Ron Artstein, Gale M Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, et al. 2014. The Distress Analysis Interview Corpus of human and computer interviews. In LREC. 3123--3128.
[7]
Rahul Gupta, Nikos Malandrakis, Bo Xiao, Tanaya Guha, Maarten Van Segbroeck, Matthew Black, Alexandros Potamianos, and Shrikanth S. Narayanan. 2014. Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions. In AVEC@MM.
[8]
Varun Jain, James L. Crowley, Anind K. Dey, and Augustin Lux. 2014. Depression estimation using audiovisual features and fisher vector encoding. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 87--91.
[9]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[10]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1188--1196.
[11]
Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, and Yunhong Wang. 2016. DepAudioNet: An Efficient Deep Model for Audio based Depression Classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 35--42.
[12]
Hongying Meng, Di Huang, Heng Wang, Hongyu Yang, Mohammed AI-Shuraifi, and Yunhong Wang. 2013. Depression recognition based on dynamic facial and vocal expression features using partial least square regression. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge. ACM, 21--30.
[13]
Anastasia Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, Matthew Pediaditis, Dimitris Manousos, Alexandros Roniotis, Georgios Giannakakis, Fabrice Meriaudeau, Panagiotis Simos, Kostas Marias, et al. 2016. Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 27--34.
[14]
Björn W. Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, and Yue Zhang. 2014. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In INTERSPEECH. 427--431.
[15]
Mohammed Senoussaoui, Milton Orlando Sarria Paja, João Felipe Santos, and Tiago H. Falk. 2014. Model Fusion for Multimodal Depression Classification and Level Detection. In AVEC@MM.
[16]
Giota Stratou, Stefan Scherer, Jonathan Gratch, and Louis-Philippe Morency. 2015. Automatic nonverbal behavior indicators of depression and ptsd: the effect of gender. Journal on Multimodal User Interfaces 9, 1 (2015), 17--29.
[17]
Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Dennis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 3--10.
[18]
Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. 2014. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 3--10.
[19]
James R. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie K. Dagli, and Thomas F. Quatieri. 2016. Detecting Depression using Vocal, Facial and Semantic Communication Cues. In AVEC@ACM Multimedia.
[20]
James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Gregory Ciccarelli, and Daryush D. Mehta. 2014. Vocal and facial biomarkers of depression based on motor incoordination and timing. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 65--72.
[21]
Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision Tree Based Depression Classification from Audio Video and Language Information. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 89--96.

Cited By

View all
  • (2024)Capítulo 5: Análisis de Sentimientos sobre choques culturales en contextos deportivosGestión del conocimiento. Perspectiva multidisciplinaria (libro 65)10.59899/Ges-cono-65-C5(86-101)Online publication date: 31-May-2024
  • (2024)Capítulo 4: Aplicación de Redes Neuronales para clasificación de texto sobre entrevistas médicas del corpus DAIC-WoZGestión del conocimiento. Perspectiva multidisciplinaria (libro 65)10.59899/Ges-cono-65-C4(65-85)Online publication date: 31-May-2024
  • (2024)Resting-State Electroencephalogram Depression Diagnosis Based on Traditional Machine Learning and Deep Learning: A Comparative AnalysisSensors10.3390/s2421681524:21(6815)Online publication date: 23-Oct-2024
  • Show More Cited By

Index Terms

  1. Hybrid Depression Classification and Estimation from Audio Video and Text Information

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge
    October 2017
    78 pages
    ISBN:9781450355025
    DOI:10.1145/3133944
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dcnn-dnn
    2. depression classification
    3. depression recognition
    4. multi-modal
    5. pv-svm

    Qualifiers

    • Research-article

    Funding Sources

    • the Shaanxi Provincial International Science and Technology Collaboration Project
    • National Natural Science Foundation of China
    • VUB Interdisciplinary Research Program

    Conference

    MM '17
    Sponsor:
    MM '17: ACM Multimedia Conference
    October 23, 2017
    California, Mountain View, USA

    Acceptance Rates

    AVEC '17 Paper Acceptance Rate 8 of 17 submissions, 47%;
    Overall Acceptance Rate 52 of 98 submissions, 53%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)121
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Capítulo 5: Análisis de Sentimientos sobre choques culturales en contextos deportivosGestión del conocimiento. Perspectiva multidisciplinaria (libro 65)10.59899/Ges-cono-65-C5(86-101)Online publication date: 31-May-2024
    • (2024)Capítulo 4: Aplicación de Redes Neuronales para clasificación de texto sobre entrevistas médicas del corpus DAIC-WoZGestión del conocimiento. Perspectiva multidisciplinaria (libro 65)10.59899/Ges-cono-65-C4(65-85)Online publication date: 31-May-2024
    • (2024)Resting-State Electroencephalogram Depression Diagnosis Based on Traditional Machine Learning and Deep Learning: A Comparative AnalysisSensors10.3390/s2421681524:21(6815)Online publication date: 23-Oct-2024
    • (2024)Disentangled-Multimodal Privileged Knowledge Distillation for Depression Recognition with Incomplete Multimodal DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681227(5712-5721)Online publication date: 28-Oct-2024
    • (2024)Rethinking Inconsistent Context and Imbalanced Regression in Depression Severity PredictionIEEE Transactions on Affective Computing10.1109/TAFFC.2024.340558415:4(2154-2168)Online publication date: 27-May-2024
    • (2024)Integrating Deep Facial Priors Into Landmarks for Privacy Preserving Multimodal Depression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329631815:3(828-836)Online publication date: Jul-2024
    • (2024)A Comprehensive Analysis of Speech Depression Recognition SystemsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500078(1509-1518)Online publication date: 15-Mar-2024
    • (2024)Multimodal depression detection using deep learning in the workplace2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT60202.2024.10468966(1-8)Online publication date: 11-Jan-2024
    • (2024)Speech Depression Recognition from the Selfreference Effect Using LSTM with ResNet2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC63619.2025.10849284(1-5)Online publication date: 3-Dec-2024
    • (2024)Additive Cross-Modal Attention Network (ACMA) for Depression Detection Based on Audio and Textual FeaturesIEEE Access10.1109/ACCESS.2024.336223312(20479-20489)Online publication date: 2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media