research-article

Hybrid Depression Classification and Estimation from Audio Video and Text Information

Authors:

Le Yang,

Hichem Sahli,

Xiaohan Xia,

Ercheng Pei,

Meshia Cédric Oveneke,

Dongmei JiangAuthors Info & Claims

AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

Pages 45 - 51

https://doi.org/10.1145/3133944.3133950

Published: 23 October 2017 Publication History

Get Access

Abstract

In this paper, we design a hybrid depression classification and depression estimation framework from audio, video and text descriptors. It contains three main components: 1) Deep Convolutional Neural Network (DCNN) and Deep Neural Network (DNN) based audio visual multi-modal depression recognition frameworks, trained with depressed and not-depressed participants, respectively; 2) Paragraph Vector (PV), Support Vector Machine (SVM) and Random Forest based depression classification framework from the interview transcripts; 3) A multivariate regression model fusing the audio visual PHQ-8 estimations from the depressed and not-depressed DCNN-DNN models, and the depression classification result from the text information. In the DCNN-DNN based depression estimation framework, audio/video feature descriptors are first input into a DCNN to learn high-level features, which are then fed to a DNN to predict the PHQ-8 score. Initial predictions from the two modalities are fused via a DNN model. In the PV-SVM and Random Forest based depression classification framework, we explore semantic-related text features using PV, as well as global text-features. Experiments have been carried out on the Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) dataset for the Depression Sub-challenge at the 2017 Audio-Visual Emotion Challenge (AVEC), results show that the proposed depression recognition framework obtains very promising results, with root mean square error (RMSE) as 3.088, mean absolute error (MAE) as 2.477 on the development set, and RMSE as 5.400, MAE as 4.359 on the test set, which are all lower than the baseline results.

References

[1]

Jeffrey F. Cohn, Tomas Simon Kruez, Iain Matthews, Ying Yang, Minh Hoai Nguyen, Margara Tejera Padilla, Feng Zhou, and Fernando De la Torre. 2009. Detecting depression from facial actions and vocal prosody. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on. IEEE, 1--7.

Abstract

References

Cited By

Index Terms

Recommendations

Decision Tree Based Depression Classification from Audio Video and Language Information

Multimodal Measurement of Depression Using Deep Learning Models

Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations