Abstract:
In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) m...Show MoreMetadata
Abstract:
In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. For each modality, corresponding feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, which are then fed into a DNN to predict the PHQ-8 score. For multi-modal depression recognition, the predicted PHQ-8 scores from each modality are integrated in a DNN for the final prediction. In addition, we propose the Histogram of Displacement Range as a novel global visual descriptor to quantify the range and speed of the facial landmarks' displacements. Experiments have been carried out on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset for the Depression Sub-challenge of the Audio-Visual Emotion Challenge (AVEC 2016), results show that the proposed multi-modal depression recognition framework obtains very promising results on both the development set and test set, which outperforms the state-of-the-art results.
Published in: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII)
Date of Conference: 23-26 October 2017
Date Added to IEEE Xplore: 01 February 2018
ISBN Information:
Electronic ISSN: 2156-8111