Automated depression analysis using convolutional neural networks from speech

https://doi.org/10.1016/j.jbi.2018.05.007Get rights and content
Under an Elsevier user license
open archive

Highlights

  • A framework for automatic diagnosis of depression from speech is proposed.

  • The combination of complementary information between deep-learned features and hand-crafted features can effectively.

  • measure the depression severity.

  • Useful characteristic of depression can be learned by Deep Convolutional Neural Networks (DCNN) from speech.

  • Help to the clinician when designing features related to depression.

Abstract

To help clinicians to efficiently diagnose the severity of a person’s depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods.

Keywords

Depression
Automatic diagnosis
Median Robust extended Local Binary Patterns(MRELBP)
Speech processing

Cited by (0)