Abstract:
Computational methods for speech-based detection of depression are still relatively new, and have focused on either a standard set of features or on specific additional a...View moreMetadata
Abstract:
Computational methods for speech-based detection of depression are still relatively new, and have focused on either a standard set of features or on specific additional approaches. We systematically study the effects of feature type, machine learning approach, and speaking style (read versus spontaneous) on depression prediction in the AVEC-2014 evaluation corpus, using features related to speech production, perception, acoustic phonetics, and prosody. Using a multilayer ANN we find that one feature type, MMEDuSA [2], results in a 25% relative error reduction over the AVEC-2014 baseline system [1] for both mean absolute error (MAE) and root mean squared error (RMSE). Other individual feature types perform comparably to the baseline, but have much lower dimensionality and simpler to interpret. Further improvements were achieved from fusing diverse features and systems. Finally, results suggest that the relative contribution of different feature types depends on whether the speech is spontaneous or read. Overall, spontaneous speech led to lower error rates than read speech, an important consideration for the collection of future clinical data.
Published in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 19-24 April 2015
Date Added to IEEE Xplore: 06 August 2015
Electronic ISBN:978-1-4673-6997-8