Abstract
Due to the existence of false positive rate of the traditional depression diagnosis method, this paper proposes a multi-modal fusion algorithm based on speech signal and facial image sequence for depression diagnosis. Introduced spectrum subtraction to enhance depressed speech signal, and use cepstrum method to extract pitch frequency features with large variation rate and formant features with significant difference, the short time energy and Mel-frequency cepstral coefficients characteristic parameters for different emotion speeches are analyzed in both time domain and frequency domain, and establish a model for training and identification. Meanwhile, this paper implements the orthogonal match pursuit algorithm to obtain a sparse linear combination of face test samples, and cascade with voice and facial emotions based proportion. The experimental results show that the recognition rate based on the depression detection algorithm of fusion speech and facial emotions has reached 81.14%. Compared to the existing doctor’s accuracy rate of 47.3%, the accuracy can bring extra 71.54% improvement by combining with the proposed method of this paper. Additionally, it can easily apply to the hardware and software on the existing hospital instruments with low cost. Therefore, it is an accurate and effective method for diagnosing depression.









Similar content being viewed by others
References
Ionescu, D.F., et al.: Defining anxious depression: a review of the literature. CNS Spectr. 18(5), 252–260 (2013)
Erschens, R., et al.: Methodological aspects of international research on the burden of anxiety and depression in medical students. Ment. Health Prev. 4(1), 31–35 (2016)
Melton, T.H., et al.: Comorbid anxiety and depressive symptoms in children and adolescents: a systematic review and analysis. J. Psychiatr. Pract. 22(2), 84 (2016)
Potapova, R., Grigorieva, M.: Crosslinguistic intelligibility of Russian and German speech in noisy environment. J. Electr. Comput. Eng. 2017, 1–9 (2017)
Vrbova, K., et al.: Quality of life, self-stigma, and hope in schizophrenia spectrum disorders: a cross-sectional study. Neuropsychiatr. Dis. Treat. 13, 567 (2017)
Hernández-Mena, C.D., Meza-Ruiz, I.V., Herrera-Camacho, J.A.: Automatic speech recognizers for Mexican Spanish and its open resources. J. Appl. Res. Technol. 15(3) (2017)
Huang, Y.B., et al.: Hash authentication algorithm of compressed domain speech perception based on MFCC and NMF. Appl. Mech. Mater. 719–720, 1166–1170 (2015)
Yang, A.Y., et al.: Distributed sensor perception via sparse representation. Proc. IEEE 98(6), 1077–1088 (2010)
Maas, A.L., et al.: Building DNN acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 41(C), 195–213 (2017)
Ozdas, A., et al.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans. Biomed. Eng. 51(9), 1530–1540 (2004)
Vihari, S., et al.: Comparison of speech enhancement algorithms. Procedia Comput. Sci. 89, 666–676 (2016)
Liu, Y.H., Zhou, D.M., Jiang, Z.J.: Improved spectral subtraction speech enhancement algorithm. Adv. Mater. Res. 760–762, 536–541 (2013)
Tohidypour, H.R., Ahadi, S.M.: New features for speech enhancement using bivariate shrinkage based on redundant wavelet filter-banks. Comput. Speech Lang. 35(C), 93–115 (2016)
You, C.H., Bin, M.A.: Spectral-domain speech enhancement for speech recognition. Speech Commun. 94, 30–41 (2017)
Sahu, S., Espywilson, C.: Effects of depression on speech. J. Acoust. Soc. Am. 136(4), 2312–2312 (2014)
ChinnaRao, M., Murthy, A.V.S.N., Satyanarayana, Ch.: Emotion recognition system based on skew gaussian mixture model and MFCC coefficients. Int. J. Inf. Eng. Electron. Bus. (IJIEEB) 4, 51–57 (2015)
Yang, Y., Fairbairn, C., Cohn, J.F.: Detecting depression severity from vocal prosody. IEEE Trans. Affect. Comput. 4(2), 142–150 (2013)
Schuller, B., et al.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)
Laukkanen, A.M., Björkner, E., Sundberg, J.: Throaty voice quality: subglottal pressure, voice source, and formant characteristics. J. Voice 20(1), 25–37 (2006)
Hou, L.M., Xiao-Ning, H.U., Xie, J.M.: Application of formant instantaneous characteristics to speech recognition and speaker identification. J. Shanghai Univ. (English Edition) 15(2), 123–127 (2011)
Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)
He, L., Guo, L.H., Li, H.Z.: Emotion speech recognition under sadness conditions. Adv. Mater. Res. 488–489, 1329–1334 (2012)
Jian, Z., et al.: A fast iterative pursuit algorithm in robust face recognition based on sparse representation. Math. Probl. Eng. 2, 1–11 (2014)
Yin, A.H., Jiang, H.M., Zhang, Q.M.: Application of improved OMP algorithm in face recognition. Comput. Eng. 38(12), 275–278 (2012)
Acknowledgements
This work was supported by National Natural Science Foundation of China (No. 61379010).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, J., Su, W., Jia, J. et al. Research on depression detection algorithm combine acoustic rhythm with sparse face recognition. Cluster Comput 22 (Suppl 4), 7873–7884 (2019). https://doi.org/10.1007/s10586-017-1469-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1469-0