Abstract
Presently, while automated depression diagnosis has made great progress, most of the recent works have focused on combining multiple modalities rather than strengthening a single one. In this research work, we present a unimodal framework for depression detection based on facial expressions and facial motion analysis. We investigate a wide set of visual features extracted from different facial regions. Due to high dimensionality of the obtained feature sets, identification of informative and discriminative features is a challenge. This paper suggests a hybrid dimensionality reduction approach which leverages the advantages of the filter and wrapper methods. First, we use a univariate filter method, Fisher Discriminant Ratio, to initially reduce the size of each feature set. Subsequently, we propose an Incremental Linear Discriminant Analysis (ILDA) approach to find an optimal combination of complementary and relevant feature sets. We compare the performance of the proposed ILDA with the batch-mode LDA and also the Composite Kernel based Support Vector Machine (CKSVM) method. The experiments conducted on the Distress Analysis Interview Corpus Wizard-of-Oz (DAIC-WOZ) dataset demonstrate that the best depression classification performance is obtained by using different feature extraction methods in combination rather than individually. ILDA generates better depression classification results in comparison to the CKSVM. Moreover, ILDA based wrapper feature selection incurs lower computational cost in comparison to the CKSVM and the batch-mode LDA methods. The proposed framework significantly improves the depression classification performance, with an F1 Score of 0.805, which is better than all the video based depression detection models suggested in literature, for the DAIC-WOZ dataset. Salient facial regions and well performing visual feature extraction methods are also identified.
Similar content being viewed by others
References
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: Application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041
Al Jazaery M, Guo G (2018) Video-based depression level analysis by encoding deep spatiotemporal features. IEEE Trans Affect Comput 12(1):262–268
Alghowinem S et al (2016) Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Trans Affect Comput 9(4):478–490
American Psychiatric Association, DS and American Psychiatric Association and others (2013) Diagnostic and statistical manual of mental disorders (DSM-5®). American psychiatric association Washington, DC
Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10
Beck AT, Steer RA, Brown GK (1996) Beck depression inventory-II. San Antonio 78(2):490–498
Bellantonio M et al (2016) Spatio-temporal pain recognition in cnn-based super-resolved facial images. In: Video Analytics. Face and Facial Expression Recognition and Audience Measurement. Springer, pp 151–162
Buyukdura JS, McClintock SM, Croarkin PE (2011) Psychomotor retardation in depression: biological underpinnings, measurement, and treatment. Prog Neuro-Psychopharmacol Biol Psychiatry 35(2):395–409
Castro E, Martínez-Ramón M, Pearlson G, Sui J, Calhoun VD (2011) Characterization of groups using composite kernels and multi-source fMRI analysis data: application to schizophrenia. Neuroimage 58(2):526–536
Chen J et al (2009) WLD: A robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720
Cohen I, Garg A, Huang TS (2000) Emotion recognition from facial expressions using multilevel HMM. In: Neural information processing systems, vol. 2
Cohn JF et al (2009) Detecting depression from facial actions and vocal prosody. pp. 1–7.
Cummins N, Joshi J, Dhall A, Sethu V, Goecke R, Epps J (2013) Diagnosis of depression by behavioural signals: a multimodal approach. pp. 11–20
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. vol. 1, pp. 886–893
de Melo WC, Granger E, Hadid A (2019) Combining global and local convolutional 3d networks for detecting depression from facial expressions. pp. 1–8
Who.int (2020) Depression. [online] Available at: https://www.who.int/news-room/fact-sheets/detail/depression. Accessed 18 June 2020
Dibeklioğlu H, Hammal Z, Yang Y, Cohn JF (Nov. 2015) Multimodal Detection of Depression in Clinical Interviews. Proc ACM Int Conf Multimodal Interact 2015:307–310
Duda RO, Hart PE, Stork DG (2006) Pattern classification. John Wiley & Sons
Fukunaga K (2013) Introduction to statistical pattern recognition. Elsevier
Giannakakis G et al (2017) Stress and anxiety detection using facial cues from videos. Biomedical Signal Proc Control 31:89–101
Girard JM, Cohn JF, Mahoor MH, Mavadati S, Rosenwald DP (2013) Social risk and depression: Evidence from manual and automatic facial expression analysis. pp. 1–8
Gong Y, Poellabauer C (2017) Topic modeling based multi-modal depression detection. pp. 69–76
Gratch J et al (2014) The distress analysis interview corpus of human and computer interviews. pp. 3123–3128
Gupta R et al (2014) Multimodal prediction of affective dimensions and depression in human-computer interactions. pp. 33–40
Haque A, Guo M, Miner AS, Fei-Fei L (2018) Measuring depression symptom severity from spoken language and 3D facial expressions. arXiv preprint arXiv:1811.08592
Hawton K, Comabella CCI, Haw C, Saunders K (2013) Risk factors for suicide in individuals with depression: a systematic review. J Affect Disord 147(1–3):17–28
He S, Soraghan JJ, O’Reilly BF, Xing D (2009) Quantitative analysis of facial paralysis using local binary patterns in biomedical videos. IEEE Trans Biomed Eng 56(7):1864–1870
He L, Jiang D, Sahli H (2018) Automatic Depression Analysis using Dynamic Facial Appearance Descriptor and Dirichlet Process Fisher Encoding. IEEE Trans Multimedia 21:1476–1486
Hill D (1974) Non-verbal behaviour in mental illness. Br J Psychiatry 124(580):221–230
Jain V, Crowley JL, Dey AK, Lux A (2014) Depression estimation using audiovisual features and fisher vector encoding. pp. 87–91
James SL et al (2018) Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 392(10159):1789–1858
Jan A, Meng H, Gaus YFA, Zhang F, Turabzadeh S (2014) Automatic depression scale prediction using facial expression dynamics and regression. pp. 73–80
Jan A, Meng H, Gaus YFBA, Zhang F (2017) Artificial intelligent system for automatic depression level analysis through visual and vocal expressions. IEEE Trans Cogn Dev Syst 10(3):668–680
Joshi J et al (2013) Multimodal assistive technologies for depression diagnosis and monitoring. J Multimodal User Interfaces 7(3):217–228
Kroenke K, Spitzer RL, Williams JB (2001) The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 16(9):606–613
Manfredonia J et al (2019) Automatic recognition of posed facial expression of emotion in individuals with autism spectrum disorder. J Autism Dev Disord 49(1):279–293
Marsaglia G, Styan GP (1974) Rank conditions for generalized inverses of partitioned matrices. Sankhyā: The Indian Journal of Statistics, Series A:437–442
Mehrabian A, Russell JA (1974) An approach to environmental psychology. the MIT Press
Meng H, Pears N, Freeman M, Bailey C (2009) Motion history histograms for human action recognition. In: Embedded Computer Vision. Springer, pp 139–162. https://doi.org/10.1007/978-1-84800-304-0_7
Meng H, Huang D, Wang H, Yang H, Ai-Shuraifi M, Wang Y (2013) Depression recognition based on dynamic facial and vocal expression features using partial least square regression. pp. 21–30
Nasir M, Jati A, Shivakumar PG, Nallan Chakravarthula S, Georgiou P (2016) Multimodal and multiresolution depression detection from speech and facial landmark features,” pp. 43–50
Neumann D, Langner T, Ulbrich F, Spitta D, Goehring D (2017) Online vehicle detection using Haar-like, LBP and HOG feature based image classifiers with stereo vision preselection. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 773–778
Nhat HTM, Hoang VT (2019) Feature fusion by using LBP, HOG, GIST descriptors and Canonical Correlation Analysis for face recognition. In: 2019 26th international conference on telecommunications (ICT). pp. 371–375
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell (7):971–987
Ojansivu V, Heikkilä J (2008) Blur insensitive texture classification using local phase quantization. pp. 236–243
Ouellette DV (1981) Schur complements and statistics. Linear Algebra Appl 36:187–295
Pampouchidou A et al (2016) Depression assessment by fusing high and low level features from audio, video, and text. pp. 27–34
Ringeval F et al (2017) Avec 2017: Real-life depression, and affect recognition workshop and challenge. pp. 3–9
Senoussaoui M, Sarria-Paja M, Santos JF, Falk TH (2014) Model fusion for multimodal depression classification and level detection. pp. 57–63
Shao L, Mattivi R (2010) Feature detector and descriptor evaluation in human action recognition. In: Proceedings of the ACM International Conference on Image and Video Retrieval. pp. 477–484
Song S, Shen L, Valstar M (2018) Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features. pp. 158–165
Stratou G, Scherer S, Gratch J, Morency L-P (2015) Automatic nonverbal behavior indicators of depression and ptsd: the effect of gender. J Multimodal User Interfaces 9(1):17–29
Sun B et al (2017) A random forest regression method with selected-text feature for depression assessment. pp. 61–68
Syed ZS, Sidorov K, Marshall D (2017) Depression severity prediction based on biomarkers of psychomotor retardation. pp. 37–43
Tan X, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650
Turan C, Lam K-M (2018) Histogram-based local descriptors for facial expression recognition (FER): A comprehensive study. J Vis Commun Image Represent 55:331–341
Valstar M et al (2013) AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. pp. 3–10
Valstar M et al (2014) Avec 2014: 3d dimensional affect and depression recognition challenge. pp. 3–10
Valstar M et al (2016) AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC ‘16, Amsterdam, The Netherlands, pp. 3–10. https://doi.org/10.1145/2988257.2988258.
Wang Y et al (2020) Automatic Depression Detection via Facial Expressions Using Multiple Instance Learning. pp. 1933–1936
Wen L, Li X, Guo G, Zhu Y (2015) Automated depression diagnosis based on facial dynamic analysis and sparse coding. IEEE Trans Inf Forensics Secur 10(7):1432–1441
Williams JB (1988) A structured interview guide for the Hamilton Depression Rating Scale. Arch Gen Psychiatry 45(8):742–747
Williamson JR, Quatieri TF, Helfer BS, Horwitz R, Yu B, Mehta DD (2013) Vocal biomarkers of depression based on motor incoordination. pp. 41–48
Williamson JR, Quatieri TF, Helfer BS, Ciccarelli G, Mehta DD (2014) Vocal and facial biomarkers of depression based on motor incoordination and timing. pp. 65–72
J. R. Williamson et al (2016) Detecting depression using vocal, facial and semantic communication cues. pp. 11–18.
Yang M, Zhang L, Shiu SC-K, Zhang D (2012) Monogenic binary coding: An efficient local feature extraction approach to face recognition. IEEE Trans Inf Forensics Secur 7(6):1738–1751
Yang B-Q, Zhang T, Gu C-C, Wu K-J, Guan X-P (2016) A novel face recognition method based on IWLD and IWBC. Multimed Tools Appl 75(12):6979–7002
Yang L, Jiang D, He L, Pei E, Oveneke MC, Sahli H (2016) Decision tree based depression classification from audio video and language information. pp. 89–96
Zheng W, Yan L, Gou C, Wang F-Y (2020) Graph Attention Model Embedded With Multi-Modal Knowledge For Depression Detection. pp. 1–6
Zhou X, Jin K, Shang Y, Guo G (2018) Visually interpretable representation learning for depression recognition from facial images. IEEE Trans Affect Comput 11(3):542–552
Zhu Y, Shang Y, Shao Z, Guo G (2018) Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Trans Affect Comput 9(4):578–584
Data Availability (data transparency)
Not Applicable
Code availability (software application or custom code)
The authors do not wish to share the code at this stage.
Funding
Not Applicable
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 1. Composite kernel support vector machine (CKSVM)
Castro et al. [9] in his work for schizophrenia detection, handled the high dimensionality of the fMRI data by using composite kernels and recursive feature elimination. The non-linear relationship between the features(voxels) within a region was captured by using the gaussian kernel, which transforms the features to an infinite dimension Hilbert space, provided with a kernel inner product. Further, to capture the linear relationship between the regions, voxels of different regions in the Hilbert space were combined using the summation kernel. This linear combination of kernels was called as the composite kernel. Composite kernel-based support vector machine (SVM) classifier parameters helped in finding the relevance of the regions instead of the individual features. Further, recursive feature elimination approach was utilized to identify the regions which distinguished the patients and the controls better. Motivated by Castro’s work, we used CKSVM to rank the 19 feature sets and used the forward selection approach to incrementally add a feature set that is the most relevant for the task of depression detection.
Let vi, f denote a feature vector from the fth feature set, 1 ≤ f ≤ F, for the ith sample, 1 ≤ i ≤ N. In our study, N = 107 for the training set, N = 35 for the test set and F = 19. Using a non-linear transformation φf, feature vectors of the feature set f are mapped to a high dimensional Hilbert space provided that
where <. > denotes the inner product for a pair of feature vectors in the Hilbert space and kf(., .) is a Mercer’s kernel function. We used the gaussian kernel for non-linearly transforming each of the F feature sets. Corresponding to the feature set f, a kernel matrix Kf is generated. The component (i, j) of Kf is computed as
where σis the gaussian kernel parameter. The feature sets that have been mapped individually can be concatenated into a single vector as
The inner product for a pair of vectors vi and vjcan be given as
The above result of the inner product is a composite kernel, expressed as the sum of the kernels for F feature sets. Accordingly, the optimization algorithm of the conventional support vector machine can be modified as:
Similarly, the equation to predict the output of SVM learning algorithm is modified as:
where, αi and b are the classifier parameters. By using composite kernels and the SVM parameter α, it is possible to compute the relevance of a particular feature set as
The higher the relevance of a feature set f, higher is the quadratic norm of wf. Usingthe forward selection approach, an optimum combination of feature sets for depression detection is determined incrementally, based on the ‖wf‖2for each distinct combination of feature sets. The CKSVM method entailed high time complexity due to the initial kernel computation and parameter tuning of σ for the gaussian kernel.
Rights and permissions
About this article
Cite this article
Rathi, S., Kaur, B. & Agrawal, R. Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant Analysis. Multimed Tools Appl 81, 17703–17727 (2022). https://doi.org/10.1007/s11042-022-12420-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12420-2