Skip to main content
Log in

Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant Analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Presently, while automated depression diagnosis has made great progress, most of the recent works have focused on combining multiple modalities rather than strengthening a single one. In this research work, we present a unimodal framework for depression detection based on facial expressions and facial motion analysis. We investigate a wide set of visual features extracted from different facial regions. Due to high dimensionality of the obtained feature sets, identification of informative and discriminative features is a challenge. This paper suggests a hybrid dimensionality reduction approach which leverages the advantages of the filter and wrapper methods. First, we use a univariate filter method, Fisher Discriminant Ratio, to initially reduce the size of each feature set. Subsequently, we propose an Incremental Linear Discriminant Analysis (ILDA) approach to find an optimal combination of complementary and relevant feature sets. We compare the performance of the proposed ILDA with the batch-mode LDA and also the Composite Kernel based Support Vector Machine (CKSVM) method. The experiments conducted on the Distress Analysis Interview Corpus Wizard-of-Oz (DAIC-WOZ) dataset demonstrate that the best depression classification performance is obtained by using different feature extraction methods in combination rather than individually. ILDA generates better depression classification results in comparison to the CKSVM. Moreover, ILDA based wrapper feature selection incurs lower computational cost in comparison to the CKSVM and the batch-mode LDA methods. The proposed framework significantly improves the depression classification performance, with an F1 Score of 0.805, which is better than all the video based depression detection models suggested in literature, for the DAIC-WOZ dataset. Salient facial regions and well performing visual feature extraction methods are also identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: Application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041

    Article  Google Scholar 

  2. Al Jazaery M, Guo G (2018) Video-based depression level analysis by encoding deep spatiotemporal features. IEEE Trans Affect Comput 12(1):262–268

    Article  Google Scholar 

  3. Alghowinem S et al (2016) Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Trans Affect Comput 9(4):478–490

    Article  Google Scholar 

  4. American Psychiatric Association, DS and American Psychiatric Association and others (2013) Diagnostic and statistical manual of mental disorders (DSM-5®). American psychiatric association Washington, DC

  5. Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10

  6. Beck AT, Steer RA, Brown GK (1996) Beck depression inventory-II. San Antonio 78(2):490–498

    Google Scholar 

  7. Bellantonio M et al (2016) Spatio-temporal pain recognition in cnn-based super-resolved facial images. In: Video Analytics. Face and Facial Expression Recognition and Audience Measurement. Springer, pp 151–162

  8. Buyukdura JS, McClintock SM, Croarkin PE (2011) Psychomotor retardation in depression: biological underpinnings, measurement, and treatment. Prog Neuro-Psychopharmacol Biol Psychiatry 35(2):395–409

    Article  Google Scholar 

  9. Castro E, Martínez-Ramón M, Pearlson G, Sui J, Calhoun VD (2011) Characterization of groups using composite kernels and multi-source fMRI analysis data: application to schizophrenia. Neuroimage 58(2):526–536

    Article  Google Scholar 

  10. Chen J et al (2009) WLD: A robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720

    Article  Google Scholar 

  11. Cohen I, Garg A, Huang TS (2000) Emotion recognition from facial expressions using multilevel HMM. In: Neural information processing systems, vol. 2

  12. Cohn JF et al (2009) Detecting depression from facial actions and vocal prosody. pp. 1–7.

  13. Cummins N, Joshi J, Dhall A, Sethu V, Goecke R, Epps J (2013) Diagnosis of depression by behavioural signals: a multimodal approach. pp. 11–20

  14. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. vol. 1, pp. 886–893

  15. de Melo WC, Granger E, Hadid A (2019) Combining global and local convolutional 3d networks for detecting depression from facial expressions. pp. 1–8

  16. Who.int (2020) Depression. [online] Available at: https://www.who.int/news-room/fact-sheets/detail/depression. Accessed 18 June 2020

  17. Dibeklioğlu H, Hammal Z, Yang Y, Cohn JF (Nov. 2015) Multimodal Detection of Depression in Clinical Interviews. Proc ACM Int Conf Multimodal Interact 2015:307–310

    Article  Google Scholar 

  18. Duda RO, Hart PE, Stork DG (2006) Pattern classification. John Wiley & Sons

    MATH  Google Scholar 

  19. Fukunaga K (2013) Introduction to statistical pattern recognition. Elsevier

    MATH  Google Scholar 

  20. Giannakakis G et al (2017) Stress and anxiety detection using facial cues from videos. Biomedical Signal Proc Control 31:89–101

    Article  Google Scholar 

  21. Girard JM, Cohn JF, Mahoor MH, Mavadati S, Rosenwald DP (2013) Social risk and depression: Evidence from manual and automatic facial expression analysis. pp. 1–8

  22. Gong Y, Poellabauer C (2017) Topic modeling based multi-modal depression detection. pp. 69–76

  23. Gratch J et al (2014) The distress analysis interview corpus of human and computer interviews. pp. 3123–3128

  24. Gupta R et al (2014) Multimodal prediction of affective dimensions and depression in human-computer interactions. pp. 33–40

  25. Haque A, Guo M, Miner AS, Fei-Fei L (2018) Measuring depression symptom severity from spoken language and 3D facial expressions. arXiv preprint arXiv:1811.08592

  26. Hawton K, Comabella CCI, Haw C, Saunders K (2013) Risk factors for suicide in individuals with depression: a systematic review. J Affect Disord 147(1–3):17–28

    Article  Google Scholar 

  27. He S, Soraghan JJ, O’Reilly BF, Xing D (2009) Quantitative analysis of facial paralysis using local binary patterns in biomedical videos. IEEE Trans Biomed Eng 56(7):1864–1870

    Article  Google Scholar 

  28. He L, Jiang D, Sahli H (2018) Automatic Depression Analysis using Dynamic Facial Appearance Descriptor and Dirichlet Process Fisher Encoding. IEEE Trans Multimedia 21:1476–1486

    Article  Google Scholar 

  29. Hill D (1974) Non-verbal behaviour in mental illness. Br J Psychiatry 124(580):221–230

    Article  Google Scholar 

  30. Jain V, Crowley JL, Dey AK, Lux A (2014) Depression estimation using audiovisual features and fisher vector encoding. pp. 87–91

  31. James SL et al (2018) Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 392(10159):1789–1858

    Article  Google Scholar 

  32. Jan A, Meng H, Gaus YFA, Zhang F, Turabzadeh S (2014) Automatic depression scale prediction using facial expression dynamics and regression. pp. 73–80

  33. Jan A, Meng H, Gaus YFBA, Zhang F (2017) Artificial intelligent system for automatic depression level analysis through visual and vocal expressions. IEEE Trans Cogn Dev Syst 10(3):668–680

    Article  Google Scholar 

  34. Joshi J et al (2013) Multimodal assistive technologies for depression diagnosis and monitoring. J Multimodal User Interfaces 7(3):217–228

    Article  Google Scholar 

  35. Kroenke K, Spitzer RL, Williams JB (2001) The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 16(9):606–613

    Article  Google Scholar 

  36. Manfredonia J et al (2019) Automatic recognition of posed facial expression of emotion in individuals with autism spectrum disorder. J Autism Dev Disord 49(1):279–293

    Article  Google Scholar 

  37. Marsaglia G, Styan GP (1974) Rank conditions for generalized inverses of partitioned matrices. Sankhyā: The Indian Journal of Statistics, Series A:437–442

  38. Mehrabian A, Russell JA (1974) An approach to environmental psychology. the MIT Press

    Google Scholar 

  39. Meng H, Pears N, Freeman M, Bailey C (2009) Motion history histograms for human action recognition. In: Embedded Computer Vision. Springer, pp 139–162. https://doi.org/10.1007/978-1-84800-304-0_7

  40. Meng H, Huang D, Wang H, Yang H, Ai-Shuraifi M, Wang Y (2013) Depression recognition based on dynamic facial and vocal expression features using partial least square regression. pp. 21–30

  41. Nasir M, Jati A, Shivakumar PG, Nallan Chakravarthula S, Georgiou P (2016) Multimodal and multiresolution depression detection from speech and facial landmark features,” pp. 43–50

  42. Neumann D, Langner T, Ulbrich F, Spitta D, Goehring D (2017) Online vehicle detection using Haar-like, LBP and HOG feature based image classifiers with stereo vision preselection. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 773–778

  43. Nhat HTM, Hoang VT (2019) Feature fusion by using LBP, HOG, GIST descriptors and Canonical Correlation Analysis for face recognition. In: 2019 26th international conference on telecommunications (ICT). pp. 371–375

  44. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell (7):971–987

  45. Ojansivu V, Heikkilä J (2008) Blur insensitive texture classification using local phase quantization. pp. 236–243

  46. Ouellette DV (1981) Schur complements and statistics. Linear Algebra Appl 36:187–295

    Article  MathSciNet  Google Scholar 

  47. Pampouchidou A et al (2016) Depression assessment by fusing high and low level features from audio, video, and text. pp. 27–34

  48. Ringeval F et al (2017) Avec 2017: Real-life depression, and affect recognition workshop and challenge. pp. 3–9

  49. Senoussaoui M, Sarria-Paja M, Santos JF, Falk TH (2014) Model fusion for multimodal depression classification and level detection. pp. 57–63

  50. Shao L, Mattivi R (2010) Feature detector and descriptor evaluation in human action recognition. In: Proceedings of the ACM International Conference on Image and Video Retrieval. pp. 477–484

  51. Song S, Shen L, Valstar M (2018) Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features. pp. 158–165

  52. Stratou G, Scherer S, Gratch J, Morency L-P (2015) Automatic nonverbal behavior indicators of depression and ptsd: the effect of gender. J Multimodal User Interfaces 9(1):17–29

    Article  Google Scholar 

  53. Sun B et al (2017) A random forest regression method with selected-text feature for depression assessment. pp. 61–68

  54. Syed ZS, Sidorov K, Marshall D (2017) Depression severity prediction based on biomarkers of psychomotor retardation. pp. 37–43

  55. Tan X, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650

    Article  MathSciNet  Google Scholar 

  56. Turan C, Lam K-M (2018) Histogram-based local descriptors for facial expression recognition (FER): A comprehensive study. J Vis Commun Image Represent 55:331–341

    Article  Google Scholar 

  57. Valstar M et al (2013) AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. pp. 3–10

  58. Valstar M et al (2014) Avec 2014: 3d dimensional affect and depression recognition challenge. pp. 3–10

  59. Valstar M et al (2016) AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC ‘16, Amsterdam, The Netherlands, pp. 3–10. https://doi.org/10.1145/2988257.2988258.

  60. Wang Y et al (2020) Automatic Depression Detection via Facial Expressions Using Multiple Instance Learning. pp. 1933–1936

  61. Wen L, Li X, Guo G, Zhu Y (2015) Automated depression diagnosis based on facial dynamic analysis and sparse coding. IEEE Trans Inf Forensics Secur 10(7):1432–1441

    Article  Google Scholar 

  62. Williams JB (1988) A structured interview guide for the Hamilton Depression Rating Scale. Arch Gen Psychiatry 45(8):742–747

    Article  Google Scholar 

  63. Williamson JR, Quatieri TF, Helfer BS, Horwitz R, Yu B, Mehta DD (2013) Vocal biomarkers of depression based on motor incoordination. pp. 41–48

  64. Williamson JR, Quatieri TF, Helfer BS, Ciccarelli G, Mehta DD (2014) Vocal and facial biomarkers of depression based on motor incoordination and timing. pp. 65–72

  65. J. R. Williamson et al (2016) Detecting depression using vocal, facial and semantic communication cues. pp. 11–18.

  66. Yang M, Zhang L, Shiu SC-K, Zhang D (2012) Monogenic binary coding: An efficient local feature extraction approach to face recognition. IEEE Trans Inf Forensics Secur 7(6):1738–1751

    Article  Google Scholar 

  67. Yang B-Q, Zhang T, Gu C-C, Wu K-J, Guan X-P (2016) A novel face recognition method based on IWLD and IWBC. Multimed Tools Appl 75(12):6979–7002

    Article  Google Scholar 

  68. Yang L, Jiang D, He L, Pei E, Oveneke MC, Sahli H (2016) Decision tree based depression classification from audio video and language information. pp. 89–96

  69. Zheng W, Yan L, Gou C, Wang F-Y (2020) Graph Attention Model Embedded With Multi-Modal Knowledge For Depression Detection. pp. 1–6

  70. Zhou X, Jin K, Shang Y, Guo G (2018) Visually interpretable representation learning for depression recognition from facial images. IEEE Trans Affect Comput 11(3):542–552

  71. Zhu Y, Shang Y, Shao Z, Guo G (2018) Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Trans Affect Comput 9(4):578–584

    Article  Google Scholar 

Download references

Data Availability (data transparency)

Not Applicable

Code availability (software application or custom code)

The authors do not wish to share the code at this stage.

Funding

Not Applicable

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swati Rathi.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 1. Composite kernel support vector machine (CKSVM)

Castro et al. [9] in his work for schizophrenia detection, handled the high dimensionality of the fMRI data by using composite kernels and recursive feature elimination. The non-linear relationship between the features(voxels) within a region was captured by using the gaussian kernel, which transforms the features to an infinite dimension Hilbert space, provided with a kernel inner product. Further, to capture the linear relationship between the regions, voxels of different regions in the Hilbert space were combined using the summation kernel. This linear combination of kernels was called as the composite kernel. Composite kernel-based support vector machine (SVM) classifier parameters helped in finding the relevance of the regions instead of the individual features. Further, recursive feature elimination approach was utilized to identify the regions which distinguished the patients and the controls better. Motivated by Castro’s work, we used CKSVM to rank the 19 feature sets and used the forward selection approach to incrementally add a feature set that is the most relevant for the task of depression detection.

Let vi, f denote a feature vector from the fth feature set, 1 ≤ f ≤ F, for the ith sample, 1 ≤ i ≤ N. In our study, N = 107 for the training set, N = 35 for the test set and F = 19. Using a non-linear transformation φf, feature vectors of the feature set f are mapped to a high dimensional Hilbert space provided that

$$ <{\varphi}_f\left({\mathbf{v}}_{i,f}\right).{\varphi}_f\left({\mathbf{v}}_{j,f}\right)>={k}_f\left({\mathbf{v}}_{i,f},{\mathbf{v}}_{j,f}\right) $$
(13)

where <. > denotes the inner product for a pair of feature vectors in the Hilbert space and kf(., .) is a Mercer’s kernel function. We used the gaussian kernel for non-linearly transforming each of the F feature sets. Corresponding to the feature set f, a kernel matrix Kf is generated. The component (i, j) of Kf is computed as

$$ {\mathbf{K}}_f\left(i,j\right)={k}_f\left({\mathbf{v}}_{i,f},{\mathbf{v}}_{j,f}\right)={e}^{-\frac{{\left\Vert {\mathbf{v}}_{i,f}-{\mathbf{v}}_{j,f}\right\Vert}^2}{2{\sigma}^2}} $$
(14)

where σis the gaussian kernel parameter. The feature sets that have been mapped individually can be concatenated into a single vector as

$$ {\varphi}_f\left({\mathbf{v}}_i\right)={\left[{\varphi_f}^T\left({\mathbf{v}}_{i,1}\right)\cdots {\varphi_f}^T\left({\mathbf{v}}_{i,F}\right)\right]}^T $$
(15)

The inner product for a pair of vectors vi and vjcan be given as

$$ {\displaystyle \begin{array}{c}<{\upvarphi}_{\mathrm{f}}\left({\mathrm{v}}_{\mathrm{i}}\right).{\upvarphi}_{\mathrm{f}}\left({\mathrm{v}}_{\mathrm{j}}\right)>=\left[{\upvarphi_{\mathrm{f}}}^{\mathrm{T}}\left({\mathrm{v}}_{\mathrm{i},1}\right)\cdots {\upvarphi_{\mathrm{f}}}^{\mathrm{T}}\left({\mathrm{v}}_{\mathrm{i},\mathrm{F}}\right)\right].{\left[{\upvarphi_{\mathrm{f}}}^{\mathrm{T}}\left({\mathrm{v}}_{\mathrm{i},1}\right)\cdots {\upvarphi_{\mathrm{f}}}^{\mathrm{T}}\left({\mathrm{v}}_{\mathrm{i},\mathrm{F}}\right)\right]}^{\mathrm{T}}\\ {}\kern0.5em =\sum \limits_{\mathrm{f}=1}^{\mathrm{F}}{\upvarphi_{\mathrm{f}}}^{\mathrm{T}}\left({\mathrm{v}}_{\mathrm{i},\mathrm{f}}\right).{\upvarphi}_{\mathrm{f}}\left({\mathrm{v}}_{\mathrm{j},\mathrm{f}}\right)=\sum \limits_{\mathrm{f}=1}^{\mathrm{F}}{\mathrm{k}}_{\mathrm{f}}\left({\mathrm{v}}_{\mathrm{i},\mathrm{f}},{\mathrm{v}}_{\mathrm{j},\mathrm{f}}\right)\end{array}} $$
(16)

The above result of the inner product is a composite kernel, expressed as the sum of the kernels for F feature sets. Accordingly, the optimization algorithm of the conventional support vector machine can be modified as:

$$ {\displaystyle \begin{array}{c}\underset{\upalpha}{\max}\sum \limits_{\mathrm{i}=1}^{\mathrm{N}}{\upalpha}_{\mathrm{i}}-\frac{1}{2}\sum \limits_{\mathrm{i}=1}^{\mathrm{N}}\sum \limits_{\mathrm{j}=1}^{\mathrm{N}}{\upalpha}_{\mathrm{i}}{\upalpha}_{\mathrm{j}}{\mathrm{y}}_{\mathrm{i}}{\mathrm{y}}_{\mathrm{j}}\sum \limits_{\mathrm{f}=1}^{\mathrm{F}}{\mathrm{k}}_{\mathrm{f}}\left({\mathrm{v}}_{\mathrm{i},\mathrm{f}},{\mathrm{v}}_{\mathrm{j},\mathrm{f}}\right)\\ {}\begin{array}{c}\mathrm{s}.\mathrm{t}.\kern0.5em \sum \limits_{\mathrm{i}=1}^{\mathrm{N}}{\upalpha}_{\mathrm{i}}{\mathrm{y}}_{\mathrm{i}}=0\\ {}{\upalpha}_{\mathrm{i}}\ge 0\\ {}\begin{array}{c}1\le \mathrm{i},\mathrm{j}\le \mathrm{N}\\ {}1\le \mathrm{f}\le \mathrm{F}\end{array}\end{array}\end{array}} $$
(17)

Similarly, the equation to predict the output of SVM learning algorithm is modified as:

$$ {y}^{\ast }=\sum \limits_{i=1}^N{\alpha}_i{y}_i\sum \limits_{f=1}^F{k}_f\left({\mathbf{v}}_{i,f},{\mathbf{v}}_{\ast, f}\right)+b $$
(18)

where, αi and b are the classifier parameters. By using composite kernels and the SVM parameter α, it is possible to compute the relevance of a particular feature set as

$$ {\left\Vert {\mathbf{w}}_f\right\Vert}^2={\alpha}^T{\mathbf{K}}_f\alpha $$
(19)

The higher the relevance of a feature set f, higher is the quadratic norm of wf. Usingthe forward selection approach, an optimum combination of feature sets for depression detection is determined incrementally, based on the ‖wf2for each distinct combination of feature sets. The CKSVM method entailed high time complexity due to the initial kernel computation and parameter tuning of σ for the gaussian kernel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rathi, S., Kaur, B. & Agrawal, R. Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant Analysis. Multimed Tools Appl 81, 17703–17727 (2022). https://doi.org/10.1007/s11042-022-12420-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12420-2

Keywords

Navigation