Abstract
Upper body gestures have proven to provide more information about a person’s depressive state when added to facial expressions. While several studies on automatic depression analysis have looked into this impact, little is known in regard to how a convolutional neural network (CNN) uses such information for predicting depression severity levels. This study investigates the performance in various CNN models when looking at facial images alone versus including the upper body when estimating depression severity levels on a regressive scale. To assess generalisability of CNN model performance, two vastly different datasets were used, one collected by the Black Dog Institute and the other being the 2013 Audio/Visual Emotion Challenge (AVEC). Results show that the differences in model performance between face versus upper body are slight, as model performance across multiple architectures is very similar but varies when different datasets are introduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albrecht, A.T., Herrick, C.R.: 100 Questions & Answers About Depression. Jones and Bartlett, Burlington (2006)
American Psychiatric Association: Diagnostic and statistical manual of mental disorders: DSM-5, Washington DC (2013)
Mann, J.J., Roose, S.P., McGrath, P.J.: Clinical Handbook for the Management of Mood Disorders, p. 430. Cambridge University Press, Cambridge (2013)
Vares, E.A., Salum, G.A., Spanemberg, L., Caldieraro, M.A., Fleck, M.P.: Depression dimensions: integrating clinical signs and symptoms from the perspectives of clinicians and patients. PLoS ONE 10(8), e0136037 (2015)
Videbech, P., Ravnkilde, B.: Hippocampal volume and depression: a meta-analysis of MRI studies. Am. J. Psychiatry 161(11), 1957–1966 (2015)
Katon, W.: The epidemiology of depression in medical care. Int. J. Psychiatry Med. 17(1), 93–112 (1988)
Waxer, P.: Nonverbal cues for depression. J. Abnorm. Psychol. 83(3), 319–322 (1974)
Darby, J.K., Simmons, N., Berger, P.A.: Speech and voice parameters of depression: a pilot study. J. Commun. Disord. 17(2), 75–85 (1984)
Parker, G., et al.: Classifying depression by mental stage signs. Br. J. Psychiatry 157(Jul), 55–65 (1990)
Chen, Y.-T., Hung, I.-C., Huang, M.-W., Hou, C.-J., Cheng, K.-S.: Physiological signal analysis for patients with depression. In: 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI), pp. 805–808 (2011)
Valstar, M., et al.: AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge (AVEC 2013) (2013)
Ringeval, F., et al.: AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In: AVEC 2019 - Proceedings of the 9th International Audio/Visual Emotion Challenge and Workshop, co-located with MM 2019, pp. 3–12. ACM Press, New York (2019)
Uyulan, C., et al.: Major depressive disorder classification based on different convolutional neural network models: deep learning approach. Clin. EEG Neurosci. 1550059420916634 (2020)
Srimadhur, N.S., Lalitha, S.: An end-to-end model for detection and assessment of depression levels using speech. Procedia Comput. Sci. 171, 12–21 (2020)
Su, C., Xu, Z., Pathak, J., Wang, F.: Deep learning in mental health outcome research: a scoping review. Transl. Psychiatry 10(1), 1–26 (2020). https://www.nature.com/articles/s41398-020-0780-3
Fairbanks, L.A., McGuire, M.T., Harris, C.J.: Nonverbal interaction of patients and therapists during psychiatric interviews. J. Abnorm. Psychol. 91(2), 109–119 (1982)
Girard, J.M., Cohn, J.F., Mahoor, M.H., Mavadati, S.M., Hammal, Z., Rosenwald, D.P.: Nonverbal social withdrawal in depression: evidence from manual and automatic analysis. Image Vis. Comput. 32(10), 641–647 (2014)
Hoffman, E.A., Haxby, J.V.: Distinct representations of eye gaze and identity in the distributed human neural system for face perception. Nat. Neurosci. 3(1), 80–84 (2000)
Jones, I.H., Pansa, M.: Some nonverbal aspects of depression and schizophrenia occurring during the interview. J. Nervous Mental Disease 167(7), 402–409 (1979)
France, J., Kramer, S., Cox, J.: Communication and Mental Illness Theoretical and Practical Approaches. Jessica Kingsley, London (2001)
Joshi, J., Goecke, R., Parker, G., Breakspear, M.: Can body expressions contribute to automatic depression analysis? In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–7. IEEE (2013)
Song, S., Shen, L., Valstar, M.: Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features. In: Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 158–165. Institute of Electrical and Electronics Engineers Inc. (2018)
Dibeklioglu, H., Hammal, Z., Cohn, J.F.: Dynamic multimodal measurement of depression severity using deep autoencoding. IEEE J. Biomed. Health Inf. 22(2), 525–536 (2018)
Gavrilescu, M., Vizireanu, N.: Predicting depression, anxiety, and stress levels from videos using the facial action coding system. Sensors 19(17), 3693 (2019). www.mdpi.com/journal/sensors
Pampouchidou, A.: Automatic detection of visual cues associated to depression. Technical report (2018). https://tel.archives-ouvertes.fr/tel-02122342
Zhou, X., Jin, K., Shang, Y., Guo, G.: Visually interpretable representation learning for depression recognition from facial images. IEEE Trans. Affect. Comput. 11(3), 542–552 (2018)
Valstar, M., et al.: AVEC 2014: 3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge (AVEC 2014) (2014)
Alghowinem, S., Göcke, R., Wagner, M., Epps, J., Breakspear, M., Parker, G.: From joyous to clinically depressed: mood detection using spontaneous speech. In: Twenty-Fifth International FLAIRS Conference, pp. 141–146 (2012)
Qureshi, S.A., Saha, S., Hasanuzzaman, M., Dias, G., Cambria, E.: Multitask representation learning for multimodal estimation of depression level. IEEE Intell. Syst. 34(5), 45–52 (2019)
Stepanov, E.A., et al.: Depression severity estimation from multiple modalities. In: 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), pp. 1–6. IEEE (2018)
Alghowinem, S., Goecke, R., Cohn, J.F., Wagner, M., Parker, G., Breakspear, M.: Cross-cultural detection of depression from nonverbal behaviour. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2015. Institute of Electrical and Electronics Engineers Inc. (2015)
Schuller, B., et al.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)
AVEC2019-Challenge guidelines. https://sites.google.com/view/avec2019/home/challenge-guidelines
Rush, A., et al.: The 16-item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biol. Psychiatry 54(5), 573–583 (2003)
Beck, A.T.: Beck depression inventory. In: Depression, vol. 2006, pp. 2–4 (1961)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), pp. 1–9 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. International Conference on Learning Representations, ICLR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778. IEEE Computer Society (2016)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. Technical report (2017)
Alghowinem, S., et al.: Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Trans. Affect. Comput. 9, 1–14 (2016)
Pampouchidou, A., et al.: Video-based depression detection using local Curvelet binary patterns in pairwise orthogonal planes. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 3835–3838. Institute of Electrical and Electronics Engineers Inc. (2016)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626. IEEE (2017)
Acknowledgements
This research was supported partially by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (project DP190101294).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ahmad, D., Goecke, R., Ireland, J. (2021). CNN Depression Severity Level Estimation from Upper Body vs. Face-Only Images. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-68780-9_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)