Abstract
This paper introduces a framework that can be used for feature extraction, relevant to monitoring the speech therapy progress of individuals suffering from social anxiety or depression. It operates multi-modal (decision fusion) by incorporating audio and video recordings of a patient and the corresponding interviewer, at two separate test assessment sessions. The used data is provided by an ongoing project in a day-hospital and outpatient setting in Germany, with the goal of investigating whether an established speech therapy group program for adolescents, which is implemented in a stationary and semi-stationary setting, can be successfully carried out via telemedicine. The features proposed in this multi-modal approach could form the basis for interpretation and analysis by medical experts and therapists, in addition to acquired data in the form of questionnaires. Extracted audio features focus on prosody (intonation, stress, rhythm, and timing), as well as predictions from a deep neural network model, which is inspired by the Pleasure, Arousal, Dominance (PAD) emotional model space. Video features are based on a pipeline that is designed to enable visualization of the interaction between the patient and the interviewer in terms of Facial Emotion Recognition (FER), utilizing the mini-Xception network architecture.
T. Weise and P. A. Pérez-Toro—Authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arevalo, J., Solorio, T., et al.: Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017)
Arkowitz, H., Burke, B.L.: Motivational interviewing as an integrative framework for the treatment of depression. In: Motivational Interviewing in the Treatment of Psychological Problems, pp. 145–172 (2008)
Arriaga, O., Valdenegro-Toro, M., Plöger, P.: Real-time convolutional neural networks for emotion and gender classification. arXiv preprint arXiv:1710.07557 (2017)
Bourke, C., Douglas, K., Porter, R.: Processing of facial emotion expression in major depression: a review. Aust. NZ J. Psychiatry 44(8), 681–696 (2010)
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Choi, I.C., Comstock, G.W.: Interviewer effect on responses to a questionnaire relating to mood. Am. J. Epidemiol. 101(1), 84–92 (1975)
Cummins, N., et al.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)
Ekman, P.: Facial expression and emotion. Am. Psychol. 48(4), 384 (1993)
Freira, S., Lemos, M.S.O.: Effect of motivational interviewing on depression scale scores of adolescents with obesity and overweight. Psychiatry Res. 252, 340–345 (2017)
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16
Gur, R.C., Erwin, R.J., et al.: Facial emotion discrimination: Ii. behavioral findings in depression. Psychiatry Res. 42(3), 241–251 (1992)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Joormann, J., Gotlib, I.H.: Is this happiness I see? Biases in the identification of emotional facial expressions in depression and social phobia. J. Abnorm. Psychol. 115(4), 705 (2006)
Klaar, L., Nagels, A., et al.: Sprachliche besonderheiten in der spontansprache von patientinnen mit depression. Logos (2020)
Kohler, C.G., Hoffman, L.J., Eastman, L.B., Healey, K., Moberg, P.J.: Facial emotion perception in depression and bipolar disorder: a quantitative review. Psychiatry Res. 188(3), 303–309 (2011)
Leppänen, J.M., et al.: Depression biases the recognition of emotionally neutral faces. Psychiatry Res. 128(2), 123–133 (2004)
Martin, G.: Depression in teenagers. Curr. Therapeutics 37(6), 57–67 (1996)
Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14, 261–292 (1996)
Mehrabian, A.: Comparison of the pad and panas as models for describing emotions and for differentiating anxiety from depression. J. Psychopathol. Behav. Assess. 19, 331–357 (1997)
Orsolini, L., Pompili, S., et al.: A systematic review on telemental health in youth mental health: Focus on anxiety, depression and obsessive-compulsive disorder. Medicina 57(8), 793 (2021)
Pérez-Toro, P.A., Bayerl, S.P., et al.: Influence of the interviewer on the automatic assessment of Alzheimer’s disease in the context of the Adresso challenge. In: Interspeech, pp. 3785–3789 (2021)
Rude, S., Gortner, E.M., Pennebaker, J.: Language use of depressed and depression-vulnerable college students. Cogn. Emotion 18(8), 1121–1133 (2004)
Rutter, L.A., Passell, E., et al.: Depression severity is associated with impaired facial emotion processing in a large international sample. J. Affect. Disord. 275, 175–179 (2020)
Schwartz, G.E., et al.: Facial muscle patterning to affective imagery in depressed and nondepressed subjects. Science 192(4238), 489–491 (1976)
Shugaley, A., Altmann, U., et al.: Klang der depression. Psychotherapeut 67(2), 158–165 (2022)
Strätz, T.: Sprachtherapie mit ängstlichen und depressiven jugendlichen-ein erfahrungsbericht (2022)
Surguladze, S., et al.: A differential pattern of neural response toward sad versus happy facial expressions in major depressive disorder. Biol. Psychiat. 57(3), 201–209 (2005)
Szegedy, C., Ioffe, S.o.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Tarasenko, S.: Emotionally colorful reflexive games. arXiv preprint arXiv:1101.0820 (2010)
Torro-Alves, N., et al.: Facial emotion recognition in social anxiety: the influence of dynamic information. Psychol. Neurosci. 9(1), 1 (2016)
Zhang, Q., Ran, G., Li, X.: The perception of facial emotional change in social anxiety: an ERP study. Front. Psychol. 9, 1737 (2018)
Zwirnmann, S., et al.: Fachbeitrag: Sprachliche und emotional-soziale beeinträchtigungen. komorbiditäten und wechselwirkungen. Vierteljahresschrift für Heilpädagogik und ihre Nachbargebiete (2023)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Weise, T. et al. (2024). Multi-modal Biomarker Extraction Framework for Therapy Monitoring of Social Anxiety and Depression Using Audio and Video. In: Maier, A.K., Schnabel, J.A., Tiwari, P., Stegle, O. (eds) Machine Learning for Multimodal Healthcare Data. ML4MHD 2023. Lecture Notes in Computer Science, vol 14315. Springer, Cham. https://doi.org/10.1007/978-3-031-47679-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-47679-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47678-5
Online ISBN: 978-3-031-47679-2
eBook Packages: Computer ScienceComputer Science (R0)