Skip to main content

Multi-modal Biomarker Extraction Framework for Therapy Monitoring of Social Anxiety and Depression Using Audio and Video

  • Conference paper
  • First Online:
Machine Learning for Multimodal Healthcare Data (ML4MHD 2023)

Abstract

This paper introduces a framework that can be used for feature extraction, relevant to monitoring the speech therapy progress of individuals suffering from social anxiety or depression. It operates multi-modal (decision fusion) by incorporating audio and video recordings of a patient and the corresponding interviewer, at two separate test assessment sessions. The used data is provided by an ongoing project in a day-hospital and outpatient setting in Germany, with the goal of investigating whether an established speech therapy group program for adolescents, which is implemented in a stationary and semi-stationary setting, can be successfully carried out via telemedicine. The features proposed in this multi-modal approach could form the basis for interpretation and analysis by medical experts and therapists, in addition to acquired data in the form of questionnaires. Extracted audio features focus on prosody (intonation, stress, rhythm, and timing), as well as predictions from a deep neural network model, which is inspired by the Pleasure, Arousal, Dominance (PAD) emotional model space. Video features are based on a pipeline that is designed to enable visualization of the interaction between the patient and the interviewer in terms of Facial Emotion Recognition (FER), utilizing the mini-Xception network architecture.

T. Weise and P. A. Pérez-Toro—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.gebo-med.de/tele-just.

References

  1. Arevalo, J., Solorio, T., et al.: Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017)

  2. Arkowitz, H., Burke, B.L.: Motivational interviewing as an integrative framework for the treatment of depression. In: Motivational Interviewing in the Treatment of Psychological Problems, pp. 145–172 (2008)

    Google Scholar 

  3. Arriaga, O., Valdenegro-Toro, M., Plöger, P.: Real-time convolutional neural networks for emotion and gender classification. arXiv preprint arXiv:1710.07557 (2017)

  4. Bourke, C., Douglas, K., Porter, R.: Processing of facial emotion expression in major depression: a review. Aust. NZ J. Psychiatry 44(8), 681–696 (2010)

    Article  Google Scholar 

  5. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)

    Article  Google Scholar 

  6. Choi, I.C., Comstock, G.W.: Interviewer effect on responses to a questionnaire relating to mood. Am. J. Epidemiol. 101(1), 84–92 (1975)

    Article  Google Scholar 

  7. Cummins, N., et al.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)

    Article  Google Scholar 

  8. Ekman, P.: Facial expression and emotion. Am. Psychol. 48(4), 384 (1993)

    Article  Google Scholar 

  9. Freira, S., Lemos, M.S.O.: Effect of motivational interviewing on depression scale scores of adolescents with obesity and overweight. Psychiatry Res. 252, 340–345 (2017)

    Google Scholar 

  10. Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16

    Chapter  Google Scholar 

  11. Gur, R.C., Erwin, R.J., et al.: Facial emotion discrimination: Ii. behavioral findings in depression. Psychiatry Res. 42(3), 241–251 (1992)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Joormann, J., Gotlib, I.H.: Is this happiness I see? Biases in the identification of emotional facial expressions in depression and social phobia. J. Abnorm. Psychol. 115(4), 705 (2006)

    Article  Google Scholar 

  14. Klaar, L., Nagels, A., et al.: Sprachliche besonderheiten in der spontansprache von patientinnen mit depression. Logos (2020)

    Google Scholar 

  15. Kohler, C.G., Hoffman, L.J., Eastman, L.B., Healey, K., Moberg, P.J.: Facial emotion perception in depression and bipolar disorder: a quantitative review. Psychiatry Res. 188(3), 303–309 (2011)

    Article  Google Scholar 

  16. Leppänen, J.M., et al.: Depression biases the recognition of emotionally neutral faces. Psychiatry Res. 128(2), 123–133 (2004)

    Article  Google Scholar 

  17. Martin, G.: Depression in teenagers. Curr. Therapeutics 37(6), 57–67 (1996)

    Google Scholar 

  18. Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14, 261–292 (1996)

    Article  MathSciNet  Google Scholar 

  19. Mehrabian, A.: Comparison of the pad and panas as models for describing emotions and for differentiating anxiety from depression. J. Psychopathol. Behav. Assess. 19, 331–357 (1997)

    Article  Google Scholar 

  20. Orsolini, L., Pompili, S., et al.: A systematic review on telemental health in youth mental health: Focus on anxiety, depression and obsessive-compulsive disorder. Medicina 57(8), 793 (2021)

    Article  Google Scholar 

  21. Pérez-Toro, P.A., Bayerl, S.P., et al.: Influence of the interviewer on the automatic assessment of Alzheimer’s disease in the context of the Adresso challenge. In: Interspeech, pp. 3785–3789 (2021)

    Google Scholar 

  22. Rude, S., Gortner, E.M., Pennebaker, J.: Language use of depressed and depression-vulnerable college students. Cogn. Emotion 18(8), 1121–1133 (2004)

    Article  Google Scholar 

  23. Rutter, L.A., Passell, E., et al.: Depression severity is associated with impaired facial emotion processing in a large international sample. J. Affect. Disord. 275, 175–179 (2020)

    Article  Google Scholar 

  24. Schwartz, G.E., et al.: Facial muscle patterning to affective imagery in depressed and nondepressed subjects. Science 192(4238), 489–491 (1976)

    Article  Google Scholar 

  25. Shugaley, A., Altmann, U., et al.: Klang der depression. Psychotherapeut 67(2), 158–165 (2022)

    Article  Google Scholar 

  26. Strätz, T.: Sprachtherapie mit ängstlichen und depressiven jugendlichen-ein erfahrungsbericht (2022)

    Google Scholar 

  27. Surguladze, S., et al.: A differential pattern of neural response toward sad versus happy facial expressions in major depressive disorder. Biol. Psychiat. 57(3), 201–209 (2005)

    Article  Google Scholar 

  28. Szegedy, C., Ioffe, S.o.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  29. Tarasenko, S.: Emotionally colorful reflexive games. arXiv preprint arXiv:1101.0820 (2010)

  30. Torro-Alves, N., et al.: Facial emotion recognition in social anxiety: the influence of dynamic information. Psychol. Neurosci. 9(1), 1 (2016)

    Article  Google Scholar 

  31. Zhang, Q., Ran, G., Li, X.: The perception of facial emotional change in social anxiety: an ERP study. Front. Psychol. 9, 1737 (2018)

    Article  Google Scholar 

  32. Zwirnmann, S., et al.: Fachbeitrag: Sprachliche und emotional-soziale beeinträchtigungen. komorbiditäten und wechselwirkungen. Vierteljahresschrift für Heilpädagogik und ihre Nachbargebiete (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tobias Weise or Paula Andrea Pérez-Toro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Weise, T. et al. (2024). Multi-modal Biomarker Extraction Framework for Therapy Monitoring of Social Anxiety and Depression Using Audio and Video. In: Maier, A.K., Schnabel, J.A., Tiwari, P., Stegle, O. (eds) Machine Learning for Multimodal Healthcare Data. ML4MHD 2023. Lecture Notes in Computer Science, vol 14315. Springer, Cham. https://doi.org/10.1007/978-3-031-47679-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47679-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47678-5

  • Online ISBN: 978-3-031-47679-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics