Audio-Based Detection of Anxiety and Depression via Vocal Biomarkers

Brueckner, Raymond; Kwon, Namhee; Subramanian, Vinod; Blaylock, Nate; O’Connell, Henry

doi:10.1007/978-3-031-53960-2_9

Raymond Brueckner¹⁰,
Namhee Kwon¹⁰,
Vinod Subramanian¹⁰,
Nate Blaylock¹⁰ &
…
Henry O’Connell¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 919))

Included in the following conference series:

Future of Information and Communication Conference

706 Accesses

Abstract

We present a comparison of results based on the application of various model/feature combinations on the task of detecting anxiety and depression from audio signals of spontaneous speech. The adopted models comprise several different advanced deep neural networks, including CNN, LSTM, and attention networks, and are compared against traditional, shallow machine learning models. As input features we compare supra-segmental, paralinguistic feature sets against classical Mel-Frequency Cepstral Coefficients and advanced pre-trained X-vector and Wav2Vec2 features. Our models are trained based on self-assessment scores: GAD-7 for anxiety and PHQ-8 for depression. We present binary classification results for anxiety and depression separately and show that despite the noisy self-assessment labels our best model is able to achieve an unweighted average recall (UAR) of 0.60 for anxiety and 0.63 on the depression task. The result on the anxiety task almost reaches the reported self-scored GAD-7 screening reliability of 0.64. This shows that our best audio-based model can be deployed as an anxiety and depression screening tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The voice of depression: speech features as biomarkers for major depressive disorder

Article Open access 12 November 2024

Unveiling Hidden Patterns in Speech: Audio Signal-Based Approach for Depression Detection

Detecting Depression from Audio Data

Notes

1.
https://www.who.int/news-room/fact-sheets/detail/mental-disorders.
2.
One could, for example, assign a person to the depressed class based on independent depression screening tests.
3.
Paralinguistics is the study of paralanguage, which connotes “alongside language” and generally describes the non-verbal elements of human communication, i.e. all meta-information that accompanies and complements language [6].
4.
https://kaldi-asr.org/models/8/0008_sitw_v2_1a.tar.gz.
5.
Note that this data set was labeled by trained clinical assessors, not relying on self-assessment labels.

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org
Google Scholar
Arroll, B., et al.: Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population. Ann. Family Med. 8(4), 348 (2010)
Google Scholar
Baevski, A., Zhou, H., Mohamed, A., Auli, M.: Wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33 (NeurIPS 2020). Curran Associates Inc., Red Hook, NY, USA (2020)
Google Scholar
Bandelow, B., Michaelis, S.: Epidemiology of anxiety disorders in the 21st century. Dialogues Clin. Neurosci. 17, 327–335 (2015)
Article Google Scholar
Beard, C., Björgvinsson, T.: Beyond generalized anxiety disorder: psychometric properties of the GAD-7 in a heterogeneous psychiatric sample. J. Anxiety Disord. 28(6), 547–552 (2014)
Article Google Scholar
Brueckner, R.: Application of Deep Learning Methods in Computational Paralinguistics. Ph.D. thesis, Technische Universität München (2020)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 785–794. ACM, New York (2016)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)
Article Google Scholar
De Angel, V., et al.: Digital health tools for the passive monitoring of depression: a systematic review of methods. NPJ Digit. Med. 5(1), 3 (2022)
Google Scholar
Endler, N.S., Kocovski, N.L.: State and trait anxiety revisited. J. Anxiety Disorders 15(3), 231–245 (2001)
Google Scholar
Eyben, F.: Real-Time Speech and Music Classification by Large Audio Feature Space Extraction. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27299-3
Book Google Scholar
Eyben, F., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE – The Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM, Florence, Italy (2010)
Google Scholar
Huang, Z., Epps, J., Joachim, D.: Investigation of speech landmark patterns for depression detection. IEEE Trans. Affect. Comput. 13(2), 666–679 (2022)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Jeancolas, L., et al.: X-vectors: new quantitative biomarkers for early Parkinson’s disease detection from speech. Front. Neuroinform. 15, 578369 (2021)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kroenke, K., Spitzer, R.L., Williams, J.B.W.: The PHQ-9: validity of a brief depression severity measure. J. General Internal Med. 16(9), 606–613 (2001)
Google Scholar
Ma, X., Yang, H., Chen, Q., Huang, D., Wang, Y.: Depaudionet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/visual Emotion Challenge, pp. 35–42 (2016)
Google Scholar
Moro-Velazquez, L., Villalba, J., Dehak, N.: Using X-vectors to automatically detect Parkinson’s disease from speech. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1155–1159. IEEE (2020)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: International Conference on Machine Learning (ICML) 2010, pp. 807–814 (2010)
Google Scholar
Nirjhar, E.H., Behzadan, A., Chaspari, T.: Exploring bio-behavioral signal trajectories of state anxiety during public speaking. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1294–1298. IEEE (2020)
Google Scholar
Pappagari, R., Cho, J., Moro-Velazquez, L., Dehak, N.: Using state of the art speaker recognition and natural language processing technologies to detect Alzheimer’s disease and assess its severity. In: INTERSPEECH, pp. 2177–2181 (2020)
Google Scholar
Pappagari, R., Wang, T., Villalba, J., Chen, N., Dehak, N.: X-vectors meet emotions: a study on dependencies between emotion and speaker recognition. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7169–7173. IEEE (2020)
Google Scholar
Luis Felipe Parra-Gallego and Juan Rafael Orozco-Arroyave: Classification of emotions and evaluation of customer satisfaction from speech in real world acoustic environments. Digit. Sig. Process. 120, 103286 (2022)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number CONF. IEEE Signal Processing Society (2011)
Google Scholar
Raj, D., Snyder, D., Povey, D., Khudanpur, S.: Probing the information encoded in x-vectors. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 726–733. IEEE (2019)
Google Scholar
Ringeval, F., et al.: AV⁺EC 2015: the first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 3–8. ACM, Brisbane, Australia (2015)
Google Scholar
Sakib, Md.N., Nirjhar, E.H., Feng, K., Behzadan, A., Chaspari, T., Chaspari, T.: Exploring individual differences of public speaking anxiety in real-life and virtual presentations. IEEE Trans. Affect. Comput. 1 (2021)
Google Scholar
Salekin, A., Eberle, J.W., Glenn, J.J., Teachman, B.A., Stankovic, J.A.: A weakly supervised learning framework for detecting social anxiety and depression. Proc. ACM Interact. Mob. Wearable Ubiquit. Technol. 2(2), 1–26 (2018)
Google Scholar
Schuller, B.: Intelligent Audio Analysis – Speech, Music, and Sound Recognition in Real-Life Conditions. Habilitation thesis, Technische Universität München, Munich, Germany (2012)
Google Scholar
Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Chichester (2014)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, Brighton, UK (2009)
Google Scholar
Schuller, B., et al.: The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In: Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH), Singapore (2014)
Google Scholar
Schuller, B., et al.: Affective and behavioural computing: lessons learnt from the first computational paralinguistics challenge. Comput. Speech Lang. 53, 156–180 (2019)
Google Scholar
Schuller, B.W., et al.: The INTERSPEECH 2016 computational paralinguistics challenge: deception, sincerity & native language. In: Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH), vol. 2016, pp. 2001–2005. ISCA, San Francisco, CA, USA (2016)
Google Scholar
Schuller, B.W., et al.: The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates. In: Proceedings of the 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 431–435 (2021)
Google Scholar
Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D., Khudanpur, S.: Spoken language recognition using x-vectors. In: Odyssey, pp. 105–111 (2018)
Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)
Google Scholar
Spitzer, R.L., Kroenke, K., Williams, J.B.W., Löwe, B.: A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch. Intern. Med. 166(10), 1092–1097 (2006)
Google Scholar
Ting, K.M.: Precision and recall. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, p. 781. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_652
Valstar, M.F., Gratch, J., Schuller, B.W., Ringeval, F., Cowie, R., Pantic, M. (eds.) Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, AVEC@MM 2016. ACM, Amsterdam, October 2016
Google Scholar
Valstar, M.F., et al.: AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In: Schuller, B.W., Valstar, M.F., Cowie, R., Krajewski, J., Pantic, M. (eds.) Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, AVEC@ACM Multimedia 2013, Barcelona, Spain, 21 October 2013, pp. 3–10. ACM (2013)
Google Scholar
Waibel, A.H., Hanazawa, T., Hinton, G.E., Shikano, K., Kevin, J.L.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Sig. Process. 37, 328–339 (1989)
Google Scholar
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the acoustics of emotion in audio: what speech, music, and sound have in common. Front. Psychol. 4 (2013)
Google Scholar
Werneck, A.O., Silva, D.R.: Population density, depressive symptoms, and suicidal thoughts. Revista Brasileira de Psiquiatria (2020)
Google Scholar
Yin, W., Levis, B., Riehm, K.E., et al.: Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9: a systematic review and individual participant data meta-analysis. Psychol. Med. 50(8), 1368–1380 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Canary Speech Inc., Provo, UT, 84604, USA
Raymond Brueckner, Namhee Kwon, Vinod Subramanian, Nate Blaylock & Henry O’Connell

Authors

Raymond Brueckner
View author publications
You can also search for this author in PubMed Google Scholar
Namhee Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Nate Blaylock
View author publications
You can also search for this author in PubMed Google Scholar
Henry O’Connell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raymond Brueckner .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brueckner, R., Kwon, N., Subramanian, V., Blaylock, N., O’Connell, H. (2024). Audio-Based Detection of Anxiety and Depression via Vocal Biomarkers. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-031-53960-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-53960-2_9
Published: 21 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53959-6
Online ISBN: 978-3-031-53960-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Audio-Based Detection of Anxiety and Depression via Vocal Biomarkers