ABSTRACT
Depression is a prevalent psychiatric condition that has to be identified and treated right away. It may cause suicidal ideation in extreme instances. The requirement for creating an efficient audio-based automated depression identification system has recently piqued the fascination of researchers. The bulk of studies conducted so far incorporates a broad range of expertly created audio elements for a depression diagnosis. This expands feature space and causes a high-dimensionality problem which complicates pattern identification and increases the chance of data imbalance. This paper suggests a deep learning autoencoder-based method to retrieve pertinent and condensed features from speech signals in order to more precisely diagnose mental illness. The performance and efficacy of the suggested approach are evaluated on the DAIC-WoZ dataset and compared the results with other noteworthy machine learning algorithms. According to the findings, this technique works better than existing audio-based depression detection models when used with an SVM classifier resulting in an accuracy of 97% for diagnosing depression.
- Tuka Al Hanai, Mohammad M Ghassemi, and James R Glass. 2018. Detecting Depression with Audio/Text Sequence Modeling of Interviews.. In Interspeech. 1716–1720.Google Scholar
- Joana Correia, Bhiksha Raj, and Isabel Trancoso. 2018. Querying depression vlogs. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 987–993.Google ScholarCross Ref
- Joana Correia, Isabel Trancoso, and Bhiksha Raj. 2016. Detecting psychological distress in adults through transcriptions of clinical interviews. In Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23-25, 2016, Proceedings 3. Springer, 162–171.Google ScholarDigital Library
- Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Sebastian Schnieder, and Jarek Krajewski. 2015. Analysis of acoustic space variability in speech affected by depression. Speech Communication 75 (2015), 27–49.Google ScholarDigital Library
- Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing 7, 2 (2015), 190–202.Google Scholar
- Jonathan Gratch, Ron Artstein, Gale Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, 2014. The distress analysis interview corpus of human and computer interviews. Technical Report. University of Southern California Los Angeles.Google Scholar
- Lang He and Cui Cao. 2018. Automated depression analysis using convolutional neural networks from speech. Journal of biomedical informatics 83 (2018), 103–111.Google ScholarCross Ref
- Paula Lopez-Otero and Laura Docio-Fernandez. 2021. Analysis of gender and identity issues in depression detection on de-identified speech. Computer Speech & Language 65 (2021), 101118.Google ScholarCross Ref
- Paula Lopez-Otero, Laura Docio-Fernandez, and Carmen Garcia-Mateo. 2015. Assessing speaker independence on a speech-based depression level estimation system. Pattern Recognition Letters 68 (2015), 343–350.Google ScholarDigital Library
- Paula Lopez-Otero, Laura Docío Fernández, Alberto Abad, and Carmen Garcia-Mateo. 2017. Depression Detection Using Automatic Transcriptions of De-Identified Speech.. In INTERSPEECH. 3157–3161.Google Scholar
- Colin D Mathers and Dejan Loncar. 2006. Projections of global mortality and burden of disease from 2002 to 2030. PLoS medicine 3, 11 (2006), e442.Google Scholar
- Michelle Renee Morales and Rivka Levitan. 2016. Speech vs. text: A comparative analysis of features for depression detection systems. In 2016 IEEE spoken language technology workshop (SLT). IEEE, 136–143.Google Scholar
- Md Nasir, Arindam Jati, Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, and Panayiotis Georgiou. 2016. Multimodal and multiresolution depression detection from speech and facial landmark features. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 43–50.Google ScholarDigital Library
- World Health Organization. 2021. Depressive disorder (depression). https://www.who.int/news-room/fact-sheets/detail/depressionGoogle Scholar
- Eugenia Palylyk-Colwell and Charlene Argáez. 2018. Telehealth for the assessment and treatment of depression, post-traumatic stress disorder, and anxiety: clinical evidence. (2018).Google Scholar
- Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th annual workshop on audio/visual emotion challenge. 3–9.Google ScholarDigital Library
- Bo Sun, Yinghui Zhang, Jun He, Lejun Yu, Qihua Xu, Dongliang Li, and Zhaoying Wang. 2017. A random forest regression method with selected-text feature for depression assessment. In Proceedings of the 7th annual workshop on Audio/Visual emotion challenge. 61–68.Google ScholarDigital Library
- Zafi Sherhan Syed, Kirill Sidorov, and David Marshall. 2017. Depression severity prediction based on biomarkers of psychomotor retardation. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 37–43.Google ScholarDigital Library
- Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 3–10.Google ScholarDigital Library
- Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. 2014. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th international workshop on audio/visual emotion challenge. 3–10.Google ScholarDigital Library
- James R Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie Dagli, and Thomas F Quatieri. 2016. Detecting depression using vocal, facial and semantic communication cues. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. 11–18.Google ScholarDigital Library
- James R Williamson, Thomas F Quatieri, Brian S Helfer, Gregory Ciccarelli, and Daryush D Mehta. 2014. Vocal and facial biomarkers of depression based on motor incoordination and timing. In Proceedings of the 4th international workshop on audio/visual emotion challenge. 65–72.Google ScholarDigital Library
- Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision tree based depression classification from audio video and language information. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 89–96.Google ScholarDigital Library
- Le Yang, Dongmei Jiang, Xiaohan Xia, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2017. Multimodal measurement of depression using deep learning models. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 53–59.Google ScholarDigital Library
- Xiaowei Zhang, Jian Shen, Zia ud Din, Jinyong Liu, Gang Wang, and Bin Hu. 2019. Multimodal depression detection: fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble. IEEE journal of biomedical and health informatics 23, 6 (2019), 2265–2275.Google Scholar
Index Terms
- Deep Learning Technique to Diagnose Depression in Audio
Recommendations
Multi-Modal Depression Detection Based on High-Order Emotional Features
AICCC '22: Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing ConferenceThe diagnosis of depression has always been a difficulty in its treatment. At present, the research on automatic depression detection mostly directly uses low-order features such as video, audio and text as input. The lack of guidance of high-order ...
Harnessing emotions for depression detection
AbstractHuman emotions using textual cues, speech patterns, and facial expressions can give insight into their mental state. Although there are several uni-modal datasets for emotion recognition, there are very few labeled datasets for multi-modal ...
An effective analysis of deep learning based approaches for audio based feature extraction and its visualization
AbstractVisualizations help decipher latent patterns in music and garner a deep understanding of a song’s characteristics. This paper offers a critical analysis of the effectiveness of various state-of-the-art Deep Neural Networks in visualizing music. ...
Comments