Abstract
Efficient detection of depression is a challenging scenario in the field of speech signal processing. Since the speech signals provide a better diagnosis of depression, a significant methodology is required for detection. However, manual examination performed by radiologists can be time-consuming and may not be feasible in complex circumstances. Diverse detection methodologies have been proposed previously, but they are found to be less accurate, time-consuming and lead over maximized error rates. The proposed research article presents an effective and automatic deep learning-based depression detection using speech signal data. The steps involved in depression prediction are data acquisition, pre-processing, Feature Extraction, Feature selection and classification. The initial step in depression detection is data acquisition, which aims at collecting speech signals from the Distress Analysis Interview Corpus (DAIC-WOZ) and Sonde Health-free speech (SH2-FS) datasets. The collected data are pre-processed through MS_DWT (Multi-stage Discrete Wavelet Transform) to offer noise-free signals and improved signal quality. The relevant features required for processing the speech signal are extracted through Hilbert Huang (H-H) transform linear prediction cepstrum coefficient (LPCC), fundamental frequency, formants, speaking rate and Mel frequency cepstral coefficients (MFCC). From the extracted features, ideal features required for enhancing the detection accuracy are selected using the Price Auction optimization algorithm (PAOA). Finally, the depression and non-depression states are classified using deep convolutional Attention Cascaded two directional long short-term memory (DAttn_Conv 2D LSTM) with a softmax classifier. The overall accuracy obtained in classifying the depressed and non-depressed classes is 97.82% and 98.91%, respectively.











Similar content being viewed by others
Data availability
Data sharing is not applicable to this article.
References
Lin L, Chen X, Shen Y, Zhang L (2020) towards automatic depression detection: a bilstm/1d cnn-based mosdel. Appl Sci 10(23):8701
Loh HW, Ooi CP, Aydemir E, Tuncer T, Dogan S, Acharya UR (2022) Decision support system for major depression detection using spectrogram and convolution neural network with EEG signals. Expert Syst 39(3):e12773
Casado CÁ, Cañellas ML, López MB (2022) Depression recognition using remote photoplethysmography from facial videos. IEEE Trans Affect Comput. arXiv preprint arXiv:2206.04399
Park J, Moon N (2022) Design and implementation of attention depression detection model based on multi-modal analysis. Sustainability 14(6):3569
Campbell EL, Dineley J, Conde P, Matcham F, Lamers F, Siddi S, Docio-Fernandez L, Garcia-Mateo C, Cummins N (2022) Detecting the severity of major depressive disorder from speech: a novel HARD-training methodology. arXiv preprint arXiv:2206.01542
Punithavathi R, Sharmila M, Avudaiappan T, Raj I, Kanchana S, Alemayehu Mamo SA (2022) Empirical investigation for predicting depression from different machine learning based voice recognition techniques. Evid Based Complement Alternat Med
Dumpala SH, Uher R, Matwin S, Kiefte M, Oore S (2022) Sine-wave speech and privacy-preserving depression detection. In: Proc. SMM21, Workshop on Speech, Music and Mind, 2021:11–15
Xu L, Hou J, Gao J (2021) A novel smart depression recognition method using human-computer interaction system. Wirel Communic Mob Comput. 1–8
Rajawat AS, Rawat R, Barhanpurkar K, Shaw RN, Ghosh A (2021) Depression detection for elderly people using AI robotic systems leveraging the Nelder–Mead Method. In: Artificial Intelligence for Future Generation Robotics, Elsevier, pp 55–70
Guo W, Yang H, Liu Z, Xu Y, Hu B (2021) deep neural networks for depression recognition based on 2d and 3d facial expressions under emotional stimulus tasks. Front Neurosci 15:609760
Lu X, Shi D, Liu Y, Yuan J (2021) Speech depression recognition based on attentional residual network. Front Biosci Landmark 26(12):1746–1759
Villatoro-Tello E, Dubagunta SP, Fritsch J, Ramírez-de-la-Rosa G, Motlicek P, Magimai-Doss M (2021) Late fusion of the available lexicon and raw waveform-based acoustic modeling for depression and dementia recognition. In: Interspeech, pp 1927–1931
Yang J, Lu H, Li C, Hu X, Hu B (2022) Data augmentation for depression detection using skeleton-based gait information. Med Biol Eng Comput 60(9):2665–2679
Wang H, Liu Y, Zhen X, Tu X (2021) Depression speech recognition with a three-dimensional convolutional network. Front Hum Neurosci 15:713823
Tadalagi M, Joshi AM (2021) AutoDep: automatic depression detection using facial expressions based on linear binary pattern descriptor. Med Biol Eng Compu 59(6):1339–1354
Angskun J, Tipprasert S, Angskun T (2022) big data analytics on social networks for real-time depression detection. J Big Data 9(1):1–15
He L, Chan JCW, Wang Z (2021) Automatic depression recognition using CNN with attention mechanism from videos. Neurocomputing 422:165–175
Seal A, Bajpai R, Agnihotri J, Yazidi A, Herrera-Viedma E, Krejcar O (2021) DeprNet: a deep convolution neural network framework for detecting depression using EEG. IEEE Trans Instrum Meas 70:1–13
Agarwal G, Om H (2021) Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition. Multimedia Tool Appl 80(7):9961–9992
Agarwal G, Om H (2021) an efficient supervised framework for music mood recognition using autoencoder-based optimized support vector regression model. IET Signal Proc 15(2):98–121
Gupta S, Agarwal G, Kumar V (2013) an efficient and robust genetic algorithm for multiprocessor task scheduling. Int J Comput Theory Eng 5(2):377
Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5):676
Dong Y, Yang X (2021) A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 441:279–290
Cai C, Niu M, Liu B, Tao J, Liu X (2021) TDCA-Net: time-domain channel attention network for depression detection. In: Interspeech, pp 2511–25155
Srimadhur NS, Lalitha S (2020) an end-to-end model for detection and assessment of depression levels using speech. Procedia Comput Sci 171:12–21
Vázquez-Romero A, Antolín AG (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107
Ansari L, Ji S, Chen Q, Cambria E (2022) Ensemble hybrid learning methods for automated depression detection. IEEE Trans Comput Soc Syst 10(1):211–219
Shen Y, Yang H, Lin L (2022) Automatic depression detection: An emotional audio-textual corpus and a GRU/BiLSTM-based model. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6247–6251
Muzammel M, Salam H, Hoffmann Y, Chetouani M, Othmani A (2020) AudVowelConsNet: a phoneme-level based deep CNN architecture for clinical depression diagnosis. Mach Learn Appl 2:100005
Zhao Y, Xie Y, Liang R, Zhang L, Zhao L, Liu C (2021) Detecting depression from speech through an attentive LSTM network. IEICE Trans Inf Syst 104(11):2019–2023
Saidi A, Othman SB, Saoud SB (2020) Hybrid CNN-SVM classifier for efficient depression detection system. In: 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), IEEE, 229–234
Walsh D, Dev S, Nag A (2023) Hilbert-Huang-Transform Based Features for Accent Classification of Non-Native English Speakers. In: 2023 34th Irish Signals and Systems Conference (ISSC), IEEE 1–6
Darling DS, Hinduja J (2022) Feature extraction in speech recognition using linear predictive coding: an overview. i-Manager’s J Digit Signal Process 10(2):16
Dutta D, Choudhury RD, Gogoi S (n.d.) Speech databases, features extraction techniques and classifiers with special reference to automatic speech emotion recognition
Seneviratne N, Espy-Wilson C (2021) Speech based depression severity level classification using a multi-stage dilated cnn-lstm model. arXiv preprint arXiv:2104.04195
Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7:44883–44893
Liao S-C, Wu C-T, Huang H-C, Cheng W-T, Liu Y-H (2017) Major depression detection from EEG signals using kernel eigen-filter-bank common spatial patterns. Sensors 17(6):1385
Yalamanchili B, Kota NS, Abbaraju MS, Nadella VSS, Alluri SV (2020) Real-time acoustic based depression detection using machine learning techniques. In: 2020 International conference on emerging trends in information technology and engineering (ic-ETITE), IEEE, 1–6
Fang M, Peng S, Liang Y, Hung C-C, Liu S (2023) A multi-modal fusion model with multi-level attention mechanism for depression detection. Biomed Signal Process Control 82:104561
Yin F, Du J, Xu X, Zhao L (2023) Depression detection in speech using transformer and parallel convolutional neural networks. Electronics 12(2):328
Huang Z, Epps J, Joachim D (2020) Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6549–6553
Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D (2023) Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 323:299–308
Funding
No funding is provided for the preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
All authors have equal contributions in this work.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
All the authors involved have agreed to participate in this submitted article.
Consent to publish
All the authors involved in this manuscript give full consent for publication of this submitted article.
Conflict of interest
Authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, S., Agarwal, G., Agarwal, S. et al. Depression detection using cascaded attention based deep learning framework using speech data. Multimed Tools Appl 83, 66135–66173 (2024). https://doi.org/10.1007/s11042-023-18076-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-18076-w