Skip to main content
Log in

Depression detection using cascaded attention based deep learning framework using speech data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Efficient detection of depression is a challenging scenario in the field of speech signal processing. Since the speech signals provide a better diagnosis of depression, a significant methodology is required for detection. However, manual examination performed by radiologists can be time-consuming and may not be feasible in complex circumstances. Diverse detection methodologies have been proposed previously, but they are found to be less accurate, time-consuming and lead over maximized error rates. The proposed research article presents an effective and automatic deep learning-based depression detection using speech signal data. The steps involved in depression prediction are data acquisition, pre-processing, Feature Extraction, Feature selection and classification. The initial step in depression detection is data acquisition, which aims at collecting speech signals from the Distress Analysis Interview Corpus (DAIC-WOZ) and Sonde Health-free speech (SH2-FS) datasets. The collected data are pre-processed through MS_DWT (Multi-stage Discrete Wavelet Transform) to offer noise-free signals and improved signal quality. The relevant features required for processing the speech signal are extracted through Hilbert Huang (H-H) transform linear prediction cepstrum coefficient (LPCC), fundamental frequency, formants, speaking rate and Mel frequency cepstral coefficients (MFCC). From the extracted features, ideal features required for enhancing the detection accuracy are selected using the Price Auction optimization algorithm (PAOA). Finally, the depression and non-depression states are classified using deep convolutional Attention Cascaded two directional long short-term memory (DAttn_Conv 2D LSTM) with a softmax classifier. The overall accuracy obtained in classifying the depressed and non-depressed classes is 97.82% and 98.91%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article.

References

  1. Lin L, Chen X, Shen Y, Zhang L (2020) towards automatic depression detection: a bilstm/1d cnn-based mosdel. Appl Sci 10(23):8701

    Article  Google Scholar 

  2. Loh HW, Ooi CP, Aydemir E, Tuncer T, Dogan S, Acharya UR (2022) Decision support system for major depression detection using spectrogram and convolution neural network with EEG signals. Expert Syst 39(3):e12773

    Article  Google Scholar 

  3. Casado CÁ, Cañellas ML, López MB (2022) Depression recognition using remote photoplethysmography from facial videos. IEEE Trans Affect Comput. arXiv preprint arXiv:2206.04399

  4. Park J, Moon N (2022) Design and implementation of attention depression detection model based on multi-modal analysis. Sustainability 14(6):3569

    Article  Google Scholar 

  5. Campbell EL, Dineley J, Conde P, Matcham F, Lamers F, Siddi S, Docio-Fernandez L, Garcia-Mateo C, Cummins N (2022) Detecting the severity of major depressive disorder from speech: a novel HARD-training methodology. arXiv preprint arXiv:2206.01542

  6. Punithavathi R, Sharmila M, Avudaiappan T, Raj I, Kanchana S, Alemayehu Mamo SA (2022) Empirical investigation for predicting depression from different machine learning based voice recognition techniques. Evid Based Complement Alternat Med 

  7. Dumpala SH, Uher R, Matwin S, Kiefte M, Oore S (2022) Sine-wave speech and privacy-preserving depression detection. In: Proc. SMM21, Workshop on Speech, Music and Mind, 2021:11–15

  8. Xu L, Hou J, Gao J (2021) A novel smart depression recognition method using human-computer interaction system. Wirel Communic Mob Comput. 1–8

  9. Rajawat AS, Rawat R, Barhanpurkar K, Shaw RN, Ghosh A (2021) Depression detection for elderly people using AI robotic systems leveraging the Nelder–Mead Method. In: Artificial Intelligence for Future Generation Robotics, Elsevier, pp 55–70

  10. Guo W, Yang H, Liu Z, Xu Y, Hu B (2021) deep neural networks for depression recognition based on 2d and 3d facial expressions under emotional stimulus tasks. Front Neurosci 15:609760

    Article  Google Scholar 

  11. Lu X, Shi D, Liu Y, Yuan J (2021) Speech depression recognition based on attentional residual network. Front Biosci Landmark 26(12):1746–1759

    Article  Google Scholar 

  12. Villatoro-Tello E, Dubagunta SP, Fritsch J, Ramírez-de-la-Rosa G, Motlicek P, Magimai-Doss M (2021) Late fusion of the available lexicon and raw waveform-based acoustic modeling for depression and dementia recognition. In: Interspeech, pp 1927–1931

  13. Yang J, Lu H, Li C, Hu X, Hu B (2022) Data augmentation for depression detection using skeleton-based gait information. Med Biol Eng Comput 60(9):2665–2679

  14. Wang H, Liu Y, Zhen X, Tu X (2021) Depression speech recognition with a three-dimensional convolutional network. Front Hum Neurosci 15:713823

  15. Tadalagi M, Joshi AM (2021) AutoDep: automatic depression detection using facial expressions based on linear binary pattern descriptor. Med Biol Eng Compu 59(6):1339–1354

    Article  Google Scholar 

  16. Angskun J, Tipprasert S, Angskun T (2022) big data analytics on social networks for real-time depression detection. J Big Data 9(1):1–15

    Article  Google Scholar 

  17. He L, Chan JCW, Wang Z (2021) Automatic depression recognition using CNN with attention mechanism from videos. Neurocomputing 422:165–175

    Article  Google Scholar 

  18. Seal A, Bajpai R, Agnihotri J, Yazidi A, Herrera-Viedma E, Krejcar O (2021) DeprNet: a deep convolution neural network framework for detecting depression using EEG. IEEE Trans Instrum Meas 70:1–13

    Article  Google Scholar 

  19. Agarwal G, Om H (2021) Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition. Multimedia Tool Appl 80(7):9961–9992

    Article  Google Scholar 

  20. Agarwal G, Om H (2021) an efficient supervised framework for music mood recognition using autoencoder-based optimized support vector regression model. IET Signal Proc 15(2):98–121

    Article  Google Scholar 

  21. Gupta S, Agarwal G, Kumar V (2013) an efficient and robust genetic algorithm for multiprocessor task scheduling. Int J Comput Theory Eng 5(2):377

    Article  Google Scholar 

  22. Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5):676

    Article  Google Scholar 

  23. Dong Y, Yang X (2021) A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 441:279–290

    Article  Google Scholar 

  24. Cai C, Niu M, Liu B, Tao J, Liu X (2021) TDCA-Net: time-domain channel attention network for depression detection. In: Interspeech, pp 2511–25155

  25. Srimadhur NS, Lalitha S (2020) an end-to-end model for detection and assessment of depression levels using speech. Procedia Comput Sci 171:12–21

    Article  Google Scholar 

  26. Vázquez-Romero A, Antolín AG (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688

    Article  Google Scholar 

  27. Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107

    Article  Google Scholar 

  28. Ansari L, Ji S, Chen Q, Cambria E (2022) Ensemble hybrid learning methods for automated depression detection. IEEE Trans Comput Soc Syst 10(1):211–219

    Article  Google Scholar 

  29. Shen Y, Yang H, Lin L (2022) Automatic depression detection: An emotional audio-textual corpus and a GRU/BiLSTM-based model. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6247–6251

  30. Muzammel M, Salam H, Hoffmann Y, Chetouani M, Othmani A (2020) AudVowelConsNet: a phoneme-level based deep CNN architecture for clinical depression diagnosis. Mach Learn Appl 2:100005

    Google Scholar 

  31. Zhao Y, Xie Y, Liang R, Zhang L, Zhao L, Liu C (2021) Detecting depression from speech through an attentive LSTM network. IEICE Trans Inf Syst 104(11):2019–2023

    Article  Google Scholar 

  32. Saidi A, Othman SB, Saoud SB (2020) Hybrid CNN-SVM classifier for efficient depression detection system. In: 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), IEEE, 229–234

  33. Walsh D, Dev S, Nag A (2023) Hilbert-Huang-Transform Based Features for Accent Classification of Non-Native English Speakers. In: 2023 34th Irish Signals and Systems Conference (ISSC), IEEE 1–6

  34. Darling DS, Hinduja J (2022) Feature extraction in speech recognition using linear predictive coding: an overview. i-Manager’s J Digit Signal Process 10(2):16

    Google Scholar 

  35. Dutta D, Choudhury RD, Gogoi S (n.d.) Speech databases, features extraction techniques and classifiers with special reference to automatic speech emotion recognition

  36. Seneviratne N, Espy-Wilson C (2021) Speech based depression severity level classification using a multi-stage dilated cnn-lstm model. arXiv preprint arXiv:2104.04195

  37. Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7:44883–44893

    Article  Google Scholar 

  38. Liao S-C, Wu C-T, Huang H-C, Cheng W-T, Liu Y-H (2017) Major depression detection from EEG signals using kernel eigen-filter-bank common spatial patterns. Sensors 17(6):1385

    Article  Google Scholar 

  39. Yalamanchili B, Kota NS, Abbaraju MS, Nadella VSS, Alluri SV (2020) Real-time acoustic based depression detection using machine learning techniques. In: 2020 International conference on emerging trends in information technology and engineering (ic-ETITE), IEEE, 1–6

  40. Fang M, Peng S, Liang Y, Hung C-C, Liu S (2023) A multi-modal fusion model with multi-level attention mechanism for depression detection. Biomed Signal Process Control 82:104561

    Article  Google Scholar 

  41. Yin F, Du J, Xu X, Zhao L (2023) Depression detection in speech using transformer and parallel convolutional neural networks. Electronics 12(2):328

    Article  Google Scholar 

  42. Huang Z, Epps J, Joachim D (2020) Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6549–6553

  43. Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D (2023) Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 323:299–308

    Article  Google Scholar 

Download references

Funding

No funding is provided for the preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors have equal contributions in this work.

Corresponding author

Correspondence to Gaurav Agarwal.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The representation of the DWT speech signal is illustrated in Fig. 12 below.

Fig. 12
figure 12

DWT of the speech signal

The representation of inverse DWT of the speech signal is illustrated in Fig. 13.

Fig. 13
figure 13

Inverse DWT of the speech signal

Table 13 below describes the Performance Metrics and its Assessment considered in the proposed model.

Table 13 Performance metrics

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Agarwal, G., Agarwal, S. et al. Depression detection using cascaded attention based deep learning framework using speech data. Multimed Tools Appl 83, 66135–66173 (2024). https://doi.org/10.1007/s11042-023-18076-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-18076-w

Keywords