Depression detection using cascaded attention based deep learning framework using speech data

Gupta, Sachi; Agarwal, Gaurav; Agarwal, Shivani; Pandey, Dilkeshwar

doi:10.1007/s11042-023-18076-w

Depression detection using cascaded attention based deep learning framework using speech data

Published: 22 January 2024

Volume 83, pages 66135–66173, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sachi Gupta¹,
Gaurav Agarwal²,
Shivani Agarwal³ &
…
Dilkeshwar Pandey⁴

964 Accesses
1 Altmetric
Explore all metrics

Abstract

Efficient detection of depression is a challenging scenario in the field of speech signal processing. Since the speech signals provide a better diagnosis of depression, a significant methodology is required for detection. However, manual examination performed by radiologists can be time-consuming and may not be feasible in complex circumstances. Diverse detection methodologies have been proposed previously, but they are found to be less accurate, time-consuming and lead over maximized error rates. The proposed research article presents an effective and automatic deep learning-based depression detection using speech signal data. The steps involved in depression prediction are data acquisition, pre-processing, Feature Extraction, Feature selection and classification. The initial step in depression detection is data acquisition, which aims at collecting speech signals from the Distress Analysis Interview Corpus (DAIC-WOZ) and Sonde Health-free speech (SH2-FS) datasets. The collected data are pre-processed through MS_DWT (Multi-stage Discrete Wavelet Transform) to offer noise-free signals and improved signal quality. The relevant features required for processing the speech signal are extracted through Hilbert Huang (H-H) transform linear prediction cepstrum coefficient (LPCC), fundamental frequency, formants, speaking rate and Mel frequency cepstral coefficients (MFCC). From the extracted features, ideal features required for enhancing the detection accuracy are selected using the Price Auction optimization algorithm (PAOA). Finally, the depression and non-depression states are classified using deep convolutional Attention Cascaded two directional long short-term memory (DAttn_Conv 2D LSTM) with a softmax classifier. The overall accuracy obtained in classifying the depressed and non-depressed classes is 97.82% and 98.91%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for Depression Recognition from Speech

Article 26 January 2023

A CNN-Based Method for Depression Detecting Form Audio

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Article Open access 25 April 2024

Data availability

Data sharing is not applicable to this article.

References

Lin L, Chen X, Shen Y, Zhang L (2020) towards automatic depression detection: a bilstm/1d cnn-based mosdel. Appl Sci 10(23):8701
Article Google Scholar
Loh HW, Ooi CP, Aydemir E, Tuncer T, Dogan S, Acharya UR (2022) Decision support system for major depression detection using spectrogram and convolution neural network with EEG signals. Expert Syst 39(3):e12773
Article Google Scholar
Casado CÁ, Cañellas ML, López MB (2022) Depression recognition using remote photoplethysmography from facial videos. IEEE Trans Affect Comput. arXiv preprint arXiv:2206.04399
Park J, Moon N (2022) Design and implementation of attention depression detection model based on multi-modal analysis. Sustainability 14(6):3569
Article Google Scholar
Campbell EL, Dineley J, Conde P, Matcham F, Lamers F, Siddi S, Docio-Fernandez L, Garcia-Mateo C, Cummins N (2022) Detecting the severity of major depressive disorder from speech: a novel HARD-training methodology. arXiv preprint arXiv:2206.01542
Punithavathi R, Sharmila M, Avudaiappan T, Raj I, Kanchana S, Alemayehu Mamo SA (2022) Empirical investigation for predicting depression from different machine learning based voice recognition techniques. Evid Based Complement Alternat Med
Dumpala SH, Uher R, Matwin S, Kiefte M, Oore S (2022) Sine-wave speech and privacy-preserving depression detection. In: Proc. SMM21, Workshop on Speech, Music and Mind, 2021:11–15
Xu L, Hou J, Gao J (2021) A novel smart depression recognition method using human-computer interaction system. Wirel Communic Mob Comput. 1–8
Rajawat AS, Rawat R, Barhanpurkar K, Shaw RN, Ghosh A (2021) Depression detection for elderly people using AI robotic systems leveraging the Nelder–Mead Method. In: Artificial Intelligence for Future Generation Robotics, Elsevier, pp 55–70
Guo W, Yang H, Liu Z, Xu Y, Hu B (2021) deep neural networks for depression recognition based on 2d and 3d facial expressions under emotional stimulus tasks. Front Neurosci 15:609760
Article Google Scholar
Lu X, Shi D, Liu Y, Yuan J (2021) Speech depression recognition based on attentional residual network. Front Biosci Landmark 26(12):1746–1759
Article Google Scholar
Villatoro-Tello E, Dubagunta SP, Fritsch J, Ramírez-de-la-Rosa G, Motlicek P, Magimai-Doss M (2021) Late fusion of the available lexicon and raw waveform-based acoustic modeling for depression and dementia recognition. In: Interspeech, pp 1927–1931
Yang J, Lu H, Li C, Hu X, Hu B (2022) Data augmentation for depression detection using skeleton-based gait information. Med Biol Eng Comput 60(9):2665–2679
Wang H, Liu Y, Zhen X, Tu X (2021) Depression speech recognition with a three-dimensional convolutional network. Front Hum Neurosci 15:713823
Tadalagi M, Joshi AM (2021) AutoDep: automatic depression detection using facial expressions based on linear binary pattern descriptor. Med Biol Eng Compu 59(6):1339–1354
Article Google Scholar
Angskun J, Tipprasert S, Angskun T (2022) big data analytics on social networks for real-time depression detection. J Big Data 9(1):1–15
Article Google Scholar
He L, Chan JCW, Wang Z (2021) Automatic depression recognition using CNN with attention mechanism from videos. Neurocomputing 422:165–175
Article Google Scholar
Seal A, Bajpai R, Agnihotri J, Yazidi A, Herrera-Viedma E, Krejcar O (2021) DeprNet: a deep convolution neural network framework for detecting depression using EEG. IEEE Trans Instrum Meas 70:1–13
Article Google Scholar
Agarwal G, Om H (2021) Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition. Multimedia Tool Appl 80(7):9961–9992
Article Google Scholar
Agarwal G, Om H (2021) an efficient supervised framework for music mood recognition using autoencoder-based optimized support vector regression model. IET Signal Proc 15(2):98–121
Article Google Scholar
Gupta S, Agarwal G, Kumar V (2013) an efficient and robust genetic algorithm for multiprocessor task scheduling. Int J Comput Theory Eng 5(2):377
Article Google Scholar
Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5):676
Article Google Scholar
Dong Y, Yang X (2021) A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 441:279–290
Article Google Scholar
Cai C, Niu M, Liu B, Tao J, Liu X (2021) TDCA-Net: time-domain channel attention network for depression detection. In: Interspeech, pp 2511–25155
Srimadhur NS, Lalitha S (2020) an end-to-end model for detection and assessment of depression levels using speech. Procedia Comput Sci 171:12–21
Article Google Scholar
Vázquez-Romero A, Antolín AG (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688
Article Google Scholar
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107
Article Google Scholar
Ansari L, Ji S, Chen Q, Cambria E (2022) Ensemble hybrid learning methods for automated depression detection. IEEE Trans Comput Soc Syst 10(1):211–219
Article Google Scholar
Shen Y, Yang H, Lin L (2022) Automatic depression detection: An emotional audio-textual corpus and a GRU/BiLSTM-based model. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6247–6251
Muzammel M, Salam H, Hoffmann Y, Chetouani M, Othmani A (2020) AudVowelConsNet: a phoneme-level based deep CNN architecture for clinical depression diagnosis. Mach Learn Appl 2:100005
Google Scholar
Zhao Y, Xie Y, Liang R, Zhang L, Zhao L, Liu C (2021) Detecting depression from speech through an attentive LSTM network. IEICE Trans Inf Syst 104(11):2019–2023
Article Google Scholar
Saidi A, Othman SB, Saoud SB (2020) Hybrid CNN-SVM classifier for efficient depression detection system. In: 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), IEEE, 229–234
Walsh D, Dev S, Nag A (2023) Hilbert-Huang-Transform Based Features for Accent Classification of Non-Native English Speakers. In: 2023 34th Irish Signals and Systems Conference (ISSC), IEEE 1–6
Darling DS, Hinduja J (2022) Feature extraction in speech recognition using linear predictive coding: an overview. i-Manager’s J Digit Signal Process 10(2):16
Google Scholar
Dutta D, Choudhury RD, Gogoi S (n.d.) Speech databases, features extraction techniques and classifiers with special reference to automatic speech emotion recognition
Seneviratne N, Espy-Wilson C (2021) Speech based depression severity level classification using a multi-stage dilated cnn-lstm model. arXiv preprint arXiv:2104.04195
Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7:44883–44893
Article Google Scholar
Liao S-C, Wu C-T, Huang H-C, Cheng W-T, Liu Y-H (2017) Major depression detection from EEG signals using kernel eigen-filter-bank common spatial patterns. Sensors 17(6):1385
Article Google Scholar
Yalamanchili B, Kota NS, Abbaraju MS, Nadella VSS, Alluri SV (2020) Real-time acoustic based depression detection using machine learning techniques. In: 2020 International conference on emerging trends in information technology and engineering (ic-ETITE), IEEE, 1–6
Fang M, Peng S, Liang Y, Hung C-C, Liu S (2023) A multi-modal fusion model with multi-level attention mechanism for depression detection. Biomed Signal Process Control 82:104561
Article Google Scholar
Yin F, Du J, Xu X, Zhao L (2023) Depression detection in speech using transformer and parallel convolutional neural networks. Electronics 12(2):328
Article Google Scholar
Huang Z, Epps J, Joachim D (2020) Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6549–6553
Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D (2023) Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 323:299–308
Article Google Scholar

Download references

Funding

No funding is provided for the preparation of the manuscript.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Galgotias College of Engineering & Technology, Greater Noida, Uttar Pradesh, 201310, India
Sachi Gupta
School of Computer Science & Engineering, Galgotias University, Gr. Noida, Uttar Pradesh, 203201, India
Gaurav Agarwal
Department of Information Technology, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, 201009, India
Shivani Agarwal
Department of Computer Science & Engineering, KIET Group of Institutions, Ghaziabad, Uttar Pradesh, 201206, India
Dilkeshwar Pandey

Authors

Sachi Gupta
View author publications
You can also search for this author inPubMed Google Scholar
Gaurav Agarwal
View author publications
You can also search for this author inPubMed Google Scholar
Shivani Agarwal
View author publications
You can also search for this author inPubMed Google Scholar
Dilkeshwar Pandey
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors have equal contributions in this work.

Corresponding author

Correspondence to Gaurav Agarwal.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The representation of the DWT speech signal is illustrated in Fig. 12 below.

The representation of inverse DWT of the speech signal is illustrated in Fig. 13.

Table 13 below describes the Performance Metrics and its Assessment considered in the proposed model.

Table 13 Performance metrics

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, S., Agarwal, G., Agarwal, S. et al. Depression detection using cascaded attention based deep learning framework using speech data. Multimed Tools Appl 83, 66135–66173 (2024). https://doi.org/10.1007/s11042-023-18076-w

Download citation

Received: 20 April 2023
Revised: 06 October 2023
Accepted: 26 December 2023
Published: 22 January 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11042-023-18076-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Depression detection using cascaded attention based deep learning framework using speech data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning for Depression Recognition from Speech

A CNN-Based Method for Depression Detecting Form Audio

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now