Rhythm Formant Analysis for Automatic Depression Classification

Kaustubh, Kumar; Gogoi, Parismita; Prasanna, S.R.M

doi:10.1007/978-3-031-48309-7_8

Kumar Kaustubh¹³,
Parismita Gogoi^14,15 &
S.R.M Prasanna¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

International Conference on Speech and Computer

802 Accesses
3 Citations

Abstract

This paper presents a study on the application of Rhythm Formant Analysis (RFA) for automatic depression classification in speech signals. The research utilizes the EATD-corpus, a dataset specifically designed for studying depression in speech. The goal is to develop an effective classification system capable of distinguishing between depressed and non-depressed speech based on Rhythm Formant (RF) features. The proposed methodology involves extracting RFs from the speech signals using signal processing techniques. Two kinds of RFs, namely Amplitude Modulation (AM) and Frequency Modulation (FM) RFs and their combinations are analyzed and used as features for classification. These features provide valuable information about the temporal and spectral characteristics of the speech. The classification system is built using a Decision Tree (DT) classifier and its results are compared with logistic regression and random forest. The model’s performance is evaluated using the accuracy, F1 scores for each class and their macro and weighted averages. Experimental results demonstrate promising outcomes, with the DT classifier achieving an accuracy of 70%, a weighted average F1 score of 0.73 and a macro average F1 score of 0.53 when using FM RFs as feature, showing much better performance compared to other features and classifiers. These results indicate that the proposed approach effectively captures discriminative features related to depression in the speech signals. The findings suggest that RFs have the potential to serve as a valuable tool for building automatic depression classification systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Explained Artificial Intelligence-Based Solution to Identify Depression Severity Symptoms Using Acoustic Features

Article 01 December 2023

Enhancing Speech-Based Depression Detection Through Gender Dependent Vowel-Level Formant Features

Detecting Depression from Audio Data

References

Al Hanai, T., Ghassemi, M.M., Glass, J.R.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)
Google Scholar
Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Breakspear, M., Parker, G., et al.: From joyous to clinically depressed: Mood detection using spontaneous speech. In: FLAIRS Conference, vol. 19 (2012)
Google Scholar
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., Quatieri, T.F.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)
Article Google Scholar
France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47(7), 829–837 (2000)
Article Google Scholar
Gibbon, D.: Speech rhythms: learning to discriminate speech styles. Proc. Speech Prosody 2022, 302–306 (2022)
Article Google Scholar
Gibbon, D.: The rhythms of rhythm. J. Int. Phon. Assoc. 53(1), 233–265 (2023)
Article Google Scholar
Gibbon, D., Li, P.: Quantifying and correlating rhythm formants in speech. arXiv preprint arXiv:1909.05639 (2019)
He, L., Cao, C.: Automated depression analysis using convolutional neural networks from speech. J. Biomed. Inform. 83, 103–111 (2018)
Article Google Scholar
Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Inter. J. Data Mining Knowl. Manag. Process 5(2), 1 (2015)
Article Google Scholar
Satt, A., Rozenberg, S., Hoory, R., et al.: Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeech, pp. 1089–1093 (2017)
Google Scholar
Shen, Y., Yang, H., Lin, L.: Automatic depression detection: an emotional audio-textual corpus and a gru/bilstm-based model. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6247–6251. IEEE (2022)
Google Scholar
Wu, P., Wang, R., Lin, H., Zhang, F., Tu, J., Sun, M.: Automatic depression recognition by intelligent speech signal processing: a systematic survey. CAAI Trans. Intell. Technol. 8(3), 701–711 (2023)
Article Google Scholar
Yingthawornsuk, T., Keskinpala, H.K., Wilkes, D.M., Shiavi, R.G., Salomon, R.M.: Direct acoustic feature using iterative em algorithm and spectral energy for classifying suicidal speech. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Google Scholar
Zhao, Z., et al.: Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders. IEEE J. Selected Topics Signal Process. 14(2), 423–434 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology, Dharwad, India
Kumar Kaustubh & S.R.M Prasanna
Indian Institute of Technology, Guwahati, India
Parismita Gogoi
DUIET, Dibrugarh University, Dibrugarh, India
Parismita Gogoi

Authors

Kumar Kaustubh
View author publications
You can also search for this author in PubMed Google Scholar
Parismita Gogoi
View author publications
You can also search for this author in PubMed Google Scholar
S.R.M Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumar Kaustubh .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaustubh, K., Gogoi, P., Prasanna, S. (2023). Rhythm Formant Analysis for Automatic Depression Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-48309-7_8
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rhythm Formant Analysis for Automatic Depression Classification