Abstract
This paper presents a study on the application of Rhythm Formant Analysis (RFA) for automatic depression classification in speech signals. The research utilizes the EATD-corpus, a dataset specifically designed for studying depression in speech. The goal is to develop an effective classification system capable of distinguishing between depressed and non-depressed speech based on Rhythm Formant (RF) features. The proposed methodology involves extracting RFs from the speech signals using signal processing techniques. Two kinds of RFs, namely Amplitude Modulation (AM) and Frequency Modulation (FM) RFs and their combinations are analyzed and used as features for classification. These features provide valuable information about the temporal and spectral characteristics of the speech. The classification system is built using a Decision Tree (DT) classifier and its results are compared with logistic regression and random forest. The model’s performance is evaluated using the accuracy, F1 scores for each class and their macro and weighted averages. Experimental results demonstrate promising outcomes, with the DT classifier achieving an accuracy of 70%, a weighted average F1 score of 0.73 and a macro average F1 score of 0.53 when using FM RFs as feature, showing much better performance compared to other features and classifiers. These results indicate that the proposed approach effectively captures discriminative features related to depression in the speech signals. The findings suggest that RFs have the potential to serve as a valuable tool for building automatic depression classification systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al Hanai, T., Ghassemi, M.M., Glass, J.R.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)
Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Breakspear, M., Parker, G., et al.: From joyous to clinically depressed: Mood detection using spontaneous speech. In: FLAIRS Conference, vol. 19 (2012)
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., Quatieri, T.F.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)
France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47(7), 829–837 (2000)
Gibbon, D.: Speech rhythms: learning to discriminate speech styles. Proc. Speech Prosody 2022, 302–306 (2022)
Gibbon, D.: The rhythms of rhythm. J. Int. Phon. Assoc. 53(1), 233–265 (2023)
Gibbon, D., Li, P.: Quantifying and correlating rhythm formants in speech. arXiv preprint arXiv:1909.05639 (2019)
He, L., Cao, C.: Automated depression analysis using convolutional neural networks from speech. J. Biomed. Inform. 83, 103–111 (2018)
Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Inter. J. Data Mining Knowl. Manag. Process 5(2), 1 (2015)
Satt, A., Rozenberg, S., Hoory, R., et al.: Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeech, pp. 1089–1093 (2017)
Shen, Y., Yang, H., Lin, L.: Automatic depression detection: an emotional audio-textual corpus and a gru/bilstm-based model. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6247–6251. IEEE (2022)
Wu, P., Wang, R., Lin, H., Zhang, F., Tu, J., Sun, M.: Automatic depression recognition by intelligent speech signal processing: a systematic survey. CAAI Trans. Intell. Technol. 8(3), 701–711 (2023)
Yingthawornsuk, T., Keskinpala, H.K., Wilkes, D.M., Shiavi, R.G., Salomon, R.M.: Direct acoustic feature using iterative em algorithm and spectral energy for classifying suicidal speech. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Zhao, Z., et al.: Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders. IEEE J. Selected Topics Signal Process. 14(2), 423–434 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kaustubh, K., Gogoi, P., Prasanna, S. (2023). Rhythm Formant Analysis for Automatic Depression Classification. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)