Abstract
Dysarthria is a motor speech impairment that impacts verbal articulation and co-ordination. Detecting dysarthria is a primary and essential step for early diagnosis and treatment. In this paper, we attempt dysarthric speech detection from telephone quality speech by using pitch perturbation (PP) measures computed with the recently introduced continuous wavelet transform (CWT)-based epoch extraction approach. This approach has the strong advantage that it is highly robust to telephone channel degradations. Six PP measures were computed from the extracted epochs. For comparison, the PP measures were also derived using two well-known epoch extraction methods, namely, zero-frequency filtering (ZFF) and dynamic programming phase slope algorithm (DYPSA). The experiments were carried out using the TORGO dysarthric speech database, which consists of speech from 7 healthy speakers and 8 dysarthric speakers. The G.191 software tools were used to convert clean speech to telephone speech. The results show that the PP measures computed with the CWT-based approach can better discriminate dysarthric and healthy speakers under telephone environment than those extracted with the other two epoch extraction methods.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adiga, N., Vikram, C. M., Pullela, K., & Prasanna, S. M. (2017). Zero frequency filter based analysis of voice disorders. In Proceedings of the Interspeech 2017, August 20–24, Stockholm, Sweden.
Berisha, V., Liss, J., Sandoval, S., Utianski, R., & Spanias, A. (2014). Modeling pathological speech perception from data with similarity labels. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 915–919).
Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017). Automatic assessment of dysarthria severity level using audio descriptors. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5070–5074).
Black, A. W., King, S., & Tokuda, K. (2009). The blizzard challenge 2009. In Proceedings of the of blizzard challenge (pp. 1–24).
Cortes, C., & Vapnik, V. (1995). Two-stage learning kernel algorithms. Machine Learning, 20(3), 273-297.
Daoudi, K., & Kumar, A. J. (2015). Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification. In Proceedings of the Interspeech.
Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Health Sciences.
Enderby, P. M. (1983). Frenchay dysarthria assessment. College Hill Press.
Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 835–838).
Falk, T. H., Chan, W.-Y., & Shein, F. (2012). Characterization of atypical vocal source excitation temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54, 622–631.
Gillespie, S., Logan, Y.-Y., Moore, E., Laures-Gore, J., Russell, S., & Patel, R. (2017). Cross-database models for the classification of dysarthria presence. In Proceedings of the Interspeech (pp. 3127–3131).
Gurugubelli, K., & Vuppala, A. K. (2019). Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 6410–6414).
ITU-T, Recommendation G. 191. (2005). Software tools for speech and audio coding standardization. International Telecommunication Union. Retrieved from https://www.itu.int/rec/T-REC-G.191/en
Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence level pathological speech. Computer Speech & Language, 29, 132–144.
Madhu Keerthana, Y., Kiran Reddy, M., & Sreenivasa Rao, K. (2019). CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters, 26, 1107–1111.
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Narendra, N. P., & Alku, P. (2018). Dysarthric speech classification using glottal features computed from non-words, words and sentences. In Proceedings of the Interspeech (pp. 3403–3307).
Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.
Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.
Paja, M. S., & Falk, T. H. (2012). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proceedings of the Interspeech (pp. 62–65).
Reddy, M. K., Alku, P., & Rao, K. S. (2020). Detection of specific language impairment in children using glottal source features. IEEE Access, 8, 15273–15279.
Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech & Language, 69, 101205.
Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 4605–4608).
Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.
Acknowledgements
The authors would like to thank the Tata Consultancy Services (TCS) for sponsoring the research under TCS Research Scholar Program—Cycle 15.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Madhu Keerthana, Y., Sreenivasa Rao, K. & Mitra, P. Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features. Int J Speech Technol 25, 967–973 (2022). https://doi.org/10.1007/s10772-022-10013-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-022-10013-w