Skip to main content
Log in

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Dysarthria is a motor speech impairment that impacts verbal articulation and co-ordination. Detecting dysarthria is a primary and essential step for early diagnosis and treatment. In this paper, we attempt dysarthric speech detection from telephone quality speech by using pitch perturbation (PP) measures computed with the recently introduced continuous wavelet transform (CWT)-based epoch extraction approach. This approach has the strong advantage that it is highly robust to telephone channel degradations. Six PP measures were computed from the extracted epochs. For comparison, the PP measures were also derived using two well-known epoch extraction methods, namely, zero-frequency filtering (ZFF) and dynamic programming phase slope algorithm (DYPSA). The experiments were carried out using the TORGO dysarthric speech database, which consists of speech from 7 healthy speakers and 8 dysarthric speakers. The G.191 software tools were used to convert clean speech to telephone speech. The results show that the PP measures computed with the CWT-based approach can better discriminate dysarthric and healthy speakers under telephone environment than those extracted with the other two epoch extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Adiga, N., Vikram, C. M., Pullela, K., & Prasanna, S. M. (2017). Zero frequency filter based analysis of voice disorders. In Proceedings of the Interspeech 2017, August 20–24, Stockholm, Sweden.

  • Berisha, V., Liss, J., Sandoval, S., Utianski, R., & Spanias, A. (2014). Modeling pathological speech perception from data with similarity labels. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 915–919).

  • Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017). Automatic assessment of dysarthria severity level using audio descriptors. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5070–5074).

  • Black, A. W., King, S., & Tokuda, K. (2009). The blizzard challenge 2009. In Proceedings of the of blizzard challenge (pp. 1–24).

  • Cortes, C., & Vapnik, V. (1995). Two-stage learning kernel algorithms. Machine Learning, 20(3), 273-297.

  • Daoudi, K., & Kumar, A. J. (2015). Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification. In Proceedings of the Interspeech.

  • Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Health Sciences.

  • Enderby, P. M. (1983). Frenchay dysarthria assessment. College Hill Press.

    Google Scholar 

  • Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 835–838).

  • Falk, T. H., Chan, W.-Y., & Shein, F. (2012). Characterization of atypical vocal source excitation temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54, 622–631.

    Article  Google Scholar 

  • Gillespie, S., Logan, Y.-Y., Moore, E., Laures-Gore, J., Russell, S., & Patel, R. (2017). Cross-database models for the classification of dysarthria presence. In Proceedings of the Interspeech (pp. 3127–3131).

  • Gurugubelli, K., & Vuppala, A. K. (2019). Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 6410–6414).

  • ITU-T, Recommendation G. 191. (2005). Software tools for speech and audio coding standardization. International Telecommunication Union. Retrieved from https://www.itu.int/rec/T-REC-G.191/en

  • Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence level pathological speech. Computer Speech & Language, 29, 132–144.

    Article  Google Scholar 

  • Madhu Keerthana, Y., Kiran Reddy, M., & Sreenivasa Rao, K. (2019). CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters, 26, 1107–1111.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.

    Article  Google Scholar 

  • Narendra, N. P., & Alku, P. (2018). Dysarthric speech classification using glottal features computed from non-words, words and sentences. In Proceedings of the Interspeech (pp. 3403–3307).

  • Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.

    Article  Google Scholar 

  • Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.

    Article  Google Scholar 

  • Paja, M. S., & Falk, T. H. (2012). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proceedings of the Interspeech (pp. 62–65).

  • Reddy, M. K., Alku, P., & Rao, K. S. (2020). Detection of specific language impairment in children using glottal source features. IEEE Access, 8, 15273–15279.

    Article  Google Scholar 

  • Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech & Language, 69, 101205.

    Article  Google Scholar 

  • Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 4605–4608).

  • Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Tata Consultancy Services (TCS) for sponsoring the research under TCS Research Scholar Program—Cycle 15.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Y. Madhu Keerthana.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madhu Keerthana, Y., Sreenivasa Rao, K. & Mitra, P. Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features. Int J Speech Technol 25, 967–973 (2022). https://doi.org/10.1007/s10772-022-10013-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-10013-w

Keywords

Navigation