Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

Madhu Keerthana, Y.; Sreenivasa Rao, K.; Mitra, Pabitra

doi:10.1007/s10772-022-10013-w

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

Published: 30 October 2022

Volume 25, pages 967–973, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

122 Accesses
Explore all metrics

Abstract

Dysarthria is a motor speech impairment that impacts verbal articulation and co-ordination. Detecting dysarthria is a primary and essential step for early diagnosis and treatment. In this paper, we attempt dysarthric speech detection from telephone quality speech by using pitch perturbation (PP) measures computed with the recently introduced continuous wavelet transform (CWT)-based epoch extraction approach. This approach has the strong advantage that it is highly robust to telephone channel degradations. Six PP measures were computed from the extracted epochs. For comparison, the PP measures were also derived using two well-known epoch extraction methods, namely, zero-frequency filtering (ZFF) and dynamic programming phase slope algorithm (DYPSA). The experiments were carried out using the TORGO dysarthric speech database, which consists of speech from 7 healthy speakers and 8 dysarthric speakers. The G.191 software tools were used to convert clean speech to telephone speech. The results show that the PP measures computed with the CWT-based approach can better discriminate dysarthric and healthy speakers under telephone environment than those extracted with the other two epoch extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Automatic Recognition System for Dysarthric Speech Based on MFCC’s, PNCC’s, JITTER and SHIMMER Coefficients

Real-time pre-processing for improved feature extraction of noisy speech

Article 26 March 2021

P. P. Raj

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Article 08 April 2022

Arunachalam Revathi, R. Nagakrishnan & N. Sasikaladevi

References

Adiga, N., Vikram, C. M., Pullela, K., & Prasanna, S. M. (2017). Zero frequency filter based analysis of voice disorders. In Proceedings of the Interspeech 2017, August 20–24, Stockholm, Sweden.
Berisha, V., Liss, J., Sandoval, S., Utianski, R., & Spanias, A. (2014). Modeling pathological speech perception from data with similarity labels. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 915–919).
Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017). Automatic assessment of dysarthria severity level using audio descriptors. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5070–5074).
Black, A. W., King, S., & Tokuda, K. (2009). The blizzard challenge 2009. In Proceedings of the of blizzard challenge (pp. 1–24).
Cortes, C., & Vapnik, V. (1995). Two-stage learning kernel algorithms. Machine Learning, 20(3), 273-297.
Daoudi, K., & Kumar, A. J. (2015). Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification. In Proceedings of the Interspeech.
Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Health Sciences.
Enderby, P. M. (1983). Frenchay dysarthria assessment. College Hill Press.
Google Scholar
Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 835–838).
Falk, T. H., Chan, W.-Y., & Shein, F. (2012). Characterization of atypical vocal source excitation temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54, 622–631.
Article Google Scholar
Gillespie, S., Logan, Y.-Y., Moore, E., Laures-Gore, J., Russell, S., & Patel, R. (2017). Cross-database models for the classification of dysarthria presence. In Proceedings of the Interspeech (pp. 3127–3131).
Gurugubelli, K., & Vuppala, A. K. (2019). Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 6410–6414).
ITU-T, Recommendation G. 191. (2005). Software tools for speech and audio coding standardization. International Telecommunication Union. Retrieved from https://www.itu.int/rec/T-REC-G.191/en
Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence level pathological speech. Computer Speech & Language, 29, 132–144.
Article Google Scholar
Madhu Keerthana, Y., Kiran Reddy, M., & Sreenivasa Rao, K. (2019). CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters, 26, 1107–1111.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Article Google Scholar
Narendra, N. P., & Alku, P. (2018). Dysarthric speech classification using glottal features computed from non-words, words and sentences. In Proceedings of the Interspeech (pp. 3403–3307).
Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.
Article Google Scholar
Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.
Article Google Scholar
Paja, M. S., & Falk, T. H. (2012). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proceedings of the Interspeech (pp. 62–65).
Reddy, M. K., Alku, P., & Rao, K. S. (2020). Detection of specific language impairment in children using glottal source features. IEEE Access, 8, 15273–15279.
Article Google Scholar
Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech & Language, 69, 101205.
Article Google Scholar
Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 4605–4608).
Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Tata Consultancy Services (TCS) for sponsoring the research under TCS Research Scholar Program—Cycle 15.

Author information

Authors and Affiliations

Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
Y. Madhu Keerthana
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
K. Sreenivasa Rao & Pabitra Mitra

Authors

Y. Madhu Keerthana
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Mitra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y. Madhu Keerthana.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Madhu Keerthana, Y., Sreenivasa Rao, K. & Mitra, P. Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features. Int J Speech Technol 25, 967–973 (2022). https://doi.org/10.1007/s10772-022-10013-w

Download citation

Received: 29 January 2022
Accepted: 05 October 2022
Published: 30 October 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10772-022-10013-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

Abstract

Access this article

Similar content being viewed by others

Automatic Recognition System for Dysarthric Speech Based on MFCC’s, PNCC’s, JITTER and SHIMMER Coefficients

Real-time pre-processing for improved feature extraction of noisy speech

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

Abstract

Access this article

Similar content being viewed by others

Automatic Recognition System for Dysarthric Speech Based on MFCC’s, PNCC’s, JITTER and SHIMMER Coefficients

Real-time pre-processing for improved feature extraction of noisy speech

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation