A Strategic Approach for Robust Dysarthric Speech Recognition

Revathi, A.; Sasikaladevi, N.; Arunprasanth, D.; Amirtharajan, Rengarajan

doi:10.1007/s11277-024-11029-y

A Strategic Approach for Robust Dysarthric Speech Recognition

Published: 20 April 2024

Volume 134, pages 2315–2346, (2024)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

A. Revathi¹,
N. Sasikaladevi²,
D. Arunprasanth³ &
…
Rengarajan Amirtharajan ORCID: orcid.org/0000-0003-1574-3045¹

100 Accesses
Explore all metrics

Abstract

The development of a system to recognize the speeches of standard speakers has been in practice for many decades. Research development is still progressing to implement a strategy to identify the speeches uttered by people with hearing impairment/Autism spectrum disorder/dysarthria. This work includes various speech enhancement techniques to increase the intelligibility of spoken utterances. This system uses perceptual features and different modelling techniques for developing a dysarthric speech recognition system. Perceptual features are extracted from raw speeches, and intelligibility-enhanced spoken utterances and models are created. The design features extracted from the test utterances are given to the models, and based on the classifier used, the test utterance is identified to be associated with the model. An Implementation of speech enhancement techniques would facilitate better accuracy. Decision-level fusion classification on integrating features, models, and speech enhancement techniques has provided overall accuracy of 81% for recognizing isolated digits spoken by a few dysarthric speakers. Better accuracy can be ensured for the database containing more utterances from many dysarthric speakers. This system would help caretakers understand the speeches uttered by persons affected with dysarthria to provide the necessary assistance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Data Availability

The datasets generated during and analyzed during the current study are available from the corresponding author upon reasonable request.

References

Cespedes-Simangas, L., Uribe-Obregon, C., & Cabanillas-Carbonell, M. (2021). Analysis of speech therapy systems for children with physical disabilities and speech disorders: A systematic review. European Journal of Molecular & Clinical Medicine, 8(3), 2287–2301.
Google Scholar
Takashima, Y., Takiguchi, T., & Ariki, Y. (2019). End-to-end dysarthric speech recognition using multiple databases. In ICASSP 2019–2019 IIEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6395–6399
Thoppil, M. G., Kumar, C. S., Kumar, A., & Amose, J. (2017). Speech signal analysis and pattern recognition in diagnosis of dysarthria. Annals of Indian Academy of Neurology, 20(4), 352.
Article Google Scholar
Aihara, R., Takiguchi, T., & Ariki, Y. (2017). Phoneme-discriminative features for dysarthric speech conversion. In Interspeech, pp 3374–3378
Jiao, Y., Tu, M., Berisha, V., & Liss, J. (2018). Simulating dysarthric speech for training data augmentation in clinical speech applications. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6009–6013
Takashima, Y., Nakashika, T., Takiguchi, T., & Ariki, Y. (2015). Feature extraction using pre-trained convolutive bottleneck nets for dysarthric speech recognition. In 2015 23rd European Signal Processing Conference (EUSIPCO), IEEE, pp 1411–1415
Espana-Bonet, C., & Fonollosa, J. A. (2016). Automatic speech recognition with deep neural networks for impaired speech. In Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23–25, 2016, Proceedings 3, Springer International Publishing, pp 97–107
Selouani, S. A., Dahmani, H., Amami, R., & Hamam, H. (2012). Using speech rhythm knowledge to improve dysarthric speech recognition. International Journal of Speech Technology, 15, 57–64.
Article Google Scholar
Rudzicz, F. (2013). Adjusting dysarthric speech signals to be more intelligible. Computer Speech & Language, 27(6), 1163–1177.
Article MathSciNet Google Scholar
Aihara, R., Takashima, R., Takiguchi, T., & Ariki, Y. (2014). A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary. EURASIP Journal on Audio, Speech, and Music Processing, 2014(1), 1–10.
Article Google Scholar
Rudzicz, F. (2010). Articulatory knowledge in the recognition of dysarthric speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 947–960.
Article Google Scholar
Tu, M., Berisha, V., & Liss, J. (2017). Interpretable objective assessment of dysarthric speech based on deep neural networks. In Interspeech, pp 1849–1853
Rudzicz, F. (2011). Acoustic transformations to improve the intelligibility of dysarthric speech. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, pp 11–21
Lee, S. H., Kim, M., Seo, H. G., Oh, B. M., Lee, G., & Leigh, J. H. (2019). Assessment of dysarthria using one-word speech recognition with hidden markov models. Journal of Korean Medical Science, 34(13), 108.
Article Google Scholar
Doire, C. S., Brookes, M., Naylor, P. A., Hicks, C. M., Betts, D., Dmour, M. A., & Jensen, S. H. (2016). Single-channel online enhancement of speech corrupted by reverberation and noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 572–587.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Lallouani, A., Gabrea, M., & Gargour, C. S. (2004). Wavelet based speech enhancement using two different threshold-based denoising algorithms. In Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No. 04CH37513), IEEE, vol. 1, pp 315–318
Islam, M. T., Shahnaz, C., Zhu, W. P., & Ahmad, M. O. (2018). Enhancement of noisy speech with low speech distortion based on probabilistic geometric spectral subtraction. arXiv preprint arXiv:1802.05125.
Lu, Y., & Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech communication, 50(6), 453–466.
Article Google Scholar
Stark, A. P., Wójcicki, K. K., Lyons, J. G., & Paliwal, K. K. (2008). Noise driven short-time phase spectrum compensation procedure for speech enhancement. In Ninth Annual Conference of the International Speech Communication Association
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T. S., Watkin, K., & Frame, S. (2008). Dysarthric speech database for universal access research. In Ninth Annual Conference of the International Speech Communication Association
Arunachalam, R. (2019). A strategic approach to recognize the speech of the children with hearing impairment: Different sets of features and models. Multimedia Tools and Applications, 78, 20787–20808.
Article Google Scholar
Despotovic, V., Walter, O., & Haeb-Umbach, R. (2018). Machine learning techniques for semantic analysis of dysarthric speech: An experimental study. Speech Communication, 99, 242–251.
Article Google Scholar
Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.
Article Google Scholar
Narendra, N. P., & Alku, P. (2021). Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Computer Speech & Language, 65, 101117.
Article Google Scholar
Diwakar, G., & Karjigi, V. (2020). Improving speech to text alignment based on repetition detection for dysarthric speech. Circuits, Systems, and Signal Processing, 39, 5543–5567.
Article Google Scholar
Cavallieri, F., Budriesi, C., Gessani, A., Contardi, S., Fioravanti, V., Menozzi, E., & Antonelli, F. (2021). Dopaminergic treatment effects on dysarthric speech: Acoustic analysis in a cohort of patients with advanced Parkinson’s disease. Frontiers in Neurology, 11, 616062.
Article Google Scholar
Hirsch, M. E., Lansford, K. L., Barrett, T. S., & Borrie, S. A. (2021). Generalized learning of dysarthric speech between male and female talkers. Journal of Speech, Language, and Hearing Research, 64(2), 444–451.
Article Google Scholar
Hu, A., Phadnis, D., & Shahamiri, S. R. (2021). Generating synthetic dysarthric speech to overcome dysarthria acoustic data scarcity. Journal of Ambient Intelligence and Humanized Computing, 14, 1–18.
Google Scholar
Kodrasi, I. (2021). Temporal envelope and fine structure cues for dysarthric speech detection using CNNs. IEEE Signal Processing Letters, 28, 1853–1857.
Article Google Scholar
Liu, S., Geng, M., Hu, S., Xie, X., Cui, M., Yu, J., & Meng, H. (2021). Recent progress in the CUHK dysarthric speech recognition system. IEEE ACM Transactions on Audio, Speech, and Language Processing, 29, 2267–2281.
Google Scholar
Liu, Y., Penttilä, N., Ihalainen, T., Lintula, J., Convey, R., & Räsänen, O. (2021). Language-independent approach for automatic computation of vowel articulation features in dysarthric speech assessment. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2228–2243.
Article Google Scholar
Dhanalakshmi, M., Nagarajan, T., & Vijayalakshmi, P. (2021). Significant sensors and parameters in assessment of dysarthric speech. Sensor Review, 41(3), 271–286.
Article Google Scholar
Rajeswari, R., Devi, T., & Shalini, S. (2022). Dysarthric speech recognition using variational mode decomposition and convolutional neural networks. Wireless Personal Communications, 122(1), 293–307.
Article Google Scholar
Tripathi, A., Bhosale, S., & Kopparapu, S. K. (2021). Automatic speaker independent dysarthric speech intelligibility assessment system. Computer Speech & Language, 69, 101213.
Article Google Scholar
Zaidi, B. F., Selouani, S. A., Boudraa, M., & Sidi Yakoub, M. (2021). Deep neural network architectures for dysarthric speech analysis and recognition. Neural Computing and Applications, 33, 9089–9108.
Article Google Scholar
Sidi Yakoub, M., Selouani, S. A., Zaidi, B. F., & Bouchair, A. (2020). Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP Journal on Audio, Speech, and Music Processing, 2020(1), 1–7.
Article Google Scholar
Rowe, H. P., Gutz, S. E., Maffei, M. F., Tomanek, K., & Green, J. R. (2022). Characterizing dysarthria diversity for automatic speech recognition: A tutorial from the clinical perspective. Frontiers in Computer Science, 4, 770210.
Article Google Scholar
Soleymanpour, M., Johnson, M. T., Soleymanpour, R., & Berry, J. (2022). Synthesizing dysarthric speech using multi-talker TTS for dysarthric speech recognition. arXiv preprint arXiv:2201.11571.
Ren, J., & Liu, M. (2017). An automatic dysarthric speech recognition approach using deep neural networks. International Journal of Advanced Computer Science and Applications. https://doi.org/10.14569/IJACSA.2017.081207
Article Google Scholar
Harvill, J., Issa, D., Hasegawa-Johnson, M., & Yoo, C. (2021). Synthesis of new words for improved dysarthric speech recognition on an expanded vocabulary. In ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 6428–6432
Sekhar, S. M., Kashyap, G., Bhansali, A., & Singh, K. (2022). Dysarthric-speech detection using transfer learning with convolutional neural networks. ICT Express, 8(1), 61–64.
Article Google Scholar
Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852–861.
Article Google Scholar
Ullah, R., Asif, M., Shah, W. A., Anjam, F., Ullah, I., Khurshaid, T., & Alibakhshikenari, M. (2023). Speech emotion recognition using convolution neural networks and multi-head convolutional transformer. Sensors, 23(13), 6212.
Article Google Scholar
Shih, D. H., Liao, C. H., Wu, T. W., Xu, X. Y., & Shih, M. H. (2022). Dysarthria speech detection using convolutional neural networks with gated recurrent unit. In Healthcare, 10(10), 1956.
Article Google Scholar
Hall, K., Huang, A., & Shahamiri, S. R. (2023). An investigation to identify optimal setup for automated assessment of dysarthric intelligibility using deep learning technologies. Cognitive Computation, 15(1), 146–158.
Article Google Scholar
Latha, M., Shivakumar, M., Manjula, G., Hemakumar, M., & Kumar, M. K. (2023). Deep learning-based acoustic feature representations for dysarthric speech recognition. SN Computer Science, 4(3), 272.
Article Google Scholar
Yu, C., Su, X., & Qian, Z. (2023). Multi-stage audio-visual fusion for dysarthric speech recognition with pre-trained models. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31, 1912–1921.
Article Google Scholar
Revathi, A., Sasikaladevi, N., & Arunprasanth, D. (2022). Development of CNN-based robust dysarthric isolated digit recognition system by enhancing speech intelligibility. Research on Biomedical Engineering, 38(4), 1067–1079.
Article Google Scholar
Almadhor, A., Irfan, R., Gao, J., Saleem, N., Rauf, H. T., & Kadry, S. (2023). E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition. Expert Systems with Applications, 222, 119797.
Article Google Scholar
Jolad, B., & Khanai, R. (2023). An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks. International Journal of Speech Technology, 26, 287–305.
Article Google Scholar

Download references

Acknowledgements

The authors thank the Department of Science & Technology, New Delhi, for the FIST funding (SR/FST/ET-I/2018/221(C). Furthermore, the authors also wish to thank the Intrusion Detection Lab at the School of Electrical & Electronics Engineering, SASTRA Deemed University, for providing infrastructural support to carry out this research work.

Funding

FIST funding (SR/FST/ET-I/2018/221(C).

Author information

Authors and Affiliations

School of Electrical and Electronics Engineering, Thanjavur, 613401, India
A. Revathi & Rengarajan Amirtharajan
School of Computing, SASTRA Deemed University, Thanjavur, 613401, India
N. Sasikaladevi
Thanjavur Medical College, Thanjavur, 613004, India
D. Arunprasanth

Authors

A. Revathi
View author publications
You can also search for this author in PubMed Google Scholar
N. Sasikaladevi
View author publications
You can also search for this author in PubMed Google Scholar
D. Arunprasanth
View author publications
You can also search for this author in PubMed Google Scholar
Rengarajan Amirtharajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rengarajan Amirtharajan.

Ethics declarations

Conflict of interest

The authors have no relevant conflicts of interest to disclose.

Ethical Approval

This article contains no studies with human participants or animals performed by authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Revathi, A., Sasikaladevi, N., Arunprasanth, D. et al. A Strategic Approach for Robust Dysarthric Speech Recognition. Wireless Pers Commun 134, 2315–2346 (2024). https://doi.org/10.1007/s11277-024-11029-y

Download citation

Accepted: 02 April 2024
Published: 20 April 2024
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11277-024-11029-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Strategic Approach for Robust Dysarthric Speech Recognition

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Strategic Approach for Robust Dysarthric Speech Recognition

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation