Abstract
Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.












Similar content being viewed by others
Data availability
All relevant data are within the paper and its supporting information files.
References
Aihara R, Takashima R, Takiguchi T et al (2014) A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary. J Audio Speech Music Proc 5(2014):1–10. https://doi.org/10.1186/1687-4722-2014-5
Aihara R, Takiguchi T, Ariki Y (2017) Phoneme-discriminative features for Dysarthric speech conversion. Proc Interspeech 2017:3374–3378 https://doi.org/10.21437/Interspeech.2017-664
Arunachalam R (2019) A strategic approach to recognizing the children's speech with hearing impairment: different sets of features and models. Multimed Tools Appl 78:20787–20808. https://doi.org/10.1007/s11042-019-7329-6
Doire CSJ, Brookes M, Naylor PA, Hicks CM, Betts D, Dmour MA, Jensen SH (2017) Single-Channel online enhancement of speech corrupted by reverberation and noise. IEEE/ACM Trans Audio Speech Lang Proc 25(3):572–587. https://doi.org/10.1109/TASLP.2016.2641904
Ephraim Y, Malah D (1984) Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121. https://doi.org/10.1109/TASSP.1984.1164453
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445. https://doi.org/10.1109/TASSP.1985.1164550
España-Bonet C, Fonollosa JA (2016) Automatic speech recognition with deep neuralnetworks for impaired speech. In: International Conference on Advances in Speech and Language Technologies forIberian Languages. Springer, Cham, pp 97–107. https://doi.org/10.1007/978-3-319-49169-1_10
Selouani SA, Dahmani H, Amami R, Hamam H (2012) Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 15(1):57–64
Hegde RM, Murthy HA, Gadde VRR (2007) 'Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202 https://ieeexplore.ieee.org/document/4032772/
Aihara R, Takashima R, Takiguchi T, Ariki Y (2014) A preliminarydemonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary. Eurasip J Audio Speech Music Process 2014(1):1–10
Jiao Y, Tu M, Berisha V, Liss J (2018) Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications. 2018 IEEE international conference on acoustics, speech, and signal processing (ICASSP), Calgary, pp 6009–6013. https://doi.org/10.1109/ICASSP.2018.8462290
Tu M, Berisha V, Liss J (2017) Interpretable objective assessment of dysarthric speech based on deep neural networks. In Interspeech, pp 1849–31853
Lallouani A, Gabrea M, Gargour CS (2004) Wavelet-based speech enhancement using two different threshold-based denoising algorithms, 1st edn. Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513), Niagara Falls, Ontario, pp 315–318. https://doi.org/10.1109/CCECE.2004.1345019
Lee SH, Kim M, Seo HG, Oh BM, Lee G, Leigh JH (2019) Assessment of Dysarthria Using One-Word Speech Recognition with Hidden Markov Models. J Korean Med Sci 34(13):e108. Published 2019 April 8. https://doi.org/10.3346/jkms.2019.34.e108
Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Int J Speech Commun 50(6):453–466. https://doi.org/10.1016/j.specom.2008.01.003
Revathi A, Sasikaladevi N (2019) Hearing impaired speech recognition: Stockwell features and models. Int J Speech Technol 22:979–991. https://doi.org/10.1007/s10772-019-09644-3
Revathi A, Sasikaladevi N, Nagakrishnan R, Jeyalakshmi C (2018) Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21:723–739. https://doi.org/10.1007/s10772-018-9546-1
Rudzicz F (2011) Articulatory knowledge in recognition of Dysarthric speech. IEEE Trans Audio Speech Lang Process 19(4):947–960. https://doi.org/10.1109/TASL.2010.2072499
Islam MT, Shahnaz C, Zhu WP, Ahmad MO (2018) Enhancement of noisy speech with low speech distortion based on probabilistic geometric spectral subtraction. arXiv preprint arXiv:1802.05125
Rudzicz F (2013) Adjusting dysarthric speech signals to be more intelligible. J Comp Speech Lang 27(6):1163–1177. https://doi.org/10.1016/j.csl.2012.11.001
Stark AP, Wójcicki KK, Lyons JG, Paliwal KK (2008) Noise driven short-time phase spectrum compensation procedure for speech enhancement. In: Ninth annual conference of the international speech communication association
Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, Frame S (2008) Dysarthric speech database for universal access research. In: Ninth Annual Conference of the International Speech Communication Association
Sloane S, Dahmani H, Amami R et al (2012) Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 15:57–64. https://doi.org/10.1007/s10772-011-9104-6
Revathi A, Sasikaladevi N, Nagakrishnan R, Jeyalakshmi C (2018) Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21(3):723–739
Takashima Y, Nakashima T, Takiguchi T, Ariki Y (2015) Feature extraction using pre-trained convolutive bottleneck nets for dysarthric speech recognition. 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, pp 1411–1415. https://doi.org/10.1109/EUSIPCO.2015.7362616
Takashima Y, Takiguchi T, Ariki Y (2019) End-to-end Dysarthric Speech Recognition Using Multiple Databases. ICASSP 2019–2019 IEEE international conference on acoustics, speech, and signal processing (ICASSP), Brighton, pp 6395–6399. https://doi.org/10.1109/ICASSP.2019.8683803
Thoppil MG, Kumar CS, Kumar A, Amos J (2017) Speech signal analysis and pattern recognition in diagnosing dysarthria. Ann Indian Acad Neurol 20:352–357 http://www.annalsofian.org/text.asp?2017/20/4/352/217159
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993
Acknowledgments
it is our work - no grant & contribution numbers.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
This article does not contain any studies with human participants.
Competing interests
The authors have declared that no competing interest exists.
Conflict of interests
All the authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Revathi, A., Nagakrishnan, R. & Sasikaladevi, N. Comparative analysis of Dysarthric speech recognition: multiple features and robust templates. Multimed Tools Appl 81, 31245–31259 (2022). https://doi.org/10.1007/s11042-022-12937-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12937-6