Abstract
Within the field of automatic speech recognition, the processing of dysarthric speech is a challenge because standard approaches are ineffective in presence of dysarthria. This paper presents preliminary evidence that the performance of speaker-dependent speech recognition systems trained for speakers with dysarthria may be substantially improved by tuning the size and shift of the spectral analysis window used to compute the initial short-time Fourier transform used in many speech front ends. Evidence for this comes from a set of experiments performed on a small collection of Italian speech (isolated words) from five different speakers suffering from different degrees of dysarthria. The experimental framework used in the paper constructs speaker-dependent GMM-HMM speech recognition models using the triphone Kaldi recipe and varying choices of the spectral analysis window size and shift. Results show a variable improvement (31% to 81%), according to the selected user with dysarthria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Robin, D.A. et al. Clinical Management of Sensorimotor Speech Disorders. In: Malcolm, M.R. (ed.) Thieme, New York (1997)
Gales, M., et al.: The Application of Hidden Markov Models in Speech recognition. Now Publishers Inc, Hanover (2008)
Jinyu, L. et al.: Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM. In: 2012 IEEE Spoken Language Technology Workshop (SLT) (2012)
Mengistu, K.T. et al. Adapting acoustic and lexical models to dysarthric speech. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (2011)
Joy, M.M., et al.: Improving acoustic models in TORGO dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)
Rudzicz, F., et al.: The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46(4), 523–541 (2012)
Espana-Bonet, C. et al.: Automatic speech recognition with deep neural networks for impaired speech. In: Abad, A. et al. (eds.) International Conference on Advances in Speech and Language Technologies for Iberian Languages, Springer, Cham (2016)
Povey, D. et al.: The kaldispeech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Zoom-na Hn4. https://www.zoom-na.com/products/field-video-recording/field-recording/zoom-h4nsp-handy-recorder
HP H2300. https://support.hp.com/us-en/product/Headsets/5382553/model/5407009
Bisani, M., et al.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Shahin, M., et al.: A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: 15th Annual Conference of the International Speech Communication Association (2014)
Young, S.J. et al.: The HTK hidden Markov model toolkit: Design and philosophy (1993)
Fukunaga, et al.: Introduction to Statistical Pattern Recognition. Elsevier, New York (2013)
Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, (Cat. No. 98CH36181), vol. 2. IEEE (1998)
Sivanandam, S.N. et al.: Genetic algorithm optimization problems. In: Introduction to genetic algorithms, pp. 165–209. Springer, Berlin (2008)
Allen, J.B., Rabiner, L.R.: A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Marini, M., Meoni, G., Mulfari, D., Vanello, N., Fanucci, L. (2021). Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2020. Lecture Notes in Electrical Engineering, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-030-66729-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-66729-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66728-3
Online ISBN: 978-3-030-66729-0
eBook Packages: EngineeringEngineering (R0)