Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition

Marini, Marco; Meoni, Gabriele; Mulfari, Davide; Vanello, Nicola; Fanucci, Luca

doi:10.1007/978-3-030-66729-0_13

Marco Marini³⁶,
Gabriele Meoni³⁶,
Davide Mulfari³⁶,
Nicola Vanello³⁶ &
…
Luca Fanucci³⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 738))

Included in the following conference series:

International Conference on Applications in Electronics Pervading Industry, Environment and Society

618 Accesses
1 Citations

Abstract

Within the field of automatic speech recognition, the processing of dysarthric speech is a challenge because standard approaches are ineffective in presence of dysarthria. This paper presents preliminary evidence that the performance of speaker-dependent speech recognition systems trained for speakers with dysarthria may be substantially improved by tuning the size and shift of the spectral analysis window used to compute the initial short-time Fourier transform used in many speech front ends. Evidence for this comes from a set of experiments performed on a small collection of Italian speech (isolated words) from five different speakers suffering from different degrees of dysarthria. The experimental framework used in the paper constructs speaker-dependent GMM-HMM speech recognition models using the triphone Kaldi recipe and varying choices of the spectral analysis window size and shift. Results show a variable improvement (31% to 81%), according to the selected user with dysarthria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Robin, D.A. et al. Clinical Management of Sensorimotor Speech Disorders. In: Malcolm, M.R. (ed.) Thieme, New York (1997)
Google Scholar
Gales, M., et al.: The Application of Hidden Markov Models in Speech recognition. Now Publishers Inc, Hanover (2008)
MATH Google Scholar
Jinyu, L. et al.: Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM. In: 2012 IEEE Spoken Language Technology Workshop (SLT) (2012)
Google Scholar
Mengistu, K.T. et al. Adapting acoustic and lexical models to dysarthric speech. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (2011)
Google Scholar
Joy, M.M., et al.: Improving acoustic models in TORGO dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)
Google Scholar
Rudzicz, F., et al.: The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46(4), 523–541 (2012)
Google Scholar
Espana-Bonet, C. et al.: Automatic speech recognition with deep neural networks for impaired speech. In: Abad, A. et al. (eds.) International Conference on Advances in Speech and Language Technologies for Iberian Languages, Springer, Cham (2016)
Google Scholar
Povey, D. et al.: The kaldispeech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar
Zoom-na Hn4. https://www.zoom-na.com/products/field-video-recording/field-recording/zoom-h4nsp-handy-recorder
HP H2300. https://support.hp.com/us-en/product/Headsets/5382553/model/5407009
Bisani, M., et al.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Google Scholar
Shahin, M., et al.: A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: 15th Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Young, S.J. et al.: The HTK hidden Markov model toolkit: Design and philosophy (1993)
Google Scholar
Fukunaga, et al.: Introduction to Statistical Pattern Recognition. Elsevier, New York (2013)
Google Scholar
Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, (Cat. No. 98CH36181), vol. 2. IEEE (1998)
Google Scholar
Sivanandam, S.N. et al.: Genetic algorithm optimization problems. In: Introduction to genetic algorithms, pp. 165–209. Springer, Berlin (2008)
Google Scholar
Allen, J.B., Rabiner, L.R.: A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Pisa, via G. Caruso 16, 56122, Pisa, Italy
Marco Marini, Gabriele Meoni, Davide Mulfari, Nicola Vanello & Luca Fanucci

Authors

Marco Marini
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Meoni
View author publications
You can also search for this author in PubMed Google Scholar
Davide Mulfari
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Vanello
View author publications
You can also search for this author in PubMed Google Scholar
Luca Fanucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Marini .

Editor information

Editors and Affiliations

DII, University of Pisa, Pisa, Italy
Sergio Saponara
DITEN, University of Genoa, Genoa, Italy
Alessandro De Gloria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marini, M., Meoni, G., Mulfari, D., Vanello, N., Fanucci, L. (2021). Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2020. Lecture Notes in Electrical Engineering, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-030-66729-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-66729-0_13
Published: 26 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66728-3
Online ISBN: 978-3-030-66729-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics