Skip to main content

Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition

  • Conference paper
  • First Online:
Applications in Electronics Pervading Industry, Environment and Society (ApplePies 2020)

Abstract

Within the field of automatic speech recognition, the processing of dysarthric speech is a challenge because standard approaches are ineffective in presence of dysarthria. This paper presents preliminary evidence that the performance of speaker-dependent speech recognition systems trained for speakers with dysarthria may be substantially improved by tuning the size and shift of the spectral analysis window used to compute the initial short-time Fourier transform used in many speech front ends. Evidence for this comes from a set of experiments performed on a small collection of Italian speech (isolated words) from five different speakers suffering from different degrees of dysarthria. The experimental framework used in the paper constructs speaker-dependent GMM-HMM speech recognition models using the triphone Kaldi recipe and varying choices of the spectral analysis window size and shift. Results show a variable improvement (31% to 81%), according to the selected user with dysarthria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Robin, D.A. et al. Clinical Management of Sensorimotor Speech Disorders. In: Malcolm, M.R. (ed.) Thieme, New York (1997)

    Google Scholar 

  2. Gales, M., et al.: The Application of Hidden Markov Models in Speech recognition. Now Publishers Inc, Hanover (2008)

    MATH  Google Scholar 

  3. Jinyu, L. et al.: Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM. In: 2012 IEEE Spoken Language Technology Workshop (SLT) (2012)

    Google Scholar 

  4. Mengistu, K.T. et al. Adapting acoustic and lexical models to dysarthric speech. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (2011)

    Google Scholar 

  5. Joy, M.M., et al.: Improving acoustic models in TORGO dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)

    Google Scholar 

  6. Rudzicz, F., et al.: The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46(4), 523–541 (2012)

    Google Scholar 

  7. Espana-Bonet, C. et al.: Automatic speech recognition with deep neural networks for impaired speech. In: Abad, A. et al. (eds.) International Conference on Advances in Speech and Language Technologies for Iberian Languages, Springer, Cham (2016)

    Google Scholar 

  8. Povey, D. et al.: The kaldispeech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)

    Google Scholar 

  9. Zoom-na Hn4. https://www.zoom-na.com/products/field-video-recording/field-recording/zoom-h4nsp-handy-recorder

  10. HP H2300. https://support.hp.com/us-en/product/Headsets/5382553/model/5407009

  11. Bisani, M., et al.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)

    Google Scholar 

  12. Shahin, M., et al.: A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: 15th Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  13. Young, S.J. et al.: The HTK hidden Markov model toolkit: Design and philosophy (1993)

    Google Scholar 

  14. Fukunaga, et al.: Introduction to Statistical Pattern Recognition. Elsevier, New York (2013)

    Google Scholar 

  15. Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, (Cat. No. 98CH36181), vol. 2. IEEE (1998)

    Google Scholar 

  16. Sivanandam, S.N. et al.: Genetic algorithm optimization problems. In: Introduction to genetic algorithms, pp. 165–209. Springer, Berlin (2008)

    Google Scholar 

  17. Allen, J.B., Rabiner, L.R.: A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Marini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marini, M., Meoni, G., Mulfari, D., Vanello, N., Fanucci, L. (2021). Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2020. Lecture Notes in Electrical Engineering, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-030-66729-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66729-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66728-3

  • Online ISBN: 978-3-030-66729-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics