Skip to main content

FPGA Implementation of a Configurable Vocal Feature Extraction Embedded System for Dysarthric Speech Recognition

  • Conference paper
  • First Online:
Applications in Electronics Pervading Industry, Environment and Society (ApplePies 2021)

Abstract

In the last few years, users have been increasingly demanding for a hands-free interaction with their digital devices. This kind of technology is even more useful if used by people with disabilities, improving their quality of life. In particular, speech-impaired users (e.g. dysarthric speakers) represent a big challenge for an Automatic Speech Recognition (ASR) system because standard approaches are ineffective with them. Therefore, new speech analysis algorithms are implemented and generally tested on off-line datasets, but their performance can differ from a real case. Hence comes the need to easily validate their performance in a real scenario. The work presented in this paper shows an implementation of a highly configurable off-line embedded system for both MFCC and Mel Filterbanks extraction equipped with an on-board microphone. The results show that our system performs well in a real scenario case in terms of both power consumption and word error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://store.avhzy.com/index.php?route=product/product&product_id=50.

References

  1. Ballati, F., Corno, F., De Russis, L.: “Hey Siri, do you understand me?”: virtual assistants and dysarthria. In: Intelligent Environments 2018, pp. 557–566. IOS Press (2018)

    Google Scholar 

  2. Ballati, F., Corno, F., De Russis, L.: Assessing virtual assistant capabilities with Italian dysarthric speech. In: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (2018)

    Google Scholar 

  3. Marini, M., Meoni, G., Mulfari, D., Vanello, N., Fanucci, L.: Enabling smart home voice control for italian people with dysarthria: preliminary analysis of frame rate effect on speech recognition. In: Saponara, S., De Gloria, A. (eds.) ApplePies 2020. LNEE, vol. 738, pp. 104–110. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-66729-0_13

    Chapter  Google Scholar 

  4. Marini, M., et al.: IDEA: an Italian dysarthric speech database. In: 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE (2021)

    Google Scholar 

  5. Turrisi, R., et al.: EasyCall corpus: a dysarthric speech dataset. arXiv preprint arXiv:2104.02542 (2021)

  6. Ciarpi, G., Palla, A., Fanucci, L., Meoni, G., Pilato, L.: Fully Digital Low-Power Implementation of an Audio Front-End for Portable Applications. University of Pisa, Dept. of Information Engineering (DII) (2019)

    Google Scholar 

  7. Meoni, G., Pilato, L., Fanucci, L.: A low power Voice Activity Detector for portable applications. In: PRIME 2018, Prague, Czech Republic (2018)

    Google Scholar 

  8. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)

    Google Scholar 

  9. Xu, M., Duan, L.-Y., Cai, J., Chia, L.-T., Xu, C., Tian, Q.: HMM-based audio keyword generation. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds.) PCM 2004. LNCS, vol. 3333, pp. 566–574. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30543-9_71

    Chapter  Google Scholar 

  10. Vergin, R., O’Shaughnessy, D.: Pre-emphasis and speech recognition. In: Proceedings 1995 Canadian Conference on Electrical and Computer Engineering, vol. 2. IEEE (1995)

    Google Scholar 

  11. O’Shaughnessy, D.: Speech Communication: Human and Machine. Addison-Wesley, Boston (1987)

    MATH  Google Scholar 

  12. Gales, M., et al.: The HTK Book. Cambridge University Engineering Department (2006)

    Google Scholar 

  13. Sreenivasa, R.K., Koolagudi, S.G.: Robust emotion recognition using spectral and prosodic features. Springer Science & Business Media, Heidelberg (2013)

    Google Scholar 

  14. Mitrović, D., Zeppelzauer, M., Breiteneder, C.: Features for content-based audio retrieval. In: Advances in Computers, vol. 78, pp. 71–150. Elsevier (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iacopo Casalini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Casalini, I., Marini, M., Fanucci, L. (2022). FPGA Implementation of a Configurable Vocal Feature Extraction Embedded System for Dysarthric Speech Recognition. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2021. Lecture Notes in Electrical Engineering, vol 866. Springer, Cham. https://doi.org/10.1007/978-3-030-95498-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95498-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95497-0

  • Online ISBN: 978-3-030-95498-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics