FPGA Implementation of a Configurable Vocal Feature Extraction Embedded System for Dysarthric Speech Recognition

Casalini, Iacopo; Marini, Marco; Fanucci, Luca

doi:10.1007/978-3-030-95498-7_31

Iacopo Casalini³⁹,
Marco Marini³⁹ &
Luca Fanucci³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 866))

Included in the following conference series:

International Conference on Applications in Electronics Pervading Industry, Environment and Society

722 Accesses

Abstract

In the last few years, users have been increasingly demanding for a hands-free interaction with their digital devices. This kind of technology is even more useful if used by people with disabilities, improving their quality of life. In particular, speech-impaired users (e.g. dysarthric speakers) represent a big challenge for an Automatic Speech Recognition (ASR) system because standard approaches are ineffective with them. Therefore, new speech analysis algorithms are implemented and generally tested on off-line datasets, but their performance can differ from a real case. Hence comes the need to easily validate their performance in a real scenario. The work presented in this paper shows an implementation of a highly configurable off-line embedded system for both MFCC and Mel Filterbanks extraction equipped with an on-board microphone. The results show that our system performs well in a real scenario case in terms of both power consumption and word error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://store.avhzy.com/index.php?route=product/product&product_id=50.

References

Ballati, F., Corno, F., De Russis, L.: “Hey Siri, do you understand me?”: virtual assistants and dysarthria. In: Intelligent Environments 2018, pp. 557–566. IOS Press (2018)
Google Scholar
Ballati, F., Corno, F., De Russis, L.: Assessing virtual assistant capabilities with Italian dysarthric speech. In: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (2018)
Google Scholar
Marini, M., Meoni, G., Mulfari, D., Vanello, N., Fanucci, L.: Enabling smart home voice control for italian people with dysarthria: preliminary analysis of frame rate effect on speech recognition. In: Saponara, S., De Gloria, A. (eds.) ApplePies 2020. LNEE, vol. 738, pp. 104–110. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-66729-0_13
Chapter Google Scholar
Marini, M., et al.: IDEA: an Italian dysarthric speech database. In: 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE (2021)
Google Scholar
Turrisi, R., et al.: EasyCall corpus: a dysarthric speech dataset. arXiv preprint arXiv:2104.02542 (2021)
Ciarpi, G., Palla, A., Fanucci, L., Meoni, G., Pilato, L.: Fully Digital Low-Power Implementation of an Audio Front-End for Portable Applications. University of Pisa, Dept. of Information Engineering (DII) (2019)
Google Scholar
Meoni, G., Pilato, L., Fanucci, L.: A low power Voice Activity Detector for portable applications. In: PRIME 2018, Prague, Czech Republic (2018)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar
Xu, M., Duan, L.-Y., Cai, J., Chia, L.-T., Xu, C., Tian, Q.: HMM-based audio keyword generation. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds.) PCM 2004. LNCS, vol. 3333, pp. 566–574. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30543-9_71
Chapter Google Scholar
Vergin, R., O’Shaughnessy, D.: Pre-emphasis and speech recognition. In: Proceedings 1995 Canadian Conference on Electrical and Computer Engineering, vol. 2. IEEE (1995)
Google Scholar
O’Shaughnessy, D.: Speech Communication: Human and Machine. Addison-Wesley, Boston (1987)
MATH Google Scholar
Gales, M., et al.: The HTK Book. Cambridge University Engineering Department (2006)
Google Scholar
Sreenivasa, R.K., Koolagudi, S.G.: Robust emotion recognition using spectral and prosodic features. Springer Science & Business Media, Heidelberg (2013)
Google Scholar
Mitrović, D., Zeppelzauer, M., Breiteneder, C.: Features for content-based audio retrieval. In: Advances in Computers, vol. 78, pp. 71–150. Elsevier (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Pisa, via G. Caruso 16, 561232, Pisa, Italy
Iacopo Casalini, Marco Marini & Luca Fanucci

Authors

Iacopo Casalini
View author publications
You can also search for this author in PubMed Google Scholar
Marco Marini
View author publications
You can also search for this author in PubMed Google Scholar
Luca Fanucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iacopo Casalini .

Editor information

Editors and Affiliations

DII, University of Pisa, Pisa, Italy
Sergio Saponara
DITEN, University of Genoa, Genoa, Italy
Alessandro De Gloria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Casalini, I., Marini, M., Fanucci, L. (2022). FPGA Implementation of a Configurable Vocal Feature Extraction Embedded System for Dysarthric Speech Recognition. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2021. Lecture Notes in Electrical Engineering, vol 866. Springer, Cham. https://doi.org/10.1007/978-3-030-95498-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-95498-7_31
Published: 09 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95497-0
Online ISBN: 978-3-030-95498-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

FPGA Implementation of a Configurable Vocal Feature Extraction Embedded System for Dysarthric Speech Recognition