Abstract
There are few models of prosthetic hands in literature designed for children. The use of speech commands as control method was not found to be used in any of them. Control by voice based on Keyword Spotting (KWS) is a non-invasive method that offers many advantages over others. KWS based on Deep Learning models have proved to be the most accurate, but their implementation in microcontrollers (MCUs) is challenging due to MCUs low hardware resources. In this paper, a robust KWS model based on log-Mel spectrograms and CNNs is presented for deployment on MCUs. The model is trained to recognize 5 keywords using the Multilingual Spoken Words Corpus and UrbanSound8k datasets, including a large number of non-keywords and background noise in training to provide robustness. Some popular MCU platforms are evaluated to implement the model, and STM32 was chosen for its advantages. Inference time simulations were made on some model-compatible STM32 boards.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alajlan, N.N., Ibrahim, D.M.: TinyML: enabling of inference deep learning models on ultra-low-power IoT edge devices for AI applications. Micromachines 13(6), 851 (2022). https://doi.org/10.3390/mi13060851
Bondarenko, Y., Nagel, M., Blankevoort, T.: Understanding and overcoming the challenges of efficient transformer quantization (2021). https://doi.org/10.48550/arXiv.2109.12948
Dridi, H., Ouni, K.: Towards robust combined deep architecture for speech recognition: experiments on TIMIT. Int. J. Adv. Comput. Sci. Appl. 11, 525–534 (2020). https://doi.org/10.14569/IJACSA.2020.0110469
Giménez, N.L., Freitag, F., Lee, J., Vandierendonck, H.: Comparison of two microcontroller boards for on-device model training in a keyword spotting task. In: 2022 11th Mediterranean Conference on Embedded Computing (MECO), pp. 1–4 (2022). https://doi.org/10.1109/MECO55406.2022.9797171
Heller, S., Woias, P.: Microwatt power hardware implementation of machine learning algorithms on MSP430 microcontrollers. In: 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 25–28 (2019). https://doi.org/10.1109/ICECS46596.2019.8964726
López-Espejo, I., Tan, Z.H., Hansen, J.H.L., Jensen, J.: Deep spoken keyword spotting: an overview. IEEE Access 10, 4169–4199 (2022). https://doi.org/10.1109/ACCESS.2021.3139508
López-Espejo, I., Tan, Z.H., Jensen, J.: Exploring Filterbank learning for keyword spotting. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 331–335 (2021). https://doi.org/10.23919/Eusipco47968.2020.9287772
Mazumder, M., et al.: Multilingual spoken words corpus. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
Miah, M.N., Wang, G.: Keyword spotting with deep neural network on edge devices. In: 2022 IEEE 12th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 98–102 (2022).https://doi.org/10.1109/ICEIEC54567.2022.9835061
Osman, A., Abid, U., Gemma, L., Perotto, M., Brunelli, D.: TinyML platforms benchmarking. In: Saponara, S., De Gloria, A. (eds.) ApplePies 2021. LNEE, vol. 866, pp. 139–148. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95498-7_20
Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.Y., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Sig. Process. 13(2), 206–219 (2019). https://doi.org/10.1109/JSTSP.2019.2908700
Ray, P.P.: A review on TinyML: state-of-the-art and prospects. J. King Saud Univ. Comput. Inf. Sci. 34(4), 1595–1623 (2022). https://doi.org/10.1016/j.jksuci.2021.11.019
Ribeiro, J., et al.: Analysis of man-machine interfaces in upper-limb prosthesis: a review. Robotics 8(1), 16 (2019). https://doi.org/10.3390/robotics8010016
Saha, S.S., Sandha, S.S., Srivastava, M.: Machine learning for microcontroller-class hardware: a review. IEEE Sens. J. 22(22), 21362–21390 (2022). https://doi.org/10.1109/JSEN.2022.3210773
Saifullah, K., Quaiser, R.M., Akhtar, N.: Voice keyword spotting on edge devices. In: 2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT), pp. 1–5 (2022). https://doi.org/10.1109/IMPACT55510.2022.10029228
Sainath, T., Parada, C.: Convolutional neural networks for small-footprint keyword spotting (2015)
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584 (Apr 2015). https://doi.org/10.1109/ICASSP.2015.7178838
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041–1044. MM ’14, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2647868.2655045
Zim, M.Z.H.: TinyML: analysis of Xtensa LX6 microprocessor for neural network applications by ESP32 SoC (2021). https://doi.org/10.13140/RG.2.2.28602.11204
Acknowledgements
The authors would like to thank the Academic Interchange and Mobility Program (PIMA) held by the University of Cadiz, that made possible authors collaboration for the development of this research. This research was partially funded by the FEDER research project “Sistemas multimodales avanzados para prótesis robóticas de miembro superior (PROBOTHAND)” (FEDER-UCA18-108407) from Junta de Andalucía, Spain.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Perdomo-Campos, A., Ramírez-Beltrán, J., Morgado-Estevez, A. (2024). Robust MCU Oriented KWS Model for Children Robotic Prosthetic Hand Control. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2023. Lecture Notes in Computer Science, vol 14335. Springer, Cham. https://doi.org/10.1007/978-3-031-49552-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-49552-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49551-9
Online ISBN: 978-3-031-49552-6
eBook Packages: Computer ScienceComputer Science (R0)