End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons

Auge, Daniel; Hille, Julian; Kreutz, Felix; Mueller, Etienne; Knoll, Alois

doi:10.1007/978-3-030-86383-8_20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12895))

Included in the following conference series:

International Conference on Artificial Neural Networks

2658 Accesses
8 Citations

Abstract

The growing demand for complex computations in edge devices requires the development of algorithms and hardware accelerators that are powerful while remaining energy-efficient. A possible solution are spiking neural networks, as they have been demonstrated to be energy-efficient in several data processing and classification tasks when executed on specialized neuromorphic hardware. In the field of speech processing, they are especially suited for the online classification of audio streams due to their strong temporal affinity. However, so far, there has been a lack of emphasis on small-scale networks that will ultimately fit into restricted neuromorphic implementations. We propose the use of resonating neurons as an input layer to spiking neural networks for online audio classification to enable an end-to-end solution. We compare different architectures to the established method of using mel-frequency-based spectral features. With our approach, spiking neural networks can be directly used without additional preprocessing, thereby making them suitable for simple continuous low-power analysis of audio streams. We compare the classification accuracy of different network architectures with ours in a keyword spotting benchmark to demonstrate the performance of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Neuromorphic Spiking Neural Network Algorithms

References

Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)
Google Scholar
Auge, D., Mueller, E.: Resonate-and-fire neurons as frequency selective input encoders for spiking neural networks. TUM (Technical Report) (2020)
Google Scholar
Banbury, C., MicroNets: neural network architectures for deploying TinyML applications on commodity microcontrollers. arXiv preprint arXiv:2010.11267 (2020)
Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. In: Advances in Neural Information Processing Systems, pp. 787–797 (2018)
Google Scholar
Blouw, P., Choo, X., Hunsberger, E., Eliasmith, C.: Benchmarking keyword spotting efficiency on neuromorphic hardware. In: Proceedings of the 7th Annual Neuro-Inspired Computational Elements Workshop, pp. 1–8 (2019)
Google Scholar
Blouw, P., Eliasmith, C.: Event-driven signal processing with neuromorphic computing systems. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8534–8538. IEEE (2020)
Google Scholar
Chan, V., Liu, S.C., van Schaik, A.: AER EAR: a matched silicon cochlea pair with address event representation interface. IEEE Trans. Circuits Syst. I Regul. Pap. 54(1), 48–59 (2007)
Article Google Scholar
Davies, M., et al.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018)
Article Google Scholar
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940. PMLR (2016)
Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
Google Scholar
Izhikevich, E.M.: Resonate-and-fire neurons. Neural Netw. 14(6–7), 883–894 (2001)
Article Google Scholar
Kim, T., Lee, J., Nam, J.: Comparison and analysis of sample CNN architectures for audio classification. IEEE J. Sel. Top. Signal Process. 13(2), 285–297 (2019)
Article Google Scholar
Kumatani, K., et al.: Direct modeling of raw audio with DNNs for wake word detection. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 252–257. IEEE (2017)
Google Scholar
Lee, J., Park, J., Kim, K.L., Nam, J.: Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv preprint arXiv:1703.01789 (2017)
Mayr, C., Hoeppner, S., Furber, S.: Spinnaker 2: a 10 million core processor system for brain simulation and machine learning. arXiv preprint arXiv:1911.02385 (2019)
Neftci, E.O., Mostafa, H., Zenke, F.: Surrogate gradient learning in spiking neural networks. IEEE Signal Process. Mag. 36, 61–63 (2019)
Article Google Scholar
Ostrau, C., Homburg, J., Klarhorst, C., Thies, M., Rückert, U.: Benchmarking deep spiking neural networks on neuromorphic hardware. arXiv:2004.01656 12397, pp. 610–621 (2020)
Pellegrini, T., Zimmer, R., Masquelier, T.: Low-activity supervised convolutional spiking neural networks applied to speech commands recognition. arXiv preprint arXiv:2011.06846 (2020)
Rybakov, O., Kononenko, N., Subrahmanya, N., Visontai, M., Laurenzo, S.: Streaming keyword spotting on mobile devices. arXiv preprint arXiv:2005.06720 (2020)
Sainath, T.N., et al.: Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 965–979 (2017)
Article Google Scholar
Sheik, S., Coath, M., Indiveri, G., Denham, S.L., Wennekers, T., Chicca, E.: Emergent auditory feature tuning in a real-time neuromorphic VLSI system. Front. Neurosci. 6, 17 (2012)
Article Google Scholar
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018)
Wu, J., Yılmaz, E., Zhang, M., Li, H., Tan, K.C.: Deep spiking neural networks for large vocabulary automatic speech recognition. Front. Neurosci. 14, 199 (2020)
Article Google Scholar
Yılmaz, E., Gevrek, O.B., Wu, J., Chen, Y., Meng, X., Li, H.: Deep convolutional spiking neural networks for keyword spotting. In: Proceedings of Interspeech 2020, pp. 2557–2561 (2020)
Google Scholar
Yin, B., Corradi, F., Bohté, S.M.: Effective and efficient computation with multiple-timescale spiking recurrent neural networks. arXiv preprint arXiv:2005.11633 (2020)
Zhang, Y., Suda, N., Lai, L., Chandra, V.: Hello edge: keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017)

Download references

Acknowledgments

We thank Infineon Technologies AG for supporting this research. The work is partly conducted within the KI-ASIC project that is funded by the German Federal Ministry of Education and Research (Grand Number 16ES0992K).

Author information

Authors and Affiliations

Department of Informatics, Technical University of Munich, Munich, Germany
Daniel Auge, Julian Hille, Etienne Mueller & Alois Knoll
Infineon Technologies AG, Munich, Germany
Julian Hille
Infineon Technologies Dresden GmbH & Co., KG, Dresden, Germany
Felix Kreutz

Authors

Daniel Auge
View author publications
You can also search for this author in PubMed Google Scholar
Julian Hille
View author publications
You can also search for this author in PubMed Google Scholar
Felix Kreutz
View author publications
You can also search for this author in PubMed Google Scholar
Etienne Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Alois Knoll
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Auge .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Auge, D., Hille, J., Kreutz, F., Mueller, E., Knoll, A. (2021). End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-86383-8_20
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86382-1
Online ISBN: 978-3-030-86383-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Neuromorphic Spiking Neural Network Algorithms

Neuromorphic Spiking Neural Network Algorithms

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us