Skip to main content

End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Abstract

The growing demand for complex computations in edge devices requires the development of algorithms and hardware accelerators that are powerful while remaining energy-efficient. A possible solution are spiking neural networks, as they have been demonstrated to be energy-efficient in several data processing and classification tasks when executed on specialized neuromorphic hardware. In the field of speech processing, they are especially suited for the online classification of audio streams due to their strong temporal affinity. However, so far, there has been a lack of emphasis on small-scale networks that will ultimately fit into restricted neuromorphic implementations. We propose the use of resonating neurons as an input layer to spiking neural networks for online audio classification to enable an end-to-end solution. We compare different architectures to the established method of using mel-frequency-based spectral features. With our approach, spiking neural networks can be directly used without additional preprocessing, thereby making them suitable for simple continuous low-power analysis of audio streams. We compare the classification accuracy of different network architectures with ours in a keyword spotting benchmark to demonstrate the performance of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)

    Google Scholar 

  2. Auge, D., Mueller, E.: Resonate-and-fire neurons as frequency selective input encoders for spiking neural networks. TUM (Technical Report) (2020)

    Google Scholar 

  3. Banbury, C., MicroNets: neural network architectures for deploying TinyML applications on commodity microcontrollers. arXiv preprint arXiv:2010.11267 (2020)

  4. Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. In: Advances in Neural Information Processing Systems, pp. 787–797 (2018)

    Google Scholar 

  5. Blouw, P., Choo, X., Hunsberger, E., Eliasmith, C.: Benchmarking keyword spotting efficiency on neuromorphic hardware. In: Proceedings of the 7th Annual Neuro-Inspired Computational Elements Workshop, pp. 1–8 (2019)

    Google Scholar 

  6. Blouw, P., Eliasmith, C.: Event-driven signal processing with neuromorphic computing systems. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8534–8538. IEEE (2020)

    Google Scholar 

  7. Chan, V., Liu, S.C., van Schaik, A.: AER EAR: a matched silicon cochlea pair with address event representation interface. IEEE Trans. Circuits Syst. I Regul. Pap. 54(1), 48–59 (2007)

    Article  Google Scholar 

  8. Davies, M., et al.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018)

    Article  Google Scholar 

  9. Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940. PMLR (2016)

    Google Scholar 

  10. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)

    Google Scholar 

  11. Izhikevich, E.M.: Resonate-and-fire neurons. Neural Netw. 14(6–7), 883–894 (2001)

    Article  Google Scholar 

  12. Kim, T., Lee, J., Nam, J.: Comparison and analysis of sample CNN architectures for audio classification. IEEE J. Sel. Top. Signal Process. 13(2), 285–297 (2019)

    Article  Google Scholar 

  13. Kumatani, K., et al.: Direct modeling of raw audio with DNNs for wake word detection. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 252–257. IEEE (2017)

    Google Scholar 

  14. Lee, J., Park, J., Kim, K.L., Nam, J.: Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv preprint arXiv:1703.01789 (2017)

  15. Mayr, C., Hoeppner, S., Furber, S.: Spinnaker 2: a 10 million core processor system for brain simulation and machine learning. arXiv preprint arXiv:1911.02385 (2019)

  16. Neftci, E.O., Mostafa, H., Zenke, F.: Surrogate gradient learning in spiking neural networks. IEEE Signal Process. Mag. 36, 61–63 (2019)

    Article  Google Scholar 

  17. Ostrau, C., Homburg, J., Klarhorst, C., Thies, M., Rückert, U.: Benchmarking deep spiking neural networks on neuromorphic hardware. arXiv:2004.01656 12397, pp. 610–621 (2020)

  18. Pellegrini, T., Zimmer, R., Masquelier, T.: Low-activity supervised convolutional spiking neural networks applied to speech commands recognition. arXiv preprint arXiv:2011.06846 (2020)

  19. Rybakov, O., Kononenko, N., Subrahmanya, N., Visontai, M., Laurenzo, S.: Streaming keyword spotting on mobile devices. arXiv preprint arXiv:2005.06720 (2020)

  20. Sainath, T.N., et al.: Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 965–979 (2017)

    Article  Google Scholar 

  21. Sheik, S., Coath, M., Indiveri, G., Denham, S.L., Wennekers, T., Chicca, E.: Emergent auditory feature tuning in a real-time neuromorphic VLSI system. Front. Neurosci. 6, 17 (2012)

    Article  Google Scholar 

  22. Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018)

  23. Wu, J., Yılmaz, E., Zhang, M., Li, H., Tan, K.C.: Deep spiking neural networks for large vocabulary automatic speech recognition. Front. Neurosci. 14, 199 (2020)

    Article  Google Scholar 

  24. Yılmaz, E., Gevrek, O.B., Wu, J., Chen, Y., Meng, X., Li, H.: Deep convolutional spiking neural networks for keyword spotting. In: Proceedings of Interspeech 2020, pp. 2557–2561 (2020)

    Google Scholar 

  25. Yin, B., Corradi, F., Bohté, S.M.: Effective and efficient computation with multiple-timescale spiking recurrent neural networks. arXiv preprint arXiv:2005.11633 (2020)

  26. Zhang, Y., Suda, N., Lai, L., Chandra, V.: Hello edge: keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017)

Download references

Acknowledgments

We thank Infineon Technologies AG for supporting this research. The work is partly conducted within the KI-ASIC project that is funded by the German Federal Ministry of Education and Research (Grand Number 16ES0992K).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Auge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Auge, D., Hille, J., Kreutz, F., Mueller, E., Knoll, A. (2021). End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86383-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86382-1

  • Online ISBN: 978-3-030-86383-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics