Skip to main content

FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Event cameras are neuromorphic image sensors that respond to per-pixel brightness changes, producing a stream of asynchronous and spatially sparse events. Currently, the most successful algorithms for event cameras convert batches of events into dense image-like representations that are synchronously processed by deep learning models of frame-based computer vision. These methods discard the inherent properties of events, leading to high latency and computational costs. Following a recent line of works, we propose a model for efficient asynchronous event processing that exploits sparsity. We design the Fully Asynchronous, Recurrent and Sparse Event-Based CNN (FARSE-CNN),, a novel multi-layered architecture which combines the mechanisms of recurrent and convolutional neural networks. To build efficient deep networks, we propose compression modules that allow to learn hierarchical features both in space and time. We theoretically derive the complexity of all components in our architecture, and experimentally validate our method on tasks for object recognition, object detection and gesture recognition. FARSE-CNN achieves similar or better performance than the state-of-the-art among asynchronous methods, with low computational complexity and without relying on a fixed-length history of events. Our code is released at https://github.com/AIRLab-POLIMI/farse-cnn.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amir, A., et al.: A low power, fully event-based gesture recognition system. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7388–7397 (2017). https://doi.org/10.1109/CVPR.2017.781

  2. Barbier, T., Teulière, C., Triesch, J.: Spike timing-based unsupervised learning of orientation, disparity, and motion representations in a spiking neural network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1377–1386 (2021). https://doi.org/10.1109/CVPRW53098.2021.00152

  3. Cannici, M., Ciccone, M., Romanoni, A., Matteucci, M.: Asynchronous convolutional networks for object detection in neuromorphic cameras. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1656–1665 (2019). https://doi.org/10.1109/CVPRW.2019.00209

  4. Cannici, M., Ciccone, M., Romanoni, A., Matteucci, M.: Matrix-lstm: a differentiable recurrent surface for asynchronous event-based data. In: Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, pp. 23–28 (2020)

    Google Scholar 

  5. Chung, J., Ahn, S., Bengio, Y.: Hierarchical multiscale recurrent neural networks. In: 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings (2017). www.scopus.com

  6. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  7. Falcon, W., The PyTorch Lightning team: PyTorch Lightning (2019). https://doi.org/10.5281/zenodo.3828935. https://github.com/Lightning-AI/lightning

  8. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006)

    Article  Google Scholar 

  9. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010). https://doi.org/10.1109/TPAMI.2009.167

    Article  Google Scholar 

  10. Gehrig, D., Loquercio, A., Derpanis, K., Scaramuzza, D.: End-to-end learning of representations for asynchronous event-based data. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5632–5642 (2019). https://doi.org/10.1109/ICCV.2019.00573

  11. Graham, B.: Sparse 3d convolutional neural networks. In: British Machine Vision Conference (2015)

    Google Scholar 

  12. Graham, B.: Spatially-sparse convolutional neural networks. arXiv preprint arXiv:1409.6070 (2014)

  13. Graham, B., Engelcke, M., Maaten, L.V.D.: 3d semantic segmentation with submanifold sparse convolutional networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018). https://doi.org/10.1109/CVPR.2018.00961

  14. He, W., et al.: Comparing snns and rnns on neuromorphic vision datasets: similarities and differences. Neural Netw. Off. J. Int. Neural Netw. Soc. 132, 108–120 (2020)

    Google Scholar 

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  16. Innocenti, S.U., Becattini, F., Pernici, F., Del Bimbo, A.: Temporal binary representation for event-based action recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10426–10432 (2021). https://doi.org/10.1109/ICPR48806.2021.9412991

  17. Kamal, U., Dash, S., Mukhopadhyay, S.: Associative memory augmented asynchronous spatiotemporal representation learning for event-based perception. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023 (2023). OpenReview.net (2023). https://openreview.net/pdf?id=ZCStthyW-TD

  18. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)

    Google Scholar 

  19. Lagorce, X., Orchard, G., Galluppi, F., Shi, B.E., Benosman, R.B.: HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1346–1359 (2017). https://doi.org/10.1109/TPAMI.2016.2574707

    Article  Google Scholar 

  20. Lee, J.H., Delbruck, T., Pfeiffer, M.: Training deep spiking neural networks using backpropagation. Front. Neurosci. 10 (2016). https://doi.org/10.3389/fnins.2016.00508. https://www.frontiersin.org/articles/10.3389/fnins.2016.00508

  21. Li, Y., et al.: Graph-based asynchronous event processing for rapid object recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 914–923 (2021). https://doi.org/10.1109/ICCV48922.2021.00097

  22. Lichtsteiner, P., Posch, C., Delbruck, T.: A 128 \(\times \) 128 120 db 15 µs latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circuits 43, 566–576 (2008). https://doi.org/10.1109/JSSC.2007.914337

    Article  Google Scholar 

  23. Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=Skq89Scxx

  24. Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014). https://doi.org/10.1126/science.1254642. https://www.science.org/doi/abs/10.1126/science.1254642

  25. Messikommer, N., Gehrig, D., Loquercio, A., Scaramuzza, D.: Event-based asynchronous sparse convolutional networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision - ECCV 2020, pp. 415–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_25

    Chapter  Google Scholar 

  26. Mitrokhin, A., Hua, Z., Fermüller, C., Aloimonos, Y.: Learning visual motion segmentation using event surfaces. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14402–14411 (2020). https://doi.org/10.1109/CVPR42600.2020.01442

  27. O’Connor, P., Neil, D., Liu, S.C., Delbruck, T., Pfeiffer, M.: Real-time classification and sensor fusion with a spiking deep belief network. Front. Neurosci. 7 (2013). https://doi.org/10.3389/fnins.2013.00178. https://www.frontiersin.org/articles/10.3389/fnins.2013.00178

  28. Orchard, G., Jayawant, A., Cohen, G., Thakor, N.: Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neurosci. 9 (2015). https://doi.org/10.3389/fnins.2015.00437

  29. Padilla, R., Passos, W.L., Dias, T.L.B., Netto, S.L., da Silva, E.A.B.: A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10(3) (2021). https://doi.org/10.3390/electronics10030279. https://www.mdpi.com/2079-9292/10/3/279

  30. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  31. Posch, C., Serrano-Gotarredona, T., Linares-Barranco, B., Delbruck, T.: Retinomorphic event-based vision sensors: Bioinspired cameras with spiking output. Proc. IEEE 102, 1470–1484 (2014). https://doi.org/10.1109/JPROC.2014.2346153

    Article  Google Scholar 

  32. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  33. Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: Events-to-video: bringing modern computer vision to event cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  34. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  35. Rueckauer, B., Liu, S.C.: Conversion of analog to spiking neural networks using sparse temporal coding. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2018). https://doi.org/10.1109/ISCAS.2018.8351295

  36. Schaefer, S., Gehrig, D., Scaramuzza, D.: AEGNN: asynchronous event-based graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12371–12381 (2022)

    Google Scholar 

  37. Sekikawa, Y., Hara, K., Saito, H.: Eventnet: asynchronous recursive event processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  38. Shi, X., et al.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf

  39. Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.B.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1731–1740 (2018)

    Google Scholar 

  40. Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Grue Simonsen, J., Nie, J.Y.: A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 553–562. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2806416.2806493

  41. de Tournemire, P., Nitti, D., Perot, E., Migliore, D., Sironi, A.: A large scale event-based detection dataset for automotive. arXiv preprint arXiv:2001.08499 (2020)

  42. Wozniak, S., Pantazi, A., Bohnstingl, T., Eleftheriou, E.: Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nat. Mach. Intell. 2, 325–336 (2020). https://doi.org/10.1038/s42256-020-0187-0

    Article  Google Scholar 

  43. Xie, B., Deng, Y., Shao, Z., Liu, H., Li, Y.: Vmv-gcn: volumetric multi-view based graph cnn for event stream classification. IEEE Rob. Autom. Lett. 7(2), 1976–1983 (2022). https://doi.org/10.1109/LRA.2022.3140819

    Article  Google Scholar 

  44. Zhu, A., Yuan, L., Chaney, K., Daniilidis, K.: Ev-flownet: self-supervised optical flow estimation for event-based cameras. In: Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania (2018). https://doi.org/10.15607/RSS.2018.XIV.062

  45. Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 989–997 (2019)

    Google Scholar 

Download references

Acknowledgements

This paper is supported by the FAIR (Future Artificial Intelligence Research) project, funded by the NextGenerationEU program within the PNRR-PE-AI scheme (M4C2, investment 1.3, line on Artificial Intelligence).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Santambrogio .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 751 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santambrogio, R., Cannici, M., Matteucci, M. (2025). FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15112. Springer, Cham. https://doi.org/10.1007/978-3-031-72949-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72949-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72948-5

  • Online ISBN: 978-3-031-72949-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics