skip to main content
10.1145/3581783.3612054acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Recognizing High-Speed Moving Objects with Spike Camera

Published: 27 October 2023 Publication History

Abstract

Spike camera is a novel bio-inspired vision sensor that mimics the sampling mechanism of the primate fovea. It presents high temporal resolution and dynamic range, showing great potentials in the high-speed moving object recognition task, which has not been fully explored in the Multimedia community due to the lack of data and annotations. This paper contributes the first large-scale High-Speed Spiking Recognition (HSSR) dataset, by recording high-speed moving objects using a spike camera. The HSSR dataset contains 135,000 indoor objects annotated using ImageNet labels and 3,100 outdoor objects collected from real-world scenarios. Furthermore, we propose an original spiking recognition framework, which employs long-term spike stream features to supervise the feature learning from short-term spike streams. This framework improves the recognition accuracy, meanwhile substantially decreasing the recognition latency, making our method can accurately recognize moving objects at an equivalent speed of 514 km/h, using only 1 ms of spike stream. Experimental results show that, the proposed method achieves 76.5% accuracy for recognizing 100 fine-grained indoor objects and 84.3% accuracy for recognizing 8 outdoor objects using 1 ms of spike streams. Resources will be available at https://github.com/Evin-X/HSSR.

References

[1]
Shafiq Ahmad, Gianluca Scarpellini, Pietro Morerio, and Alessio Del Bue. 2022. Event-driven Re-Id: A New Benchmark and Method Towards Privacy-Preserving Person Re-Identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 459--468.
[2]
Zhiwen Chen, Jinjian Wu, Junhui Hou, Leida Li, Weisheng Dong, and Guangming Shi. 2022. ECSNet: Spatio-Temporal Feature Learning for Event Camera. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[3]
Gregory Cohen, Saeed Afshar, Garrick Orchard, Jonathan Tapson, Ryad Benosman, and Andre van Schaik. 2018. Spatial and temporal downsampling in event-based visual classification. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 10 (2018), 5030--5044.
[4]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.
[5]
Lei Deng, Yujie Wu, Xing Hu, Ling Liang, Yufei Ding, Guoqi Li, Guangshe Zhao, Peng Li, and Yuan Xie. 2020. Rethinking the performance comparison between SNNS and ANNS. Neural Networks, Vol. 121 (2020), 294--307.
[6]
Yongjian Deng, Hao Chen, Huiying Chen, and Youfu Li. 2021b. Learning from images: A distillation learning framework for event cameras. IEEE Transactions on Image Processing, Vol. 30 (2021), 4919--4931.
[7]
Yongjian Deng, Hao Chen, and Youfu Li. 2021a. MVF-Net: A multi-view fusion network for event-based object classification. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 12 (2021), 8275--8284.
[8]
Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, and Yonghong Tian. 2021. Deep residual learning in spiking neural networks. Advances in Neural Information Processing Systems, Vol. 34 (2021), 21056--21069.
[9]
Andrew C Freeman, Chris Burgess, and Ketan Mayer-Patel. 2021. Motion segmentation and tracking for integrating event cameras. In Proceedings of the 12th Conference on ACM Multimedia Systems. 1--11.
[10]
Andrew C Freeman, Montek Singh, and Ketan Mayer-Patel. 2023. An Asynchronous Intensity Representation for Framed and Event Video Sources. In Proceedings of the 14th Conference on ACM Multimedia Systems. 74--85.
[11]
Guillermo Gallego, Tobi Delbrück, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, Jörg Conradt, Kostas Daniilidis, et al. 2020. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 1 (2020), 154--180.
[12]
Daniel Gehrig, Antonio Loquercio, Konstantinos G Derpanis, and Davide Scaramuzza. 2019. End-to-end learning of representations for asynchronous event-based data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5633--5643.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[14]
Yuhuang Hu, Hongjie Liu, Michael Pfeiffer, and Tobi Delbruck. 2016. DVS benchmark datasets for object tracking, action recognition, and object recognition. Frontiers in Neuroscience, Vol. 10 (2016), 405.
[15]
Tiejun Huang, Yajing Zheng, Zhaofei Yu, Rui Chen, Yuan Li, Ruiqin Xiong, Lei Ma, Junwei Zhao, Siwei Dong, Lin Zhu, et al. 2022. 1000× faster camera and machine vision with ordinary devices. Engineering (2022).
[16]
Zhiwei Jiang, Jiaming Xu, Tielin Zhang, Mu-ming Poo, and Bo Xu. 2023. Origin of the efficiency of spike timing-based neural computation for processing temporal information. Neural Networks, Vol. 160 (2023), 84--96.
[17]
Junho Kim, Jaehyeok Bae, Gangin Park, Dongsu Zhang, and Young Min Kim. 2021. N-ImageNet: Towards robust, fine-grained object recognition with event cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2146--2156.
[18]
Xavier Lagorce, Garrick Orchard, Francesco Galluppi, Bertram E Shi, and Ryad B Benosman. 2016. HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 7 (2016), 1346--1359.
[19]
Hongmin Li, Hanchao Liu, Xiangyang Ji, Guoqi Li, and Luping Shi. 2017. CIFAR10-DVS: an event-stream dataset for object classification. Frontiers in Neuroscience, Vol. 11 (2017), 309.
[20]
Jianing Li, Xiao Wang, Lin Zhu, Jia Li, Tiejun Huang, and Yonghong Tian. 2022b. Retinomorphic object detection in asynchronous visual streams. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1332--1340.
[21]
Yuhang Li, Youngeun Kim, Hyoungseob Park, Tamar Geller, and Priyadarshini Panda. 2022a. Neuromorphic data augmentation for training spiking neural networks. In European Conference on Computer Vision. Springer, 631--649.
[22]
Yihan Lin, Wei Ding, Shaohua Qiang, Lei Deng, and Guoqi Li. 2021. ES-ImageNet: A million event-stream classification dataset for spiking neural networks. Frontiers in Neuroscience (2021), 1546.
[23]
Qianhui Liu, Haibo Ruan, Dong Xing, Huajin Tang, and Gang Pan. 2020. Effective AER object classification using segmented probability-maximization learning in spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1308--1315.
[24]
Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. 2021. TAM: Temporal adaptive module for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13708--13718.
[25]
Ana I Maqueda, Antonio Loquercio, Guillermo Gallego, Narciso García, and Davide Scaramuzza. 2018. Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5419--5427.
[26]
Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, and Zhi-Quan Luo. 2022. Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12444--12453.
[27]
Garrick Orchard, Ajinkya Jayawant, Gregory K Cohen, and Nitish Thakor. 2015. Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience, Vol. 9 (2015), 437.
[28]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618--626.
[29]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[30]
Shiming Song, Chenxiang Ma, Wei Sun, Junhai Xu, Jianwu Dang, and Qiang Yu. 2021. Efficient learning with augmented spikes: A case study with image classification. Neural Networks, Vol. 142 (2021), 205--212.
[31]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9, 11 (2008).
[32]
Yixuan Wang, Jianing Li, Lin Zhu, Xijie Xiang, Tiejun Huang, and Yonghong Tian. 2022. Learning stereo depth estimation with bio-inspired spike cameras. In 2022 IEEE International Conference on Multimedia and Expo. IEEE, 1--6.
[33]
Zhengwei Wang, Qi She, and Aljosa Smolic. 2021. Action-Net: Multipath excitation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13214--13223.
[34]
Qiang Yu, Shiming Song, Chenxiang Ma, Linqiang Pan, and Kay Chen Tan. 2021. Synaptic learning with augmented spikes. IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, 3 (2021), 1134--1146.
[35]
Kaixuan Zhang, Kaiwei Che, Jianguo Zhang, Jie Cheng, Ziyang Zhang, Qinghai Guo, and Luziwei Leng. 2022. Discrete time convolution for fast event-based stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8676--8686.
[36]
Junwei Zhao, Shiliang Zhang, Lei Ma, Zhaofei Yu, and Tiejun Huang. 2022b. SpikingSIM:A Bio-inspired Spiking Simulator. In 2022 IEEE International Symposium on Circuits and Systems. IEEE.
[37]
Rui Zhao, Ruiqin Xiong, Jing Zhao, Zhaofei Yu, Xiaopeng Fan, and Tiejun Huang. 2022a. Learning Optical Flow from Continuous Spike Streams. Advances in Neural Information Processing Systems, Vol. 35, 7905--7920.
[38]
Hanle Zheng, Yujie Wu, Lei Deng, Yifan Hu, and Guoqi Li. 2021. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11062--11070.
[39]
Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, and Lin Wang. 2023 a. Deep learning for event-based vision: A comprehensive survey and benchmarks. arXiv preprint arXiv:2302.08890 (2023).
[40]
Yajing Zheng, Zhaofei Yu, Song Wang, and Tiejun Huang. 2022. Spike-Based Motion Estimation for Object Tracking Through Bio-Inspired Unsupervised Learning. IEEE Transactions on Image Processing, Vol. 32 (2022), 335--349.
[41]
Yajing Zheng, Jiyuan Zhang, Rui Zhao, Jianhao Ding, Shiyan Chen, Ruiqin Xiong, Zhaofei Yu, and Tiejun Huang. 2023 b. SpikeCV: Open a Continuous Computer Vision Era. arXiv preprint arXiv:2303.11684 (2023).
[42]
Yajing Zheng, Lingxiao Zheng, Zhaofei Yu, Tiejun Huang, and Song Wang. 2023 c. Capture the Moment: High-speed Imaging with Spiking Cameras through Short-term Plasticity. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[43]
Lin Zhu, Siwei Dong, Tiejun Huang, and Yonghong Tian. 2019a. A retina-inspired sampling method for visual texture reconstruction. In 2019 IEEE International Conference on Multimedia and Expo. IEEE, 1432--1437.
[44]
Lin Zhu, Jianing Li, Xiao Wang, Tiejun Huang, and Yonghong Tian. 2021. NeuSpike-Net: High Speed Video Reconstruction via Bio-inspired Neuromorphic Cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2400--2409.
[45]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019b. Deformable Convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9308--9316.

Cited By

View all
  • (2025)FlyCount: High-Speed Counting of Black Soldier Flies Using Neuromorphic SensorsIEEE Sensors Journal10.1109/JSEN.2024.350428925:2(2861-2869)Online publication date: 15-Jan-2025
  • (2024)Unifying Spike Perception and Prediction: A Compact Spike Representation Model Using Multi-scale CorrelationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681448(2341-2349)Online publication date: 28-Oct-2024

Index Terms

  1. Recognizing High-Speed Moving Objects with Spike Camera

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. neuromorphic vision
    2. object recognition
    3. representation learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)134
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)FlyCount: High-Speed Counting of Black Soldier Flies Using Neuromorphic SensorsIEEE Sensors Journal10.1109/JSEN.2024.350428925:2(2861-2869)Online publication date: 15-Jan-2025
    • (2024)Unifying Spike Perception and Prediction: A Compact Spike Representation Model Using Multi-scale CorrelationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681448(2341-2349)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media