Abstract
Although advances in event-based machine vision algorithms have demonstrated unparalleled capabilities in performing some of the most demanding tasks, their implementations under stringent real-time and power constraints in edge systems remain a major challenge. In this work, a reconfigurable hardware-software architecture called REMOT, which performs real-time event-based multi-object tracking on FPGAs, is presented. REMOT performs vision tasks by defining a set of actions over attention units (AUs). These actions allow AUs to track an object candidate autonomously by adjusting its region of attention and allow information gathered by each AU to be used for making algorithmic-level decisions. Taking advantage of this modular structure, algorithm-architecture codesign can be performed by implementing different parts of the algorithm in either hardware or software for different tradeoffs. Results show that REMOT can process 0.43–2.91 million events per second at 1.75–5.45 W. Compared with the software baseline, our implementation achieves up to 44 times higher throughput and 35.4 times higher power efficiency. Migrating the Merge operation to hardware further reduces the worst-case latency to be 95 times shorter than the software baseline. By varying the AU configuration and operation, a reduction of 0.59–0.77 mW per AU on the programmable logic has also been demonstrated.
- [1] . 2019. EBBIOT: A low-complexity tracking algorithm for surveillance in IoVT using stationary neuromorphic vision sensors. In Proceedings of the 32nd IEEE International System-on-Chip Conference (SOCC’19). 318–323.Google ScholarCross Ref
- [2] . 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30, 3 (2019), 644–656.Google ScholarCross Ref
- [3] . 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 983–990.Google Scholar
- [4] . 2010. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2010), 1619–1632.Google ScholarDigital Library
- [5] . 2018. Real-time clustering and multi-target tracking using event-based sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18), 5764–5769.Google Scholar
- [6] . 2017. High-speed tracking-by-detection without using image information. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’17), 1–6.Google Scholar
- [7] . 2014. A 240\(\times\) 180 130 db 3 \(\mu\)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circ. 49, 10 (2014), 2333–2341.Google ScholarCross Ref
- [8] . 2020. End-to-end learning of object motion estimation from retinal events for event-based object tracking. Proc. AAAI Conf. Artif. Intell. 34, 07 (2020), 10534–10541.Google Scholar
- [9] . 2019. Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In Proceedings of the 27th ACM International Conference on Multimedia, 473–481.Google ScholarDigital Library
- [10] . 2016. Skimming digits: Neuromorphic classification of spike-encoded images. Front. Neurosci. 10 (2016), 184.Google ScholarCross Ref
- [11] . 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 1 (2018), 82–99.Google ScholarCross Ref
- [12] . 2020. MOT20: A benchmark for multi object tracking in crowded scenes.
arXiv: 2003.09003 . Retrieved from http://arxiv.org/abs/1906.04567.Google Scholar - [13] . 1997. A reliable randomized algorithm for the closest-pair problem. J. Algor. 25, 1 (1997), 19–51.Google ScholarDigital Library
- [14] . 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96),226–231.Google Scholar
- [15] . 2020. Event-based vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020), 1–26.Google Scholar
- [16] . 2022. REMOT: A hardware-software architecture for attention-guided multi-object tracking with dynamic vision sensors on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’22). Association for Computing Machinery, New York, NY, 158–168. Google ScholarDigital Library
- [17] . 2020. EKLT: Asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 128, 3 (2020), 601–618.Google ScholarCross Ref
- [18] . 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361.Google Scholar
- [19] . 2002. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press.Google ScholarCross Ref
- [20] . 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850–863. Google ScholarDigital Library
- [21] . 2020. Object tracking on event cameras with offline–online learning. CAAI Trans. Intell. Technol. 5, 3 (2020), 165–171.Google ScholarDigital Library
- [22] . 2017. HOTS: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 7 (2017), 1346–1359.Google ScholarDigital Library
- [23] . 2019. Robust event-based object tracking combining correlation filter and CNN representation representation. Front. Neurorobot. 13 (2019), 82.Google ScholarCross Ref
- [24] . 2015. A USB3.0 FPGA event-based filtering and tracking framework for dynamic vision sensors. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’15), 2417–2420.Google Scholar
- [25] . 2019. Low latency event-based filtering and feature extraction for dynamic vision sensors in real-time FPGA applications. IEEE Access 7 (2019), 134926–134942.Google ScholarCross Ref
- [26] . 2021. Dynamic vision sensor integration on FPGA-based CNN accelerators for high-speed visual classification. In Proceedings of the International Conference on Neuromorphic Systems, 1–7.Google Scholar
- [27] . 2020. Effective AER object classification using segmented probability-maximization learning in spiking neural networks. Proc. AAAI Conf. Artif. Intell. 34, 02 (2020), 1308–1315.Google Scholar
- [28] . 2021. HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 2 (2021), 548–578.Google ScholarDigital Library
- [29] . 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668–673.Google ScholarCross Ref
- [30] . 2017. The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. Int. J. Robot. Res. 36, 2 (2017), 142–149.Google ScholarDigital Library
- [31] . 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378. Retrieved from https://arxiv.org/abs/1109.2378.Google Scholar
- [32] . 2020. On-device event filtering with binary neural networks for pedestrian detection using neuromorphic vision sensors. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20), 3084–3088.Google ScholarCross Ref
- [33] . 2020. Learning to detect objects with a 1 megapixel event camera. Adv. Neural Inf. Process. Syst. 33 (2020), 16639–16652.Google Scholar
- [34] . 1988. Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spat. Vis. 3, 3 (1988), 179–197.Google ScholarCross Ref
- [35] . 2018. PCA-RECT: An energy-efficient object detection approach for event cameras. In Proceedings of the Asian Conference on Computer Vision, 434–449.Google Scholar
- [36] . 2019. Event-based attention and tracking on neuromorphic hardware. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’19). 1709–1716.Google ScholarCross Ref
- [37] . 2009. CAVIAR: A 45k neuron, 5M synapse, 12G connects/s AER hardware sensory–processing–learning–actuating system for high-speed visual object recognition and tracking. IEEE Trans. Neural Netw. 20, 9 (2009), 1417–1438.Google ScholarDigital Library
- [38] . 2018. HATS: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
- [39] . 2020. Event-based gesture recognition through a hierarchy of time-surfaces for FPGA. Sensors 20, 12 (2020), 3404.Google ScholarCross Ref
- [40] . 2020. A hybrid neuromorphic object tracking and classification framework for real-time systems. arXiv:2007.11404. Retrieved from https://arxiv.org/abs/2007.11404.Google Scholar
Index Terms
- A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking
Recommendations
Real-Time Object Tracking System on FPGAs
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingObject tracking is an important task in computer vision applications. One of the crucial challenges is the real-time speed requirement. In this paper we implement an object tracking system in reconfigurable hardware using an efficient parallel ...
Real-time multiple object centroid tracking for gesture recognition based on FPGA
ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and CommunicationIn this paper, we present the design and implementation of real-time multiple object centroid tracking for gesture recognition. Our multiple object tracking design consists of four stages: preprocessing, local intensity accumulation, object observation, ...
Real-time multi-view 3d object tracking in cluttered scenes
ISVC'06: Proceedings of the Second international conference on Advances in Visual Computing - Volume Part IIThis paper presents an approach to real-time 3D object tracking in cluttered scenes using multiple synchronized and calibrated cameras. The goal is to accurately track targets over a long period of time in the presence of complete occlusion in some of ...
Comments