research-article

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

Authors:
Yizhao Gao

University of Hong Kong, Hong Kong

University of Hong Kong, Hong Kong

0000-0001-5673-3746
View Profile

,
Song Wang

University of Hong Kong, Hong Kong

University of Hong Kong, Hong Kong

0000-0002-1813-5865
View Profile

,
Hayden Kwok-Hay So

University of Hong Kong, Hong Kong

University of Hong Kong, Hong Kong

0000-0002-6514-0237
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 16 Issue 4Article No.: 58pp 1–26https://doi.org/10.1145/3593587

Published:01 September 2023Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Although advances in event-based machine vision algorithms have demonstrated unparalleled capabilities in performing some of the most demanding tasks, their implementations under stringent real-time and power constraints in edge systems remain a major challenge. In this work, a reconfigurable hardware-software architecture called REMOT, which performs real-time event-based multi-object tracking on FPGAs, is presented. REMOT performs vision tasks by defining a set of actions over attention units (AUs). These actions allow AUs to track an object candidate autonomously by adjusting its region of attention and allow information gathered by each AU to be used for making algorithmic-level decisions. Taking advantage of this modular structure, algorithm-architecture codesign can be performed by implementing different parts of the algorithm in either hardware or software for different tradeoffs. Results show that REMOT can process 0.43–2.91 million events per second at 1.75–5.45 W. Compared with the software baseline, our implementation achieves up to 44 times higher throughput and 35.4 times higher power efficiency. Migrating the Merge operation to hardware further reduces the worst-case latency to be 95 times shorter than the software baseline. By varying the AU configuration and operation, a reduction of 0.59–0.77 mW per AU on the programmable logic has also been demonstrated.

REFERENCES

[1] Acharya Jyotibdha, Caycedo Andres Ussa, Padala Vandana Reddy, Sidhu Rishi Raj Singh, Orchard Garrick, Ramesh Bharath, and Basu Arindam. 2019. EBBIOT: A low-complexity tracking algorithm for surveillance in IoVT using stationary neuromorphic vision sensors. In Proceedings of the 32nd IEEE International System-on-Chip Conference (SOCC’19). 318–323.Google ScholarCross Ref
[2] Aimar Alessandro, Mostafa Hesham, Calabrese Enrico, Rios-Navarro Antonio, Tapiador-Morales Ricardo, Lungu Iulia-Alexandra, Milde Moritz B., Corradi Federico, Linares-Barranco Alejandro, Liu Shih-Chii, and Delbruck Tobi. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30, 3 (2019), 644–656.Google ScholarCross Ref
[3] Babenko Boris, Yang Ming-Hsuan, and Belongie Serge. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 983–990.Google Scholar
[4] Babenko Boris, Yang Ming-Hsuan, and Belongie Serge. 2010. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2010), 1619–1632.Google ScholarDigital Library
[5] Barranco Francisco, Fermuller Cornelia, and Ros Eduardo. 2018. Real-time clustering and multi-target tracking using event-based sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18), 5764–5769.Google Scholar
[6] Bochinski Erik, Eiselein Volker, and Sikora Thomas. 2017. High-speed tracking-by-detection without using image information. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’17), 1–6.Google Scholar
[7] Brandli Christian, Berner Raphael, Yang Minhao, Liu Shih-Chii, and Delbruck Tobi. 2014. A 240\(\times\) 180 130 db 3 \(\mu\)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circ. 49, 10 (2014), 2333–2341.Google ScholarCross Ref
[8] Chen Haosheng, Suter David, Wu Qiangqiang, and Wang Hanzi. 2020. End-to-end learning of object motion estimation from retinal events for event-based object tracking. Proc. AAAI Conf. Artif. Intell. 34, 07 (2020), 10534–10541.Google Scholar
[9] Chen Haosheng, Wu Qiangqiang, Liang Yanjie, Gao Xinbo, and Wang Hanzi. 2019. Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In Proceedings of the 27th ACM International Conference on Multimedia, 473–481.Google ScholarDigital Library
[10] Cohen Gregory K., Orchard Garrick, Leng Sio-Hoi, Tapson Jonathan, Benosman Ryad B., and Schaik André Van. 2016. Skimming digits: Neuromorphic classification of spike-encoded images. Front. Neurosci. 10 (2016), 184.Google ScholarCross Ref
[11] Davies Mike, Srinivasa Narayan, Lin Tsung-Han, Chinya Gautham, Cao Yongqiang, Choday Sri Harsha, Dimou Georgios, Joshi Prasad, Imam Nabil, Jain Shweta, et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 1 (2018), 82–99.Google ScholarCross Ref
[12] Dendorfer P., Rezatofighi H., Milan A., Shi J., Cremers D., Reid I., Roth S., Schindler K., and Leal-Taixé L.. 2020. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv: 2003.09003. Retrieved from http://arxiv.org/abs/1906.04567.Google Scholar
[13] Dietzfelbinger Martin, Hagerup Torben, Katajainen Jyrki, and Penttonen Martti. 1997. A reliable randomized algorithm for the closest-pair problem. J. Algor. 25, 1 (1997), 19–51.Google ScholarDigital Library
[14] Ester Martin, Kriegel Hans-Peter, Sander Jörg, Xu Xiaowei, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96),226–231.Google Scholar
[15] Gallego Guillermo, Delbruck Tobi, Orchard Garrick Michael, Bartolozzi Chiara, Taba Brian, Censi Andrea, Leutenegger Stefan, Davison Andrew, Conradt Jorg, Daniilidis Kostas, and Scaramuzza Davide. 2020. Event-based vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020), 1–26.Google Scholar
[16] Gao Yizhao, Wang Song, and So Hayden Kwok-Hay. 2022. REMOT: A hardware-software architecture for attention-guided multi-object tracking with dynamic vision sensors on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’22). Association for Computing Machinery, New York, NY, 158–168. Google ScholarDigital Library
[17] Gehrig Daniel, Rebecq Henri, Gallego Guillermo, and Scaramuzza Davide. 2020. EKLT: Asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 128, 3 (2020), 601–618.Google ScholarCross Ref
[18] Geiger Andreas, Lenz Philip, and Urtasun Raquel. 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361.Google Scholar
[19] Gerstner Wulfram and Kistler Werner M.. 2002. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press.Google ScholarCross Ref
[20] Huttenlocher D. P., Klanderman G. A., and Rucklidge W. J.. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850–863. Google ScholarDigital Library
[21] Jiang Rui, Mou Xiaozheng, Shi Shunshun, Zhou Yueyin, Wang Qinyi, Dong Meng, and Chen Shoushun. 2020. Object tracking on event cameras with offline–online learning. CAAI Trans. Intell. Technol. 5, 3 (2020), 165–171.Google ScholarDigital Library
[22] Lagorce Xavier, Orchard Garrick, Galluppi Francesco, Shi Bertram E., and Benosman Ryad B.. 2017. HOTS: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 7 (2017), 1346–1359.Google ScholarDigital Library
[23] Li Hongmin and Shi Luping. 2019. Robust event-based object tracking combining correlation filter and CNN representation representation. Front. Neurorobot. 13 (2019), 82.Google ScholarCross Ref
[24] Linares-Barranco A., Gómez-Rodríguez F., Villanueva V., Longinotti L., and Delbrück T.. 2015. A USB3.0 FPGA event-based filtering and tracking framework for dynamic vision sensors. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’15), 2417–2420.Google Scholar
[25] Linares-Barranco Alejandro, Perez-Peña Fernando, Moeys Diederik Paul, Gomez-Rodriguez Francisco, Jimenez-Moreno Gabriel, Liu Shih-Chii, and Delbruck Tobi. 2019. Low latency event-based filtering and feature extraction for dynamic vision sensors in real-time FPGA applications. IEEE Access 7 (2019), 134926–134942.Google ScholarCross Ref
[26] Linares-Barranco Alejandro, Rios-Navarro Antonio, Canas-Moreno Salvador, Piñero-Fuentes Enrique, Tapiador-Morales Ricardo, and Delbruck Tobi. 2021. Dynamic vision sensor integration on FPGA-based CNN accelerators for high-speed visual classification. In Proceedings of the International Conference on Neuromorphic Systems, 1–7.Google Scholar
[27] Liu Qianhui, Ruan Haibo, Xing Dong, Tang Huajin, and Pan Gang. 2020. Effective AER object classification using segmented probability-maximization learning in spiking neural networks. Proc. AAAI Conf. Artif. Intell. 34, 02 (2020), 1308–1315.Google Scholar
[28] Luiten Jonathon, Os̆ep Aljos̆a, Dendorfer Patrick, Torr Philip, Geiger Andreas, Leal-Taixé Laura, and Leibe Bastian. 2021. HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 2 (2021), 548–578.Google ScholarDigital Library
[29] Merolla Paul A., Arthur John V., Alvarez-Icaza Rodrigo, Cassidy Andrew S., Sawada Jun, Akopyan Filipp, Jackson Bryan L., Imam Nabil, Guo Chen, Nakamura Yutaka, et al. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668–673.Google ScholarCross Ref
[30] Mueggler Elias, Rebecq Henri, Gallego Guillermo, Delbruck Tobi, and Scaramuzza Davide. 2017. The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. Int. J. Robot. Res. 36, 2 (2017), 142–149.Google ScholarDigital Library
[31] Müllner Daniel. 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378. Retrieved from https://arxiv.org/abs/1109.2378.Google Scholar
[32] Ojeda Fernando Cladera, Bisulco Anthony, Kepple Daniel, Isler Volkan, and Lee Daniel D.. 2020. On-device event filtering with binary neural networks for pedestrian detection using neuromorphic vision sensors. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20), 3084–3088.Google ScholarCross Ref
[33] Perot Etienne, de Tournemire Pierre, Nitti Davide, Masci Jonathan, and Sironi Amos. 2020. Learning to detect objects with a 1 megapixel event camera. Adv. Neural Inf. Process. Syst. 33 (2020), 16639–16652.Google Scholar
[34] Pylyshyn Zenon W. and Storm Ron W.. 1988. Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spat. Vis. 3, 3 (1988), 179–197.Google ScholarCross Ref
[35] Ramesh Bharath, Ussa Andrés, Vedova Luca Della, Yang Hong, and Orchard Garrick. 2018. PCA-RECT: An energy-efficient object detection approach for event cameras. In Proceedings of the Asian Conference on Computer Vision, 434–449.Google Scholar
[36] Renner Alpha, Evanusa Matthew, and Sandamirskaya Yulia. 2019. Event-based attention and tracking on neuromorphic hardware. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’19). 1709–1716.Google ScholarCross Ref
[37] Serrano-Gotarredona Rafael, Oster Matthias, Lichtsteiner Patrick, Linares-Barranco Alejandro, Paz-Vicente Rafael, Gómez-Rodríguez Francisco, Camuñas-Mesa Luis, Berner Raphael, Rivas-Pérez Manuel, Delbruck Tobi, et al. 2009. CAVIAR: A 45k neuron, 5M synapse, 12G connects/s AER hardware sensory–processing–learning–actuating system for high-speed visual object recognition and tracking. IEEE Trans. Neural Netw. 20, 9 (2009), 1417–1438.Google ScholarDigital Library
[38] Sironi Amos, Brambilla Manuele, Bourdis Nicolas, Lagorce Xavier, and Benosman Ryad. 2018. HATS: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
[39] Tapiador-Morales Ricardo, Maro Jean-Matthieu, Jimenez-Fernandez Angel, Jimenez-Moreno Gabriel, Benosman Ryad, and Linares-Barranco Alejandro. 2020. Event-based gesture recognition through a hierarchy of time-surfaces for FPGA. Sensors 20, 12 (2020), 3404.Google ScholarCross Ref
[40] Ussa Andrés, Rajen Chockalingam Senthil, Singla Deepak, Acharya Jyotibdha, Chuanrong Gideon Fu, Basu Arindam, and Ramesh Bharath. 2020. A hybrid neuromorphic object tracking and classification framework for real-time systems. arXiv:2007.11404. Retrieved from https://arxiv.org/abs/2007.11404.Google Scholar

Index Terms

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
  2. Real-time systems
    1. Real-time system architecture
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking

Recommendations

Real-Time Object Tracking System on FPGAs
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

Object tracking is an important task in computer vision applications. One of the crucial challenges is the real-time speed requirement. In this paper we implement an object tracking system in reconfigurable hardware using an efficient parallel ...
Read More
Real-time multiple object centroid tracking for gesture recognition based on FPGA
ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

In this paper, we present the design and implementation of real-time multiple object centroid tracking for gesture recognition. Our multiple object tracking design consists of four stages: preprocessing, local intensity accumulation, object observation, ...
Read More
Real-time multi-view 3d object tracking in cluttered scenes
ISVC'06: Proceedings of the Second international conference on Advances in Visual Computing - Volume Part II

This paper presents an approach to real-time 3D object tracking in cluttered scenes using multiple synchronized and calibrated cameras. The goal is to accurately track targets over a long period of time in the presence of complete occlusion in some of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 16, Issue 4
December 2023
343 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3615981
Editor:
Deming Chen
University of Illinois, Urbana-Champaign, USA
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2023
- Online AM: 21 April 2023
- Accepted: 4 April 2023
- Revised: 3 February 2023
- Received: 14 September 2022
Published in trets Volume 16, Issue 4

Check for updates
Author Tags
REMOT
Dynamic Vision Sensors
multi-object tracking
event sensors
event camera
hardware/software co-design
attention unit
FPGA
HOTA
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 398
  Total Downloads
- Downloads (Last 12 months)398
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

ACM Transactions on Reconfigurable Technology and Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Real-Time Object Tracking System on FPGAs

Real-time multiple object centroid tracking for gesture recognition based on FPGA

Real-time multi-view 3d object tracking in cluttered scenes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

ACM Transactions on Reconfigurable Technology and Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Real-Time Object Tracking System on FPGAs

Real-time multiple object centroid tracking for gesture recognition based on FPGA

Real-time multi-view 3d object tracking in cluttered scenes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media