skip to main content
research-article

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

Published:01 September 2023Publication History
Skip Abstract Section

Abstract

Although advances in event-based machine vision algorithms have demonstrated unparalleled capabilities in performing some of the most demanding tasks, their implementations under stringent real-time and power constraints in edge systems remain a major challenge. In this work, a reconfigurable hardware-software architecture called REMOT, which performs real-time event-based multi-object tracking on FPGAs, is presented. REMOT performs vision tasks by defining a set of actions over attention units (AUs). These actions allow AUs to track an object candidate autonomously by adjusting its region of attention and allow information gathered by each AU to be used for making algorithmic-level decisions. Taking advantage of this modular structure, algorithm-architecture codesign can be performed by implementing different parts of the algorithm in either hardware or software for different tradeoffs. Results show that REMOT can process 0.43–2.91 million events per second at 1.75–5.45 W. Compared with the software baseline, our implementation achieves up to 44 times higher throughput and 35.4 times higher power efficiency. Migrating the Merge operation to hardware further reduces the worst-case latency to be 95 times shorter than the software baseline. By varying the AU configuration and operation, a reduction of 0.59–0.77 mW per AU on the programmable logic has also been demonstrated.

REFERENCES

  1. [1] Acharya Jyotibdha, Caycedo Andres Ussa, Padala Vandana Reddy, Sidhu Rishi Raj Singh, Orchard Garrick, Ramesh Bharath, and Basu Arindam. 2019. EBBIOT: A low-complexity tracking algorithm for surveillance in IoVT using stationary neuromorphic vision sensors. In Proceedings of the 32nd IEEE International System-on-Chip Conference (SOCC’19). 318323.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Aimar Alessandro, Mostafa Hesham, Calabrese Enrico, Rios-Navarro Antonio, Tapiador-Morales Ricardo, Lungu Iulia-Alexandra, Milde Moritz B., Corradi Federico, Linares-Barranco Alejandro, Liu Shih-Chii, and Delbruck Tobi. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30, 3 (2019), 644656.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Babenko Boris, Yang Ming-Hsuan, and Belongie Serge. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 983990.Google ScholarGoogle Scholar
  4. [4] Babenko Boris, Yang Ming-Hsuan, and Belongie Serge. 2010. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2010), 16191632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Barranco Francisco, Fermuller Cornelia, and Ros Eduardo. 2018. Real-time clustering and multi-target tracking using event-based sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18), 57645769.Google ScholarGoogle Scholar
  6. [6] Bochinski Erik, Eiselein Volker, and Sikora Thomas. 2017. High-speed tracking-by-detection without using image information. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’17), 16.Google ScholarGoogle Scholar
  7. [7] Brandli Christian, Berner Raphael, Yang Minhao, Liu Shih-Chii, and Delbruck Tobi. 2014. A 240\(\times\) 180 130 db 3 \(\mu\)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circ. 49, 10 (2014), 23332341.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Haosheng, Suter David, Wu Qiangqiang, and Wang Hanzi. 2020. End-to-end learning of object motion estimation from retinal events for event-based object tracking. Proc. AAAI Conf. Artif. Intell. 34, 07 (2020), 1053410541.Google ScholarGoogle Scholar
  9. [9] Chen Haosheng, Wu Qiangqiang, Liang Yanjie, Gao Xinbo, and Wang Hanzi. 2019. Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In Proceedings of the 27th ACM International Conference on Multimedia, 473481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Cohen Gregory K., Orchard Garrick, Leng Sio-Hoi, Tapson Jonathan, Benosman Ryad B., and Schaik André Van. 2016. Skimming digits: Neuromorphic classification of spike-encoded images. Front. Neurosci. 10 (2016), 184.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Davies Mike, Srinivasa Narayan, Lin Tsung-Han, Chinya Gautham, Cao Yongqiang, Choday Sri Harsha, Dimou Georgios, Joshi Prasad, Imam Nabil, Jain Shweta, et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 1 (2018), 8299.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Dendorfer P., Rezatofighi H., Milan A., Shi J., Cremers D., Reid I., Roth S., Schindler K., and Leal-Taixé L.. 2020. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv: 2003.09003. Retrieved from http://arxiv.org/abs/1906.04567.Google ScholarGoogle Scholar
  13. [13] Dietzfelbinger Martin, Hagerup Torben, Katajainen Jyrki, and Penttonen Martti. 1997. A reliable randomized algorithm for the closest-pair problem. J. Algor. 25, 1 (1997), 1951.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Ester Martin, Kriegel Hans-Peter, Sander Jörg, Xu Xiaowei, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96),226231.Google ScholarGoogle Scholar
  15. [15] Gallego Guillermo, Delbruck Tobi, Orchard Garrick Michael, Bartolozzi Chiara, Taba Brian, Censi Andrea, Leutenegger Stefan, Davison Andrew, Conradt Jorg, Daniilidis Kostas, and Scaramuzza Davide. 2020. Event-based vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020), 126.Google ScholarGoogle Scholar
  16. [16] Gao Yizhao, Wang Song, and So Hayden Kwok-Hay. 2022. REMOT: A hardware-software architecture for attention-guided multi-object tracking with dynamic vision sensors on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’22). Association for Computing Machinery, New York, NY, 158168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Gehrig Daniel, Rebecq Henri, Gallego Guillermo, and Scaramuzza Davide. 2020. EKLT: Asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 128, 3 (2020), 601618.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Geiger Andreas, Lenz Philip, and Urtasun Raquel. 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 33543361.Google ScholarGoogle Scholar
  19. [19] Gerstner Wulfram and Kistler Werner M.. 2002. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Huttenlocher D. P., Klanderman G. A., and Rucklidge W. J.. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Jiang Rui, Mou Xiaozheng, Shi Shunshun, Zhou Yueyin, Wang Qinyi, Dong Meng, and Chen Shoushun. 2020. Object tracking on event cameras with offline–online learning. CAAI Trans. Intell. Technol. 5, 3 (2020), 165171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Lagorce Xavier, Orchard Garrick, Galluppi Francesco, Shi Bertram E., and Benosman Ryad B.. 2017. HOTS: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 7 (2017), 13461359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Li Hongmin and Shi Luping. 2019. Robust event-based object tracking combining correlation filter and CNN representation representation. Front. Neurorobot. 13 (2019), 82.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Linares-Barranco A., Gómez-Rodríguez F., Villanueva V., Longinotti L., and Delbrück T.. 2015. A USB3.0 FPGA event-based filtering and tracking framework for dynamic vision sensors. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’15), 24172420.Google ScholarGoogle Scholar
  25. [25] Linares-Barranco Alejandro, Perez-Peña Fernando, Moeys Diederik Paul, Gomez-Rodriguez Francisco, Jimenez-Moreno Gabriel, Liu Shih-Chii, and Delbruck Tobi. 2019. Low latency event-based filtering and feature extraction for dynamic vision sensors in real-time FPGA applications. IEEE Access 7 (2019), 134926134942.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Linares-Barranco Alejandro, Rios-Navarro Antonio, Canas-Moreno Salvador, Piñero-Fuentes Enrique, Tapiador-Morales Ricardo, and Delbruck Tobi. 2021. Dynamic vision sensor integration on FPGA-based CNN accelerators for high-speed visual classification. In Proceedings of the International Conference on Neuromorphic Systems, 17.Google ScholarGoogle Scholar
  27. [27] Liu Qianhui, Ruan Haibo, Xing Dong, Tang Huajin, and Pan Gang. 2020. Effective AER object classification using segmented probability-maximization learning in spiking neural networks. Proc. AAAI Conf. Artif. Intell. 34, 02 (2020), 13081315.Google ScholarGoogle Scholar
  28. [28] Luiten Jonathon, Os̆ep Aljos̆a, Dendorfer Patrick, Torr Philip, Geiger Andreas, Leal-Taixé Laura, and Leibe Bastian. 2021. HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 2 (2021), 548578.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Merolla Paul A., Arthur John V., Alvarez-Icaza Rodrigo, Cassidy Andrew S., Sawada Jun, Akopyan Filipp, Jackson Bryan L., Imam Nabil, Guo Chen, Nakamura Yutaka, et al. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668673.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Mueggler Elias, Rebecq Henri, Gallego Guillermo, Delbruck Tobi, and Scaramuzza Davide. 2017. The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. Int. J. Robot. Res. 36, 2 (2017), 142149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Müllner Daniel. 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378. Retrieved from https://arxiv.org/abs/1109.2378.Google ScholarGoogle Scholar
  32. [32] Ojeda Fernando Cladera, Bisulco Anthony, Kepple Daniel, Isler Volkan, and Lee Daniel D.. 2020. On-device event filtering with binary neural networks for pedestrian detection using neuromorphic vision sensors. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20), 30843088.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Perot Etienne, de Tournemire Pierre, Nitti Davide, Masci Jonathan, and Sironi Amos. 2020. Learning to detect objects with a 1 megapixel event camera. Adv. Neural Inf. Process. Syst. 33 (2020), 1663916652.Google ScholarGoogle Scholar
  34. [34] Pylyshyn Zenon W. and Storm Ron W.. 1988. Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spat. Vis. 3, 3 (1988), 179197.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ramesh Bharath, Ussa Andrés, Vedova Luca Della, Yang Hong, and Orchard Garrick. 2018. PCA-RECT: An energy-efficient object detection approach for event cameras. In Proceedings of the Asian Conference on Computer Vision, 434449.Google ScholarGoogle Scholar
  36. [36] Renner Alpha, Evanusa Matthew, and Sandamirskaya Yulia. 2019. Event-based attention and tracking on neuromorphic hardware. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’19). 17091716.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Serrano-Gotarredona Rafael, Oster Matthias, Lichtsteiner Patrick, Linares-Barranco Alejandro, Paz-Vicente Rafael, Gómez-Rodríguez Francisco, Camuñas-Mesa Luis, Berner Raphael, Rivas-Pérez Manuel, Delbruck Tobi, et al. 2009. CAVIAR: A 45k neuron, 5M synapse, 12G connects/s AER hardware sensory–processing–learning–actuating system for high-speed visual object recognition and tracking. IEEE Trans. Neural Netw. 20, 9 (2009), 14171438.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Sironi Amos, Brambilla Manuele, Bourdis Nicolas, Lagorce Xavier, and Benosman Ryad. 2018. HATS: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle Scholar
  39. [39] Tapiador-Morales Ricardo, Maro Jean-Matthieu, Jimenez-Fernandez Angel, Jimenez-Moreno Gabriel, Benosman Ryad, and Linares-Barranco Alejandro. 2020. Event-based gesture recognition through a hierarchy of time-surfaces for FPGA. Sensors 20, 12 (2020), 3404.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Ussa Andrés, Rajen Chockalingam Senthil, Singla Deepak, Acharya Jyotibdha, Chuanrong Gideon Fu, Basu Arindam, and Ramesh Bharath. 2020. A hybrid neuromorphic object tracking and classification framework for real-time systems. arXiv:2007.11404. Retrieved from https://arxiv.org/abs/2007.11404.Google ScholarGoogle Scholar

Index Terms

  1. A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 16, Issue 4
          December 2023
          343 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3615981
          • Editor:
          • Deming Chen
          Issue’s Table of Contents

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2023
          • Online AM: 21 April 2023
          • Accepted: 4 April 2023
          • Revised: 3 February 2023
          • Received: 14 September 2022
          Published in trets Volume 16, Issue 4

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text