Elsevier

Pattern Recognition

Volume 121, January 2022, 108205
Pattern Recognition

Tracking more than 100 arbitrary objects at 25 FPS through deep learning

https://doi.org/10.1016/j.patcog.2021.108205Get rights and content
Under a Creative Commons license
open access

Highlights

  • A real-time multiple visual object tracker (MVOT) for motion estimation is proposed.

  • Design of the first RoI operator able to work with backbones without padding.

  • Definition of a novel pairwise cross-correlation operator for identity matching.

  • Quality of our method is superior to is predecessor but with a 10-fold speedup.

Abstract

Most video analytics applications rely on object detectors to localize objects in frames. However, when real-time is a requirement, running the detector at all the frames is usually not possible. This is somewhat circumvented by instantiating visual object trackers between detector calls, but this does not scale with the number of objects. To tackle this problem, we present SiamMT, a new deep learning multiple visual object tracking solution that applies single-object tracking principles to multiple arbitrary objects in real-time. To achieve this, SiamMT reuses feature computations, implements a novel crop-and-resize operator, and defines a new and efficient pairwise similarity operator. SiamMT naturally scales up to several dozens of targets, reaching 25 fps with 122 simultaneous objects for VGA videos, or up to 100 simultaneous objects in HD720 video. SiamMT has been validated on five large real-time benchmarks, achieving leading performance against current state-of-the-art trackers.

Keywords

Multiple visual object tracking
Motion estimation
Deep learning
Siamese networks

Cited by (0)

Lorenzo Vaquero is a Ph.D. student at the CiTIUS of the Universidade de Santiago de Compostela, Spain. He received the B.S. degree in Computer Science in 2018 and the M.S. degree in Big Data in 2019. His research interests are visual object tracking and deep learning for autonomous vehicles.

Víctor M. Brea is an Associate Professor at CiTIUS, Universidade de Santiago de Compostela, Spain. His main research interest lies in Computer Vision, both on deep learning algorithms, and on the design of efficient architectures and CMOS solutions. He has authored more than 100 scientific papers in these fields of research.

Manuel Mucientes is an Associate Professor at the CiTIUS of the University of Santiago de Compostela, Spain. His main research interest is artificial intelligence applied to the following areas: computer vision for object detection and tracking; machine learning; process mining. He has authored more than 100 scientific papers in these fields of research.