Abstract
Existing multi-object trackers mainly apply the tracking-by-detection (TBD) paradigm and have achieved remarkable success. However, the mainstream methods execute their detection networks alone, without taking full advantage of the information derived from tracking so that the detection and tracking processes can benefit from each other. In this paper, we achieve strengthened tracking performance in complex scenarios by utilizing the rich temporal information derived from the tracking process to enhance the critical features at the current moment. Specifically, we first propose a critical feature capturing network (CFCN) for extracting receptive field adaptive discriminative features for each frame. Then, we design a temporal-aware feature aggregation module (TFAM), which is used to propagate previous critical features, thus leveraging temporal information to alleviate the detection quality degradation encountered when the visual cues decrease. Extensive experimental comparisons and analyses demonstrate the superiority and effectiveness of the proposed method on the popular and challenging MOT16, MOT17, and MOT20 benchmarks. The experimental results reveal that our tracker achieves state-of-the-art tracking performance, e.g., IDF1 of 75.2% on IDF and MOTA of 80.4% on MOT17.









Similar content being viewed by others
Data availability
The data used to support the findings of this study are available from the corresponding author upon request.
References
He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215
Gao T, Pan H, Wang Z, Gao H (2022) A CRF-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimed 24:995–1007
Li X, Xie Z, Deng X, Wu Y, Pi Y (2022) Traffic sign detection based on improved faster R-CNN for autonomous driving. J Supercomput 78:7982–8002
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS)
Joseph R, Ali F (2018) YOLOv3: an incremental improvement. Preprint at http://arxiv.org/abs/1804.02767
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6569–6578
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 2961–2969
Bewley A, Ge ZY, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp 3464–3468
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp 3645–3649
Hua W, Mu D, Zheng Z, Guo D (2020) Online multi-person tracking assist by high-performance detection. J Supercomput 76:4076–4094
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2017) POI: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 2961–2969
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp 1–6
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 107–122
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) MOTS: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7942–7951
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087
Liang C, Zhang Z, Zhou X, Li B, Lu Y, Hu W (2022) One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 1546–1554
Yu E, Li Z, Han S, Wang H (2022) Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Trans Multimed
Liang C, Zhang Z, Lu Y, Zhou X, Li B, Ye X, Zou J (2022) Rethinking the competition between detection and ReID in multi-object tracking
Wu H, Nie J, He Z, Zhu Z, Gao M (2022) One-shot multiple object tracking in UAV videos using task-specific fine-grained features. Remote Sens 14(16):3853
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45
Sun S, Akhtar N, Song H, Mian A, Shah M (2021) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8136–8145
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 366–382
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 145–161
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 850–865
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8971–8980
Anton M, Laura LT, Lu Y, Ian DR, Stefan R, Konrad S (2016) MOT16: a benchmark for multi-object tracking. Preprint at http://arxiv.org/abs/1603.00831
Patrick D, Aljosa O, Anton M, Konrad S, Daniel C, Ian R, Stefan R, Laura LT (2021) MOTChallenge: a benchmark for single-camera multiple target tracking. Int J Comput Vis 129:845–881
Patrick D, Hamid R, Anton M, Javen S, Daniel C, Ian R, Stefan R, Konrad S, Laura LT (2020) MOT20: a benchmark for multi object tracking in crowded scenes. Preprint at https://arxiv.org/abs/2003.09003
Gioele C, Francisco LS, Siham T, Luigi T, Roberto T, Francisco H (2020) Deep learning in video multi-object tracking: a survey. Neurocomputing 381:61–88
Qi Y, Gu J, Li W, Tian Z, Zhang Y, Geng J (2020) Pulmonary nodule image super-resolution using multi-scale deep residual channel attention network with joint optimization. J Supercomput 76:1005–1019
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4836–4845
Wang K, Liu M (2022) YOLOv3-MT: a YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell 52:2070–2091
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Wang X, Ling H, Chen J, Li P (2020) Multi-object tracking via multi-attention. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1–8
Zhou Z, Luo W, Wang Q, Xing J, Hu W (2020) Distractor-aware discrimination learning for online multiple object tracking. Pattern Recogn 107:107512
Gao X, Jiang T (2018) OSMO: online specific models for occlusion in multiple object tracking under surveillance scene. In: Proceedings of the ACM International Conference on Multimedia, pp 201–210
Lit Z, Cai S, Wang X, Shao H, Niu L, Xue N (2021) Multiple object tracking with GRU association and Kalman prediction. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1–8
Khalkhali MB, Vahedian A, Yazdi HS (2021) Situation assessment-augmented interactive Kalman filter for multi-vehicle tracking. IEEE Trans Intell Transp Syst 1–11
Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12352–12361
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4282–4291
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 12549–12556
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9543–9552
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13713–13722
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141
Huang C, Wu B, Nevatia R (2008) Robust object tracking by hierarchical association of detection responses. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 788–801
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) CrowdHuman: a benchmark for detecting human in a crowd. Preprint at http://arxiv.org/abs/1805.00123
Zhang S, Benenson R, Schiele B (2017) CityPersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3213–3221
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 304–311
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3415–3424
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1367–1376
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. Int J Comput Vis 75:247–266
Bernardin K, Stiefelhagen R (2016) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 17–35
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2017) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops (ECCVW), pp 1367–1376
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 466–475
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) TubeTK: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6308–6318
Pang J, Qiu L, Li X, Chen H, Li Q, Darrell T, Yu F (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 164–173
Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3876–3886
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 13708–13715
Wan X, Cao J, Zhou S, Wang J, Zheng N (2021) Tracking beyond detection: learning a global response map for end-to-end multi-object tracking. IEEE Trans Image Process 30:8222–8235
Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 307–317
Zhou X, Koltun V, Philipp K (2020) Tracking objects as points. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 474–490
Wang S, Sheng H, Zhang Y, Wu Y, Xiong Z (2021) A general recurrent tracking framework without real data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 13219–13228
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 10860–10869
Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7(9):7892–7902
Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: an efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6330–6340
Wan X, Zhou S, Wang J, Meng R (2021) Multiple object tracking by trajectory map regression with temporal priors embedding. In: Proceedings of the 29th ACM International Conference on Multimedia (ACMMM), pp 1377–1386
Acknowledgements
This paper was supported by the National Natural Science Foundation of China (Grants No. 61571394, No. 62001149); the Key R &D Program of Zhejiang Province (Grants No. 2020C03098).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, H., Nie, J., Zhu, Z. et al. Leveraging temporal-aware fine-grained features for robust multiple object tracking. J Supercomput 79, 2910–2931 (2023). https://doi.org/10.1007/s11227-022-04776-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04776-x