Leveraging temporal-aware fine-grained features for robust multiple object tracking

Wu, Han; Nie, Jiahao; Zhu, Ziming; He, Zhiwei; Gao, Mingyu

doi:10.1007/s11227-022-04776-x

Leveraging temporal-aware fine-grained features for robust multiple object tracking

Published: 26 August 2022

Volume 79, pages 2910–2931, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Han Wu¹,
Jiahao Nie¹,
Ziming Zhu¹,
Zhiwei He ORCID: orcid.org/0000-0001-7264-2019^1,2 &
…
Mingyu Gao^1,2

412 Accesses
Explore all metrics

Abstract

Existing multi-object trackers mainly apply the tracking-by-detection (TBD) paradigm and have achieved remarkable success. However, the mainstream methods execute their detection networks alone, without taking full advantage of the information derived from tracking so that the detection and tracking processes can benefit from each other. In this paper, we achieve strengthened tracking performance in complex scenarios by utilizing the rich temporal information derived from the tracking process to enhance the critical features at the current moment. Specifically, we first propose a critical feature capturing network (CFCN) for extracting receptive field adaptive discriminative features for each frame. Then, we design a temporal-aware feature aggregation module (TFAM), which is used to propagate previous critical features, thus leveraging temporal information to alleviate the detection quality degradation encountered when the visual cues decrease. Extensive experimental comparisons and analyses demonstrate the superiority and effectiveness of the proposed method on the popular and challenging MOT16, MOT17, and MOT20 benchmarks. The experimental results reveal that our tracker achieves state-of-the-art tracking performance, e.g., IDF1 of 75.2% on IDF and MOTA of 80.4% on MOT17.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EnhanceCenter for improving point based tracking and rich feature representation

Article Open access 03 March 2025

A multiple feature fused model for visual object tracking via correlation filters

Article 17 June 2019

TSTrack: A Robust Object Tracking Framework Integrated Temporal and Spatial Features

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215
Article Google Scholar
Gao T, Pan H, Wang Z, Gao H (2022) A CRF-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimed 24:995–1007
Article Google Scholar
Li X, Xie Z, Deng X, Wu Y, Pi Y (2022) Traffic sign detection based on improved faster R-CNN for autonomous driving. J Supercomput 78:7982–8002
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS)
Joseph R, Ali F (2018) YOLOv3: an incremental improvement. Preprint at http://arxiv.org/abs/1804.02767
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6569–6578
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 2961–2969
Bewley A, Ge ZY, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp 3464–3468
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp 3645–3649
Hua W, Mu D, Zheng Z, Guo D (2020) Online multi-person tracking assist by high-performance detection. J Supercomput 76:4076–4094
Article Google Scholar
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2017) POI: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 2961–2969
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp 1–6
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 107–122
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) MOTS: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7942–7951
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087
Article Google Scholar
Liang C, Zhang Z, Zhou X, Li B, Lu Y, Hu W (2022) One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 1546–1554
Yu E, Li Z, Han S, Wang H (2022) Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Trans Multimed
Liang C, Zhang Z, Lu Y, Zhou X, Li B, Ye X, Zou J (2022) Rethinking the competition between detection and ReID in multi-object tracking
Wu H, Nie J, He Z, Zhu Z, Gao M (2022) One-shot multiple object tracking in UAV videos using task-specific fine-grained features. Remote Sens 14(16):3853
Article Google Scholar
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45
Article MathSciNet Google Scholar
Sun S, Akhtar N, Song H, Mian A, Shah M (2021) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Google Scholar
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8136–8145
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 366–382
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 145–161
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 850–865
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8971–8980
Anton M, Laura LT, Lu Y, Ian DR, Stefan R, Konrad S (2016) MOT16: a benchmark for multi-object tracking. Preprint at http://arxiv.org/abs/1603.00831
Patrick D, Aljosa O, Anton M, Konrad S, Daniel C, Ian R, Stefan R, Laura LT (2021) MOTChallenge: a benchmark for single-camera multiple target tracking. Int J Comput Vis 129:845–881
Article Google Scholar
Patrick D, Hamid R, Anton M, Javen S, Daniel C, Ian R, Stefan R, Konrad S, Laura LT (2020) MOT20: a benchmark for multi object tracking in crowded scenes. Preprint at https://arxiv.org/abs/2003.09003
Gioele C, Francisco LS, Siham T, Luigi T, Roberto T, Francisco H (2020) Deep learning in video multi-object tracking: a survey. Neurocomputing 381:61–88
Article Google Scholar
Qi Y, Gu J, Li W, Tian Z, Zhang Y, Geng J (2020) Pulmonary nodule image super-resolution using multi-scale deep residual channel attention network with joint optimization. J Supercomput 76:1005–1019
Article Google Scholar
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4836–4845
Wang K, Liu M (2022) YOLOv3-MT: a YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell 52:2070–2091
Article Google Scholar
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Wang X, Ling H, Chen J, Li P (2020) Multi-object tracking via multi-attention. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1–8
Zhou Z, Luo W, Wang Q, Xing J, Hu W (2020) Distractor-aware discrimination learning for online multiple object tracking. Pattern Recogn 107:107512
Article Google Scholar
Gao X, Jiang T (2018) OSMO: online specific models for occlusion in multiple object tracking under surveillance scene. In: Proceedings of the ACM International Conference on Multimedia, pp 201–210
Lit Z, Cai S, Wang X, Shao H, Niu L, Xue N (2021) Multiple object tracking with GRU association and Kalman prediction. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1–8
Khalkhali MB, Vahedian A, Yazdi HS (2021) Situation assessment-augmented interactive Kalman filter for multi-vehicle tracking. IEEE Trans Intell Transp Syst 1–11
Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12352–12361
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4282–4291
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 12549–12556
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9543–9552
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13713–13722
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141
Huang C, Wu B, Nevatia R (2008) Robust object tracking by hierarchical association of detection responses. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 788–801
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) CrowdHuman: a benchmark for detecting human in a crowd. Preprint at http://arxiv.org/abs/1805.00123
Zhang S, Benenson R, Schiele B (2017) CityPersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3213–3221
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 304–311
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3415–3424
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1367–1376
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. Int J Comput Vis 75:247–266
Article Google Scholar
Bernardin K, Stiefelhagen R (2016) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 17–35
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2017) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops (ECCVW), pp 1367–1376
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 466–475
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) TubeTK: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6308–6318
Pang J, Qiu L, Li X, Chen H, Li Q, Darrell T, Yu F (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 164–173
Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3876–3886
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 13708–13715
Wan X, Cao J, Zhou S, Wang J, Zheng N (2021) Tracking beyond detection: learning a global response map for end-to-end multi-object tracking. IEEE Trans Image Process 30:8222–8235
Article Google Scholar
Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 307–317
Zhou X, Koltun V, Philipp K (2020) Tracking objects as points. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 474–490
Wang S, Sheng H, Zhang Y, Wu Y, Xiong Z (2021) A general recurrent tracking framework without real data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 13219–13228
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 10860–10869
Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7(9):7892–7902
Article Google Scholar
Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: an efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6330–6340
Wan X, Zhou S, Wang J, Meng R (2021) Multiple object tracking by trajectory map regression with temporal priors embedding. In: Proceedings of the 29th ACM International Conference on Multimedia (ACMMM), pp 1377–1386

Download references

Acknowledgements

This paper was supported by the National Natural Science Foundation of China (Grants No. 61571394, No. 62001149); the Key R &D Program of Zhejiang Province (Grants No. 2020C03098).

Author information

Authors and Affiliations

The School of Electronic Information, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
Han Wu, Jiahao Nie, Ziming Zhu, Zhiwei He & Mingyu Gao
Zhejiang Province Key Lab of Equipment Electronics, Hangzhou, 2019E10009, Zhejiang, China
Zhiwei He & Mingyu Gao

Authors

Han Wu
View author publications
You can also search for this author inPubMed Google Scholar
Jiahao Nie
View author publications
You can also search for this author inPubMed Google Scholar
Ziming Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Zhiwei He
View author publications
You can also search for this author inPubMed Google Scholar
Mingyu Gao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhiwei He.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, H., Nie, J., Zhu, Z. et al. Leveraging temporal-aware fine-grained features for robust multiple object tracking. J Supercomput 79, 2910–2931 (2023). https://doi.org/10.1007/s11227-022-04776-x

Download citation

Accepted: 12 August 2022
Published: 26 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11227-022-04776-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging temporal-aware fine-grained features for robust multiple object tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

EnhanceCenter for improving point based tracking and rich feature representation

A multiple feature fused model for visual object tracking via correlation filters

TSTrack: A Robust Object Tracking Framework Integrated Temporal and Spatial Features

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now