Skip to main content

Advertisement

Log in

Leveraging temporal-aware fine-grained features for robust multiple object tracking

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Existing multi-object trackers mainly apply the tracking-by-detection (TBD) paradigm and have achieved remarkable success. However, the mainstream methods execute their detection networks alone, without taking full advantage of the information derived from tracking so that the detection and tracking processes can benefit from each other. In this paper, we achieve strengthened tracking performance in complex scenarios by utilizing the rich temporal information derived from the tracking process to enhance the critical features at the current moment. Specifically, we first propose a critical feature capturing network (CFCN) for extracting receptive field adaptive discriminative features for each frame. Then, we design a temporal-aware feature aggregation module (TFAM), which is used to propagate previous critical features, thus leveraging temporal information to alleviate the detection quality degradation encountered when the visual cues decrease. Extensive experimental comparisons and analyses demonstrate the superiority and effectiveness of the proposed method on the popular and challenging MOT16, MOT17, and MOT20 benchmarks. The experimental results reveal that our tracker achieves state-of-the-art tracking performance, e.g., IDF1 of 75.2% on IDF and MOTA of 80.4% on MOT17.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215

    Article  Google Scholar 

  2. Gao T, Pan H, Wang Z, Gao H (2022) A CRF-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimed 24:995–1007

    Article  Google Scholar 

  3. Li X, Xie Z, Deng X, Wu Y, Pi Y (2022) Traffic sign detection based on improved faster R-CNN for autonomous driving. J Supercomput 78:7982–8002

    Article  Google Scholar 

  4. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS)

  5. Joseph R, Ali F (2018) YOLOv3: an incremental improvement. Preprint at http://arxiv.org/abs/1804.02767

  6. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6569–6578

  7. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 2961–2969

  8. Bewley A, Ge ZY, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp 3464–3468

  9. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp 3645–3649

  10. Hua W, Mu D, Zheng Z, Guo D (2020) Online multi-person tracking assist by high-performance detection. J Supercomput 76:4076–4094

    Article  Google Scholar 

  11. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2017) POI: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 2961–2969

  12. Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp 1–6

  13. Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 107–122

  14. Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) MOTS: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7942–7951

  15. Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087

    Article  Google Scholar 

  16. Liang C, Zhang Z, Zhou X, Li B, Lu Y, Hu W (2022) One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 1546–1554

  17. Yu E, Li Z, Han S, Wang H (2022) Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Trans Multimed

  18. Liang C, Zhang Z, Lu Y, Zhou X, Li B, Ye X, Zou J (2022) Rethinking the competition between detection and ReID in multi-object tracking

  19. Wu H, Nie J, He Z, Zhu Z, Gao M (2022) One-shot multiple object tracking in UAV videos using task-specific fine-grained features. Remote Sens 14(16):3853

    Article  Google Scholar 

  20. Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45

    Article  MathSciNet  Google Scholar 

  21. Sun S, Akhtar N, Song H, Mian A, Shah M (2021) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119

    Google Scholar 

  22. Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8136–8145

  23. Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 366–382

  24. Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 145–161

  25. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 850–865

  26. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8971–8980

  27. Anton M, Laura LT, Lu Y, Ian DR, Stefan R, Konrad S (2016) MOT16: a benchmark for multi-object tracking. Preprint at http://arxiv.org/abs/1603.00831

  28. Patrick D, Aljosa O, Anton M, Konrad S, Daniel C, Ian R, Stefan R, Laura LT (2021) MOTChallenge: a benchmark for single-camera multiple target tracking. Int J Comput Vis 129:845–881

    Article  Google Scholar 

  29. Patrick D, Hamid R, Anton M, Javen S, Daniel C, Ian R, Stefan R, Konrad S, Laura LT (2020) MOT20: a benchmark for multi object tracking in crowded scenes. Preprint at https://arxiv.org/abs/2003.09003

  30. Gioele C, Francisco LS, Siham T, Luigi T, Roberto T, Francisco H (2020) Deep learning in video multi-object tracking: a survey. Neurocomputing 381:61–88

    Article  Google Scholar 

  31. Qi Y, Gu J, Li W, Tian Z, Zhang Y, Geng J (2020) Pulmonary nodule image super-resolution using multi-scale deep residual channel attention network with joint optimization. J Supercomput 76:1005–1019

    Article  Google Scholar 

  32. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4836–4845

  33. Wang K, Liu M (2022) YOLOv3-MT: a YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell 52:2070–2091

    Article  Google Scholar 

  34. Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  35. Wang X, Ling H, Chen J, Li P (2020) Multi-object tracking via multi-attention. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1–8

  36. Zhou Z, Luo W, Wang Q, Xing J, Hu W (2020) Distractor-aware discrimination learning for online multiple object tracking. Pattern Recogn 107:107512

    Article  Google Scholar 

  37. Gao X, Jiang T (2018) OSMO: online specific models for occlusion in multiple object tracking under surveillance scene. In: Proceedings of the ACM International Conference on Multimedia, pp 201–210

  38. Lit Z, Cai S, Wang X, Shao H, Niu L, Xue N (2021) Multiple object tracking with GRU association and Kalman prediction. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1–8

  39. Khalkhali MB, Vahedian A, Yazdi HS (2021) Situation assessment-augmented interactive Kalman filter for multi-vehicle tracking. IEEE Trans Intell Transp Syst 1–11

  40. Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12352–12361

  41. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4282–4291

  42. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp 12549–12556

  43. Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9543–9552

  44. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13713–13722

  45. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141

  46. Huang C, Wu B, Nevatia R (2008) Robust object tracking by hierarchical association of detection responses. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 788–801

  47. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) CrowdHuman: a benchmark for detecting human in a crowd. Preprint at http://arxiv.org/abs/1805.00123

  48. Zhang S, Benenson R, Schiele B (2017) CityPersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3213–3221

  49. Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8

  50. Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 304–311

  51. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3415–3424

  52. Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1367–1376

  53. Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. Int J Comput Vis 75:247–266

    Article  Google Scholar 

  54. Bernardin K, Stiefelhagen R (2016) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 17–35

  55. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2017) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops (ECCVW), pp 1367–1376

  56. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 466–475

  57. Pang B, Li Y, Zhang Y, Li M, Lu C (2020) TubeTK: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6308–6318

  58. Pang J, Qiu L, Li X, Chen H, Li Q, Darrell T, Yu F (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 164–173

  59. Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3876–3886

  60. Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 13708–13715

  61. Wan X, Cao J, Zhou S, Wang J, Zheng N (2021) Tracking beyond detection: learning a global response map for end-to-end multi-object tracking. IEEE Trans Image Process 30:8222–8235

    Article  Google Scholar 

  62. Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 307–317

  63. Zhou X, Koltun V, Philipp K (2020) Tracking objects as points. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 474–490

  64. Wang S, Sheng H, Zhang Y, Wu Y, Xiong Z (2021) A general recurrent tracking framework without real data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 13219–13228

  65. Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 10860–10869

  66. Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7(9):7892–7902

    Article  Google Scholar 

  67. Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: an efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 6330–6340

  68. Wan X, Zhou S, Wang J, Meng R (2021) Multiple object tracking by trajectory map regression with temporal priors embedding. In: Proceedings of the 29th ACM International Conference on Multimedia (ACMMM), pp 1377–1386

Download references

Acknowledgements

This paper was supported by the National Natural Science Foundation of China (Grants No. 61571394, No. 62001149); the Key R &D Program of Zhejiang Province (Grants No. 2020C03098).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwei He.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Nie, J., Zhu, Z. et al. Leveraging temporal-aware fine-grained features for robust multiple object tracking. J Supercomput 79, 2910–2931 (2023). https://doi.org/10.1007/s11227-022-04776-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04776-x

Keywords