Skip to main content
Log in

Learning task-specific discriminative representations for multiple object tracking

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

One-shot multiple object tracking (MOT), which learns object detection and identity embedding in a unified network, has attracted increasing attention due to its low complexity and high tracking speed. However, most one-shot trackers ignore that detection and re-identification (ReID) require different representations of features. The inherent difference between these two subtasks leads to optimization contradictions in the training procedure. This issue would result in suboptimal tracking performance. To alleviate this contradiction, we propose a novel dual-path transformation network (DTN) that decouples the shared features into detection-specific and ReID-specific representations. By learning task-specific features, this module satisfies the different requirements of both subtasks. Moreover, we observe that previous trackers generally utilize local information to distinguish targets and ignore global semantic relations, which are crucial for tracking. Therefore, we design a pyramid non-local network (PNN) that allows our network to explore pixel-to-pixel relations with a global receptive field. Meanwhile, PNN considers the scale information to enhance the robustness to scale variations. Extensive experiments conducted on three benchmarks, i.e., MOT16, MOT17, and MOT20, demonstrate the superiority of our tracker, namely DPTrack. The experimental results reveal that DPTrack achieves state-of-the-art performance, e.g., MOTA of 77.1\(\%\) and IDF1 of 74.9\(\%\) on MOT17. Moreover, DPTrack runs at 14.9FPS, and our lightweight version runs at 26.6FPS with only a slight performance decay.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability statements

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215

    Article  Google Scholar 

  2. Harun S, Ertugrul B, Numan C (2022) Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Comput Appl 34:18171–18182

    Article  Google Scholar 

  3. Gao T, Pan H, Wang Z, Gao H (2022) A CRF-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimed 24:995–1007

    Article  Google Scholar 

  4. Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14668–14678

  5. Cao Y, Liu S, Zhou X, Yang Y (2021) Real-time stage-wise object tracking in traffic scenes: an online tracker selection method via deep reinforcement learning. Neural Comput Appl 33:16831–16846

    Article  Google Scholar 

  6. Tian W, Lauer M, Chen L (2020) Online multi-object tracking using joint domain information in traffic scenarios. IEEE Trans Intell Transp Syst 21:374–384

    Article  Google Scholar 

  7. Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8136–8145

  8. Sun Z, Chen J, Chao L, Ruan W, Mukherjee M (2021) A survey of multiple pedestrian tracking based on tracking-by-detection framework. IEEE Trans Circuits Syst Video Technol 31:1819–1833

    Article  Google Scholar 

  9. Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multimed Comput Commun Appl 18:1–18

    Article  Google Scholar 

  10. Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2021) Siamcorners: siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967

    Article  Google Scholar 

  11. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6569–6578

  12. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969

  13. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271

  14. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3645–3649

  15. He J, Huang Z, Wang N, Zhang Z (2021) Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5299–5309

  16. Dai P, Weng R, Choi W, Zhang C, He Z, Ding W (2021) Learning a proposal classifier for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2452

  17. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3464–3468

  18. Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82:35–45

    Article  MathSciNet  Google Scholar 

  19. Zhou H, Ouyang W, Cheng J, Wang X, Li H (2019) Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking. IEEE Trans Circuits Syst Video Technol 29:1011–1022

    Article  Google Scholar 

  20. Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 145–161

  21. Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: An online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12352–12361

  22. Zhou X, Koltun V, Philipp K (2020) Tracking objects as points. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 474–490

  23. Zagoruyko S, Komodakis N (2016) Wide residual networks. Preprint at arXiv:1605.07146

  24. He Z, Li X, You X, Tao D, Tang Y (2016) Connected component model for multi-object tracking. IEEE Trans Image Process 25:3698–3711

    Article  MathSciNet  MATH  Google Scholar 

  25. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2017) Poi: Multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969

  26. Liu H, Yang X, Latecki JL, Yan S (2012) Dense neighborhoods on affinity graph. Int J Comput Vis 98:65–82

    Article  MathSciNet  MATH  Google Scholar 

  27. Luhn WH (2005) The Hungarian method for the assignment problem. Nav Res Logist Q 52:7–21

    Article  Google Scholar 

  28. Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6

  29. Sun S, Akhtar N, Song H, Mian A, Shah M (2021) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43:104–119

    Google Scholar 

  30. Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7942–7951

  31. Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 107–122

  32. Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087

    Article  Google Scholar 

  33. Liu S, Li X, Lu H, He Y (2022) Multi-object tracking meets moving UAV. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8876–8885

  34. Wan X, Zhou S, Wang J, Meng R (2021) Multiple object tracking by trajectory map regression with temporal priors embedding. In: Proceedings of the 29th ACM International Conference on Multimedia (ACMMM), pp. 1377–1386

  35. Liu X, Luo Y, Yan K, Chen J, Lei Z (2021) Part-MOT: a multi-object tracking method with instance part-based embedding. IET Image Process 15:2521–2531

    Article  Google Scholar 

  36. Li Z, Wang H, Swistek T, Chen W, Li Y, Wang H (2021) Enabling the Network to Surf the Internet. Preprint at arXiv:2102.12205

  37. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. Preprint at arXiv:1505.00853

  38. Ultralytics: YOLOv5. Available at https://github.com/ultralytics/yolov5 (2021)

  39. Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715

  40. Anton M, Laura LT, Lu Y, Ian DR, Stefan R, Konrad S (2016) MOT16: A benchmark for multi-object tracking. Preprint at arXiv:1603.00831

  41. Patrick D, Aljosa O, Anton M, Konrad S, Daniel C, Ian R, Stefan R, Laura LT (2021) Motchallenge: a benchmark for single-camera multiple target tracking. Int J Comput Vis 129:845–881

    Article  Google Scholar 

  42. Patrick D, Hamid R, Anton M, Javen S, Daniel C, Ian R, Stefan R, Konrad S, Laura LT (2020) MOT20: a benchmark for multi object tracking in crowded scenes. Preprint at arXiv:2003.09003

  43. Bernardin K, Stiefelhagen R (2016) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008:17–35

    Google Scholar 

  44. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2017) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops (ECCVW), pp. 1367–1376

  45. Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vis 75:247–266

    Article  Google Scholar 

  46. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) CrowdHuman: A benchmark for detecting human in a crowd. Preprint at arXiv:1805.00123

  47. Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and ReID in multiobject tracking. IEEE Trans Image Process 31:3182–3196

    Article  Google Scholar 

  48. Dollar P, Wojek C, Schiele B, P P (2009) Pedestrian detection: A benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 304–311

  49. Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3221

  50. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3415–3424

  51. Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1367–1376

  52. Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8

  53. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 466–475

  54. Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6308–6318

  55. Braso G, Leal-Taixe L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 466–475

  56. Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 307–317

  57. Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: an efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6330–6340

Download references

Acknowledgements

This paper was supported by the National Natural Science Foundation of China [grants No.61571394, No.62001149]; the Key Research and Development Program of Zhejiang Province [grants No.2020C03098].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwei He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Nie, J., Zhu, Z. et al. Learning task-specific discriminative representations for multiple object tracking. Neural Comput & Applic 35, 7761–7777 (2023). https://doi.org/10.1007/s00521-022-08079-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08079-3

Keywords