Abstract
One-shot multiple object tracking (MOT), which learns object detection and identity embedding in a unified network, has attracted increasing attention due to its low complexity and high tracking speed. However, most one-shot trackers ignore that detection and re-identification (ReID) require different representations of features. The inherent difference between these two subtasks leads to optimization contradictions in the training procedure. This issue would result in suboptimal tracking performance. To alleviate this contradiction, we propose a novel dual-path transformation network (DTN) that decouples the shared features into detection-specific and ReID-specific representations. By learning task-specific features, this module satisfies the different requirements of both subtasks. Moreover, we observe that previous trackers generally utilize local information to distinguish targets and ignore global semantic relations, which are crucial for tracking. Therefore, we design a pyramid non-local network (PNN) that allows our network to explore pixel-to-pixel relations with a global receptive field. Meanwhile, PNN considers the scale information to enhance the robustness to scale variations. Extensive experiments conducted on three benchmarks, i.e., MOT16, MOT17, and MOT20, demonstrate the superiority of our tracker, namely DPTrack. The experimental results reveal that DPTrack achieves state-of-the-art performance, e.g., MOTA of 77.1\(\%\) and IDF1 of 74.9\(\%\) on MOT17. Moreover, DPTrack runs at 14.9FPS, and our lightweight version runs at 26.6FPS with only a slight performance decay.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability statements
The data used to support the findings of this study are available from the corresponding author upon request.
References
He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215
Harun S, Ertugrul B, Numan C (2022) Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Comput Appl 34:18171–18182
Gao T, Pan H, Wang Z, Gao H (2022) A CRF-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimed 24:995–1007
Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14668–14678
Cao Y, Liu S, Zhou X, Yang Y (2021) Real-time stage-wise object tracking in traffic scenes: an online tracker selection method via deep reinforcement learning. Neural Comput Appl 33:16831–16846
Tian W, Lauer M, Chen L (2020) Online multi-object tracking using joint domain information in traffic scenarios. IEEE Trans Intell Transp Syst 21:374–384
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8136–8145
Sun Z, Chen J, Chao L, Ruan W, Mukherjee M (2021) A survey of multiple pedestrian tracking based on tracking-by-detection framework. IEEE Trans Circuits Syst Video Technol 31:1819–1833
Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multimed Comput Commun Appl 18:1–18
Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2021) Siamcorners: siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6569–6578
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3645–3649
He J, Huang Z, Wang N, Zhang Z (2021) Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5299–5309
Dai P, Weng R, Choi W, Zhang C, He Z, Ding W (2021) Learning a proposal classifier for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2452
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3464–3468
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82:35–45
Zhou H, Ouyang W, Cheng J, Wang X, Li H (2019) Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking. IEEE Trans Circuits Syst Video Technol 29:1011–1022
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 145–161
Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: An online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12352–12361
Zhou X, Koltun V, Philipp K (2020) Tracking objects as points. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 474–490
Zagoruyko S, Komodakis N (2016) Wide residual networks. Preprint at arXiv:1605.07146
He Z, Li X, You X, Tao D, Tang Y (2016) Connected component model for multi-object tracking. IEEE Trans Image Process 25:3698–3711
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2017) Poi: Multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969
Liu H, Yang X, Latecki JL, Yan S (2012) Dense neighborhoods on affinity graph. Int J Comput Vis 98:65–82
Luhn WH (2005) The Hungarian method for the assignment problem. Nav Res Logist Q 52:7–21
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6
Sun S, Akhtar N, Song H, Mian A, Shah M (2021) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43:104–119
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7942–7951
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 107–122
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087
Liu S, Li X, Lu H, He Y (2022) Multi-object tracking meets moving UAV. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8876–8885
Wan X, Zhou S, Wang J, Meng R (2021) Multiple object tracking by trajectory map regression with temporal priors embedding. In: Proceedings of the 29th ACM International Conference on Multimedia (ACMMM), pp. 1377–1386
Liu X, Luo Y, Yan K, Chen J, Lei Z (2021) Part-MOT: a multi-object tracking method with instance part-based embedding. IET Image Process 15:2521–2531
Li Z, Wang H, Swistek T, Chen W, Li Y, Wang H (2021) Enabling the Network to Surf the Internet. Preprint at arXiv:2102.12205
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. Preprint at arXiv:1505.00853
Ultralytics: YOLOv5. Available at https://github.com/ultralytics/yolov5 (2021)
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715
Anton M, Laura LT, Lu Y, Ian DR, Stefan R, Konrad S (2016) MOT16: A benchmark for multi-object tracking. Preprint at arXiv:1603.00831
Patrick D, Aljosa O, Anton M, Konrad S, Daniel C, Ian R, Stefan R, Laura LT (2021) Motchallenge: a benchmark for single-camera multiple target tracking. Int J Comput Vis 129:845–881
Patrick D, Hamid R, Anton M, Javen S, Daniel C, Ian R, Stefan R, Konrad S, Laura LT (2020) MOT20: a benchmark for multi object tracking in crowded scenes. Preprint at arXiv:2003.09003
Bernardin K, Stiefelhagen R (2016) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008:17–35
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2017) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops (ECCVW), pp. 1367–1376
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vis 75:247–266
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) CrowdHuman: A benchmark for detecting human in a crowd. Preprint at arXiv:1805.00123
Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and ReID in multiobject tracking. IEEE Trans Image Process 31:3182–3196
Dollar P, Wojek C, Schiele B, P P (2009) Pedestrian detection: A benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 304–311
Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3221
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3415–3424
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1367–1376
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 466–475
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6308–6318
Braso G, Leal-Taixe L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 466–475
Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 307–317
Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: an efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6330–6340
Acknowledgements
This paper was supported by the National Natural Science Foundation of China [grants No.61571394, No.62001149]; the Key Research and Development Program of Zhejiang Province [grants No.2020C03098].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, H., Nie, J., Zhu, Z. et al. Learning task-specific discriminative representations for multiple object tracking. Neural Comput & Applic 35, 7761–7777 (2023). https://doi.org/10.1007/s00521-022-08079-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08079-3