Learning task-specific discriminative representations for multiple object tracking

Wu, Han; Nie, Jiahao; Zhu, Ziming; He, Zhiwei; Gao, Mingyu

doi:10.1007/s00521-022-08079-3

Learning task-specific discriminative representations for multiple object tracking

Original Article
Published: 07 December 2022

Volume 35, pages 7761–7777, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Han Wu¹,
Jiahao Nie¹,
Ziming Zhu¹,
Zhiwei He ORCID: orcid.org/0000-0001-7264-2019^1,2 &
…
Mingyu Gao^1,2

390 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

One-shot multiple object tracking (MOT), which learns object detection and identity embedding in a unified network, has attracted increasing attention due to its low complexity and high tracking speed. However, most one-shot trackers ignore that detection and re-identification (ReID) require different representations of features. The inherent difference between these two subtasks leads to optimization contradictions in the training procedure. This issue would result in suboptimal tracking performance. To alleviate this contradiction, we propose a novel dual-path transformation network (DTN) that decouples the shared features into detection-specific and ReID-specific representations. By learning task-specific features, this module satisfies the different requirements of both subtasks. Moreover, we observe that previous trackers generally utilize local information to distinguish targets and ignore global semantic relations, which are crucial for tracking. Therefore, we design a pyramid non-local network (PNN) that allows our network to explore pixel-to-pixel relations with a global receptive field. Meanwhile, PNN considers the scale information to enhance the robustness to scale variations. Extensive experiments conducted on three benchmarks, i.e., MOT16, MOT17, and MOT20, demonstrate the superiority of our tracker, namely DPTrack. The experimental results reveal that DPTrack achieves state-of-the-art performance, e.g., MOTA of 77.1$\%$ and IDF1 of 74.9$\%$ on MOT17. Moreover, DPTrack runs at 14.9FPS, and our lightweight version runs at 26.6FPS with only a slight performance decay.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSL-MOT: self-supervised learning based multi-object tracking

Article 22 April 2022

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

Multi-object Tracking by Joint Detection and Identification Learning

Article 06 May 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability statements

The data used to support the findings of this study are available from the corresponding author upon request.

References

He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215
Article Google Scholar
Harun S, Ertugrul B, Numan C (2022) Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Comput Appl 34:18171–18182
Article Google Scholar
Gao T, Pan H, Wang Z, Gao H (2022) A CRF-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimed 24:995–1007
Article Google Scholar
Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14668–14678
Cao Y, Liu S, Zhou X, Yang Y (2021) Real-time stage-wise object tracking in traffic scenes: an online tracker selection method via deep reinforcement learning. Neural Comput Appl 33:16831–16846
Article Google Scholar
Tian W, Lauer M, Chen L (2020) Online multi-object tracking using joint domain information in traffic scenarios. IEEE Trans Intell Transp Syst 21:374–384
Article Google Scholar
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8136–8145
Sun Z, Chen J, Chao L, Ruan W, Mukherjee M (2021) A survey of multiple pedestrian tracking based on tracking-by-detection framework. IEEE Trans Circuits Syst Video Technol 31:1819–1833
Article Google Scholar
Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multimed Comput Commun Appl 18:1–18
Article Google Scholar
Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2021) Siamcorners: siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967
Article Google Scholar
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6569–6578
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3645–3649
He J, Huang Z, Wang N, Zhang Z (2021) Learnable graph matching: incorporating graph partitioning with deep feature learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5299–5309
Dai P, Weng R, Choi W, Zhang C, He Z, Ding W (2021) Learning a proposal classifier for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2452
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3464–3468
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82:35–45
Article MathSciNet Google Scholar
Zhou H, Ouyang W, Cheng J, Wang X, Li H (2019) Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking. IEEE Trans Circuits Syst Video Technol 29:1011–1022
Article Google Scholar
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 145–161
Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: An online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12352–12361
Zhou X, Koltun V, Philipp K (2020) Tracking objects as points. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 474–490
Zagoruyko S, Komodakis N (2016) Wide residual networks. Preprint at arXiv:1605.07146
He Z, Li X, You X, Tao D, Tang Y (2016) Connected component model for multi-object tracking. IEEE Trans Image Process 25:3698–3711
Article MathSciNet MATH Google Scholar
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2017) Poi: Multiple object tracking with high performance detection and appearance feature. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969
Liu H, Yang X, Latecki JL, Yan S (2012) Dense neighborhoods on affinity graph. Int J Comput Vis 98:65–82
Article MathSciNet MATH Google Scholar
Luhn WH (2005) The Hungarian method for the assignment problem. Nav Res Logist Q 52:7–21
Article Google Scholar
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6
Sun S, Akhtar N, Song H, Mian A, Shah M (2021) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43:104–119
Google Scholar
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7942–7951
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 107–122
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087
Article Google Scholar
Liu S, Li X, Lu H, He Y (2022) Multi-object tracking meets moving UAV. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8876–8885
Wan X, Zhou S, Wang J, Meng R (2021) Multiple object tracking by trajectory map regression with temporal priors embedding. In: Proceedings of the 29th ACM International Conference on Multimedia (ACMMM), pp. 1377–1386
Liu X, Luo Y, Yan K, Chen J, Lei Z (2021) Part-MOT: a multi-object tracking method with instance part-based embedding. IET Image Process 15:2521–2531
Article Google Scholar
Li Z, Wang H, Swistek T, Chen W, Li Y, Wang H (2021) Enabling the Network to Surf the Internet. Preprint at arXiv:2102.12205
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. Preprint at arXiv:1505.00853
Ultralytics: YOLOv5. Available at https://github.com/ultralytics/yolov5 (2021)
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715
Anton M, Laura LT, Lu Y, Ian DR, Stefan R, Konrad S (2016) MOT16: A benchmark for multi-object tracking. Preprint at arXiv:1603.00831
Patrick D, Aljosa O, Anton M, Konrad S, Daniel C, Ian R, Stefan R, Laura LT (2021) Motchallenge: a benchmark for single-camera multiple target tracking. Int J Comput Vis 129:845–881
Article Google Scholar
Patrick D, Hamid R, Anton M, Javen S, Daniel C, Ian R, Stefan R, Konrad S, Laura LT (2020) MOT20: a benchmark for multi object tracking in crowded scenes. Preprint at arXiv:2003.09003
Bernardin K, Stiefelhagen R (2016) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008:17–35
Google Scholar
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2017) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops (ECCVW), pp. 1367–1376
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vis 75:247–266
Article Google Scholar
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) CrowdHuman: A benchmark for detecting human in a crowd. Preprint at arXiv:1805.00123
Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and ReID in multiobject tracking. IEEE Trans Image Process 31:3182–3196
Article Google Scholar
Dollar P, Wojek C, Schiele B, P P (2009) Pedestrian detection: A benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 304–311
Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3221
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3415–3424
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1367–1376
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 466–475
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6308–6318
Braso G, Leal-Taixe L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 466–475
Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 307–317
Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: an efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6330–6340

Download references

Acknowledgements

This paper was supported by the National Natural Science Foundation of China [grants No.61571394, No.62001149]; the Key Research and Development Program of Zhejiang Province [grants No.2020C03098].

Author information

Authors and Affiliations

The School of Electronic Information, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
Han Wu, Jiahao Nie, Ziming Zhu, Zhiwei He & Mingyu Gao
Zhejiang Province Key Lab of Equipment Electronics, Hangzhou, 2019E10009, Zhejiang, China
Zhiwei He & Mingyu Gao

Authors

Han Wu
View author publications
You can also search for this author inPubMed Google Scholar
Jiahao Nie
View author publications
You can also search for this author inPubMed Google Scholar
Ziming Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Zhiwei He
View author publications
You can also search for this author inPubMed Google Scholar
Mingyu Gao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhiwei He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, H., Nie, J., Zhu, Z. et al. Learning task-specific discriminative representations for multiple object tracking. Neural Comput & Applic 35, 7761–7777 (2023). https://doi.org/10.1007/s00521-022-08079-3

Download citation

Received: 26 May 2022
Accepted: 22 November 2022
Published: 07 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08079-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning task-specific discriminative representations for multiple object tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SSL-MOT: self-supervised learning based multi-object tracking

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

Multi-object Tracking by Joint Detection and Identification Learning

Explore related subjects

Data availability statements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now