Abstract
With the surge in object detection, Multi-Object Tracking (MOT) research has recently witnessed significant advancements. However, most previous studies have primarily focused on benchmarks involving distinguishing appearances and linear motion. In scenarios involving non-linear motion and similar appearances, these methods exhibit a drastic drop in performance. To address this issue, we propose WDTtrack, which incorporates spatiotemporal proximity, velocity orientation, and appearance similarity simultaneously. Firstly, we employ the Centroid Triplet Loss ReID (CTL) model to extract high-quality appearance embeddings. Second, we introduce Wider Bounding Box (W-BBox) and Direction Bank (DB) to capture abundant credible, and discriminative motion cues. Finally, we devise the Tracklet Recovery Mechanism (TRM) to facilitate long-term tracking maintenance. Extensive empirical results demonstrate that WDTtrack outperforms other trackers on the DanceTrack and SportsMOT dataset, highlighting its effectiveness and potential for further development. Specifically, WDTtrack achieves a 66.8 HOTA score, a 72.8 IDF1 score and a 55.9 AssA score on DanceTrack, and a 73.8 HOTA score, a 80.5 IDF1 score and a 64.3 AssA score on SportsMOT, substantially surpassing other non-Transformer algorithms.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Data and Materials
The DanceTrack [4] dataset is available at https://github.com/DanceTrack/DanceTrack. And the Sports-MOT [36] dataset is available at https://github.com/MCG-NJU/SportsMOT.
References
Xia X, Meng Z, Han X et al (2023) An automated driving systems data acquisition and analytics platform. Transp Res C Emerg Technol 151:104120. https://doi.org/10.1016/j.trc.2023.104120
Gloudemans D, Work DB (2021) Fast vehicle turning-movement counting using localization-based tracking. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 4150–4159. https://doi.org/10.1109/cvprw53098.2021.00469
Cioppa A, Giancola S, Deliege A et al (2022) Soccernet-tracking: multiple object tracking dataset and benchmark in soccer videos. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 3490–3501. https://doi.org/10.1109/cvprw56347.2022.00393
Sun P, Cao J, Jiang Y et al (2022) Dancetrack: multi-object tracking in uniform appearance and diverse motion. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 20961–20970. https://doi.org/10.1109/cvpr52688.2022.02032
Li X, Zhao Z, Wu J et al (2022) Y-bgd: broiler counting based on multi-object tracking. Comput Electron Agric 202:107347. https://doi.org/10.1016/j.compag.2022.107347
Du Y, Zhao Z, Song Y et al (2023) Strongsort: make deepsort great again. IEEE Trans Multimed 25:8725–8737. https://doi.org/10.1109/tmm.2023.3240881
Maggiolino G, Ahmad A, Cao J et al (2023) Deep oc-sort: multi-pedestrian tracking by adaptive re-identification. In: 2023 IEEE International conference on image processing (ICIP), pp 3025–3029. https://doi.org/10.1109/ICIP49359.2023.10222576
Yang F, Odashima S, Masui S et al (2023) Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In: 2023 IEEE/CVF Winter conference on applications of computer vision (WACV), pp 4788–4797. https://doi.org/10.1109/WACV56688.2023.00478
Bergmann P, Meinhardt T, Leal-Taixé L (2019) Tracking without bells and whistles. In: 2019 IEEE/CVF International conference on computer vision (ICCV). pp 941–951, https://doi.org/10.1109/ICCV.2019.00103
Pang J, Qiu L, Li X et al (2021) Quasi-dense similarity learning for multiple object tracking. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 164–173. https://doi.org/10.1109/CVPR46437.2021.00023
Cao J, Pang J, Weng X et al (2023) Observation-centric sort: rethinking sort for robust multi-object tracking. In: 2023 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 9686–9696. https://doi.org/10.1109/CVPR52729.2023.00934
Liu Z, Wang X, Wang C et al (2023) Sparsetrack: multi-object tracking by performing scene decomposition based on pseudo-depth. arXiv:2306.05238
He K, Gkioxari G, Dollár P et al (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: Computer vision – ECCV 2020. Springer International Publishing, Cham, pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Zhu X, Su W, Lu L et al (2021) Deformable detr: deformable transformers for end-to-end object detection. In: 2021 International conference on learning representations (ICLR). OpenReview.net, https://openreview.net/forum?id=gZ9hCDWe6ke
Wang CY, Bochkovskiy A, Liao HYM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International conference on image processing (ICIP), pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Bewley A, Ge Z, Ott L et al (2016) Simple online and realtime tracking. In: 2016 IEEE International conference on image processing (ICIP), pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003
Xiao C, Cao Q, Zhong Y et al (2023) Motiontrack: learning motion predictor for multiple object tracking. arXiv:2306.02585
Welch G, Bishop G (1995) An introduction to the kalman filter. Tech. rep, USA
Zhang Y, Sun P, Jiang Y et al (2022) Bytetrack: multi-object tracking by associating every detection box. In: Computer vision – ECCV 2022, Cham, pp 1–21. https://doi.org/10.1007/978-3-031-20047-2_1
Aharon N, Orfaig R, Bobrovsky BZ (2022) Bot-sort: robust associations multi-pedestrian tracking. arXiv:2206.14651
Wang Z, Zheng L, Liu Y et al (2020) Towards real-time multi-object tracking. In: Computer vision – ECCV 2020. Springer International Publishing, Cham, pp 107–122.https://doi.org/10.1007/978-3-030-58621-8_7
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: Computer vision – ECCV 2020. Springer International Publishing, Cham, pp 474–490. https://doi.org/10.1007/978-3-030-58548-8_28
Zhang Y, Wang C, Wang X et al (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129(11):3069–3087. https://doi.org/10.1007/s11263-021-01513-4
Yan B, Jiang Y, Sun P et al (2022) Towards grand unification of object tracking. In: Computer vision – ECCV 2022, Cham, pp 733–751. https://doi.org/10.1007/978-3-031-19803-8_43
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S et al (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Sun P, Zhang R, Jiang Y et al (2021) Sparse r-cnn: end-to-end object detection with learnable proposals. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 14449–14458. https://doi.org/10.1109/CVPR46437.2021.01422
Sun P, Cao J, Jiang Y et al (2020) Transtrack: multiple object tracking with transformer. arXiv:2012.15460
Zeng F, Dong B, Zhang Y et al (2022) Motr: end-to-end multiple-object tracking with transformer. In: Computer vision – ECCV 2022. Springer Nature Switzerland, Cham, pp 659–675. https://doi.org/10.1007/978-3-031-19812-0_38
Meinhardt T, Kirillov A, Leal-Taixe L et al (2022) Trackformer: multi-object tracking with transformers. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8834–8844. https://doi.org/10.1109/cvpr52688.2022.00864
Gao R, Wang L (2023) Memotr: long-term memory-augmented transformer for multi-object tracking. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp 9867–9876. https://doi.org/10.1109/ICCV51070.2023.00908
Zhang Y, Wang T, Zhang X (2023) Motrv2: bootstrapping end-to-end multi-object tracking by pretrained object detectors. In: 2023 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 22056–22065. https://doi.org/10.1109/CVPR52729.2023.02112
Wieczorek M, Rychalska B, Dąbrowski J (2021) On the unreasonable effectiveness of centroids in image retril. In: Neural information processing. Springer International Publishing, Cham, pp 212–223. https://doi.org/10.1007/978-3-030-92273-3_18
Cui Y, Zeng C, Zhao X et al (2023) Sportsmot: a large multi-object tracking dataset in multiple sports scenes. In: 2023 IEEE/CVF International conference on computer vision (ICCV), pp 9887–9897.https://doi.org/10.1109/ICCV51070.2023.00910
Luiten J, Ošep A, Dendorfer P et al (2021) Hota: a higher order metric for evaluating multi-object tracking. Int J Comput Vis 129(2):548–578. https://doi.org/10.1007/s11263-020-01375-2
Ristani E, Solera F, Zou R et al (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: Computer vision – ECCV 2016 Workshops. Springer International Publishing, Cham, pp 17–35. https://doi.org/10.1007/978-3-319-48881-3_2
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Vid Process 1–10. https://doi.org/10.1155/2008/246309
Yan F, Luo W, Zhong Y et al (2023) Bridging the gap between end-to-end and non-end-to-end multi-object tracking. arXiv:2305.12724
Luo R, Song Z, Ma L et al (2024) Diffusiontrack: diffusion model for multi-object tracking. In: 2024 Proceedings of the AAAI conference on artificial intelligence, pp 3991–3999. https://doi.org/10.1609/AAAI.V38I5.28192
Girbau A, Marqués F, Satoh S (2022) Multiple object tracking from appearance by hierarchically clustering tracklets. In: 2022 British machine vision conference (BMVC), p 362. https://bmvc2022.mpi-inf.mpg.de/362/
Wu J, Cao J, Song L et al (2021) Track to detect and segment: an online multi-object tracker. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217
Zhou X, Yin T, Koltun V et al (2022) Global tracking transformers. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8761–8770. https://doi.org/10.1109/CVPR52688.2022.00857
Rezatofighi H, Tsoi N, Gwak J et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 658–666. https://doi.org/10.1109/CVPR.2019.00075
Zheng Z, Wang P, Liu W et al (2020) Distance-iou loss: faster and better learning for bounding box regression. Proc AAAI Conf Artif Intell 34:12993–13000. https://doi.org/10.1609/aaai.v34i07.6999
Acknowledgements
This research was funded by the National Key Research and Development Program of China, grant number 2018YFC0823002, and the Fundamental Research Fund for the Central Uni-versities of China, grant number FRF-TP-20-10B, FRF-GF-19-010A.
Author information
Authors and Affiliations
Contributions
Zeyong Zhao: Conceptualization, Methodology, Software, Visualization, Writing - original draft. Jingyi Wu: Methodology, Writing - review. Ruicong Zhi: Supervision, Conceptualization, Writing - review.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that there are no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and Informed Consent for Data Used
The DanceTrack dataset and SportsMOT dataset both are open source datasets and are only used for non-commercial research purposes.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Z., Wu, J. & Zhi, R. WDTtrack: tracking multiple objects with indistinguishable appearance and irregular motion. Appl Intell 54, 10018–10038 (2024). https://doi.org/10.1007/s10489-024-05682-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05682-w