Joint Detection and Association for End-to-End Multi-object Tracking

Li, Ye; Luo, Xiaoyu; Shi, Junyu; Wang, Xinzhong; Yin, Guangqiang; Wang, Zhiguo

doi:10.1007/s11063-023-11397-9

Joint Detection and Association for End-to-End Multi-object Tracking

Published: 24 November 2023

Volume 55, pages 11823–11844, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Ye Li^1,2,3^na1,
Xiaoyu Luo²^na1,
Junyu Shi⁴,
Xinzhong Wang¹,
Guangqiang Yin² &
…
Zhiguo Wang²

305 Accesses
Explore all metrics

Abstract

Multi-object tracking (MOT) is mainly used for detecting and tracking the object on multi-cameras, which is widely applied in intelligent video surveillance and intelligent security. The process of MOT generally involves three import parts: feature extracting, multi-task learning and object matching. Unfortunately, the existed methods still have some drawbacks. Firstly, the feature extracting module cannot effectively fuse the shallow and deep features. What’s more, the multi-task learning module cannot strike a good balance between the detection and re-identification. In addition, the object matching module associates with pedestrian by using a traditional method rather than training a model. For these problems, we propose a method of joint detection and association (JDA) for end-to-end multi-object tracking network, which involves the multi-scale feature extraction and the learnable object association. It first combines a feature extraction backbone based on multi-scale feature fusion and a point-based multi-task object detection branch, to solve the task of feature extraction and object detection. Then, a learnable object motion association module is embedded, which uses the historical frames information to infer the position of the object, and associate the object identity between previous frames and subsequent frames. In addition, the JDA can be end-to-end trained when handling the detection and matching tasks. The proposed JDA is evaluated through a series of experiments on MOT16 and MOT17. The results shows that JDA the existing methods in terms of precision and stability of MOT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Motion estimation and multi-stage association for tracking-by-detection

Article Open access 22 November 2023

Multi-object Tracking by Joint Detection and Identification Learning

Article 06 May 2019

Two-Stage Real-Time Multi-object Tracking with Candidate Selection

References

Kokkinos I (2017) Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6129–6138
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Article Google Scholar
Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. In: Advances in neural information processing systems, vol 31
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7942–7951
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, pp 107–122
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp 3464–3468
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 39(6):1137–1149
Google Scholar
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), pp 3645–3649
Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: IEEE international conference on computer vision
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: multiple object tracking with high performance detection and appearance feature. In: European conference on computer vision, pp 36–42
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp 21–37
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
He Q, Sun X, Yan Z, Fu K (2022) Dabnet: deformable contextual and boundary-weighted network for cloud detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–16. https://doi.org/10.1109/TGRS.2020.3045474
Article Google Scholar
Xu Y, Osep A, Ban Y, Horaud R, Leal-Taixe L, Alameda-Pineda X (2019) How to train your deep multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6786–6795
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6307–6317
Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: Proceedings of the IEEE international conference on computer vision, pp 3038–3046
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 941–951
He Q, Sun X, Yan Z, Li B, Fu K (2022) Multi-object tracking in satellite videos with graph-based multitask modeling. IEEE Trans Geosci Remote Sens 60:1–13. https://doi.org/10.1109/TGRS.2022.3152250
Article Google Scholar
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129(11):3069–3087
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 1–19
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, September 1–5, 2014. Bmva Press
Li Y, Luo X, Hou S, Li C, Yin G (2021) End-to-end network embedding unsupervised key frame extraction for video-based person re-identification. In: 2021 11th international conference on information science and technology (ICIST), pp 404–410 . https://doi.org/10.1109/ICIST52614.2021.9440586
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European conference on computer vision. Springer, pp 474–490
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008:1–10
Article Google Scholar
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: Computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II. Springer, pp 17–35
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: adopting tubes to track multi-object in a one-step training model. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6307–6317. https://doi.org/10.1109/CVPR42600.2020.00634
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 941–951
Xu Y, Ban Y, Alameda-Pineda X, Horaud R (2019) Deepmot: a differentiable framework for training multiple object trackers. arXiv preprint arXiv:1906.06618
Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6247–6257
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 3464–3468
Meneses M, Matos L, Prado B, Carvalho A, Macedo H (2021) Smartsort: an mlp-based method for tracking multiple objects in real-time. J Real-Time Image Proc 18:913–921
Article Google Scholar
He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215. https://doi.org/10.1109/TIP.2022.3154286
Article Google Scholar
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 3645–3649
Wang G, Wang Y, Gu R, Hu W, Hwang J-N (2022) Split and connect: a universal tracklet booster for multi-object tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2022.3140919
Article Google Scholar

Download references

Funding

This work was supported in part by the Young People Fund of Xinjiang Science and Technology Department (No. 2022D01B05).

Author information

Y. Li and X. Luo have contributed equally to this work.

Authors and Affiliations

Shenzhen Institute of Information Technology, Shenzhen, China
Ye Li & Xinzhong Wang
University of Electronic Science and Technology of China, Chengdu, China
Ye Li, Xiaoyu Luo, Guangqiang Yin & Zhiguo Wang
Kashi Institute of Electronics and Information Industry, Varanasi, India
Ye Li
Shenzhen University, Shenzhen, China
Junyu Shi

Authors

Ye Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Luo
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xinzhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guangqiang Yin
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xinzhong Wang, Guangqiang Yin or Zhiguo Wang.

Ethics declarations

Conflict of interest

The data that support the findings of this study are available online. These datasets were derived from the following public domain resources: MOT17 and MOT16. The authors declare that they have no conflict of interest/competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Luo, X., Shi, J. et al. Joint Detection and Association for End-to-End Multi-object Tracking. Neural Process Lett 55, 11823–11844 (2023). https://doi.org/10.1007/s11063-023-11397-9

Download citation

Accepted: 14 August 2023
Published: 24 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11397-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint Detection and Association for End-to-End Multi-object Tracking

Abstract

Access this article

Similar content being viewed by others

Motion estimation and multi-stage association for tracking-by-detection

Multi-object Tracking by Joint Detection and Identification Learning

Two-Stage Real-Time Multi-object Tracking with Candidate Selection

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint Detection and Association for End-to-End Multi-object Tracking

Abstract

Access this article

Similar content being viewed by others

Motion estimation and multi-stage association for tracking-by-detection

Multi-object Tracking by Joint Detection and Identification Learning

Two-Stage Real-Time Multi-object Tracking with Candidate Selection

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation