Skip to main content
Log in

A systematic survey on recent deep learning-based approaches to multi-object tracking

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This survey covers an in-depth review of the state-of-the-art research on Multi-Object Tracking (MOT) from research articles published in 2019 or later in top-tier journals and conferences. We categorize existing MOT research into nine broad categories and discuss the workflow and limitations of each of these categories. Such a classification will enable readers to understand the research trend in different sub-domains of the MOT problem, as well as identify the research gaps. To the best of our knowledge, existing surveys on MOT do not put much emphasis on discussing the tracking step of MOT, which we have addressed here. Additionally, our survey highlights the progress made in MOT by employing recent Deep Learning models such as Transformers, Graph Neural Networks, etc., which also have not been covered in other surveys. It also discusses the challenges faced by the various trackers due to a variety of extrinsic and intrinsic factors. Additionally, we elaborate on the available public datasets, benchmarks, and metrics employed to evaluate the performance of an MOT model and make comparative studies to enlist the important results reported in previous research studies for some popular MOT datasets. This survey will provide the readers with an extensive overview of the state-of-the-art MOT algorithms and their shortcomings, which will help them in designing and developing newer and better MOT algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data Availability

Not Applicable

References

  1. Chandrajit M, Girisha R, Vasudev T (2016) Multiple objects tracking in surveillance video using color and hu moments. arXiv:1608.06148

  2. Xie D, Hu W, Tan T, Peng J (2004) A multi-object tracking system for surveillance video analysis. Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 4. pp 767–7704

  3. Gebregziabher B (2023) Multi-object tracking for predictive collision avoidance. arXiv:2307.02161

  4. Liu D (2021) Multi-object tracking and segmentation for autonomous driving: A flow guided association approach. PhD thesis, Purdue University Graduate School

  5. Luo C, Yang X, Yuille AL (2021) Exploring simple 3d multi-object tracking for autonomous driving. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 10468-10477

  6. Li M (2016) Detecting, segmenting and tracking bio-medical objects. PhD thesis, Missouri University of Science and Technology

  7. Smal I, Meijering EHW, Draegestein K, Galjart N, Grigoriev I, Akhmanova A, van Royen ME, Houtsmuller AB, Niessen WJ (2008) Multiple object tracking in molecular bioimaging by rao-blackwellized marginal particle filtering. Med Image Anal 12:6

    Article  Google Scholar 

  8. Park Y, Dang LM, Lee S, Han D, Moon H (2021) Multiple object tracking in deep learning approaches: A survey. Electronics

  9. Ciaparrone G, Sánchez FL, Tabik S, Troiano L, Tagliaferri R, Herrera F (2019) Deep learning in video multi-object tracking: A survey. Neurocomputing 381:61–88

    Article  Google Scholar 

  10. Xu Y, Zhou X, Chen S, Li F (2019) Deep learning for multiple object tracking: a survey. IET Comput Vis 13:355–368

    Article  Google Scholar 

  11. Wang G, Song M, Hwang J-N (2022) Recent advances in embedding methods for multi-object tracking: A survey. arXiv:2205.10766

  12. Dai Y, Hu Z-Y, Zhang S, Liu L (2022) A survey of detection-based video multi-object tracking. Displays 75:102317

    Article  Google Scholar 

  13. Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51:6400–6429

    Article  Google Scholar 

  14. Fan L, Wang Z-L, Cai B-G, Tao C, Zhang Z, Wang Y, Li S, Huang F, Fu S, Zhang F (2016) A survey on multiple object tracking algorithm. 2016 IEEE international conference on information and automation (ICIA), pp 1855-1862

  15. Emami P, Pardalos PM, Elefteriadou L, Ranka S (2018) Machine learning methods for solving assignment problems in multi-target tracking. arXiv:1802.06897

  16. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. arXiv:2004.01177

  17. Weng X, Wang J, Held D, Kitani K (2019) 3d multi-object tracking: A baseline and new evaluation metrics. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10359–10366

  18. Pang Z, Li Z, Wang N (2021) Simpletrack: Understanding and rethinking 3d multi-object tracking. arXiv:2111.09621

  19. Luo W, Xing J, Milan A, Zhang X, Liu W, Zhao X, Kim T-K (2014) Multiple object tracking: A literature review. Artif Intell 293:103448

    Article  MathSciNet  Google Scholar 

  20. Bashar M, Islam S, Hussain KK, Hasan MB, Rahman ABMA, Kabir MH (2022) Multiple object tracking in recent times: A literature review. arXiv:2209.04796

  21. Luo W, Xing J, Milan A, Zhang X, Liu W, Zhao X, Kim T-K (2021) Multiple object tracking: A literature review. Artif Intell 293:103448

    Article  MathSciNet  Google Scholar 

  22. Xu Z, Zhang W, Tan X, Yang W, Huang H, Wen S, Ding E, Huang, L (2020) Segment as points for efficient online multi-object tracking and segmentation. In: ECCV

  23. Bras’o G, Leal-Taix’e L (2020) Learning a neural solver for multiple object tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6246–6256

  24. Miah M, Bilodeau G-A, Saunier N (2021) Multi-object tracking and segmentation with a space-time memory network. arXiv:2110.11284

  25. Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6036–6046

  26. Peri N, Khorramshahi P, Rambhatla SS, Shenoy V, Rawat S, Chen J-C, Chellappa R (2020) Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection. 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2648–2657

  27. Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6171–6180

  28. Weng X, Yuan Y, Kitani K (2021) Ptp: Parallelized tracking and prediction with graph neural networks and diversity sampling. IEEE Robot Autom Lett 6:4640–4647

    Article  Google Scholar 

  29. Jiang X, Li P, Li Y, Zhen X (2019) Graph neural based end-to-end data association framework for online multiple-object tracking. arXiv:1907.05315

  30. Lusardi C, Taufique AMN, Savakis AE (2021) Robust multi-object tracking using re-identification features and graph convolutional networks. 2021 IEEE/CVF international conference on computer vision workshops (ICCVW), pp 3861–3870

  31. Al-Shakarji NM, Ufuktepe E, Bunyak F, Aliakbarpour H, Seetharaman G, Palaniappan K (2020) Semi-automatic system for rapid annotation of moving objects in surveillance videos using deep detection and multi-object tracking techniques. 2020 IEEE applied imagery pattern recognition workshop (AIPR), pp 1–6

  32. Ghasemi A, Ravikumar CN (2015) Multi object tracking algorithm use in video surveillance systems. Int J Sci Res Educ 3

  33. Gani MHH, Khalifa OO, Gunawan TS, Shamsan EA (2017) Traffic intensity monitoring using multiple object detection with traffic surveillance cameras. 2017 IEEE 4th international conference on smart instrumentation, measurement and application (ICSIMA), pp 1–5

  34. Khorramshahi P, Shenoy V, Pack ML, Chellappa R (2022) Scalable and real-time multi-camera vehicle detection, re-identification, and tracking. arXiv:2204.07442

  35. Wu M, Qian Y, Wang C, Yang M (2021) A multi-camera vehicle tracking system based on city-scale vehicle re-id and spatial-temporal information. 2021 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 4072–4081

  36. Chiu H-K, Prioletti A, Li J, Bohg J (2020) Probabilistic 3d multi-object tracking for autonomous driving. arXiv:2001.05673

  37. Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14656–14666

  38. Zhao D, Fu H, Xiao L, Wu T, Dai B (2018) Multi-object tracking with correlation filter for autonomous vehicle. Sensors (Basel, Switzerland) 18

  39. Ning G, Huang H (2020) Lighttrack: A generic framework for online top-down human pose tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 4456–4465

  40. Gade R, Moeslund TB (2017) Constrained multi-target tracking for team sports activities. IPSJ Trans Comput Vision Appl 10:1–11

    Google Scholar 

  41. Kim K, Cao M, Rao S, Xu J, Medasani SS, Owechko Y (2011) Multi-object detection and behavior recognition from motion 3d data. CVPR 2011 workshops, pp 37–42

  42. Musaev A, Wang J, Zhu L, Li C, Chen Y, Liu J, Zhang W, Mei J, Wang D (2020) Towards in-store multi-person tracking using head detection and track heatmaps. arXiv:2005.08009

  43. Patel AS, Vyas R, Vyas OP, Ojha M, Tiwari V (2022) Motion-compensated online object tracking for activity detection and crowd behavior analysis. The Visual Computer, pp 1–21

  44. Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: Multi-object tracking and segmentation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 7934-7943

  45. Leal-Taixé L, Milan A, Reid ID, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942

  46. Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. arXiv:1603.00831

  47. Dendorfer P, Rezatofighi H, Milan A, Shi JQ, Cremers D, Reid ID, Roth S, Schindler K, Leal-Taix’e L (2020) Mot20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003

  48. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361

  49. Dave A, Khurana T, Tokmakov P, Schmid C, Ramanan D (2020) Tao: A large-scale benchmark for tracking any object. In: ECCV

  50. Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: CVPR

  51. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multi-modal dataset for autonomous driving. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11618–11628

  52. Wu B, Nevatia R (2006) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vision 75:247–266

    Article  Google Scholar 

  53. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: The clear mot metrics. EURASIP J Image Video Process 2008:1–10

    Article  Google Scholar 

  54. Ristani E, Solera F, Zou RS, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. arXiv:1609.01775

  55. Weng X, Wang J, Held D, Kitani K (2020) 3d multi-object tracking: A baseline and new evaluation metrics. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10359–10366

  56. Kim C, Li F, Alotaibi M, Rehg JM (2021) Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9548–9557

  57. Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 3987–3997

  58. Wang C, Wang Y, Wang Y, Wu C-T, Yu G (2019) mussp: Efficient min-cost flow algorithm for multi-object tracking. In: NeurIPS

  59. Zhang L, Li Y, Nevatia R (2008) Global data association for multi-object tracking using network flows. 2008 IEEE conference on computer vision and pattern Recognition, pp 1–8

  60. Wang C, Wang Y, Yu G (2020) Efficient global multi-object tracking under minimum-cost circulation framework. IEEE Trans Pattern Anal Mach Intell

  61. Chen J, Sheng H, Zhang Y, Xiong Z (2017) Enhancing detection model for multiple hypothesis tracking. 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2143–2152

  62. Bergmann P, Meinhardt T, Leal-Taixé L (2019) Tracking without bells and whistles. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 941–951

  63. Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: Adopting tubes to track multi-object in a one-step training model. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6307–6317

  64. Weng X, Wang Y, Man Y, Kitani K (2020) Gnn3dmot: Graph neural network for 3d multi-object tracking with 2d-3d multi-feature learning. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6498–6507

  65. Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: An online multi-object tracker. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12347–12356

  66. Sun S, Akhtar N, Song X, Song H, Mian AS, Shah M (2020) Simultaneous detection and tracking with motion modelling for multiple object tracking. arXiv:2008.08826

  67. Wang G, Wang Y, Zhang H, Gu R, Hwang J-N (2019) Exploit the connectivity: Multi-object tracking with trackletnet. Proceedings of the 27th ACM international conference on multimedia

  68. Zhang W, Zhou H, Sun S, Wang Z, Shi J, Loy CC (2019) Robust multi-modality multi-object tracking. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 2365–2374

  69. Xu Y, Osep A, Ban Y, Horaud R, Leal-Taixé L, Alameda-Pineda X (2020) How to train your deep multi-object tracker. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6786–6795

  70. Chaabane M, Zhang P, Beveridge JR, O’Hara S (2021) Deft: Detection embeddings for tracking. arXiv:2102.02267

  71. Shuai B, Berneshawi AG, Li X, Modolo D, Tighe J (2021) Siammot: Siamese multi-object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12367–12377

  72. Saleh FS, Aliakbarian MS, Salzmann M, Gould S (2020) Artist: Autoregressive trajectory inpainting and scoring for tracking. arXiv:2004.07482

  73. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. 2017 IEEE International Conference on Computer Vision (ICCV), pp 4846–4855

  74. Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M-H (2018) Online multi-object tracking with dual matching attention networks. In: ECCV

  75. Yin J, Wang W, Meng Q, Yang R, Shen J (2020) A unified object motion and affinity model for online multi-object tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6767–6776

  76. Ma C, Li Y, Yang F, Zhang Z, Zhuang Y, Jia H, Xie X (2019) Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. Proceedings of the 2019 on international conference on multimedia retrieval

  77. Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. 2015 IEEE international conference on computer vision (ICCV), pp 3029–3037

  78. Fagot-Bouquet L, Audigier R, Dhome Y, Lerasle F (2016) Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. In: ECCV

  79. Henschel R, Zou Y, Rosenhahn B (2019) Multiple people tracking using body and joint detections. 2019 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 770–779

  80. Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. 2017 IEEE international conference on computer vision (ICCV), pp 3057–3065

  81. Zhang Y, Sun P, Jiang Y, Yu D, Yuan Z, Luo P, Liu W, Wang X (2021) Bytetrack: Multi-object tracking by associating every detection box. In: European conference on computer vision

  82. Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087

    Article  Google Scholar 

  83. Zheng L, Tang M, Chen Y, Zhu G, Wang J, Lu H (2021) Improving multiple object tracking with single object tracking. 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 2453–2462

  84. Buchner M, Valada A (2022) 3d multi-object tracking using graph neural networks with cross-edge modality attention. IEEE Robot Autom Lett 7:9707–9714

    Article  Google Scholar 

  85. Bewley A, Ge Z, Ott L, Ramos FT, Upcroft B (2016) Simple online and realtime tracking. 2016 IEEE International conference on image processing (ICIP), pp 3464–3468

  86. Meinhardt, T., Kirillov, A., Leal-Taixé, L., Feichtenhofer, C (2022) Track-former: Multi-object tracking with transformers. 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8834–8844

  87. Milan A, Rezatofighi SH, Dick AR, Reid ID, Schindler K (2017) Online multi-target tracking using recurrent neural networks. In: AAAI

  88. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. 2017 IEEE international conference on image processing (ICIP), pp 3645–3649

  89. Chu P, Wang J, You Q, Ling H, Liu Z (2021) Transmot: Spatial-temporal graph transformer for multiple object tracking. arXiv:2104.00194

  90. Gao X, Shen Z, Yang Y (2022) Multi-object tracking with siamese-rpn and adaptive matching strategy. Signal Image Video Process 16:965–973

    Article  Google Scholar 

  91. Vaquero L, Brea VM, Mucientes M (2022) Real-time siamese multiple object tracker with enhanced proposals. arXiv:2202.04966

  92. Cai J, Xu M, Li W, Xiong Y, Xia W, Tu Z, Soatto S (2022) Memot: Multi-object tracking with memory. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8080–8090

  93. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. 2018 IEEE winter conference on applications of computer vision (WACV), pp 466-475

  94. Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. 2017 IEEE international conference on computer vision (ICCV), pp 300–311

  95. Zhou X, Yin T, Koltun V, Krähenbühl, P (2022) Global tracking transformers. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8761–8770

  96. Pang Z, Li J, Tokmakov P, Chen D, Zagoruyko S, Wang Y-X (2023) Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17928–17938

  97. Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696

  98. Wu D, Han W, Wang T, Dong X, Zhang X, Shen J (2023) Referring multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14633–14642

  99. Qin Z, Zhou S, Wang L, Duan J, Hua G, Tang W (2023) Motiontrack: Learning robust short-term and long-term motions for multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17939–17948

  100. Huang K, Lertniphonphan K, Chen F, Li J, Wang Z (2023) Multi-object tracking by self-supervised learning appearance model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3162–3168

  101. Yang F, Odashima S, Masui S, Jiang S (2023) Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4799–4808

  102. Seidenschwarz J, Brasó G, Serrano VC, Elezi I, Leal-Taixé L (2023) Simple cues lead to a strong multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13813–13823

  103. Cao J, Weng X, Khirodkar R, Pang J, Kitani K (2022) Observation-centric sort: Rethinking sort for robust multi-object tracking. arXiv:2203.14360

  104. Wang L, Xu L, Kim MY, Rigazico L, Yang M-H (2017) Online multiple object tracking via flow and convolutional features. 2017 IEEE international conference on image processing (ICIP), pp 3630–3634

  105. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: IEEE international conference on computer vision

  106. Beyer L, Breuers S, Kurin V, Leibe B (2017) Towards a principled integration of multi-camera re-identification and tracking through optimal bayes filters. 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1444–1453

  107. Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018) Attention-aware compositional network for person re-identification. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2119–2128

  108. García ROC, Aycard O (2016) Multiple sensor fusion and classification for moving object detection and tracking. IEEE Trans Intell Transp Syst 17:525–534

    Article  Google Scholar 

  109. Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762

  110. Khan SH, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: A survey. ACM Comput Surv 54:1–41

    Article  Google Scholar 

  111. Rubin J, Erkamp R, Naidu RS, Thodiyil AO, Chen AI (2021) Attention distillation for detection transformers: Application to real-time video object detection in ultrasound. In: ML4H@NeurIPS

  112. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: Deformable transformers for end-to-end object detection. arXiv:2010.04159

  113. Sun P, Jiang Y, Zhang R, Xie E, Cao J, Hu X, Kong T, Yuan Z, Wang C, Luo P (2020) Transtrack: Multiple-object tracking with transformer. arXiv:2012.15460

  114. Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) Transcenter: Transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell

  115. Galor A, Orfaig R, Bobrovsky B-Z (2022) Strong-transcenter: Improved multi-object tracking based on transformers with dense representations. arXiv:2210.13570

  116. Zeng F, Dong B, Wang T, Chen C, Zhang X, Wei Y (2022) Motr: End-to-end multiple-object tracking with transformer. In: ECCV

  117. Zhu T, Hiller M, Ehsanpour M, Ma R, Drummond T, Rezatofighi H (2022) Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans Pattern Anal Mach Intell

  118. Willes J, Reading C, Waslander SL (2022) Intertrack: Interaction transformer for 3d multi-object tracking. arXiv:2208.08041

  119. Liu Y, Bai T, Tian Y, Wang Y, Wang J, Wang X, Wang F-Y (2022) Segdq: Segmentation assisted multi-object tracking with dynamic query-based transformers. Neurocomputing 481:91–101

    Article  Google Scholar 

  120. Yang J, Ge H-W, Su S, Liu G (2022) Transformer-based two-source motion model for multi-object tracking. Appl Intell 52:9967–9979

    Article  Google Scholar 

  121. Xu X, Feng Z, Cao C, Yu C, Li M, Wu Z, Ye S, Shang Y (2022) Stn-track: Multiobject tracking of unmanned aerial vehicles by swin transformer neck and new data association method. IEEE J Sel Top Appl Earth Obs Remote Sens 15:8734–8743

    Article  Google Scholar 

  122. Li Y, Lu C (2022) Modeling human memory in multi-object tracking with transformers. ICASSP 2022 - 2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2849–2853

  123. Tsai C-Y, Shen G, Nisar H (2023) Swin-jde: Joint detection and embedding multi-object tracking in crowded scenes based on swin-transformer. Eng Appl Artif Intell 119:105770

    Article  Google Scholar 

  124. Tang Z, Naphade MR, Liu M-Y, Yang X, Birchfield S, Wang S, Kumar R, Anastasiu D, Hwang J-N (2019) Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8789–8798

  125. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCV

  126. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: ECCV workshops

  127. Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1420–1429

  128. Valmadre J, Bertinetto L, Henriques JF, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5000–5008

  129. Zhang J, Sun J, Wang J, Li Z, Chen X (2022) An object tracking framework with recapture based on correlation filters and siamese networks. Comput Electr Eng 98:107730

    Article  Google Scholar 

  130. Pan G, Chen G, Kang W, Hou J (2019) Correlation filter tracker with siamese: A robust and real-time object tracking framework. Neurocomputing 358:33–43

    Article  Google Scholar 

  131. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980

  132. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: ECCV

  133. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286

  134. Fan H, Ling H (2019) Siamese cascaded region proposal networks for realtime visual tracking. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953

  135. Rahul MV, Revanur A, Shobha G (2017) Siamese network for underwater multiple object tracking. Proceedings of the 9th international conference on machine learning and computing

  136. Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank SJ (2018) Learning attentions: Residual attentional siamese network for high performance online visual tracking. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4854–4863

  137. Zhu Z, Wu W, Zou W, Yan J (2018) End-to-end flow correlation tracking with spatial-temporal attention. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 548–557

  138. Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable siamese attention networks for visual object tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6727–6736

  139. Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4644–4654

  140. Wang B, Wang G, Chan KL, Wang L (2017) Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans Pattern Anal Mach Intell 39:589–602

    Article  Google Scholar 

  141. Chari V, Lacoste-Julien S, Laptev I, Sivic J (2015) On pairwise costs for network flow multi-object tracking. 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 5537–5545

  142. Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2730–2739

  143. Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. 2020 IEEE winter conference on applications of computer vision (WACV), pp 708–717

  144. Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. 2021 IEEE international conference on robotics and automation (ICRA), pp 13708–13715

  145. He J, Huang Z, Wang N, Zhang Z (2021) Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5295–5305

  146. Papakis I, Sarkar A, Karpatne A (2020) Gcnnmatch: Graph convolutional neural networks for multi-object tracking via sinkhorn normalization. arXiv:2010.00067

  147. Rangesh A, Maheshwari P, Gebre M, Mhatre S, Ramezani VR, Trivedi MM (2021) Trackmpnn: A message passing graph neural architecture for multi-object tracking. arXiv:2101.04206

  148. Zaech J-N, Dai D, Liniger A, Danelljan M, Gool LV (2022) Learnable online graph representations for 3d multi-object tracking. IEEE Robot Autom Lett 1

  149. Dai P, Weng R, Choi W, Zhang C, He Z, Ding W (2021) Learning a proposal classifier for multiple object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2443–2452

  150. Lee J, Jeong M, Ko B (2021) Graph convolution neural network-based data association for online multi-object tracking. IEEE Access 9:114535–114546

    Article  Google Scholar 

  151. Weng X, Kitani K (2020) Autoselect: Automatic and dynamic detection selection for 3d multi-object tracking. arXiv:2012.05894

  152. Wang Y, Weng X, Kitani K (2020) Joint detection and multi-object tracking with graph neural networks. arXiv:2006.13164

  153. Marinello N, Proesmans M, Gool LV (2022) Triplettrack: 3d object tracking using triplet embeddings and lstm. 2022 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 4499–4509

  154. Wan X, Wang J, Zhou S (2018) An online and flexible multi-object tracking framework using long short-term memory. 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1311–13118

  155. Kim C, Li F, Rehg JM (2018) Multi-object tracking with neural gating using bilinear lstm. In: ECCV

  156. Ondruska P, Posner I (2016) Deep tracking: Seeing beyond seeing using recurrent neural networks. In: AAAI

  157. Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 10840–10849

  158. Yu F, Wang D, Darrell T (2018) Deep layer aggregation. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2403–2412

  159. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

  160. Song Y, Zhang P, Huang W, Zha Y, You T, Zhang Y (2021) Multiple object tracking based on multi-task learning with strip attention. IET Image Process 15:3661–3673

    Article  Google Scholar 

  161. Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3875–3885

  162. Mostafa R, Baraka H, Bayoumi A (2022) Lmot: Efficient light-weight detection and tracking in crowds. IEEE Access 10:83085–83095

    Article  Google Scholar 

  163. Shuai B, Berneshawi AG, Wang M, Liu C, Modolo D, Li X, Tighe J (2020) Application of multi-object tracking with siamese track-rcnn to the human in events dataset. Proceedings of the 28th ACM international conference on multimedia

  164. McKee DW, Shuai B, Berneshawi AG, Wang M, Modolo D, Lazebnik S, Tighe J (2021) Multi-object tracking with hallucinated and unlabeled videos. arXiv:2108.08836

  165. Li J, Ding Y, Wei H-L (2022) Simpletrack: Rethinking and improving the jde approach for multi-object tracking. Sensors (Basel, Switzerland) 22

  166. Liu S, Li X, Lu H, He Y (2022) Multi-object tracking meets moving uav. 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8866–8875

  167. Nalaie K, Xu R, Zheng R (2022) Deepscale: Online frame size adaptation for multi-object tracking on smart cameras and edge servers. 2022 IEEE/ACM seventh international conference on internet-of-things design and implementation (IoTDI), pp 67–79

  168. Wang S, Sheng H, Zhang Y, Wu Y, Xiong Z (2021) A general recurrent tracking framework without real data. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 13199–13208

  169. Pang J, Qiu L, Li X, Chen H, Li Q, Darrell T, Yu F (2021) Quasidense similarity learning for multiple object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 164–173

  170. Stadler D, Beyerer J (2021) On the performance of crowd-specific detectors in multi-pedestrian tracking. 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–12

  171. Liu J, Hou Q, Cheng M-M, Wang C, Feng J (2020) Improving convolutional networks with self-calibrated convolutions. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10093–10102

  172. Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. arXiv:2007.14557

  173. Hornáková A, Kaiser TB, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: An efficient approximate solver for lifted disjoint paths. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 6310–6320

  174. Stadler DS, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10953–10962

  175. Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3701–3710

  176. Psalta A, Tsironis V, Karantzalos K (2022) Transformer-based assignment decision network for multiple object tracking. arXiv:2208.03571

  177. Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7:7892–7902

    Article  Google Scholar 

  178. Ren S, He K, Girshick RB, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149

    Article  Google Scholar 

  179. Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645

    Article  Google Scholar 

  180. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2129–2137

  181. Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) Transcenter: Transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell 45(6):7820–7835

    Article  Google Scholar 

  182. Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: Improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424

Download references

Acknowledgements

The authors acknowledge SERB-DST, Government of India for supporting their work with a project grant (ref. no. CRG/2020/005465).

Author information

Authors and Affiliations

Authors

Contributions

Agarwal, H. and Halder, A. made equal contributions to the work. They studied the research articles, worked on the categorization of MOT approaches, compiled results from existing research articles, and drafted the survey. Chattopadhyay, P. supervised the overall progress, proofread the paper, and helped in improving the presentation of the manuscript.

Corresponding author

Correspondence to Pratik Chattopadhyay.

Ethics declarations

Conflict of interest

The paper is original and is not simultaneously under consideration for publication in any other journal or conference proceedings. There is also no potential conflict of interest to disclose, such as employment, financial or non-financial interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agrawal, H., Halder, A. & Chattopadhyay, P. A systematic survey on recent deep learning-based approaches to multi-object tracking. Multimed Tools Appl 83, 36203–36259 (2024). https://doi.org/10.1007/s11042-023-16910-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16910-9

Keywords

Navigation