Abstract
This survey covers an in-depth review of the state-of-the-art research on Multi-Object Tracking (MOT) from research articles published in 2019 or later in top-tier journals and conferences. We categorize existing MOT research into nine broad categories and discuss the workflow and limitations of each of these categories. Such a classification will enable readers to understand the research trend in different sub-domains of the MOT problem, as well as identify the research gaps. To the best of our knowledge, existing surveys on MOT do not put much emphasis on discussing the tracking step of MOT, which we have addressed here. Additionally, our survey highlights the progress made in MOT by employing recent Deep Learning models such as Transformers, Graph Neural Networks, etc., which also have not been covered in other surveys. It also discusses the challenges faced by the various trackers due to a variety of extrinsic and intrinsic factors. Additionally, we elaborate on the available public datasets, benchmarks, and metrics employed to evaluate the performance of an MOT model and make comparative studies to enlist the important results reported in previous research studies for some popular MOT datasets. This survey will provide the readers with an extensive overview of the state-of-the-art MOT algorithms and their shortcomings, which will help them in designing and developing newer and better MOT algorithms.
Similar content being viewed by others
Data Availability
Not Applicable
References
Chandrajit M, Girisha R, Vasudev T (2016) Multiple objects tracking in surveillance video using color and hu moments. arXiv:1608.06148
Xie D, Hu W, Tan T, Peng J (2004) A multi-object tracking system for surveillance video analysis. Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 4. pp 767–7704
Gebregziabher B (2023) Multi-object tracking for predictive collision avoidance. arXiv:2307.02161
Liu D (2021) Multi-object tracking and segmentation for autonomous driving: A flow guided association approach. PhD thesis, Purdue University Graduate School
Luo C, Yang X, Yuille AL (2021) Exploring simple 3d multi-object tracking for autonomous driving. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 10468-10477
Li M (2016) Detecting, segmenting and tracking bio-medical objects. PhD thesis, Missouri University of Science and Technology
Smal I, Meijering EHW, Draegestein K, Galjart N, Grigoriev I, Akhmanova A, van Royen ME, Houtsmuller AB, Niessen WJ (2008) Multiple object tracking in molecular bioimaging by rao-blackwellized marginal particle filtering. Med Image Anal 12:6
Park Y, Dang LM, Lee S, Han D, Moon H (2021) Multiple object tracking in deep learning approaches: A survey. Electronics
Ciaparrone G, Sánchez FL, Tabik S, Troiano L, Tagliaferri R, Herrera F (2019) Deep learning in video multi-object tracking: A survey. Neurocomputing 381:61–88
Xu Y, Zhou X, Chen S, Li F (2019) Deep learning for multiple object tracking: a survey. IET Comput Vis 13:355–368
Wang G, Song M, Hwang J-N (2022) Recent advances in embedding methods for multi-object tracking: A survey. arXiv:2205.10766
Dai Y, Hu Z-Y, Zhang S, Liu L (2022) A survey of detection-based video multi-object tracking. Displays 75:102317
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51:6400–6429
Fan L, Wang Z-L, Cai B-G, Tao C, Zhang Z, Wang Y, Li S, Huang F, Fu S, Zhang F (2016) A survey on multiple object tracking algorithm. 2016 IEEE international conference on information and automation (ICIA), pp 1855-1862
Emami P, Pardalos PM, Elefteriadou L, Ranka S (2018) Machine learning methods for solving assignment problems in multi-target tracking. arXiv:1802.06897
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. arXiv:2004.01177
Weng X, Wang J, Held D, Kitani K (2019) 3d multi-object tracking: A baseline and new evaluation metrics. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10359–10366
Pang Z, Li Z, Wang N (2021) Simpletrack: Understanding and rethinking 3d multi-object tracking. arXiv:2111.09621
Luo W, Xing J, Milan A, Zhang X, Liu W, Zhao X, Kim T-K (2014) Multiple object tracking: A literature review. Artif Intell 293:103448
Bashar M, Islam S, Hussain KK, Hasan MB, Rahman ABMA, Kabir MH (2022) Multiple object tracking in recent times: A literature review. arXiv:2209.04796
Luo W, Xing J, Milan A, Zhang X, Liu W, Zhao X, Kim T-K (2021) Multiple object tracking: A literature review. Artif Intell 293:103448
Xu Z, Zhang W, Tan X, Yang W, Huang H, Wen S, Ding E, Huang, L (2020) Segment as points for efficient online multi-object tracking and segmentation. In: ECCV
Bras’o G, Leal-Taix’e L (2020) Learning a neural solver for multiple object tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6246–6256
Miah M, Bilodeau G-A, Saunier N (2021) Multi-object tracking and segmentation with a space-time memory network. arXiv:2110.11284
Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6036–6046
Peri N, Khorramshahi P, Rambhatla SS, Shenoy V, Rawat S, Chen J-C, Chellappa R (2020) Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection. 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2648–2657
Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6171–6180
Weng X, Yuan Y, Kitani K (2021) Ptp: Parallelized tracking and prediction with graph neural networks and diversity sampling. IEEE Robot Autom Lett 6:4640–4647
Jiang X, Li P, Li Y, Zhen X (2019) Graph neural based end-to-end data association framework for online multiple-object tracking. arXiv:1907.05315
Lusardi C, Taufique AMN, Savakis AE (2021) Robust multi-object tracking using re-identification features and graph convolutional networks. 2021 IEEE/CVF international conference on computer vision workshops (ICCVW), pp 3861–3870
Al-Shakarji NM, Ufuktepe E, Bunyak F, Aliakbarpour H, Seetharaman G, Palaniappan K (2020) Semi-automatic system for rapid annotation of moving objects in surveillance videos using deep detection and multi-object tracking techniques. 2020 IEEE applied imagery pattern recognition workshop (AIPR), pp 1–6
Ghasemi A, Ravikumar CN (2015) Multi object tracking algorithm use in video surveillance systems. Int J Sci Res Educ 3
Gani MHH, Khalifa OO, Gunawan TS, Shamsan EA (2017) Traffic intensity monitoring using multiple object detection with traffic surveillance cameras. 2017 IEEE 4th international conference on smart instrumentation, measurement and application (ICSIMA), pp 1–5
Khorramshahi P, Shenoy V, Pack ML, Chellappa R (2022) Scalable and real-time multi-camera vehicle detection, re-identification, and tracking. arXiv:2204.07442
Wu M, Qian Y, Wang C, Yang M (2021) A multi-camera vehicle tracking system based on city-scale vehicle re-id and spatial-temporal information. 2021 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 4072–4081
Chiu H-K, Prioletti A, Li J, Bohg J (2020) Probabilistic 3d multi-object tracking for autonomous driving. arXiv:2001.05673
Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14656–14666
Zhao D, Fu H, Xiao L, Wu T, Dai B (2018) Multi-object tracking with correlation filter for autonomous vehicle. Sensors (Basel, Switzerland) 18
Ning G, Huang H (2020) Lighttrack: A generic framework for online top-down human pose tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 4456–4465
Gade R, Moeslund TB (2017) Constrained multi-target tracking for team sports activities. IPSJ Trans Comput Vision Appl 10:1–11
Kim K, Cao M, Rao S, Xu J, Medasani SS, Owechko Y (2011) Multi-object detection and behavior recognition from motion 3d data. CVPR 2011 workshops, pp 37–42
Musaev A, Wang J, Zhu L, Li C, Chen Y, Liu J, Zhang W, Mei J, Wang D (2020) Towards in-store multi-person tracking using head detection and track heatmaps. arXiv:2005.08009
Patel AS, Vyas R, Vyas OP, Ojha M, Tiwari V (2022) Motion-compensated online object tracking for activity detection and crowd behavior analysis. The Visual Computer, pp 1–21
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: Multi-object tracking and segmentation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 7934-7943
Leal-Taixé L, Milan A, Reid ID, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942
Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. arXiv:1603.00831
Dendorfer P, Rezatofighi H, Milan A, Shi JQ, Cremers D, Reid ID, Roth S, Schindler K, Leal-Taix’e L (2020) Mot20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361
Dave A, Khurana T, Tokmakov P, Schmid C, Ramanan D (2020) Tao: A large-scale benchmark for tracking any object. In: ECCV
Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: CVPR
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multi-modal dataset for autonomous driving. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11618–11628
Wu B, Nevatia R (2006) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vision 75:247–266
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: The clear mot metrics. EURASIP J Image Video Process 2008:1–10
Ristani E, Solera F, Zou RS, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. arXiv:1609.01775
Weng X, Wang J, Held D, Kitani K (2020) 3d multi-object tracking: A baseline and new evaluation metrics. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 10359–10366
Kim C, Li F, Alotaibi M, Rehg JM (2021) Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9548–9557
Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 3987–3997
Wang C, Wang Y, Wang Y, Wu C-T, Yu G (2019) mussp: Efficient min-cost flow algorithm for multi-object tracking. In: NeurIPS
Zhang L, Li Y, Nevatia R (2008) Global data association for multi-object tracking using network flows. 2008 IEEE conference on computer vision and pattern Recognition, pp 1–8
Wang C, Wang Y, Yu G (2020) Efficient global multi-object tracking under minimum-cost circulation framework. IEEE Trans Pattern Anal Mach Intell
Chen J, Sheng H, Zhang Y, Xiong Z (2017) Enhancing detection model for multiple hypothesis tracking. 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2143–2152
Bergmann P, Meinhardt T, Leal-Taixé L (2019) Tracking without bells and whistles. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 941–951
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: Adopting tubes to track multi-object in a one-step training model. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6307–6317
Weng X, Wang Y, Man Y, Kitani K (2020) Gnn3dmot: Graph neural network for 3d multi-object tracking with 2d-3d multi-feature learning. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6498–6507
Wu J, Cao J, Song L, Wang Y, Yang M, Yuan J (2021) Track to detect and segment: An online multi-object tracker. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12347–12356
Sun S, Akhtar N, Song X, Song H, Mian AS, Shah M (2020) Simultaneous detection and tracking with motion modelling for multiple object tracking. arXiv:2008.08826
Wang G, Wang Y, Zhang H, Gu R, Hwang J-N (2019) Exploit the connectivity: Multi-object tracking with trackletnet. Proceedings of the 27th ACM international conference on multimedia
Zhang W, Zhou H, Sun S, Wang Z, Shi J, Loy CC (2019) Robust multi-modality multi-object tracking. 2019 IEEE/CVF international conference on computer vision (ICCV), pp 2365–2374
Xu Y, Osep A, Ban Y, Horaud R, Leal-Taixé L, Alameda-Pineda X (2020) How to train your deep multi-object tracker. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6786–6795
Chaabane M, Zhang P, Beveridge JR, O’Hara S (2021) Deft: Detection embeddings for tracking. arXiv:2102.02267
Shuai B, Berneshawi AG, Li X, Modolo D, Tighe J (2021) Siammot: Siamese multi-object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12367–12377
Saleh FS, Aliakbarian MS, Salzmann M, Gould S (2020) Artist: Autoregressive trajectory inpainting and scoring for tracking. arXiv:2004.07482
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. 2017 IEEE International Conference on Computer Vision (ICCV), pp 4846–4855
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M-H (2018) Online multi-object tracking with dual matching attention networks. In: ECCV
Yin J, Wang W, Meng Q, Yang R, Shen J (2020) A unified object motion and affinity model for online multi-object tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6767–6776
Ma C, Li Y, Yang F, Zhang Z, Zhuang Y, Jia H, Xie X (2019) Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. Proceedings of the 2019 on international conference on multimedia retrieval
Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. 2015 IEEE international conference on computer vision (ICCV), pp 3029–3037
Fagot-Bouquet L, Audigier R, Dhome Y, Lerasle F (2016) Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. In: ECCV
Henschel R, Zou Y, Rosenhahn B (2019) Multiple people tracking using body and joint detections. 2019 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 770–779
Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. 2017 IEEE international conference on computer vision (ICCV), pp 3057–3065
Zhang Y, Sun P, Jiang Y, Yu D, Yuan Z, Luo P, Liu W, Wang X (2021) Bytetrack: Multi-object tracking by associating every detection box. In: European conference on computer vision
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087
Zheng L, Tang M, Chen Y, Zhu G, Wang J, Lu H (2021) Improving multiple object tracking with single object tracking. 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 2453–2462
Buchner M, Valada A (2022) 3d multi-object tracking using graph neural networks with cross-edge modality attention. IEEE Robot Autom Lett 7:9707–9714
Bewley A, Ge Z, Ott L, Ramos FT, Upcroft B (2016) Simple online and realtime tracking. 2016 IEEE International conference on image processing (ICIP), pp 3464–3468
Meinhardt, T., Kirillov, A., Leal-Taixé, L., Feichtenhofer, C (2022) Track-former: Multi-object tracking with transformers. 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8834–8844
Milan A, Rezatofighi SH, Dick AR, Reid ID, Schindler K (2017) Online multi-target tracking using recurrent neural networks. In: AAAI
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. 2017 IEEE international conference on image processing (ICIP), pp 3645–3649
Chu P, Wang J, You Q, Ling H, Liu Z (2021) Transmot: Spatial-temporal graph transformer for multiple object tracking. arXiv:2104.00194
Gao X, Shen Z, Yang Y (2022) Multi-object tracking with siamese-rpn and adaptive matching strategy. Signal Image Video Process 16:965–973
Vaquero L, Brea VM, Mucientes M (2022) Real-time siamese multiple object tracker with enhanced proposals. arXiv:2202.04966
Cai J, Xu M, Li W, Xiong Y, Xia W, Tu Z, Soatto S (2022) Memot: Multi-object tracking with memory. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8080–8090
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. 2018 IEEE winter conference on applications of computer vision (WACV), pp 466-475
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. 2017 IEEE international conference on computer vision (ICCV), pp 300–311
Zhou X, Yin T, Koltun V, Krähenbühl, P (2022) Global tracking transformers. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8761–8770
Pang Z, Li J, Tokmakov P, Chen D, Zagoruyko S, Wang Y-X (2023) Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17928–17938
Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696
Wu D, Han W, Wang T, Dong X, Zhang X, Shen J (2023) Referring multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14633–14642
Qin Z, Zhou S, Wang L, Duan J, Hua G, Tang W (2023) Motiontrack: Learning robust short-term and long-term motions for multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17939–17948
Huang K, Lertniphonphan K, Chen F, Li J, Wang Z (2023) Multi-object tracking by self-supervised learning appearance model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3162–3168
Yang F, Odashima S, Masui S, Jiang S (2023) Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4799–4808
Seidenschwarz J, Brasó G, Serrano VC, Elezi I, Leal-Taixé L (2023) Simple cues lead to a strong multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13813–13823
Cao J, Weng X, Khirodkar R, Pang J, Kitani K (2022) Observation-centric sort: Rethinking sort for robust multi-object tracking. arXiv:2203.14360
Wang L, Xu L, Kim MY, Rigazico L, Yang M-H (2017) Online multiple object tracking via flow and convolutional features. 2017 IEEE international conference on image processing (ICIP), pp 3630–3634
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: IEEE international conference on computer vision
Beyer L, Breuers S, Kurin V, Leibe B (2017) Towards a principled integration of multi-camera re-identification and tracking through optimal bayes filters. 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1444–1453
Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018) Attention-aware compositional network for person re-identification. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2119–2128
García ROC, Aycard O (2016) Multiple sensor fusion and classification for moving object detection and tracking. IEEE Trans Intell Transp Syst 17:525–534
Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Khan SH, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: A survey. ACM Comput Surv 54:1–41
Rubin J, Erkamp R, Naidu RS, Thodiyil AO, Chen AI (2021) Attention distillation for detection transformers: Application to real-time video object detection in ultrasound. In: ML4H@NeurIPS
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: Deformable transformers for end-to-end object detection. arXiv:2010.04159
Sun P, Jiang Y, Zhang R, Xie E, Cao J, Hu X, Kong T, Yuan Z, Wang C, Luo P (2020) Transtrack: Multiple-object tracking with transformer. arXiv:2012.15460
Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) Transcenter: Transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell
Galor A, Orfaig R, Bobrovsky B-Z (2022) Strong-transcenter: Improved multi-object tracking based on transformers with dense representations. arXiv:2210.13570
Zeng F, Dong B, Wang T, Chen C, Zhang X, Wei Y (2022) Motr: End-to-end multiple-object tracking with transformer. In: ECCV
Zhu T, Hiller M, Ehsanpour M, Ma R, Drummond T, Rezatofighi H (2022) Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans Pattern Anal Mach Intell
Willes J, Reading C, Waslander SL (2022) Intertrack: Interaction transformer for 3d multi-object tracking. arXiv:2208.08041
Liu Y, Bai T, Tian Y, Wang Y, Wang J, Wang X, Wang F-Y (2022) Segdq: Segmentation assisted multi-object tracking with dynamic query-based transformers. Neurocomputing 481:91–101
Yang J, Ge H-W, Su S, Liu G (2022) Transformer-based two-source motion model for multi-object tracking. Appl Intell 52:9967–9979
Xu X, Feng Z, Cao C, Yu C, Li M, Wu Z, Ye S, Shang Y (2022) Stn-track: Multiobject tracking of unmanned aerial vehicles by swin transformer neck and new data association method. IEEE J Sel Top Appl Earth Obs Remote Sens 15:8734–8743
Li Y, Lu C (2022) Modeling human memory in multi-object tracking with transformers. ICASSP 2022 - 2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2849–2853
Tsai C-Y, Shen G, Nisar H (2023) Swin-jde: Joint detection and embedding multi-object tracking in crowded scenes based on swin-transformer. Eng Appl Artif Intell 119:105770
Tang Z, Naphade MR, Liu M-Y, Yang X, Birchfield S, Wang S, Kumar R, Anastasiu D, Hwang J-N (2019) Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8789–8798
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCV
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: ECCV workshops
Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1420–1429
Valmadre J, Bertinetto L, Henriques JF, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5000–5008
Zhang J, Sun J, Wang J, Li Z, Chen X (2022) An object tracking framework with recapture based on correlation filters and siamese networks. Comput Electr Eng 98:107730
Pan G, Chen G, Kang W, Hou J (2019) Correlation filter tracker with siamese: A robust and real-time object tracking framework. Neurocomputing 358:33–43
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: ECCV
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286
Fan H, Ling H (2019) Siamese cascaded region proposal networks for realtime visual tracking. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953
Rahul MV, Revanur A, Shobha G (2017) Siamese network for underwater multiple object tracking. Proceedings of the 9th international conference on machine learning and computing
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank SJ (2018) Learning attentions: Residual attentional siamese network for high performance online visual tracking. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4854–4863
Zhu Z, Wu W, Zou W, Yan J (2018) End-to-end flow correlation tracking with spatial-temporal attention. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 548–557
Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable siamese attention networks for visual object tracking. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6727–6736
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4644–4654
Wang B, Wang G, Chan KL, Wang L (2017) Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans Pattern Anal Mach Intell 39:589–602
Chari V, Lacoste-Julien S, Laptev I, Sivic J (2015) On pairwise costs for network flow multi-object tracking. 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 5537–5545
Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2730–2739
Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. 2020 IEEE winter conference on applications of computer vision (WACV), pp 708–717
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. 2021 IEEE international conference on robotics and automation (ICRA), pp 13708–13715
He J, Huang Z, Wang N, Zhang Z (2021) Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5295–5305
Papakis I, Sarkar A, Karpatne A (2020) Gcnnmatch: Graph convolutional neural networks for multi-object tracking via sinkhorn normalization. arXiv:2010.00067
Rangesh A, Maheshwari P, Gebre M, Mhatre S, Ramezani VR, Trivedi MM (2021) Trackmpnn: A message passing graph neural architecture for multi-object tracking. arXiv:2101.04206
Zaech J-N, Dai D, Liniger A, Danelljan M, Gool LV (2022) Learnable online graph representations for 3d multi-object tracking. IEEE Robot Autom Lett 1
Dai P, Weng R, Choi W, Zhang C, He Z, Ding W (2021) Learning a proposal classifier for multiple object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2443–2452
Lee J, Jeong M, Ko B (2021) Graph convolution neural network-based data association for online multi-object tracking. IEEE Access 9:114535–114546
Weng X, Kitani K (2020) Autoselect: Automatic and dynamic detection selection for 3d multi-object tracking. arXiv:2012.05894
Wang Y, Weng X, Kitani K (2020) Joint detection and multi-object tracking with graph neural networks. arXiv:2006.13164
Marinello N, Proesmans M, Gool LV (2022) Triplettrack: 3d object tracking using triplet embeddings and lstm. 2022 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 4499–4509
Wan X, Wang J, Zhou S (2018) An online and flexible multi-object tracking framework using long short-term memory. 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1311–13118
Kim C, Li F, Rehg JM (2018) Multi-object tracking with neural gating using bilinear lstm. In: ECCV
Ondruska P, Posner I (2016) Deep tracking: Seeing beyond seeing using recurrent neural networks. In: AAAI
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 10840–10849
Yu F, Wang D, Darrell T (2018) Deep layer aggregation. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2403–2412
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Song Y, Zhang P, Huang W, Zha Y, You T, Zhang Y (2021) Multiple object tracking based on multi-task learning with strip attention. IET Image Process 15:3661–3673
Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3875–3885
Mostafa R, Baraka H, Bayoumi A (2022) Lmot: Efficient light-weight detection and tracking in crowds. IEEE Access 10:83085–83095
Shuai B, Berneshawi AG, Wang M, Liu C, Modolo D, Li X, Tighe J (2020) Application of multi-object tracking with siamese track-rcnn to the human in events dataset. Proceedings of the 28th ACM international conference on multimedia
McKee DW, Shuai B, Berneshawi AG, Wang M, Modolo D, Lazebnik S, Tighe J (2021) Multi-object tracking with hallucinated and unlabeled videos. arXiv:2108.08836
Li J, Ding Y, Wei H-L (2022) Simpletrack: Rethinking and improving the jde approach for multi-object tracking. Sensors (Basel, Switzerland) 22
Liu S, Li X, Lu H, He Y (2022) Multi-object tracking meets moving uav. 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8866–8875
Nalaie K, Xu R, Zheng R (2022) Deepscale: Online frame size adaptation for multi-object tracking on smart cameras and edge servers. 2022 IEEE/ACM seventh international conference on internet-of-things design and implementation (IoTDI), pp 67–79
Wang S, Sheng H, Zhang Y, Wu Y, Xiong Z (2021) A general recurrent tracking framework without real data. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 13199–13208
Pang J, Qiu L, Li X, Chen H, Li Q, Darrell T, Yu F (2021) Quasidense similarity learning for multiple object tracking. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 164–173
Stadler D, Beyerer J (2021) On the performance of crowd-specific detectors in multi-pedestrian tracking. 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–12
Liu J, Hou Q, Cheng M-M, Wang C, Feng J (2020) Improving convolutional networks with self-calibrated convolutions. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10093–10102
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. arXiv:2007.14557
Hornáková A, Kaiser TB, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: An efficient approximate solver for lifted disjoint paths. 2021 IEEE/CVF international conference on computer vision (ICCV), pp 6310–6320
Stadler DS, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10953–10962
Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3701–3710
Psalta A, Tsironis V, Karantzalos K (2022) Transformer-based assignment decision network for multiple object tracking. arXiv:2208.03571
Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7:7892–7902
Ren S, He K, Girshick RB, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2129–2137
Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) Transcenter: Transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell 45(6):7820–7835
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: Improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424
Acknowledgements
The authors acknowledge SERB-DST, Government of India for supporting their work with a project grant (ref. no. CRG/2020/005465).
Author information
Authors and Affiliations
Contributions
Agarwal, H. and Halder, A. made equal contributions to the work. They studied the research articles, worked on the categorization of MOT approaches, compiled results from existing research articles, and drafted the survey. Chattopadhyay, P. supervised the overall progress, proofread the paper, and helped in improving the presentation of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The paper is original and is not simultaneously under consideration for publication in any other journal or conference proceedings. There is also no potential conflict of interest to disclose, such as employment, financial or non-financial interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Agrawal, H., Halder, A. & Chattopadhyay, P. A systematic survey on recent deep learning-based approaches to multi-object tracking. Multimed Tools Appl 83, 36203–36259 (2024). https://doi.org/10.1007/s11042-023-16910-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16910-9