Abstract
Recently, with the development of deep-learning, the performance of multi-object tracking algorithms based on deep neural networks has been greatly improved. However, most methods separate different functional modules into multiple networks and train them independently on specific tasks. When these network modules are used directly, they are not compatible with each other effectively, nor can they be better adapted to the multi-object tracking task, which leads to a poor tracking effect. Therefore, a network structure is designed to aggregate the regression of objects between frames and the extraction of appearance features into one model to improve the harmony between various functional modules of multi-object tracking. To improve the support for the multi-object tracking task, an end-to-end training method is also proposed to simulate the multi-object tracking process during the training and expand the training data by using the historical position of the target combined with the prediction of the motion model. A metric loss that can take advantage of the historical appearance features of the target is also used to train the extraction module of appearance features to improve the temporal correlation of extracted appearance features. Evaluation results on the MOTChallenge benchmark datasets show that the proposed approach achieves state-of-the-art performance.
Similar content being viewed by others
References
Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited, in: Proceedings of the IEEE international conference on computer vision, pp. 4696–4704
Bae S-H, Yoon K-J (2014) Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1218–1225
Lenz P, Geiger A, Urtasun R (2015) Followme: Efficient online min-cost flow tracking with bounded memory and computation, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 4364–4372
Wu Z, Thangali A, Sclaroff S, Betke M (2012) Coupling detection and data association for multiple object tracking, in: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1948-1955
Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 3988–3998
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 4836–4845
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M-H (2018) Online multi-object tracking with dual matching attention networks, in: Proceedings of the European Conference on Computer Vision, pp. 366–382
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6638–6646
Feng W, Hu Z, Wu W, Yan J, Ouyang W (2019) Multi-object tracking with multiple cues and switcher-aware classification, arXiv:1901.06129
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980
Chu P, Fan H, Tan CC, Ling H (2019) Online multi-object tracking with instance-aware tracker and dynamic model refreshment, in: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 161–170
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification, in: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 1–6
Yoon Y-C, Boragule A, Song Y-M, Yoon K, Jeon M (2018) Online multi-object tracking with historical appearance matching and scene adaptive detection filtering, in: Proceedings of the IEEE International conference on advanced video and signal based surveillance, pp. 1–6
Yoon Y-C, Kim DY, Yoon K, Song Y-m, Jeon M (2019) Online multiple pedestrian tracking using deep temporal appearance matching association, arXiv:1907.00831
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles, in: Proceedings of the IEEE international conference on computer vision, pp. 941–951
Kalman RE (1960) A new approach to linear filtering and prediction problems. ASME J Basic Eng March 82(1):35–45
Evangelidis GD, Psarakis EZ (2008) Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans Pattern Anal Mach Intell 30(10):1858–1865
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
Girshick R (2015) Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks, in: Proceedings of the Advances in neural information processing systems, pp. 91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement, arXiv:1804.02767
Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 3038–3046
Kieritz H, Hubner W, Arens M (2018) Joint detection and online multi-object tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1459–1467
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector, in: Proceedings of the European Conference on Computer Vision, pp. 21–37
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking, in: Proceedings of the IEEE International Conference on Image Processing, pp. 3464–3468
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric, in: Proceedings of the IEEE International Conference on Image Processing, pp. 3645–3649
Huang P, Han S, Zhao J, Liu D, Wang H, Yu E, Kot AC (2020) Refinements in motion and appearance for online multi-object tracking, arXiv:2003.07177
Milan A, Rezatofighi SH, Dick A, ReID I, Schindler K (2016) Online multi-object tracking using recurrent neural networks, arXiv:1604.03635
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking, in: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475
Takala V, Pietikainen M (2007) Multi-object tracking using color, texture and motion, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–7
Yang M, Jia YJCV, Understanding I (2016) Temporal dynamic appearance modeling for online multi-person tracking. Comput Vis Image Underst 153:16–28
Wang L, Xu L, Kim MY, Rigazico L, Yang M-H (2017) Online multiple object tracking via flow and convolutional features, in: Proceedings of the IEEE International Conference on Image Processing, pp. 3630–3634
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature, in: Proceedings of the European Conference on Computer Vision, pp. 36–42
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
Mahmoudi N, Ahadi SM, Rahmati M (2019) Multi-object tracking using CNN-based features: CNNMTT. Multimed. Tools Appl 78(6):7077–7096
Hermans A, Beyer L, Leibe B.J.a.p.a. (2017) In defense of the triplet loss for person re-identification, arXiv:1703.07737
Sun S, Akhtar N, Song H, Mian AS, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell:1
Xu Y, Osep A, Ban Y, Horaud R, Leal-Taixé L, Alameda-Pineda X (2020) How to train your deep multi-object tracker, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6787–6796
Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 6172–6181
Shi X, Ling H, Pang Y, Hu W, Chu P, Xing J (2019) Rank-1 tensor approximation for high-order association in multi-object tracking. Int J Comput Vis 127(8):1063–1083
G. Brasó, L. Leal-Taixé (2020) Learning a neural solver for multiple object tracking, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257
Gündüz G, Acarman T (2019) Efficient multi-object tracking by strong associations on temporal window. IEEE Transactions on Intelligent Vehicles 4(3):447–455
Osep A, Mehner W, Mathias M, Leibe B (2017) Combined image-and world-space tracking in traffic scenes. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1988–1995
Yoon JH, Lee CR, Yang MH, Yoon KJ (2016) Online multi-object tracking via structural constraint event aggregation. In Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1392–1400
Wang S, Fowlkes CC (2017) Learning optimal parameters for multi-target tracking with contextual interactions. International journal of computer vision 122(3):484–501
Gündüz G, Acarman T (2018) A lightweight online multiple object vehicle tracking method. In 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 427–432
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Leal-Taixé L, Milan A, ReID I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-object tracking, arXiv:1504.01942
Milan A, Leal-Taixé L, ReID I, Roth S, Schindler K (2016) MOT16: A benchmark for multi-object tracking, arXiv:1603.00831
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2129–2137
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP Journal on Image and Video Processing 2008:1–10
Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2020) HOTA: a higher order metric for evaluating multi-object tracking. Int J Comput Vis:1–31
Lin TY, Maire M, Belongie S, James P, Perona P, Ramanan D, Piotr D, Zitnick CL (2014). Microsoft Coco: Common Objects in Context. in: Proceedings of the European Conference on Computer Vision, pp. 740–755
Acknowledgements
This paper was supported by the Graduate Innovation Foundation of Jiangsu Province [grant No. KYLX16_0781]; the Natural Science Foundation of Jiangsu Province [grants No. BK20181340];the 111 Project [grants No. B12018]; PAPD of Jiangsu Higher Education Institutions; National Natural Science Foundation of China [grants No. 61806006]; China Postdoctoral Science Foundation [Grant No. 2019 M660149].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, J., Ge, H., Yang, J. et al. Online multi-object tracking using multi-function integration and tracking simulation training. Appl Intell 52, 1268–1288 (2022). https://doi.org/10.1007/s10489-021-02457-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02457-5