Abstract
Multi-object tracking (MOT) is one of the most challenging tasks in the field of computer vision. Most MOT methods generally face the problem of not being able to handle pedestrian features such as size and appearance well, which can easily lead to the problem of missed detection and occlusion. Considering this, an end-to-end multi-target tracking network with feature fusion and feature enhancement is proposed. The network framework integrates feature extraction, object detection, and data association. Using two adjacent frames as input chain nodes, based on Inception convolution as the backbone network, which has special pre-training weights that increase the perceptual domain of the network for multiple targets. In addition, the three-times repetitive overlay weighted bidirectional pyramid structure in the feature fusion module, which can focus more on key features and enhance the adaptability to target deformation. In order to solve the phenomenon of crowding in complex scenes, a context-sensitive prediction modules are added, which contain deeper and wider convolution to enhance the key information between targets. After the above processing, three loss function branches are formed, where the classification branch and the identity branch together form the attention multiplied by the regression branch to ensure the accuracy of regression. In MOT16 and MOT17 dataset experiments, our model MOTA metrics reach 67.9 and 67.7, with frame rates up to 30 FPS on a single GPU, with improved visualization results beyond Chain-Tracker.
Similar content being viewed by others
Data Availability
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
References
Adame BO, Salau AO, Subbanna BC, Tirupal T, Sultana SF (2020) Multimodal medical image fusion based on intuitionistic fuzzy sets. In: 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), IEEE, pp 131–134
Aharon N, Orfaig R, Bobrovsky BZ (2022) Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651
Badal T, Nain N, Ahmed M (2018) Online multi-object tracking: multiple instance based target appearance model. Multimedia Tools and Applications 77(19):25199–25221
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 941–951
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), IEEE, pp 3464–3468
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–6
Bouraffa T, Feng Z, Yan L, Xia Y, Xiao B (2022) Multi-feature fusion tracking algorithm based on peak-context learning. Image Vis Comput 123(104):468
Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6247–6257
Chen L, Lou J, Xu F, Ren M (2020) Grid-based multi-object tracking with siamese cnn based appearance edge and access region mechanism. Multimedia Tools and Applications 79(47):35333–35351
Chu P, Wang J, You Q, Ling H, Liu Z (2023) Transmot: Spatial-temporal graph transformer for multiple object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 4870–4880
Elayaperumal D, Joo YH (2021) Robust visual object tracking using context-based spatial variation via multi-feature fusion. Inf Sci 577:467–482
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 466–475
Faster R (2015) Towards real-time object detection with region proposal networks. Advances in neural information processing systems 9199(10.5555):2969239–2969250
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fu Lh, Ding Y, Du YB, Zhang B, Wang LY, Wang D (2020) Siammn: Siamese modulation network for visual object tracking. Multimedia Tools and Applications 79(43):32623–32641
Gao X, Shen Z, Yang Y (2022) Multi-object tracking with siamese-rpn and adaptive matching strategy. SIViP 16(4):965–973
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8136–8145
Hornakova A, Henschel R, Rosenhahn B, Swoboda P (2020) Lifted disjoint paths with application in multiple object tracking. In: International conference on machine learning, PMLR, pp 4364–4375
Jain S, Salau AO (2021) Multimodal image fusion employing discrete cosine transform. In: 2021 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), IEEE, pp 5–8
Karunasekera H, Wang H, Zhang H (2019) Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access 7:104423–104434
Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704
Kim C, Li F, Rehg JM (2018) Multi-object tracking with neural gating using bilinear lstm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 200–215
Kim C, Fuxin L, Alotaibi M, Rehg JM (2021) Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9553–9562
Kim DY, Vo BN, Vo BT, Jeon M (2019) A labeled random finite set online multi-object tracker for video data. Pattern Recogn 90:377–389
Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 719–728
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu J, Li C, Liang F, Lin C, Sun M, Yan J, Ouyang W, Xu D (2021) Inception convolution with efficient dilation search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11486–11495
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14668–14678
Mahmoudi N, Ahadi SM, Rahmati M (2019) Multi-target tracking using cnn-based features: Cnnmtt. Multimedia Tools and Applications 78(6):7077–7096
Pang B, Li Y, Zhang Y, Li M, Lu C (2020a) Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6308–6318
Pang Y, Li F, Qiao X, Gilman A (2020b) Real-time tracking based on deep feature fusion. Multimedia Tools and Applications 79(37):27229–27255
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European conference on computer vision, Springer, pp 145–161
Qin W, Du H, Zhang X Ma Z, Ren X, Luo T (2021) Joint prediction and association for deep feature multiple object tracking. In: Journal of Physics: Conference Series, IOP Publishing, p 012021
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28
Salau AO, Jain S, Eneh JN (2021) A review of various image fusion types and transform. Indonesian Journal of Electrical Engineering and Computer Science 24(3):1515–1522
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online multi-target tracking with strong and weak detections. In: European Conference on Computer Vision, Springer, pp 84–99
Shuai B, Berneshawi A, Li X, Modolo D, Tighe J (2021) Siammot: Siamese multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12372–12382
Song Ym, Jeon M (2016) Online multiple object tracking with the hierarchically adopted gm-phd filter using motion and appearance. In: 2016 IEEE International conference on consumer electronics-Asia (ICCE-Asia), IEEE, pp 1–4
Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Takala V, Pietikainen M (2007) Multi-object tracking using color, texture and motion. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–7
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: A context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp 797–813
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10,860–10,869
Wan J, Zhang H, Zhang J, Ding Y, Yang Y, Li Y, Li X (2022) Dsrrtracker: Dynamic search region refinement for attention-based siamese multi-object tracking. arXiv preprint arXiv:2203.10729
Wang L, Xu L, Kim MY, et al (2017) Online multiple object tracking via flow and convolutional features. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3630–3634
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 13,708–13,715
Wang Z, Zheng L, Liu Y, et al (2020) Towards real-time multi-object tracking. In: European Conference on Computer Vision, Springer, pp 107–122
Wojke N, Bewley A, Paulus D (2017) Simple online and real-time tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 3645–3649
Xing D, Evangeliou N, Tsoukalas A, Tzes A (2022) Siamese transformer pyramid networks for real-time uav tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2139–2148
Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137
Yang M, Jia Y (2016) Temporal dynamic appearance modeling for online multi-person tracking. Comput Vis Image Underst 153:16–28
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature. In: European Conference on Computer Vision, Springer, pp 36–42
Zeng F, Dong B, Wang T, Chen C, Zhang X, Wei Y. Motr: End-to-end multiple-object tracking with transformer. arxiv 2021. arXiv preprint arXiv:2105.03247
Zhang T, Sun R, Wan Y et al (2023) Msffal: Few-shot object detection via multi-scale feature fusion and attentive learning. Sensors 23(7):3609
Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: Multi-object tracking by associating every detection box. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, Springer, pp 1–21
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490
Zhou Z, Xing J, Zhang M, Hu W (2018) Online multi-target tracking with tensor-based high-order graph matching. In: 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, pp 1809–1814
Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 307–317
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors state that they have no conflicting financial interests or personal connections that may have influenced the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, Y., Chen, J., Wang, D. et al. Multi-object tracking using context-sensitive enhancement via feature fusion. Multimed Tools Appl 83, 19465–19484 (2024). https://doi.org/10.1007/s11042-023-16027-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16027-z