Skip to main content
Log in

Multi-object tracking using context-sensitive enhancement via feature fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multi-object tracking (MOT) is one of the most challenging tasks in the field of computer vision. Most MOT methods generally face the problem of not being able to handle pedestrian features such as size and appearance well, which can easily lead to the problem of missed detection and occlusion. Considering this, an end-to-end multi-target tracking network with feature fusion and feature enhancement is proposed. The network framework integrates feature extraction, object detection, and data association. Using two adjacent frames as input chain nodes, based on Inception convolution as the backbone network, which has special pre-training weights that increase the perceptual domain of the network for multiple targets. In addition, the three-times repetitive overlay weighted bidirectional pyramid structure in the feature fusion module, which can focus more on key features and enhance the adaptability to target deformation. In order to solve the phenomenon of crowding in complex scenes, a context-sensitive prediction modules are added, which contain deeper and wider convolution to enhance the key information between targets. After the above processing, three loss function branches are formed, where the classification branch and the identity branch together form the attention multiplied by the regression branch to ensure the accuracy of regression. In MOT16 and MOT17 dataset experiments, our model MOTA metrics reach 67.9 and 67.7, with frame rates up to 30 FPS on a single GPU, with improved visualization results beyond Chain-Tracker.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Adame BO, Salau AO, Subbanna BC, Tirupal T, Sultana SF (2020) Multimodal medical image fusion based on intuitionistic fuzzy sets. In: 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), IEEE, pp 131–134

  2. Aharon N, Orfaig R, Bobrovsky BZ (2022) Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651

  3. Badal T, Nain N, Ahmed M (2018) Online multi-object tracking: multiple instance based target appearance model. Multimedia Tools and Applications 77(19):25199–25221

    Article  Google Scholar 

  4. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 941–951

  5. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), IEEE, pp 3464–3468

  6. Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–6

  7. Bouraffa T, Feng Z, Yan L, Xia Y, Xiao B (2022) Multi-feature fusion tracking algorithm based on peak-context learning. Image Vis Comput 123(104):468

    Google Scholar 

  8. Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6247–6257

  9. Chen L, Lou J, Xu F, Ren M (2020) Grid-based multi-object tracking with siamese cnn based appearance edge and access region mechanism. Multimedia Tools and Applications 79(47):35333–35351

    Article  Google Scholar 

  10. Chu P, Wang J, You Q, Ling H, Liu Z (2023) Transmot: Spatial-temporal graph transformer for multiple object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 4870–4880

  11. Elayaperumal D, Joo YH (2021) Robust visual object tracking using context-based spatial variation via multi-feature fusion. Inf Sci 577:467–482

    Article  MathSciNet  Google Scholar 

  12. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 466–475

  13. Faster R (2015) Towards real-time object detection with region proposal networks. Advances in neural information processing systems 9199(10.5555):2969239–2969250

  14. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  PubMed  Google Scholar 

  15. Fu Lh, Ding Y, Du YB, Zhang B, Wang LY, Wang D (2020) Siammn: Siamese modulation network for visual object tracking. Multimedia Tools and Applications 79(43):32623–32641

    Article  Google Scholar 

  16. Gao X, Shen Z, Yang Y (2022) Multi-object tracking with siamese-rpn and adaptive matching strategy. SIViP 16(4):965–973

    Article  Google Scholar 

  17. Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8136–8145

  18. Hornakova A, Henschel R, Rosenhahn B, Swoboda P (2020) Lifted disjoint paths with application in multiple object tracking. In: International conference on machine learning, PMLR, pp 4364–4375

  19. Jain S, Salau AO (2021) Multimodal image fusion employing discrete cosine transform. In: 2021 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), IEEE, pp 5–8

  20. Karunasekera H, Wang H, Zhang H (2019) Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access 7:104423–104434

    Article  Google Scholar 

  21. Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704

  22. Kim C, Li F, Rehg JM (2018) Multi-object tracking with neural gating using bilinear lstm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 200–215

  23. Kim C, Fuxin L, Alotaibi M, Rehg JM (2021) Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9553–9562

  24. Kim DY, Vo BN, Vo BT, Jeon M (2019) A labeled random finite set online multi-object tracker for video data. Pattern Recogn 90:377–389

    Article  ADS  Google Scholar 

  25. Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 719–728

  26. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  27. Liu J, Li C, Liang F, Lin C, Sun M, Yan J, Ouyang W, Xu D (2021) Inception convolution with efficient dilation search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11486–11495

  28. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  29. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  30. Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14668–14678

  31. Mahmoudi N, Ahadi SM, Rahmati M (2019) Multi-target tracking using cnn-based features: Cnnmtt. Multimedia Tools and Applications 78(6):7077–7096

    Article  Google Scholar 

  32. Pang B, Li Y, Zhang Y, Li M, Lu C (2020a) Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6308–6318

  33. Pang Y, Li F, Qiao X, Gilman A (2020b) Real-time tracking based on deep feature fusion. Multimedia Tools and Applications 79(37):27229–27255

  34. Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European conference on computer vision, Springer, pp 145–161

  35. Qin W, Du H, Zhang X Ma Z, Ren X, Luo T (2021) Joint prediction and association for deep feature multiple object tracking. In: Journal of Physics: Conference Series, IOP Publishing, p 012021

  36. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28

  37. Salau AO, Jain S, Eneh JN (2021) A review of various image fusion types and transform. Indonesian Journal of Electrical Engineering and Computer Science 24(3):1515–1522

    Article  Google Scholar 

  38. Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online multi-target tracking with strong and weak detections. In: European Conference on Computer Vision, Springer, pp 84–99

  39. Shuai B, Berneshawi A, Li X, Modolo D, Tighe J (2021) Siammot: Siamese multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12372–12382

  40. Song Ym, Jeon M (2016) Online multiple object tracking with the hierarchically adopted gm-phd filter using motion and appearance. In: 2016 IEEE International conference on consumer electronics-Asia (ICCE-Asia), IEEE, pp 1–4

  41. Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119

    Google Scholar 

  42. Takala V, Pietikainen M (2007) Multi-object tracking using color, texture and motion. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–7

  43. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  44. Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: A context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp 797–813

  45. Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10,860–10,869

  46. Wan J, Zhang H, Zhang J, Ding Y, Yang Y, Li Y, Li X (2022) Dsrrtracker: Dynamic search region refinement for attention-based siamese multi-object tracking. arXiv preprint arXiv:2203.10729

  47. Wang L, Xu L, Kim MY, et al (2017) Online multiple object tracking via flow and convolutional features. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3630–3634

  48. Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 13,708–13,715

  49. Wang Z, Zheng L, Liu Y, et al (2020) Towards real-time multi-object tracking. In: European Conference on Computer Vision, Springer, pp 107–122

  50. Wojke N, Bewley A, Paulus D (2017) Simple online and real-time tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 3645–3649

  51. Xing D, Evangeliou N, Tsoukalas A, Tzes A (2022) Siamese transformer pyramid networks for real-time uav tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2139–2148

  52. Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998

  53. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137

  54. Yang M, Jia Y (2016) Temporal dynamic appearance modeling for online multi-person tracking. Comput Vis Image Underst 153:16–28

    Article  Google Scholar 

  55. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature. In: European Conference on Computer Vision, Springer, pp 36–42

  56. Zeng F, Dong B, Wang T, Chen C, Zhang X, Wei Y. Motr: End-to-end multiple-object tracking with transformer. arxiv 2021. arXiv preprint arXiv:2105.03247

  57. Zhang T, Sun R, Wan Y et al (2023) Msffal: Few-shot object detection via multi-scale feature fusion and attentive learning. Sensors 23(7):3609

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  58. Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: Multi-object tracking by associating every detection box. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, Springer, pp 1–21

  59. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490

  60. Zhou Z, Xing J, Zhang M, Hu W (2018) Online multi-target tracking with tensor-based high-order graph matching. In: 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, pp 1809–1814

  61. Zou Z, Huang J, Luo P (2022) Compensation tracker: reprocessing lost object for multi-object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 307–317

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Zhou.

Ethics declarations

Competing interests

The authors state that they have no conflicting financial interests or personal connections that may have influenced the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Chen, J., Wang, D. et al. Multi-object tracking using context-sensitive enhancement via feature fusion. Multimed Tools Appl 83, 19465–19484 (2024). https://doi.org/10.1007/s11042-023-16027-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16027-z

Keywords

Navigation