Skip to main content
Log in

MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Object tracking has achieved impressive performance in computer vision. However, there are many challenges due to complex scenarios in reality. The mainstream trackers mostly locate the object in form of two branches, which limits the ability of trackers to fully mine similarity between template and search region. In this paper, we propose a multi-branch and multi-scale perception object tracking framework based on Siamese Convolutional Neural Networks (MultiBSP), in which the multi-branch tracking framework is established based on the idea of relation mining, and a tower-structured relation network is designed for each branch to learn the non-linear relation function between template and search region. By branch combination, multiple branches can verify their predictions with each other, which is beneficial to robust tracking. Besides, in order to sense the scale and aspect ratio of object in advance, a multi-scale perception module is designed by utilizing the dilated convolutions in five scales, which contributes to the ability of tracker to deal with scale variation. In addition, we propose an information enhancement module that focuses on important features and suppresses unnecessary ones along spatial and channel dimensions. Extensive experiments on six visual tracking benchmarks including OTB100, VOT2018, VOT2019, UAV123, GOT-10k, and LaSOT demonstrate that our MultiBSP can achieve robust tracking and have state-of-the-art performance. Finally, ablation experiments verify the effectiveness of each module and the tracking stability is proved by qualitative and quantitative analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Fu C, He Y, Lin F et al (2020) Robust multi-kernelized correlators for UAV tracking with adaptive context analysis and dynamic weighted filters. Neural Comput Appl 32:12591–12607

    Article  Google Scholar 

  2. Li P, Qin T, Shen S (2018) Stereo vision-based semantic 3D Object and Ego-motion tracking for autonomous driving. In: Proceedings of the European conference on computer vision, pp. 664–679

  3. Wang Z, Yoon S, Park DS (2017) Online adaptive multiple pedestrian tracking in monocular surveillance video. Neural Comput Appl 28:127–141

    Article  Google Scholar 

  4. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  5. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8971–8980

  6. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4277–4286

  7. Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: Proceedings of the European conference on computer vision, pp. 771–787

  8. Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6268–6276

  9. Yang K, He Z, Pei W et al (2021) SiamCorners: siamese corner networks for visual tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2021.3074239

    Article  Google Scholar 

  10. Zhang Z, Liu Y, Wang X, Li B, Hu W (2021) Learn to match: automatic matching network design for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 13339–13348

  11. Kristan M et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision workshops, pp. 3–53

  12. Kristan M et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE international conference on computer vision workshop, pp. 2206–2241

  13. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the European conference on computer vision, pp. 445–461

  14. Huang L, Zhao X, Huang K (2021) GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577

    Article  Google Scholar 

  15. Fan H, Lin L, Yang F et al (2019) LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5369–5378

  16. Marvasti-Zadeh SM, Ghanei-Yakhdan H, Kasaei S (2021) Efficient scale estimation methods using lightweight deep convolutional neural networks for visual tracking. Neural Comput Appl 33:8319–8334

    Article  Google Scholar 

  17. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302

  18. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4655–4664

  19. Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 6181–6190

  20. Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1420–1429

  21. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp. 850–865

  22. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 1781–1789

  23. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  24. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp. 103–119

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105

  26. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 770–778

  27. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE international conference on computer vision, pp. 1328–1338

  28. Dong X, Shen J, Shao L, Porikli F (2020) CLNet: A compact latent network for fast adjusting siamese trackers. In: Proceedings of the European conference on computer vision, pp. 378–395

  29. Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully convolutional onestage object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9626–9635

  30. Law H, Deng J (2018) CornerNet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp. 765–781

  31. Yang Z, Liu S, Hu H, Wang L, Lin S (2019) RepPoints: Point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9656–9665

  32. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6667–6676

  33. Xu Y, Wang Z, Li Z, Ye Y, Yu G (2020) SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12549–12556

  34. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199–1208

  35. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of international conference on learning representations

  36. Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision, pp. 3–19

  37. Zhang Z, Peng H (2019) Deeper and wider siamese networks for realtime visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4586–4595

  38. Yang T, Chan AB (2018) Learning dynamic memory networks for object tracking. In: Proceedings of the European conference on computer vision, pp. 153–169

  39. Li F, Tian C, Zuo W, Zhang L, Yang MH (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4904–4913

  40. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  41. Lin TY, Maire M, Belongie S et al (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision, pp. 740–755

  42. Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7464–7473

  43. Song Y, Ma C, Wu X et al (2018) VITAL: Visual tracking via adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8990–8999

  44. Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6931–6939

  45. Danelljan M, Hager G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 4310–4318

  46. Zhang J, Ma S, Sclaroff S (2014) MEEM: Robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, pp. 188–203

  47. Danelljan M, Hager G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575

    Article  Google Scholar 

  48. Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured siamese network for real-time visual tracking. In: Proceedings of the European conference on computer vision, pp. 355–370

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61671002. The experiments in this paper are conducted on the High Performance Computing Platform of Beihang University and the Supercomputing Platform of School of Mathematical Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyuan Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests in relation to the work in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, J., Yang, X., Li, Z. et al. MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN. Neural Comput & Applic 34, 18787–18803 (2022). https://doi.org/10.1007/s00521-022-07420-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07420-0

Keywords

Navigation