MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN

Jiang, Jin; Yang, Xiaoyuan; Li, Zhengze; Shen, Kangqing; Jiang, Fazhen; Ren, Huwei; Li, Yixiao

doi:10.1007/s00521-022-07420-0

MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN

Original Article
Published: 24 June 2022

Volume 34, pages 18787–18803, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jin Jiang¹,
Xiaoyuan Yang¹,
Zhengze Li¹,
Kangqing Shen¹,
Fazhen Jiang²,
Huwei Ren¹ &
…
Yixiao Li¹

425 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Object tracking has achieved impressive performance in computer vision. However, there are many challenges due to complex scenarios in reality. The mainstream trackers mostly locate the object in form of two branches, which limits the ability of trackers to fully mine similarity between template and search region. In this paper, we propose a multi-branch and multi-scale perception object tracking framework based on Siamese Convolutional Neural Networks (MultiBSP), in which the multi-branch tracking framework is established based on the idea of relation mining, and a tower-structured relation network is designed for each branch to learn the non-linear relation function between template and search region. By branch combination, multiple branches can verify their predictions with each other, which is beneficial to robust tracking. Besides, in order to sense the scale and aspect ratio of object in advance, a multi-scale perception module is designed by utilizing the dilated convolutions in five scales, which contributes to the ability of tracker to deal with scale variation. In addition, we propose an information enhancement module that focuses on important features and suppresses unnecessary ones along spatial and channel dimensions. Extensive experiments on six visual tracking benchmarks including OTB100, VOT2018, VOT2019, UAV123, GOT-10k, and LaSOT demonstrate that our MultiBSP can achieve robust tracking and have state-of-the-art performance. Finally, ablation experiments verify the effectiveness of each module and the tracking stability is proved by qualitative and quantitative analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Region Proposal Network Trained by Multi-domain Learning for Visual Object Tracking

Learning bi-grained cross-correlation siamese networks for visual tracking

Article 02 February 2022

Towards real-time object tracking with deep Siamese network and layerwise aggregation

Article 25 January 2021

References

Fu C, He Y, Lin F et al (2020) Robust multi-kernelized correlators for UAV tracking with adaptive context analysis and dynamic weighted filters. Neural Comput Appl 32:12591–12607
Article Google Scholar
Li P, Qin T, Shen S (2018) Stereo vision-based semantic 3D Object and Ego-motion tracking for autonomous driving. In: Proceedings of the European conference on computer vision, pp. 664–679
Wang Z, Yoon S, Park DS (2017) Online adaptive multiple pedestrian tracking in monocular surveillance video. Neural Comput Appl 28:127–141
Article Google Scholar
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4277–4286
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: Proceedings of the European conference on computer vision, pp. 771–787
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6268–6276
Yang K, He Z, Pei W et al (2021) SiamCorners: siamese corner networks for visual tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2021.3074239
Article Google Scholar
Zhang Z, Liu Y, Wang X, Li B, Hu W (2021) Learn to match: automatic matching network design for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 13339–13348
Kristan M et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision workshops, pp. 3–53
Kristan M et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE international conference on computer vision workshop, pp. 2206–2241
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the European conference on computer vision, pp. 445–461
Huang L, Zhao X, Huang K (2021) GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Article Google Scholar
Fan H, Lin L, Yang F et al (2019) LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5369–5378
Marvasti-Zadeh SM, Ghanei-Yakhdan H, Kasaei S (2021) Efficient scale estimation methods using lightweight deep convolutional neural networks for visual tracking. Neural Comput Appl 33:8319–8334
Article Google Scholar
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4655–4664
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 6181–6190
Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1420–1429
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp. 850–865
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 1781–1789
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp. 103–119
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 770–778
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE international conference on computer vision, pp. 1328–1338
Dong X, Shen J, Shao L, Porikli F (2020) CLNet: A compact latent network for fast adjusting siamese trackers. In: Proceedings of the European conference on computer vision, pp. 378–395
Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully convolutional onestage object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9626–9635
Law H, Deng J (2018) CornerNet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp. 765–781
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) RepPoints: Point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9656–9665
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6667–6676
Xu Y, Wang Z, Li Z, Ye Y, Yu G (2020) SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12549–12556
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199–1208
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of international conference on learning representations
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision, pp. 3–19
Zhang Z, Peng H (2019) Deeper and wider siamese networks for realtime visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4586–4595
Yang T, Chan AB (2018) Learning dynamic memory networks for object tracking. In: Proceedings of the European conference on computer vision, pp. 153–169
Li F, Tian C, Zuo W, Zhang L, Yang MH (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4904–4913
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Lin TY, Maire M, Belongie S et al (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision, pp. 740–755
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7464–7473
Song Y, Ma C, Wu X et al (2018) VITAL: Visual tracking via adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8990–8999
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6931–6939
Danelljan M, Hager G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 4310–4318
Zhang J, Ma S, Sclaroff S (2014) MEEM: Robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, pp. 188–203
Danelljan M, Hager G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Article Google Scholar
Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured siamese network for real-time visual tracking. In: Proceedings of the European conference on computer vision, pp. 355–370

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61671002. The experiments in this paper are conducted on the High Performance Computing Platform of Beihang University and the Supercomputing Platform of School of Mathematical Sciences.

Author information

Authors and Affiliations

School of Mathematics Sciences, Beihang University, Beijing, 102206, China
Jin Jiang, Xiaoyuan Yang, Zhengze Li, Kangqing Shen, Huwei Ren & Yixiao Li
School of Cyber Science and Technology, Beihang University, Beijing, 102206, China
Fazhen Jiang

Authors

Jin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengze Li
View author publications
You can also search for this author in PubMed Google Scholar
Kangqing Shen
View author publications
You can also search for this author in PubMed Google Scholar
Fazhen Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Huwei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yixiao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyuan Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests in relation to the work in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, J., Yang, X., Li, Z. et al. MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN. Neural Comput & Applic 34, 18787–18803 (2022). https://doi.org/10.1007/s00521-022-07420-0

Download citation

Received: 19 November 2021
Accepted: 09 May 2022
Published: 24 June 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00521-022-07420-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN

Abstract

Access this article

Similar content being viewed by others

Multi-scale Region Proposal Network Trained by Multi-domain Learning for Visual Object Tracking

Learning bi-grained cross-correlation siamese networks for visual tracking

Towards real-time object tracking with deep Siamese network and layerwise aggregation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN

Abstract

Access this article

Similar content being viewed by others

Multi-scale Region Proposal Network Trained by Multi-domain Learning for Visual Object Tracking

Learning bi-grained cross-correlation siamese networks for visual tracking

Towards real-time object tracking with deep Siamese network and layerwise aggregation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation