Abstract
With the rapid development of Siamese network based trackers, a set of related methods have produced considerable performance improvement. However, the tracking results are often disturbed due to the background noise from the template image and background distractor objects from the search image. In this paper, we present an elegant background-aware Siamese tracker for online single object visual tracking. Specifically, a new basic tracking framework is firstly proposed to implement the target localization, bounding box regression, and IoU prediction with offline multi-task learning. During the online tracking stage, we design a novel background-aware tracker with two strategies. Firstly, a spatial mask is introduced to reduce the impacts of background noise from the template image. Secondly, we predict a background-aware salient map to discover and suppress the distractor features in the search image. To validate the effectiveness, we conduct extensive experiments and exhaustive comparisons on OTB2013, OTB2015, VOT2019, UAV123, and GOT10k tracking datasets. Experimental results demonstrate that the proposed tracker, dubbed BaSiamIoU, can achieve state-of-the-art performance while running over 50 FPS.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1401–1409
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 6182–6191
Bhat G, Danelljan M, Van Gool L, Timofte R (2020) Know your surroundings: exploiting scene information for object tracking. In: European conference on computer vision. Springer, pp 205–221
Bhat G, Johnander J, Danelljan M, Shahbaz KF, Felsberg M (2018) Unveiling the power of deep tracking. In: European conference on computer vision. Springer, pp 483–498
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2544–2550
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 6667–6676
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4660–4669
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 6638–6646
Danelljan M, Hager G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928
Danelljan M, Hager G, Shahbaz KF, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops. IEEE, pp 58–66
Danelljan M, Hager G, Shahbaz KF, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 4310–4318
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488
Danelljan M, Shahbaz KF, Felsberg M, Van de Weijer J (2014) Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1090–1097
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: European conference on computer vision. Springer, pp 459–474
Fan H, Bai H, Lin L, Yang F, Chu P, Deng G, Yu S, Harshit HM, Liu J, Xu Y, Liao C, Yuan L, Ling H (2021) Lasot: A high-quality large-scale single object tracking benchmark. Int J Comput Vis 129(2):439–461
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 1763–1771
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4834–4843
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 770–778
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, pp 749–765
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Huang L, Zhao X, Huang K (2019) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43:1562–1577
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: European conference on computer vision. Springer, pp 784–799
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 7482–7491
Kiani GH, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 1135–1143
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J.K, Cehovin ZL, Drbohlav O, Lukezic A, Berg A et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops. IEEE, pp 2206–2241
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems-volume 1, NIPS’12. Curran Associates Inc., Red Hook, NY, USA, pp 1097–1105
Lee H, Choi S, Kim Y, Kim C (2019) Bilinear siamese networks with background suppression for visual object tracking. In: BMVC, pp 8
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4282–4291
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 8971–8980
Li D, Porikli F, Wen G, Kuai Y (2019) When correlation filters meet siamese networks for real-time complementary tracking. IEEE Trans Circuits Syst Video Technol 30(2):509–519
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: Gradient-guided network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 6162–6171
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision. Springer, pp 254–265
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2117–2125
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu T, Kong J, Jiang M, Liu C, Gu X, Wang X (2019) Collaborative model with adaptive selection scheme for visual tracking. Int J Mach Learn Cybern 10(2):215–228
Lukezic A, Vojir T, Cehovin ZL, Matas J, Kristan M (2017) Discriminative correlation filter with channel and spatial reliability. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 6309–6318
Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 3074–3082
Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 5388–5396
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision. Springer, pp 445–461
Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: European conference on computer vision. Springer, pp 300–317
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4293–4302
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1420–1429
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 9627–9636
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2805–2813
Wang G, Luo C, Xiong Z, Zeng W (2019) Spm-tracker: series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3643–3652
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4854–4863
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1328–1338
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2411–2418
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp 12549–12556
Zhang K, Zhang L, Liu Q, Zhang D, Yang MH (2014) Fast visual tracking via dense spatio-temporal context learning. In: European conference on computer vision. Springer, pp 127–141
Zhang L, Gonzalez-Garcia A, Weijer J, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 4010–4019
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4591–4600
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2921–2929
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: European conference on computer vision. Springer, pp 101–117
Acknowledgements
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 52127809, 51625501.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tan, K., Xu, TB. & Wei, Z. Online visual tracking via background-aware Siamese networks. Int. J. Mach. Learn. & Cyber. 13, 2825–2842 (2022). https://doi.org/10.1007/s13042-022-01564-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01564-0