Abstract
Siamese-based single target trackers estimate the position of target in following frames of video. When facing complex scenes, obtaining accurate response map is the key to improve tracking performance. The robustness of most trackers is bad without template update. To solve these issues, a simple yet better tracking network (SiamSYB) is proposed. SiamSYB integrates the attention mechanism and template update module. With adding the attention mechanism, the network is more focus on the target. And the template update module makes network more robust when facing the challenges, including background clutter, similar objects and object deformation. Multi-stage offline training strategy is applied to get more accurate model. SiamSYB is the state-of-the-art tracker on 3 official test datasets, including VOT2016, VOT2019 and OTB100. SiamSYB achieves 0.391 EAO and 0.237 EAO on VOT2016 and VOT2019. SiamSYB achieves 0.853 precision score and 0.642 AUC score on OTB100. The tracking speed of SiamSYB is 90 FPS, which far surpasses the real-time speed of 25 FPS.
Similar content being viewed by others
References
Achanta S, Karthikeyan T, Vinothkanna R (2019) A novel hidden markov model-based adaptive dynamic time warping (HMDTW) gait analysis for identifying physically challenged persons. Soft Comput 23(18):8359–8366
Achanta S D M, Karthikeyan T, Vinoth K R (2020) A wireless IOT system towards gait detection technique using FSR sensor and wearable IOT devices. Int J Intell Unmanned Syst 8(1):43–54
Bao H, Lu Y, Wang Q (2020) Single target tracking via correlation filter and context adaptively. Multimed Tools Appl 79(4):27465–27482
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P H (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1401–1409
Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision
Bolme D S, Beveridge J R, Draper B A, Lui Y M (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6667–6676
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Danelljan M, Hager G, Khan F S, Felsberg M (2016) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision
Danelljan M, Robinson A, Khan F S, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. Springer International Publishing
Fan H, Bai H, Lin L, Yang F, Ling H (2020) LaSOT: a high-quality large-scale single object tracking benchmark. Int J Comput Vis 129:439–461
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277
Guo D, Shao Y, Cui Y, Wang Z, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Hadfield S, Bowden R, Lebeda K (2016) The visual object tracking VOT2016 challenge results
Henriques J F, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7132–7141
Hu Z, Wei Z, Sun H, Yang J, Wei L (2021) Optimization of metal rolling control using soft computing approaches: a review. Arch Comput Methods Eng 28:405–421
Huang L, Zhao X, Huang K (2019) GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43:1562–1577
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J K, Cehovin Zajc L, Drbohlav O, Lukezic A, Berg A et al (2019) The seventh visual object tracking VOT2019 challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2206–2241
Leng X L, Miao X A, Liu T (2021) Using recurrent neural network structure with enhanced multi-head self-attention for sentiment analysis. Multimed Tools Appl 80:12581–12600
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291
Li X, Ma C, Wu B, He Z, Yang M H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft COCO: common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755
Liu F, Yang A (2019) Application of gcForest to visual tracking using UAV image sequences. Multimed Tools Appl 78:27933–27956
Ma C, Yang X, Zhang C, Yang M H (2015) Long-term correlation tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5388–5396
Martin D, Gustav H, Fahad S, Khan M (2017) Felsberg: discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Nam H, Baek M, Han B (2016) Modeling and propagating CNNs in a tree structure for visual tracking
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5296–5305
Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Tao R, Gavves E, Smeulders A W (2016) Siamese instance search for tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1420–1429
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr P H (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2805–2813
Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3119–3127
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4854–4863
Wang Q, Zhang L, Bertinetto L, Hu W, Torr P H (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338
Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) ECA-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wei L, Cui W, Hu Z (2021) 2D MRI image analysis and brain tumor detection using deep learning CNN model LeU-Net. Vis Comput 37:133–142
Woo S, Park J, Lee J Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2411–2418
Yu Y, Xiong Y, Huang W, Scott M R (2020) Deformable Siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6728–6737
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4591–4600
Zhang Z, Peng H (2020) Ocean: object-aware anchor-free tracking. In: Proceedings of the European conference on computer vision, pp 771–787
Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4010–4019
Zhang N, Wu C, Wu Y, Xiong N N (2020) An improved target tracking algorithm and its application in intelligent video surveillance system. Multimed Tools Appl 79:15965–15983
Zhang H, Hu Z, Hao R (2021) Joint information fusion and multi-scale network model for pedestrian detection. Vis Comput 37:2433–2442
Zhao F, Zhang T, Ma C, Tang M, Wang J, Wang X (2020) Siamese attentive graph tracking. In: MM’20: the 28th ACM international conference on multimedia
Zhong W, Jiang L, Zhang T, Ji J, Xiong H (2020) A part-based attention network for person re-identification. Multimed Tools Appl 79 (10):22525–22549
Zhu G, Porikli F, Li H (2015) Tracking randomly moving objects on edge box proposals. Comput Sci 943–951. arXiv:1507.08085v2
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp 101–117
Acknowledgements
This work was supported by National Key Research and Development Program of China (No.2018YFB1702300), National Natural Science Foundation of China (No.62003296), Natural Science Foundation of Hebei (No.F2020203031), Hebei Youth Fund (No.E2018203162).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, L., Xi, Z., Hu, Z. et al. SiamSYB: simple yet better methods to enhance Siamese tracking. Multimed Tools Appl 81, 26245–26264 (2022). https://doi.org/10.1007/s11042-022-12569-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12569-w