Abstract
Discriminative correlation filter (DCF) played a dominant role in visual tracking tasks in early years. However, with the recent development of deep learning, the Siamese based networks begin to prevail. Unlike DCF, most Siamese network based tracking methods take the first frame as the reference, while ignoring the information from the subsequent frames. As a result, these methods may fail under unforeseeable situations (e.g. target scale/size changes, variant illuminations, occlusions etc.). Meanwhile, other deep learning based tracking methods learn discriminative filters online, where the training samples are extracted from a few fixed frames with predictable labels. However, these methods have the same limitations as Siamese-based trackers. The training samples are prone to have cumulative errors, which ultimately lead to tracking loss. In this situation, we propose SiamET, a Siamese-based network using Resnet-50 as its backbone with enhanced template module. Different from existing methods, our templates are acquired based on all historical frames. Extensive experiments have been carried out on popular datasets to verify the effectiveness of our method. It turns out that our tracker achieves superior performances than the state-of-the-art methods on 4 challenging benchmarks, including OTB100, VOT2018, VOT2019 and LaSOT. Specifically, we achieve an EAO score of 0.480 on VOT2018 with 31 FPS. Code is available at https://github.com/yu-1238/SiamET
Similar content being viewed by others
References
Yi Wu, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8971–8980
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional Siamese networks for object tracking. In European Conference on Computer Vision (ECCV), 850–865
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4282–4291
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6668–6677
Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for Siamese trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 4010–4019)
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In IEEE/CVF International Conference on Computer Vision, ICCV, Seoul, South Korea, pages 6181–6190
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 770–778)
Ma L, Li H, Meng F, Wu Q, Ngan KN (2018) Global and local semantics-preserving based deep hashing for cross-modal retrieval [J]. Neurocomputing, 312(5):49–62
Ma L, Li H, Meng F, Wu Q, Ngan KN (2020) Discriminative deep metric learning for asymmetric discrete hashing [J]. Neurocomputing, 380(7):115–124
Ma L, Li X, Shi* Y, Wu J, Zhang Y (2020) Correlation filtering-based hashing for fine-grained image retrieval [J]. IEEE Signal Processing Letters, 2020, 27:2129–2133
Ma L, Li H, Meng F, Wu Q, Ngan KN (2017) Learning Efficient Binary Codes From High-Level Feature Representations for Multilabel Image Retrieval. IEEE Transactions on Multimedia 19(11), 2545 – 2560
Ma L, Li H, Meng F, Qingbo W, Xu L (2017) Manifold-ranking embedded order preserving hashing for image semantic retrieval [J]. Journal of Visual Communication and Image Representation 44(1):29–39
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7952–7961
Zhang Z, Peng H (2019) Deeper and wider Siamese networks for real-time visual tracking. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR/Oral)
Li T, Wu P, Ding F, Yang W (2020) Parallel dual networks for visual object tracking [J]. Appl Intell 50:4631–4646
Fan J, Song H, Zhang K, Member, Kang Yang, and Qingshan (2021) Liu feature alignment and aggregation siamese networks for fast visual tracking. IEEE Trans Circuits Systems Video Technol, 31, N. 4, April, 2021
Zeng Y, Zeng B, Yin X, Yang W (2021) SiamPCF: Siamese point regression with coarse-fine classification network for visual tracking [J]. Applied Intelligence, July 31th
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In European Conference on Computer Vision, 2016 October, pp.749–765
Lianga Y, Liua Y, Yana Y, Liming Zhang b, Hanzi Wang (2021) Robust visual tracking via spatio-temporal adaptive and channel selective correlation filters. Pattern Recognition 112: 107738
Dinesh Elayaperumal, Young Hoon Joo. Aberrance suppressed Spatio-temporal correlation filters for visual object tracking. Pattern Recognition 115 (2021) 107922
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4660–4669
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese Networks for Visual Object Tracking. In European Conference on Computer Vision, 1205–1219
Meng Y, Deng Z, Zhao K, Xu Y, H Liu (2021) Hierarchical correlation Siamese network for real-time object tracking. Applied Intelligence 51:3202–3211
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) GradNet: Gradient-guided network for visual object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 6162–6171).
Valmadre J, Bertinetto L, Henriques JF, Vedaldi A, Torr P (2017) End-to-end representation learning for correlation filter based tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5000–5008
Fan H, Bai H, Lin L, Yang F, Chu P, Deng G, Ling H (2021) LaSOT: A high-quality large-scale single object tracking benchmark. Int J Comput Vision 129(2):439–461
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Sun Y (2018) The 6th visual object tracking VOT2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, (pp. 0–0)
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen JK, Hak Ki, B. (2019) The 7th visual object tracking VOT2019 challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, (pp. 0–0)
Xu T, Feng ZH, Wu XJ, Kittler J (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual tracking. IEEE Transaction on Image Processing 28(11):5596–5609
Xu T, Feng Z, Wu XJ, Kittler J (2021) Adaptive channel selection for robust visual object tracking with discriminative correlation filters. Int J Comput Vision 2021(129):1359–1375
Jw A, Jja B, Mqa B, Xla B (2021) Towards accurate estimation for visual object tracking with multi-hierarchy feature aggregation. Neurocomputing 2021:252–264
Sun C, Wang D, Lu H, Yang MH (2018) Correlation tracking via joint discrimination and reliability learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:489–497
Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. European Conference on Computer Vision, 493–509
Wang Q, Zhang L, Bertinetto L, Hu W, Torr P (2019) Fast online object tracking and segmentation: a unifying approach. Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 1328–1338
Wang G, Luo C, Xiong Z, Zeng W (2019) SPM-tracker: Series-parallel matching for real-time visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3643–3652)
Zhang Z, Xie Y, Xing F, McGough M, Yang L (2017) Mdnet: A semantically and visually interpretable medical image diagnosis network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.6428–6436
Cui Z, Lu N (2021) Feature selection accelerated convolutional neural networks for visual tracking. Appl Intell 51:8230–8244
Gao L, Liu B, Fu P, Xu M, Li J (2021) Visual tracking via dynamic saliency discriminative correlation filter. Applied Intelligence, August 24th, 2021
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR):IEEE, 6931–6939
Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Yang MH (2018) Vital: Visual tracking via adversarial learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, 8990–8999
Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured Siamese network for real-time visual tracking. In Proceedings of the European conference on computer vision (ECCV), 2018, 351–366
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic Siamese network for visual object tracking. In Proceedings of the IEEE international Conference on Computer Vision 1763–1771
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, Y., Zhang, Y. SiamET: a Siamese based visual tracking network with enhanced templates. Appl Intell 52, 9782–9794 (2022). https://doi.org/10.1007/s10489-021-03057-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03057-z