Skip to main content
Log in

SiamET: a Siamese based visual tracking network with enhanced templates

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Discriminative correlation filter (DCF) played a dominant role in visual tracking tasks in early years. However, with the recent development of deep learning, the Siamese based networks begin to prevail. Unlike DCF, most Siamese network based tracking methods take the first frame as the reference, while ignoring the information from the subsequent frames. As a result, these methods may fail under unforeseeable situations (e.g. target scale/size changes, variant illuminations, occlusions etc.). Meanwhile, other deep learning based tracking methods learn discriminative filters online, where the training samples are extracted from a few fixed frames with predictable labels. However, these methods have the same limitations as Siamese-based trackers. The training samples are prone to have cumulative errors, which ultimately lead to tracking loss. In this situation, we propose SiamET, a Siamese-based network using Resnet-50 as its backbone with enhanced template module. Different from existing methods, our templates are acquired based on all historical frames. Extensive experiments have been carried out on popular datasets to verify the effectiveness of our method. It turns out that our tracker achieves superior performances than the state-of-the-art methods on 4 challenging benchmarks, including OTB100, VOT2018, VOT2019 and LaSOT. Specifically, we achieve an EAO score of 0.480 on VOT2018 with 31 FPS. Code is available at https://github.com/yu-1238/SiamET

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Yi Wu, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  2. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8971–8980

  3. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional Siamese networks for object tracking. In European Conference on Computer Vision (ECCV), 850–865

  4. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4282–4291

  5. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6668–6677

  6. Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for Siamese trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 4010–4019)

  7. Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In IEEE/CVF International Conference on Computer Vision, ICCV, Seoul, South Korea, pages 6181–6190

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 770–778)

  9. Ma L, Li H, Meng F, Wu Q, Ngan KN (2018) Global and local semantics-preserving based deep hashing for cross-modal retrieval [J]. Neurocomputing, 312(5):49–62

  10. Ma L, Li H, Meng F, Wu Q, Ngan KN (2020) Discriminative deep metric learning for asymmetric discrete hashing [J]. Neurocomputing, 380(7):115–124

  11. Ma L, Li X, Shi* Y, Wu J, Zhang Y (2020) Correlation filtering-based hashing for fine-grained image retrieval [J]. IEEE Signal Processing Letters, 2020, 27:2129–2133

  12. Ma L, Li H, Meng F, Wu Q, Ngan KN (2017) Learning Efficient Binary Codes From High-Level Feature Representations for Multilabel Image Retrieval. IEEE Transactions on Multimedia 19(11), 2545 – 2560

  13. Ma L, Li H, Meng F, Qingbo W, Xu L (2017) Manifold-ranking embedded order preserving hashing for image semantic retrieval [J]. Journal of Visual Communication and Image Representation 44(1):29–39

  14. Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7952–7961

  15. Zhang Z, Peng H (2019) Deeper and wider Siamese networks for real-time visual tracking. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR/Oral)

  16. Li T, Wu P, Ding F, Yang W (2020) Parallel dual networks for visual object tracking [J]. Appl Intell 50:4631–4646

    Article  Google Scholar 

  17. Fan J, Song H, Zhang K, Member, Kang Yang, and Qingshan (2021) Liu feature alignment and aggregation siamese networks for fast visual tracking. IEEE Trans Circuits Systems Video Technol, 31, N. 4, April, 2021

  18. Zeng Y, Zeng B, Yin X, Yang W (2021) SiamPCF: Siamese point regression with coarse-fine classification network for visual tracking [J]. Applied Intelligence, July 31th

  19. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In European Conference on Computer Vision, 2016 October, pp.749–765

  20. Lianga Y, Liua Y, Yana Y, Liming Zhang b, Hanzi Wang (2021) Robust visual tracking via spatio-temporal adaptive and channel selective correlation filters. Pattern Recognition 112: 107738

  21. Dinesh Elayaperumal, Young Hoon Joo. Aberrance suppressed Spatio-temporal correlation filters for visual object tracking. Pattern Recognition 115 (2021) 107922

  22. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4660–4669

  23. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese Networks for Visual Object Tracking. In European Conference on Computer Vision, 1205–1219

  24. Meng Y, Deng Z, Zhao K, Xu Y, H Liu (2021) Hierarchical correlation Siamese network for real-time object tracking. Applied Intelligence 51:3202–3211

  25. Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) GradNet: Gradient-guided network for visual object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 6162–6171).

  26. Valmadre J, Bertinetto L, Henriques JF, Vedaldi A, Torr P (2017) End-to-end representation learning for correlation filter based tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5000–5008

  27. Fan H, Bai H, Lin L, Yang F, Chu P, Deng G, Ling H (2021) LaSOT: A high-quality large-scale single object tracking benchmark. Int J Comput Vision 129(2):439–461

    Article  Google Scholar 

  28. Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Sun Y (2018) The 6th visual object tracking VOT2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, (pp. 0–0)

  29. Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen JK, Hak Ki, B. (2019) The 7th visual object tracking VOT2019 challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, (pp. 0–0)

  30. Xu T, Feng ZH, Wu XJ, Kittler J (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual tracking. IEEE Transaction on Image Processing 28(11):5596–5609

    Article  MathSciNet  Google Scholar 

  31. Xu T, Feng Z, Wu XJ, Kittler J (2021) Adaptive channel selection for robust visual object tracking with discriminative correlation filters. Int J Comput Vision 2021(129):1359–1375

    Article  Google Scholar 

  32. Jw A, Jja B, Mqa B, Xla B (2021) Towards accurate estimation for visual object tracking with multi-hierarchy feature aggregation. Neurocomputing 2021:252–264

    Google Scholar 

  33. Sun C, Wang D, Lu H, Yang MH (2018) Correlation tracking via joint discrimination and reliability learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:489–497

  34. Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. European Conference on Computer Vision, 493–509

  35. Wang Q, Zhang L, Bertinetto L, Hu W, Torr P (2019) Fast online object tracking and segmentation: a unifying approach. Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 1328–1338

  36. Wang G, Luo C, Xiong Z, Zeng W (2019) SPM-tracker: Series-parallel matching for real-time visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3643–3652)

  37. Zhang Z, Xie Y, Xing F, McGough M, Yang L (2017) Mdnet: A semantically and visually interpretable medical image diagnosis network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.6428–6436

  38. Cui Z, Lu N (2021) Feature selection accelerated convolutional neural networks for visual tracking. Appl Intell 51:8230–8244

    Article  Google Scholar 

  39. Gao L, Liu B, Fu P, Xu M, Li J (2021) Visual tracking via dynamic saliency discriminative correlation filter. Applied Intelligence, August 24th, 2021

  40. Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR):IEEE, 6931–6939

  41. Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Yang MH (2018) Vital: Visual tracking via adversarial learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, 8990–8999

  42. Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured Siamese network for real-time visual tracking. In Proceedings of the European conference on computer vision (ECCV), 2018, 351–366

  43. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic Siamese network for visual object tracking. In Proceedings of the IEEE international Conference on Computer Vision 1763–1771

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhang.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Zhang, Y. SiamET: a Siamese based visual tracking network with enhanced templates. Appl Intell 52, 9782–9794 (2022). https://doi.org/10.1007/s10489-021-03057-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03057-z

Keywords

Navigation