Siamese network with transformer and saliency encoder for object tracking

Liu, Lei; Kong, Guangqian; Duan, Xun; Long, Huiyun; Wu, Yun

doi:10.1007/s10489-022-03352-3

Siamese network with transformer and saliency encoder for object tracking

Published: 06 May 2022

Volume 53, pages 2265–2279, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Lei Liu¹,
Guangqian Kong ORCID: orcid.org/0000-0001-6662-2564^1,2,
Xun Duan^1,2,
Huiyun Long^1,2 &
…
Yun Wu^1,2

615 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Siamese network-based tracking algorithms are highly prone to lose semantic information as well as detailed features of objects when applying correlation operations, and they lack global modeling capabilities that make it difficult to track objects in multiple complex scenarios. To address the above problems, this paper proposes a feature repair strategy combining Transformer and saliency encoder to repair the feature loss from correlation operation. We first add a saliency encoder network branch that is parallel with Siamese network to provide more detailed features and semantic information for the regression and classification to reduce the interference from invalid objects. Second, we fuse the correlation response graph with the encoded saliency features and use the encoding part of Transformer to enhance the nonlinear ability of the fused feature graph to capture global contextual information. The integrated and enhanced feature map can effectively optimize the classification and localization capabilities of our algorithm. Finally, the DIoU loss function is used to continuously optimize the generation of bounding boxes during training. The algorithm proposed in this paper achieves advanced performance in experiments on five publicly available datasets, namely, GOT-10k, LaSOT, UAV123, DTB70, and TrackingNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Salient Feature Enhanced Multi-object Tracking with Soft-Sparse Attention in Transformer

A location-aware siamese network for high-speed visual tracking

Article 10 June 2022

Hierarchical attentive Siamese network for real-time visual tracking

Article 21 May 2019

References

Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2019.00441, pp 4282–4291
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Gao X, Hoi SCH, Zhang Y, Zhou J, Wan J, Chen Z, Li J, Zhu J (2017) Sparse online learning of image similarity. ACM Transactions on Intelligent Systems and Technology (TIST) 8(5):1–22
Article Google Scholar
Hanif MS (2019) Patch match networks: Improved two-channel and siamese networks for image patch matching. Pattern Recogn Lett 120:54–61
Article Google Scholar
Liu W, Shen X, Wang C, Zhang Z, Wen C, Li J (2018) H-net: Neural network for cross-domain image patch matching.. In: IJCAI, pp 856–863
Li W, Chen Q, Gu G, Sui X (2021) Object matching between visible and infrared images using a siamese network. Appl Intell, pp 1–13
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr Philip HS (2016) Fully-convolutional siamese networks for object tracking. In: Computer Vision – ECCV 2016 Workshops, Springer, pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813. https://doi.org/10.1109/cvpr.2017.531
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1763–1771. https://doi.org/10.1109/iccv.2017.196
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4834–4843. https://doi.org/10.1109/cvpr.2018.00508
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4854–4863
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 8971–8980. https://doi.org/10.1109/cvpr.2018.00935
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6668–6677. https://doi.org/10.1109/cvpr42600.2020.00670
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6269–6277. https://doi.org/10.1109/cvpr42600.2020.00630
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12549–12556. https://doi.org/10.1609/aaai.v34i07.6944
Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2020) Siamese attentional keypoint network for high performance visual tracking. Knowledge–Based systems 193:105448
Article Google Scholar
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67
Article Google Scholar
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9543–9552
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer Vision – ECCV 2020, Springer, pp 213–229
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 658–666
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12993–13000
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5374–5383. https://doi.org/10.1109/cvpr.2019.00552
Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2957464
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6182–6191. https://doi.org/10.1109/iccv.2019.00628
Danelljan M, Bhat G, Khan FS, Felsberg M (2020) Atom: Accurate tracking by overlap maximization. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00479
Wang G, Luo C, Xiong Z, Zeng W (2019) Spm-tracker: Series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3643–3652. https://doi.org/10.1109/cvpr.2019.00376
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, Springer, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
Zheng L, Tang M, Chen Y, Wang J, Lu H (2020) Learning feature embeddings for discriminant model based tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, Springer, pp 759–775. https://doi.org/10.1007/978-3-030-58555-6_45
Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7183–7192. https://doi.org/10.1109/cvpr42600.2020.00721
Cheng S, Zhong B, Li G, Liu X, Tang Z, Li X, Wang J (2021) Learning to filter: Siamese relation network for robust tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4421–4431
Han W, Dong X, Khan FS, Shao L, Shen J (2021) Learning to fuse asymmetric feature maps in siamese trackers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16570–16580
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Computer Vision – ECCV 2016, Springer, pp 445–461. https://doi.org/10.1007/978-3-319-46448-0_27
Li Y, Fu C, Ding F, Huang Z, Lu G (2020) Autotrack: Towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11923–11932. https://doi.org/10.1109/cvpr42600.2020.01194
Dong X, Shen J, Shao L, Porikli F (2020) Clnet: A compact latent network for fast adjusting siamese trackers. In: Computer Vision – ECCV 2020, Springer, pp 378–395. https://doi.org/10.1007/978-3-030-58565-5_23
Li S, Yeung D-Y (2017) Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. AAAI’17. AAAI Press, pp 4140?–4146
Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 300–317

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China [2018] under Grant No. 61741124, in part by the Science Planning Project of Guizhou Province under Grant No. QKHPTRC[2018]5781 and in part by the Guizhou Province Graduate Research Foundation under Grant No. YJSCXJH[2020]054.

Author information

Authors and Affiliations

College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China
Lei Liu, Guangqian Kong, Xun Duan, Huiyun Long & Yun Wu
State Key Laboratory of Public Big Data, Guiyang, 550025, China
Guangqian Kong, Xun Duan, Huiyun Long & Yun Wu

Authors

Lei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guangqian Kong
View author publications
You can also search for this author in PubMed Google Scholar
Xun Duan
View author publications
You can also search for this author in PubMed Google Scholar
Huiyun Long
View author publications
You can also search for this author in PubMed Google Scholar
Yun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangqian Kong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, L., Kong, G., Duan, X. et al. Siamese network with transformer and saliency encoder for object tracking. Appl Intell 53, 2265–2279 (2023). https://doi.org/10.1007/s10489-022-03352-3

Download citation

Accepted: 07 February 2022
Published: 06 May 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03352-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese network with transformer and saliency encoder for object tracking

Abstract

Access this article

Similar content being viewed by others

Salient Feature Enhanced Multi-object Tracking with Soft-Sparse Attention in Transformer

A location-aware siamese network for high-speed visual tracking

Hierarchical attentive Siamese network for real-time visual tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Siamese network with transformer and saliency encoder for object tracking

Abstract

Access this article

Similar content being viewed by others

Salient Feature Enhanced Multi-object Tracking with Soft-Sparse Attention in Transformer

A location-aware siamese network for high-speed visual tracking

Hierarchical attentive Siamese network for real-time visual tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation