Siamese transformer RGBT tracking

Wang, Futian; Wang, Wenqi; Liu, Lei; Li, Chenglong; Tang, Jing

doi:10.1007/s10489-023-04741-y

Siamese transformer RGBT tracking

Published: 28 July 2023

Volume 53, pages 24709–24723, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Futian Wang^1,2,
Wenqi Wang¹,
Lei Liu¹,
Chenglong Li¹ &
…
Jing Tang^1,2

613 Accesses
Explore all metrics

Abstract

Siamese-based RGBT trackers have attracted wide attention because of their high efficiency. However, there is a lack of an effective multimodal fusion module and information interaction between the search area and template area, which leads to poor performance of these siamese-based RGBT trackers. To solve this problem, inspire by the global information modeling capability of the transformer, we construct a siamese-based transformer RGBT tracker consisting of a single unified transformer module. Specifically, we propose a unified transformer fusion module to achieve feature extraction and global information interaction in the siamese RGBT tracker, i.e., the interaction between the search area and template area, the interaction between different modalities. It consists of self-attention and cross-attention, which are used to extract features and information interaction respectively. In addition, to alleviate the impact of multimodal fusion on the efficiency of template update in the tracking stage, we propose a feature-level template update strategy, which effectively improves tracking efficiency. To verify the effectiveness of our tracker, we evaluate it on five benchmark datasets including GTOT, RGBT210, RGBT234, LasHeR and VTUAV, and the results show that our tracker achieves excellent performance compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

Article 16 August 2024

Deep Triply Attention Network for RGBT Tracking

Article 07 June 2023

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Zhu Y, Li C, Luo B, Tang J, Wang X (2019) Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the ACM International conference on multimedia, pp 465–472
Gao Y, Li C, Zhu Y, Tang J, He T, Wang F (2019) Deep adaptive fusion network for high performance rgbt tracking. In: Proceedings of the IEEE/CVF International conference on computer vision workshops, pp 0–0
Long Li C, Lu A, Hua Zheng A, Tu Z, Tang J (2019) Multi-adapter rgbt tracking. In: Proceedings of the IEEE/CVF International conference on computer vision workshops, pp 0–0
Li C, Liu L, Lu A, Ji Q, Tang J (2020) Challenge-aware rgbt tracking. In: European conference on computer vision, pp 222–237
Zhang P, Wang D, Lu H, Yang X (2021) Learning adaptive attribute-driven representation for real-time rgb-t tracking. Int J Comput Vis 129(9):2714–2729
Article Google Scholar
Xiao Y, Yang M, Li C, Liu L, Tang J (2022) Attribute-based progressive fusion network for rgbt tracking. National Conference on Artificial Intelligence
Nam H and Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4293–4302
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 6668–6677
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 4282–4291
Lu X, Li F, Zhao Y, Yang W (2022) A robust tracking architecture using tracking failure detection in siamese trackers
Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Applied Intell 51(6):3202–3211
Article Google Scholar
Zhang T, Liu X, Zhang Q, Han J (2021) Siamcda: Complementarity-and distractor-aware rgb-t tracking based on siamese network. IEEE Transactions on Circuits and Systems for Video Technology
He F, Chen M, Chen X, Han J, Bai L (2022) Siamdl: Siamese dual-level fusion attention network for rgbt tracking. Available at SSRN 4209345
Cui Y, Jiang C, Wang L, Wu G (2022) Mixformer: End-to-end tracking with iterative mixed attention. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 13608–13618
Li Y, Yu AW, Meng T, Caine B, Ngiam J, Peng D, Shen J, Lu Y, Zhou D, Le QV, et al (2022) Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 17182–17191
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 1833–1844
Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021)Transformer tracking. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8126–8135
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30
Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 10448–10457
Lin L, Fan H, Zhang Z, Xu Y, Ling H (2022) Swintrack: A simple and strong baseline for transformer tracking. Adv Neural Inf Process Syst 35:16743–16754
Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 10012–10022
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 22–31
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1251–1258
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 770–778
Kingma DP and Ba J (2015) Adam: A method for stochastic optimization. ICLR
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 658–666
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755
Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 300–317
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2020) Lasot: A high-quality benchmark for large-scale single object tracking. In: IEEE/CVF Conference on computer vision and pattern recognition
Huang L, Zhao X, Huang K (2022) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence
Li C, Cheng H, Hu S, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing, pp 5743–5756 . https://doi.org/10.1109/tip.2016.2614135
Li C, Zhao N, Lu Y, Zhu C, Tang J (2017) Weighted sparse representation regularized graph learning for rgb-t object tracking. In: Proceedings of the ACM International conference on multimedia, pp 1856–1864
Li C, Liang X, Lu Y, Zhao N, Tang J (2019) Rgb-t object tracking: Benchmark and baseline. Pattern Recognit 96:06977
Article Google Scholar
Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, Sun D (2021) Lasher: A large-scale high-diversity benchmark for rgbt tracking. IEEE Trans Image Process 31:392–404
Article Google Scholar
Pengyu Z, Zhao J, Wang D, Lu H, Ruan X (2022) Visible-thermal uav tracking: A large-scale benchmark and new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Wang C, Xu C, Cui Z, Zhou L, Zhang T, Zhang X, Yang J (2020) Cross-modal pattern-propagation for rgb-t tracking. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 7064–7073
Zhang P, Zhao J, Bo C, Wang D, Lu H, Yang X (2021) Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Transactions on Image Processing
Lu A, Li C, Yan Y, Tang J, Luo B (2021) Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process 30:5613–5625
Article Google Scholar
Tu Z, Lin C, Zhao W, Li C, Tang J (2022) M5l: Multi-modal multi-margin metric learning for rgbt tracking. IEEE Transactions on Image Processing
Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision, pp 472–488
Lu A, Qian C, Li C, Tang J, Wang L (2022) Duality-gated mutual condition network for rgbt tracking. IEEE Transactions on Neural Networks and Learning Systems
Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20(2):393
Article Google Scholar
Zhang L, Danelljan M, Gonzalez-Garcia A, van de Weijer J, Shahbaz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proceedings of the IEEE/CVF International conference on computer vision workshops
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J-K, Cehovin Zajc L, Drbohlav O, Lukezic A, Berg A, et al (2019) The seventh visual object tracking VOT2019 challenge results. In: Proceedings of the IEEE/CVF International conference on computer vision workshops
Feng M and Su J (2022) Learning reliable modal weight with transformer for robust rgbt tracking. Knowledge-Based Systems, 108945
Zhang L, Gonzalez-Garcia A, Weijer Jvd, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: The IEEE International Conference on Computer Vision (ICCV)

Download references

Acknowledgements

This work is jointly supported by the Natural Science Foundation for the Higher Education Institutions of Anhui Province (No. KJ2021A0044), Hefei Natural Science Foundation (No. HZ22ZK001), the University Synergy Innovation Program of Anhui Province (No. GXXT-2021-038, GXXT-2022-042), and the National Natural Science Foundation of China (No. 62076003)

Author information

Authors and Affiliations

Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
Futian Wang, Wenqi Wang, Lei Liu, Chenglong Li & Jing Tang
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230601, Anhui, China
Futian Wang & Jing Tang

Authors

Futian Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wenqi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Lei Liu
View author publications
You can also search for this author inPubMed Google Scholar
Chenglong Li
View author publications
You can also search for this author inPubMed Google Scholar
Jing Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lei Liu.

Ethics declarations

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, F., Wang, W., Liu, L. et al. Siamese transformer RGBT tracking. Appl Intell 53, 24709–24723 (2023). https://doi.org/10.1007/s10489-023-04741-y

Download citation

Accepted: 26 May 2023
Published: 28 July 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04741-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese transformer RGBT tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

Deep Triply Attention Network for RGBT Tracking

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now