Deep Triply Attention Network for RGBT Tracking

Yang, Rui; Wang, Xiao; Zhu, Yabin; Tang, Jin

doi:10.1007/s12559-023-10158-z

Deep Triply Attention Network for RGBT Tracking

Published: 07 June 2023

Volume 15, pages 1934–1946, (2023)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Rui Yang^1,2,
Xiao Wang^1,2,
Yabin Zhu ORCID: orcid.org/0000-0002-1000-2750^1,3 &
…
Jin Tang^1,2

270 Accesses
Explore all metrics

Abstract

RGB-Thermal (RGBT) tracking has gained significant attention in the field of computer vision due to its wide range of applications in video surveillance, autonomous driving, and human-computer interaction. This paper focuses on achieving a robust fusion of different modalities for RGBT tracking through attention modeling. We propose an effective triply attentive network for robust RGBT tracking, which consists of a local attention module, a cross-modality co-attention module, and a global attention module. The local attention module enables the tracker to focus on target regions while considering background interference, generated through backpropagation of the score map with respect to the RGB and thermal image pair. To enhance the interaction of different modalities in feature learning, we introduce a co-attention module that selects more discriminative features for both the visible (RGB) and thermal modalities simultaneously. To compensate for the limitations of local sampling, we incorporate a global attention module based on multi-modal information to compute high-quality global proposals. This module not only complements the local search strategy but also re-tracks lost targets when they come back into view. Extensive experiments conducted on three RGBT tracking datasets demonstrate that our proposed method outperforms other RGBT trackers, achieving more competitive results. Specifically, on the LasHeR dataset, the precision rate, normalized precision rate, and success rate reach 57.5%, 51.6%, and 41.0%, respectively. The above state-of-the-art experimental results confirm the effectiveness of our method in exploring the complementary advantages between modalities and achieving robust visual tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

External-attention dual-modality fusion network for RGBT tracking

Article 05 May 2023

RGBT Tracking via Multi-stage Matching Guidance and Context integration

Article 22 July 2023

Data Availability

Data is available on request from the authors.

References

Wang X, Li C, Luo B, Tang J. Sint++: Robust visual tracking via adversarial positive instance generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Cao Y, Ji H, Zhang W, Xue F. Learning spatio-temporal context via hierarchical features for visual tracking. Signal Process Image Commun. 2018;66:50–65.
Article Google Scholar
Qian X, Han L, Wang Y, Ding M. Deep learning assisted robust visual tracking with adaptive particle filtering. Signal Process Image Commun. 2018;60:183–92.
Article Google Scholar
Dong X, Shen J, Wang W, Liu Y, Shao L, Porikli F. Hyperparameter optimization for tracking with continuous deep q-learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Gong L, Mo Z, Zhao S, Song Y. An improved kernelized correlation filter tracking algorithm based on multi-channel memory model. Signal Process Image Commun. 2019;78:200–5.
Article Google Scholar
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH. Fully-convolutional Siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision. 2016.
Zhang Z, Liu Y, Wang X, Li B, Hu W. Learn to match: Automatic matching network design for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 13339–48.
Wang X, Shu X, Zhang Z, Jiang B, Wang Y, Tian Y, Wu F. Towards more flexible and accurate object tracking with natural language: Algorithms and benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 13763–73.
Wang X, Tang J, Luo B, Wang Y, Tian Y, Wu F. Tracking by joint local and global search: A target-aware attention-based approach. IEEE Trans Neural Netw Learn Syst. 2021;1–15. https://doi.org/10.1109/TNNLS.2021.3083933.
Wang X, Chen Z, Tang J, Luo B, Wang Y, Tian Y, Wu F. Dynamic attention guided multi-trajectory analysis for single object tracking. IEEE Trans Circ Syst Vid Technol. 2021.
Kong Q, Tang J, Li C, Wang X, Zhang J. An ensemble of complementary models for deep tracking. Cognit Comput. 2022;14(3):1096–106.
Article Google Scholar
Li C, Cheng H, Hu S, Liu X, Tang J, Lin L. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process. 2016;25(12):5743–56.
Article MathSciNet MATH Google Scholar
Li C, Zhu C, Huang Y, Tang J, Wang L. Cross-modal ranking with soft consistency and noisy labels for robust rgb-t tracking. In: Proceedings of the European Conference on Computer Vision. 2018.
Li C, Wu X, Zhao N, Cao X, Tang J. Fusing two-stream convolutional neural networks for rgb-t object tracking. 2018;281:78–85.
Google Scholar
Li C, Liang X, Lu Y, Zhao N, Tang J. Rgb-t object tracking: Benchmark and baseline. Pattern Recognit. 2019;96:106977.
Article Google Scholar
Yang R, Wang X, Li C, Hu J, Tang J. Rgbt tracking via cross-modality message passing. Neurocomputing. 2021;462:365–75.
Article Google Scholar
Olshausen BA, Anderson CH, Van Essen DC. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci. 1993;13(11):4700–19.
Article Google Scholar
Zhu Y, Li C, Tang J, Luo B. Quality-aware feature aggregation network for robust rgbt tracking. IEEE Trans Intell Vehicles. 2020;6(1):121–30.
Article Google Scholar
Choi J, Jin Chang H, Yun S, Fischer T, Demiris Y, Young Choi J. Attentional correlation filter network for adaptive visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision. 2017.
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S. Learning attentions: Residual attentional Siamese network for high performance online visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Wang X, Li C, Yang R, Zhang T, Tang J, Luo B. Describe and attend to track: Learning natural language guided structural representation and visual attention for object tracking. arXiv:1811.10014 [Preprint]. 2018. Available from: http://arxiv.org/abs/1811.10014.
Wang X, Sun T, Yang R, Luo B. Learning target-aware attention for robust tracking with conditional adversarial network. In: The British Machine Vision Conference. 2019.
Pu S, Song Y, Ma C, Zhang H, Yang M. Deep attentive tracking via reciprocative learning. In: Advances in Neural Information Processing Systems. 2018.
Nguyen D, Okatani T. Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Ma C, Shen C, Dick A, Wu Q, Wang P, van den Hengel A, Reid I. Visual question answering with memory-augmented networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–8.
Guo M-H, Xu T-X, Liu J-J, Liu Z-N, Jiang P-T, Mu T-J, Zhang S-H, Martin RR, Cheng M-M, Hu S-M. Attention mechanisms in computer vision: A survey. Comput Vis Media. 2022;8(3):331–68.
Article Google Scholar
Zhang P, Wang D, Lu H. Multi-modal visual tracking: Review and experimental comparison. arXiv:2012.04176 [Preprint]. 2020. Available from: http://arxiv.org/abs/2012.04176.
Li C, Zhao N, Lu Y, Zhu C, Tang J. Weighted sparse representation regularized graph learning for rgb-t object tracking. In: ACM International Conference on Multimedia. 2017.
Li C, Zhu C, Zheng S, Luo B, Tang J. Two-stage modality-graphs regularized manifold ranking for rgb-t tracking. Signal Process Image Commun. 2018;68:207–17.
Article Google Scholar
Ding M, Yao Y, Wei L, Cao Y. Visual tracking using locality-constrained linear coding and saliency map for visible light and infrared image sequences. Signal Process Image Commun. 2018;68:13–25.
Article Google Scholar
Li S, Bak S, Carr P, Wang X. Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Cui Z, Xiao S, Feng J, Yan S. Recurrently target-attending tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Liang J, Jiang L, Cao L, Li L, Hauptmann AG. Focal visual-text attention for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Lu J, Yang J, Batra D, Parikh D. Hierarchical question-image co-attention for visual question answering. In: Advances In Neural Information Processing Systems. 2016.
Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 7132–41.
Li X, Wang W, Hu X, Yang J. Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 510–9.
Zhu G, Porikli F, Li H. Beyond local search: Tracking objects everywhere with instance-specific proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Jia X, De Brabandere B, Tuytelaars T, Gool LV. Dynamic filter networks. In: Advances in Neural Information Processing Systems. 2016.
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H. LASOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, Sun D. Lasher: A large-scale high-diversity benchmark for rgbt tracking. IEEE Trans Image Process. 2021;31:392–404.
Article Google Scholar
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M. ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Danelljan M, Robinson A, Khan FS, Felsberg M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of the European Conference on Computer Vision. 2016.
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH. Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH. End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Lukezic A, Vojir T, Cehovin Zajc L, Matas J, Kristan M. Discriminative correlation filter with channel and spatial reliability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Kim H, Lee D, Sim J, Kim C. SOWP: Spatially ordered and weighted patch descriptor for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2015.
Zhang J, Ma S, Sclaroff S. Meem: Robust tracking via multiple experts using entropy minimization. In: Proceedings of the European Conference on Computer Vision. 2014.
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M. Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2015.
Zhang Z, Peng H. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Jung I, Son J, Baek M, Han B. Real-time mdnet. In: Proceedings of the European Conference on Computer Vision. 2018.
Long Li C, Lu A, Hua Zheng A, Tu Z, Tang J. Multi-adapter rgbt tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019.
Gao Y, Li C, Zhu Y, Tang J, He T, Wang F. Deep adaptive fusion network for high performance rgbt tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019.
Zhang H, Zhang L, Zhuo L, Zhang J. Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors. 2020;20(2):393.
Article Google Scholar
Li C, Liu L, Lu A, Ji Q, Tang J. Challenge-aware rgbt tracking. In: Proceedings of European Conference on Computer Vision. 2020. p. 222–37.
Lu A, Qian C, Li C, Tang J, Wang L. Duality-gated mutual condition network for rgbt tracking. IEEE Transactions on Neural Networks and Learning Systems. 2022. Early Access.
Lu A, Li C, Yan Y, Tang J, Luo B. Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process. 2021;30:5613–25.
Article Google Scholar
Xiao Y, Yang M, Li C, Liu L, Tang J. Attribute-based progressive fusion network for rgbt tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 2831–8.
Mei J, Zhou D, Cao J, Nie R, He K. Differential reinforcement and global collaboration network for rgbt tracking. IEEE Sens J. 2023. Early Access.

Download references

Funding

This work was supported by the Major Project for New Generation of AI under Grant (No. 2018AAA0100400) and the National Natural Science Foundation of China (No. 62202002, No. 62102205).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, 230601, Anhui, China
Rui Yang, Xiao Wang, Yabin Zhu & Jin Tang
School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
Rui Yang, Xiao Wang & Jin Tang
School of Electronic and Information Engineering, Anhui University, Hefei, 230601, Anhui, China
Yabin Zhu

Authors

Rui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yabin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jin Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yabin Zhu.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Conflict of Interest

Rui Yang is a Master’s graduate from Anhui University and is currently employed at Arm China. Xiao Wang previously served as a postdoctoral fellow at Pengcheng Laboratory and is presently a faculty member at Anhui University. Yabin Zhu is currently a postdoctoral fellow at Anhui University. Jin Tang holds the position of professor at Anhui University. Apart from their affiliations with the mentioned educational institutions, research institutes, and companies, all authors declare no conflicts of interest with external entities.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, R., Wang, X., Zhu, Y. et al. Deep Triply Attention Network for RGBT Tracking. Cogn Comput 15, 1934–1946 (2023). https://doi.org/10.1007/s12559-023-10158-z

Download citation

Received: 15 September 2022
Accepted: 24 May 2023
Published: 07 June 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s12559-023-10158-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Triply Attention Network for RGBT Tracking

Abstract

Access this article

Similar content being viewed by others

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

External-attention dual-modality fusion network for RGBT tracking

RGBT Tracking via Multi-stage Matching Guidance and Context integration

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Triply Attention Network for RGBT Tracking

Abstract

Access this article

Similar content being viewed by others

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

External-attention dual-modality fusion network for RGBT tracking

RGBT Tracking via Multi-stage Matching Guidance and Context integration

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation