External-attention dual-modality fusion network for RGBT tracking

Yan, Kaixiang; Mei, Jiatian; Zhou, Dongming; Zhou, Lifen

doi:10.1007/s11227-023-05329-6

External-attention dual-modality fusion network for RGBT tracking

Published: 05 May 2023

Volume 79, pages 17020–17041, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Kaixiang Yan¹,
Jiatian Mei¹,
Dongming Zhou¹ &
…
Lifen Zhou^1,2

295 Accesses
Explore all metrics

Abstract

Due to the unique complementarity of RGB and thermal (RGBT) images, RGBT tracking has gradually become a crucial area of research. To achieve robust tracking performance, how to leverage both local and global information becomes a crucial issue for the RGBT tracking. Inspired by external-attention mechanism, we designed an external-attention dual-modality fusion network (EDFNet) equipped with external-attention guided module (EGM). The EGM based on two external memorized units generates the external attention maps that help reallocate the weights according to the correlations. To avoid feature deterioration, EDFNet introduces shortcuts to make detours and adaptively fuses the features from detours and external attention with adaptive weights. Furthermore, considering the difference of RGBT image, we design an asymmetric feature enhancement approach consisting of detailed information guidance (DiG) and structural information enhancement. DiG aims to optimize the detailed and textural features of RGB feature by axial detail optimization. SiE leverages the accumulated-addtion feature to enhance the structural features. Simultaneously, we deploy a loss function named partial weight enhanced loss in EDFNet to accommodate this new architecture. The evaluation results based on RGBT234 and GTOT, respectively, validate that EDFNet achieves a better tracking performance compared with the other trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGBT Tracking via Multi-stage Matching Guidance and Context integration

Article 22 July 2023

Deep Triply Attention Network for RGBT Tracking

Article 07 June 2023

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Availability of data and material

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Huang L, Song K, Wang J, Niu M, Yan Y (2021) Multi-graph fusion and learning for rgbt image saliency detection. IEEE Trans Circuits Syst Video Technol 99:1–1
Google Scholar
Huang L, Song K, Gong A, Liu C, Yan Y (2020) Rgb-t saliency detection via low-rank tensor learning and unified collaborative ranking. IEEE Signal Process Lett 99:1–1
Google Scholar
Song K, Huang L, Gong A, Yan Y (2022) Multiple graph affinity interactive network and a variable illumination dataset for rgbt image salient object detection. IEEE Trans Circuits Syst Video Technol, 1–1. https://doi.org/10.1109/TCSVT.2022.3233131
Li C, Zhao N, Lu Y, Zhu C, Tang J (2017) Weighted sparse representation regularized graph learning for rgb-t object tracking. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1856–1864
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5380–5389
Xu D, Ouyang W, Ricci E, Wang X, Sebe N (2017) Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5363–5371
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4293–4302
Li C, Wu X, Bao Z, Tang J (2017) Regle: spatially regularized graph learning for visual tracking. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 252–260
Li C, Zhu C, Huang Y, Tang J, Wang L (2018) Cross-modal ranking with soft consistency and noisy labels for robust rgb-t tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 808–823
Li C, Zhu C, Zhang J, Luo B, Wu X, Tang J (2018) Learning local-global multi-graph descriptors for rgb-t object tracking. IEEE Trans Circuits Syst Video Technol 29(10):2913–2926
Article Google Scholar
Mei J, Zhou D, Cao J, Nie R, Guo Y (2021) Hdinet: hierarchical dual-sensor interaction network for rgbt tracking. IEEE Sensors J 21(15):16915–16926
Article Google Scholar
Zhu Y, Li C, Tang J, Luo B, Wang L (2021) Rgbt tracking by trident fusion network. IEEE Trans Circuits Syst Video Technol 32(2):579–592
Article Google Scholar
Li C, Wu X, Zhao N, Cao Xn, Tang J (2018) Fusing two-stream convolutional neural networks for rgb-t object tracking. Neurocomputing 281:78–85
Long Li C, Lu A, Hua Zheng A, Tu Z, Tang J (2019) Multi-adapter rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 5915–5926
Zhang X, Ye P, Peng S, Liu J, Gong K, Xiao G (2019) Siamft: An rgb-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access 7:122122–122133
Article Google Scholar
Zhu Y, Li C, Luo B, Tang J, Wang X (2019) Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 465–472
Li YD, Lai HC, Wang LJ, Jia ZH (2022) Multibranch adaptive fusion network for rgbt tracking. IEEE Sens J 22(7):7084–7093. https://doi.org/10.1109/jsen.2022.3154657
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1293–1302
Google Scholar
Lu T, Wang Y, Zhang Y, Jiang J, Wang Z, Xiong Z (2022) Rethinking prior-guided face super-resolution: a new paradigm with facial component prior. IEEE Trans Neural Netw Learn Syst, 301–309
Wang Y, Lu T, Zhang Y, Wang Z, Jiang J, Xiong Z (2022) Faceformer: Aggregating global and local representation for face hallucination. IEEE Trans Circuits Syst Video Technol, 256–264
Lu T, Wang Y, Zhang Y, Wang Y, Wei L, Wang Z, Jiang J (2021) Face hallucination via split-attention in split-attention network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 501–5509
Guo M-H, Liu Z-N, Mu T-J, Hu S-M (2022) Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell, pp 32–43
Tang Z, Xu T, Wu X-J (2022) A survey for deep rgbt tracking. arXiv preprint arXiv:2201.09296
Conaire C, O‘Connor NE, Smeaton A (2008) Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Mach Vis Appl 19(5):483–494
Li C, Sun X, Wang X, Zhang L, Tang J (2017) Grayscale-thermal object tracking via multitask laplacian sparse representation. IEEE Trans Syst Man Cybernet Syst 47(4):673–681
Article Google Scholar
Li C, Cheng H, Hu S, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process 25(12):5743–5756
Article MathSciNet MATH Google Scholar
Fang Z, Ye B, Yuan B, Wang T, Zhong S, Li S, Zheng J (2022) Angle prediction model when the imaging plane is tilted about z-axis. J Supercomput 78(17):18598–18615. https://doi.org/10.1007/s11227-022-04595-0
Article Google Scholar
Li X, Lu R, Liu P, Zhu Z (2022) Graph convolutional networks with hierarchical multi-head attention for aspect-level sentiment classification. J Supercomput 78(13):14846–14865. https://doi.org/10.1007/s11227-022-04480-w
Article Google Scholar
Mittal P, Sharma A, Singh R, Sangaiah AK (2022) On the performance evaluation of object classification models in low altitude aerial data. J Supercomput 78(12):14548–14570. https://doi.org/10.1007/s11227-022-04469-5
Article Google Scholar
Zhu Y, Li C, Tang J, Luo B, Wang L (2021) Rgbt tracking by trident fusion network. IEEE Trans Circuits Syst Video Technol 32(2):579–592
Article Google Scholar
Zhang L, Danelljan M, Gonzalez-Garcia A, van de Weijer J, hahbaz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 324–336
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6182–6191
Liu W, Liu W, Sun Y (2023) Visible-infrared dual-sensor fusion for single object tracking. IEEE Sens J, pp 121–1217
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2010) An image is worth 16x16 words: transformers for image recognition at scale. arxiv 2020. arXiv preprint arXiv:2010.11929, 7538–7546
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 568–578
Zheng M, Gao P, Zhang R, Li K, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. arXiv preprint arXiv:2011.09315, 11286–11301
Choi J, Jin Chang H, Yun S, Fischer T, Demiris Y, Young Choi J (2017) Attentional correlation filter network for adaptive visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4807–4816
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301
Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20(2):393–399
Article Google Scholar
Li C, Liang X, Lu Y, Zhao N, Tang J (2019) Rgb-t object tracking: benchmark and baseline. Pattern Recogn 96:106977–106989
Article Google Scholar
Li C, Zhao N, Lu Y, Zhu C, Tang J (2017) Weighted sparse representation regularized graph learning for rgb-t object tracking. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 1856–1864
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J-K, Cehovin Zajc L, Drbohlav O, Lukezic A, Berg A, et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 10260–10270
Tu Z, Lin C, Zhao W, Li C, Tang J (2021) M5l: multi-modal multi-margin metric learning for rgbt tracking. IEEE Trans Image Process 31:85–98
Article Google Scholar
Xu Q, Mei Y, Liu J, Li C (2021) Multimodal cross-layer bilinear pooling for rgbt tracking. IEEE Trans Multimedia 24:567–580
Article Google Scholar
Lu A, Qian C, Li C, Tang J, Wang L (2022) Duality-gated mutual condition network for rgbt tracking. IEEE Trans Neural Netw Learn Syst, pp 216–224
Xia W, Zhou D, Cao J, Liu Y, Hou R (2022) Cirnet: An improved rgbt tracking via cross-modality interaction and re-identification. Neurocomputing 493:327–339
Article Google Scholar
Feng M, Su J (2022) Learning reliable modal weight with transformer for robust rgbt tracking. Knowl Based Syst 249:108945–108957
Article Google Scholar
Huang Y, Li X, Lu R, Qi N (2023) Rgb-t object tracking via sparse response-consistency discriminative correlation filters. Infrared Phys Technol 128:104509–104523
Article Google Scholar
Xiao X, Xiong X, Meng F, Chen Z (2023) Multi-scale feature interactive fusion network for rgbt tracking. Sensors 23(7):3410–3417
Article Google Scholar
Mei J, Liu Y, Wang C, Zhou D, Nie R, Cao J (2022) Asymmetric global-local mutual integration network for rgbt tracking. IEEE Trans Instrument Measure 71:1–17
Article Google Scholar
Li, C., Liu, L., Lu, A., Ji, Q., Tang, J.: Challenge-aware rgbt tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp 222–237 (2020). Springer
Zhang P, Zhao J, Bo C, Wang D, Lu H, Yang X (2021) Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Trans Image Process 30:3335–3347
Article Google Scholar
Zhang P, Wang D, Lu H, Yang X (2021) Learning adaptive attribute-driven representation for real-time rgb-t tracking. Int J Computer Vis 129:2714–2729
Article Google Scholar
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6638–6646

Download references

Funding

This work was primarily supported by the National Natural Science Foundation of China under Grants 62066047,61966037

Author information

Kaixiang Yan and Jiatian Mei have contributed equally to this work.

Authors and Affiliations

School of Information and Engineering, Yunnan University, Kunming, 650500, Yunnan, China
Kaixiang Yan, Jiatian Mei, Dongming Zhou & Lifen Zhou
College of Information Engineering, QuJing Normal University, Qujing, 530300, Yunnan, China
Lifen Zhou

Authors

Kaixiang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jiatian Mei
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lifen Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KY and JM wrote the main manuscript text and LZ. prepared Figs. 1–5. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dongming Zhou.

Ethics declarations

Conflict of interest

To the best of our knowledge, the named authors have no conflict of interest, financial or otherwise.

Ethical approval

No human or animal experiments are involved in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, K., Mei, J., Zhou, D. et al. External-attention dual-modality fusion network for RGBT tracking. J Supercomput 79, 17020–17041 (2023). https://doi.org/10.1007/s11227-023-05329-6

Download citation

Accepted: 20 April 2023
Published: 05 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11227-023-05329-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

External-attention dual-modality fusion network for RGBT tracking

Abstract

Access this article

Similar content being viewed by others

RGBT Tracking via Multi-stage Matching Guidance and Context integration

Deep Triply Attention Network for RGBT Tracking

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

External-attention dual-modality fusion network for RGBT tracking

Abstract

Access this article

Similar content being viewed by others

RGBT Tracking via Multi-stage Matching Guidance and Context integration

Deep Triply Attention Network for RGBT Tracking

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation