Skip to main content
Log in

RGBT Tracking based on modality feature enhancement

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fusion tracking based on visible and thermal infrared images can boost tracking performance under adverse challenging conditions, such as low illumination and bad weather. Existing RGBT tracking methods mainly focus on estimating the reliability weights of two modalities to achieve effective multi-modal fusion. However, these algorithms fail to significantly enhance the discriminability of multimodal features, including channel-level and spatial-level discriminability, which limits their tracking performance. We propose a novel Modality Feature Enhancement Network for RGBT tracking. Specifically, we design a modality feature enhancement module, which is composed of the channel feature enhancement module and the spatial feature enhancement module. The channel feature enhancement module can adaptively adjust the importance of different channels, which helps to improve the channel discriminability of multimodal features. The spatial feature enhancement module is used to improve the spatial discriminability of multimodal features. By the collaboration of these two modules, our network can effectively tackle partial occlusion challenges. In addition, modality feature enhancement module is parameter-shared between different modalities to explore the advantages of modality-shared cues. To solve the problem of the tracking failure caused by sudden camera motion, we introduce the re-sampling strategy to improve the tracking robustness. Extensive experiments on three RGBT tracking benchmark datasets show that our method is superior to other advanced tracking algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Cai B, Zhang C, Zhixin LI (2017) Tracking Infrared-visible Target with Joint Histogram. J of Guang xi Normal University

  2. Feng M, Song K, Wang Y, Liu J, Yan Y (2020) Learning Discriminative Update Adaptive Spatial-Temporal Regularized Correlation Filter for RGB-T Tracking. J Vis Commun Image Represent, 72:102881

  3. Gao Y, Li CL, Zhu YB, Tang J, He T, Wang FT (2019) Deep adaptive fusion network for high performance rgbt tracking. In: Proc IEEE Int Conf Comput Vis Workshops

  4. He KM, Zhang XY, Ren SQ, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 770–778

  5. He MJ, Zhang J, Shan SG, Liu X, Wu ZQ et al (2022) Locality-Aware Channel-Wise Dropout for Occluded Face Recognition. IEEE Trans Image Process, 31:788–798

    Article  ADS  PubMed  Google Scholar 

  6. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 7132–7141

  7. Kroeger T, Timofte RD, Dai DX, Van Gool L (2016) Fast optical flow using dense inverse search. In: Proceedings of the Computer Vision-ECCV: 14th European Conference, pp 471-488

  8. Laurense VA, Goh JY, Gerdes JC (2017) Path-tracking for autonomous vehicles at the limit of friction. Am Control Conf, p 5586–5 591

  9. Li CL, Liang XY, Lu YJ, Zhao N, Tang J (2019) RGB-T object tracking: Benchmark and baseline. Pattern Recognit, 96:106977

  10. Li CL, Liu L, Lu AD, Ji Q, Tang J (2020) Challenge-aware rgbt tracking. In: Proceedings of the Computer Vision-ECCV: 16th European Conference, pp 222-237

  11. Li CL, Lu AD, Zheng AH, Tu ZZ, Tang J (2019) Multi-Adapter RGBT Tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. https://doi.org/10.1109/ICCVW.2019.00279

  12. Li CL, Xue WL, Jia YQ, Qu ZC, Luo B, Tang J, et al (2021) LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. IEEE Trans Image Process. https://doi.org/10.48550/arXiv.2104.13202

  13. Li CL, Cheng H, Hu SY, Liu XB, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process, 25:5743–5756

    Article  ADS  MathSciNet  Google Scholar 

  14. Li J, Zhang Z, He H (2017) Hierarchical convolutional neural networks for EEG-based emotion recognition. Cogn Comput, 10:1–13

    ADS  Google Scholar 

  15. Li X, Wang W, Hu X, et al (2019) Selective kernel networks. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit. pp 510-519

  16. Li B, Yan JJ, Wu W, Zhu Z, Hu XL (2018) High performance visual tracking with siamese region proposal network. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 8971–8980

  17. Lu AD, Li CL, Yan YQ, Tang J, Luo B (2021) RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss. IEEE Trans Image Process, pp 5613–5625

  18. Lu AD, Qian C, Li CL, Tang J, Wang L (2020). Duality-Gated Mutual Condition Network for RGBT Tracking. https://doi.org/10.1109/TNNLS.2022.3157594

    Article  Google Scholar 

  19. Marriott RT, Romdhani S, Chen L (2021) A 3D GAN for Improved Large-pose Facial Recognition. In:Proc IEEE Conf Comput Vis Pattern Recognit, pp 13445–13455

  20. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4293–4302

  21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1409.1556

  22. Sun P, Zhang RF, Jiang Y, Kong T, Xu CF, Zhan W, et al (2021) Sparse R-CNN: End-to-End Object Detection with Learnable Proposals. In:Proc IEEE Conf Comput Vis Pattern Recognit, p 14449–14458

  23. Tu ZZ, Chun L, Li CL, Tang J, Luo B (2020) M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking. https://doi.org/10.48550/arXiv.2003.07650

  24. Wang CQ, Xu CY, Cui Z, Zhou L, Zhang T, Zhang XY, et al (2020) Cross-modal pattern-propagation for RGB-T tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 7064–7073

  25. Wang K, Wei HL, Chen CB, Cao K (2018) Target Tracking Based on Infrared and Visible Light Fusion. Comput Syst Appl, 27:149–153

    Google Scholar 

  26. Wang Y, Wei X, Tang X, Shen H, Zhang H (2021) Adaptive Fusion CNN Features for RGBT Object Tracking. IEEE Trans Intell Transp Syst, 23:7831–7840

    Article  Google Scholar 

  27. Wang Q, Wu B, Zhu P, et al (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit, p 11534-11542

  28. Xiao Y, Yang MM, Li CL, Liu L, Tang J (2022) Attribute-based Progressive Fusion Network for RGBT Tracking. Proc AAAI Conf Artif Intell, 36:2831–2838

    Google Scholar 

  29. Zhai S, Shao P, Liang X, Wang X (2019) Fast RGB-T Tracking via Cross-Modal Correlation Filters. Neurocomputing 334:172–181

    Article  Google Scholar 

  30. Zhang LC, Danelljan M, Gonzalez-Garcia A, van de Weijer J, Shah-baz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proc IEEE Conf Comput Vis Workshops

  31. Zhang PY, Wang D, Lu H, Yang X (2021) Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int J Comput Vis, pp 1–16

  32. Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20:393

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  33. Zhang PY, Zhao J, Bo CJ, Wang D, Lu HC, Yang XY (2021) Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans Image Process, 30:3335–3347

    Article  ADS  PubMed  Google Scholar 

  34. Zhu YB, Li CL, Luo B, Tang J, Wang X (2019) Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 465–472

  35. Zhu YB, Li CL, Tang J, Luo B (2021) Quality-Aware Feature Aggregation Network for Robust RGBT Tracking. IEEE Trans Intell Veh, 6:121–130

    Article  Google Scholar 

  36. Zhu YB, Li CL, Tang J, Luo B, Wang L (2021) RGBT Tracking by Trident Fusion Network. IEEE Trans Circ Syst Video Technol, 32:579–592

    Article  Google Scholar 

  37. Zhu G, Porikli F, Li HD (2016) Beyond local search: Tracking objects everywhere with instance-specific proposals. In:Proc IEEE Conf Comput Vis Pattern Recognit, pp 943–951

Download references

Acknowledgements

This work is part supported by the Natural Science Research Project of Anhui Education Department (Grant No: KJ2019A0005), the Open Project of School of Mathematical Sciences, Anhui University (Grant No: KF2019A03), the National Natural Science Foundation of China (Grant No: 62076003)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Liu.

Ethics declarations

Ethics Approval and Consent to Participate

This article does not contain any studies with animals performed by any of the authors.

Competing Interests

The authors declare that there is no confict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, S., Wu, Y., Liu, L. et al. RGBT Tracking based on modality feature enhancement. Multimed Tools Appl 83, 29311–29330 (2024). https://doi.org/10.1007/s11042-023-16418-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16418-2

Keywords

Navigation