Abstract
Fusion tracking based on visible and thermal infrared images can boost tracking performance under adverse challenging conditions, such as low illumination and bad weather. Existing RGBT tracking methods mainly focus on estimating the reliability weights of two modalities to achieve effective multi-modal fusion. However, these algorithms fail to significantly enhance the discriminability of multimodal features, including channel-level and spatial-level discriminability, which limits their tracking performance. We propose a novel Modality Feature Enhancement Network for RGBT tracking. Specifically, we design a modality feature enhancement module, which is composed of the channel feature enhancement module and the spatial feature enhancement module. The channel feature enhancement module can adaptively adjust the importance of different channels, which helps to improve the channel discriminability of multimodal features. The spatial feature enhancement module is used to improve the spatial discriminability of multimodal features. By the collaboration of these two modules, our network can effectively tackle partial occlusion challenges. In addition, modality feature enhancement module is parameter-shared between different modalities to explore the advantages of modality-shared cues. To solve the problem of the tracking failure caused by sudden camera motion, we introduce the re-sampling strategy to improve the tracking robustness. Extensive experiments on three RGBT tracking benchmark datasets show that our method is superior to other advanced tracking algorithms.
Similar content being viewed by others
References
Cai B, Zhang C, Zhixin LI (2017) Tracking Infrared-visible Target with Joint Histogram. J of Guang xi Normal University
Feng M, Song K, Wang Y, Liu J, Yan Y (2020) Learning Discriminative Update Adaptive Spatial-Temporal Regularized Correlation Filter for RGB-T Tracking. J Vis Commun Image Represent, 72:102881
Gao Y, Li CL, Zhu YB, Tang J, He T, Wang FT (2019) Deep adaptive fusion network for high performance rgbt tracking. In: Proc IEEE Int Conf Comput Vis Workshops
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 770–778
He MJ, Zhang J, Shan SG, Liu X, Wu ZQ et al (2022) Locality-Aware Channel-Wise Dropout for Occluded Face Recognition. IEEE Trans Image Process, 31:788–798
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 7132–7141
Kroeger T, Timofte RD, Dai DX, Van Gool L (2016) Fast optical flow using dense inverse search. In: Proceedings of the Computer Vision-ECCV: 14th European Conference, pp 471-488
Laurense VA, Goh JY, Gerdes JC (2017) Path-tracking for autonomous vehicles at the limit of friction. Am Control Conf, p 5586–5 591
Li CL, Liang XY, Lu YJ, Zhao N, Tang J (2019) RGB-T object tracking: Benchmark and baseline. Pattern Recognit, 96:106977
Li CL, Liu L, Lu AD, Ji Q, Tang J (2020) Challenge-aware rgbt tracking. In: Proceedings of the Computer Vision-ECCV: 16th European Conference, pp 222-237
Li CL, Lu AD, Zheng AH, Tu ZZ, Tang J (2019) Multi-Adapter RGBT Tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. https://doi.org/10.1109/ICCVW.2019.00279
Li CL, Xue WL, Jia YQ, Qu ZC, Luo B, Tang J, et al (2021) LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. IEEE Trans Image Process. https://doi.org/10.48550/arXiv.2104.13202
Li CL, Cheng H, Hu SY, Liu XB, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process, 25:5743–5756
Li J, Zhang Z, He H (2017) Hierarchical convolutional neural networks for EEG-based emotion recognition. Cogn Comput, 10:1–13
Li X, Wang W, Hu X, et al (2019) Selective kernel networks. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit. pp 510-519
Li B, Yan JJ, Wu W, Zhu Z, Hu XL (2018) High performance visual tracking with siamese region proposal network. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 8971–8980
Lu AD, Li CL, Yan YQ, Tang J, Luo B (2021) RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss. IEEE Trans Image Process, pp 5613–5625
Lu AD, Qian C, Li CL, Tang J, Wang L (2020). Duality-Gated Mutual Condition Network for RGBT Tracking. https://doi.org/10.1109/TNNLS.2022.3157594
Marriott RT, Romdhani S, Chen L (2021) A 3D GAN for Improved Large-pose Facial Recognition. In:Proc IEEE Conf Comput Vis Pattern Recognit, pp 13445–13455
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4293–4302
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.1409.1556
Sun P, Zhang RF, Jiang Y, Kong T, Xu CF, Zhan W, et al (2021) Sparse R-CNN: End-to-End Object Detection with Learnable Proposals. In:Proc IEEE Conf Comput Vis Pattern Recognit, p 14449–14458
Tu ZZ, Chun L, Li CL, Tang J, Luo B (2020) M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking. https://doi.org/10.48550/arXiv.2003.07650
Wang CQ, Xu CY, Cui Z, Zhou L, Zhang T, Zhang XY, et al (2020) Cross-modal pattern-propagation for RGB-T tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 7064–7073
Wang K, Wei HL, Chen CB, Cao K (2018) Target Tracking Based on Infrared and Visible Light Fusion. Comput Syst Appl, 27:149–153
Wang Y, Wei X, Tang X, Shen H, Zhang H (2021) Adaptive Fusion CNN Features for RGBT Object Tracking. IEEE Trans Intell Transp Syst, 23:7831–7840
Wang Q, Wu B, Zhu P, et al (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit, p 11534-11542
Xiao Y, Yang MM, Li CL, Liu L, Tang J (2022) Attribute-based Progressive Fusion Network for RGBT Tracking. Proc AAAI Conf Artif Intell, 36:2831–2838
Zhai S, Shao P, Liang X, Wang X (2019) Fast RGB-T Tracking via Cross-Modal Correlation Filters. Neurocomputing 334:172–181
Zhang LC, Danelljan M, Gonzalez-Garcia A, van de Weijer J, Shah-baz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proc IEEE Conf Comput Vis Workshops
Zhang PY, Wang D, Lu H, Yang X (2021) Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int J Comput Vis, pp 1–16
Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20:393
Zhang PY, Zhao J, Bo CJ, Wang D, Lu HC, Yang XY (2021) Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans Image Process, 30:3335–3347
Zhu YB, Li CL, Luo B, Tang J, Wang X (2019) Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 465–472
Zhu YB, Li CL, Tang J, Luo B (2021) Quality-Aware Feature Aggregation Network for Robust RGBT Tracking. IEEE Trans Intell Veh, 6:121–130
Zhu YB, Li CL, Tang J, Luo B, Wang L (2021) RGBT Tracking by Trident Fusion Network. IEEE Trans Circ Syst Video Technol, 32:579–592
Zhu G, Porikli F, Li HD (2016) Beyond local search: Tracking objects everywhere with instance-specific proposals. In:Proc IEEE Conf Comput Vis Pattern Recognit, pp 943–951
Acknowledgements
This work is part supported by the Natural Science Research Project of Anhui Education Department (Grant No: KJ2019A0005), the Open Project of School of Mathematical Sciences, Anhui University (Grant No: KF2019A03), the National Natural Science Foundation of China (Grant No: 62076003)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval and Consent to Participate
This article does not contain any studies with animals performed by any of the authors.
Competing Interests
The authors declare that there is no confict of interests regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhai, S., Wu, Y., Liu, L. et al. RGBT Tracking based on modality feature enhancement. Multimed Tools Appl 83, 29311–29330 (2024). https://doi.org/10.1007/s11042-023-16418-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16418-2