skip to main content
research-article

Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning

Published: 11 December 2023 Publication History

Abstract

RGBT tracking is gaining popularity due to its ability to provide effective tracking results in a variety of weather conditions. However, feature specificity and complementarity have not been fully used in existing models that directly fuse the correlation filtering response, which leads to poor tracker performance. In this article, we propose correlation filters with adaptive modality weight and cross-modality learning (AWCM) ability to solve multimodality tracking tasks. First, we use weighted activation to fuse thermal infrared and visible modalities, and the fusion modality is used as an auxiliary modality to suppress noise and increase the learning ability of shared modal features. Second, we design modal weights through average peak-to-correlation energy coefficients to improve model reliability. Third, we propose consistency in using the fusion modality as an intermediate variable for joint learning consistency, thereby increasing tracker robustness via interactive cross-modal learning. Finally, we use the alternating direction method of multipliers algorithm to produce a closed solution and conduct extensive experiments on the RGBT234, VOT-TIR2019, and GTOT tracking benchmark datasets to demonstrate the superior performance of the proposed AWCM against compared to existing tracking algorithms. The code developed in this study is available at the following website.

References

[1]
David S. Bolme, J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[2]
Stephen Boyd, Neal Parikh, and Eric Chu. 2011. Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers. Now Publishers Inc. 1–122.
[3]
Jongwon Choi, Hyung Jin Chang, Sangdoo Yun, Tobias Fischer, Yiannis Demiris, and Jin Young Choi. 2017. Attentional correlation filter network for adaptive visual tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 4807–4816.
[4]
Binfei Chu, Yiting Lin, Bineng Zhong, Zhenjun Tang, Xianxian Li, and Jing Wang. 2023. Robust Long-term tracking via localizing occluders. ACM Transactions on Multimedia Computing, Communications and Applications 19, 2s (2023), 1–15.
[5]
Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, and Jianhua Li. 2019. Visual tracking via adaptive spatially-regularized correlation filters. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[6]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Eco: Efficient convolution operators for tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[7]
Martin Danelljan, Gustav Häger, Fahad Khan, and Michael Felsberg. 2014. Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference.
[8]
Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision.
[9]
Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 472–488.
[10]
Jonathan Eckstein and Dimitri P. Bertsekas. 1992. On the Douglas –Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55 (1992), 293–318.
[11]
Mingzheng Feng, Kechen Song, Yanyan Wang, Jie Liu, and Yunhui Yan. 2020. Learning discriminative update adaptive spatial-temporal regularized correlation filter for RGB-T tracking. J. Vis. Commun. Image Represent. 72 (2020), 102881.
[12]
Mingzheng Feng and Jianbo Su. 2022. Learning reliable modal weight with transformer for robust RGBT tracking. Knowl. Bas. Syst. (2022).
[13]
Yuan Gao, Chenglong Li, Yabin Zhu, Jin Tang, Tao He, and Futian Wang. 2019. Deep adaptive fusion network for high performance RGBT tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0.
[14]
Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip HS Torr. 2015. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2015), 2096–2109.
[15]
João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2012. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European Conference on Computer Vision (ECCV’12). Springer, 702–715.
[16]
João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2014. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2014), 583–596.
[17]
Bo Huang, Tingfa Xu, Shenwang Jiang, Yiwen Chen, and Yu Bai. 2020. Robust visual tracking via constrained multi-kernel correlation filters. IEEE Trans. Multimedia 22, 11 (2020), 2820–2832.
[18]
Bin Kang, Dong Liang, Junxi Mei, Xiaoyang Tan, Quan Zhou, and Dengyin Zhang. 2022. Robust RGB-T tracking via graph attention-based bilinear pooling. IEEE Trans. Neural Netw. Learn. Syst. (2022), 1–12. DOI:
[19]
Hamed Kiani Galoogahi, Ashton Fagg, and Simon Lucey. 2017. Learning background-aware correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 1135–1143.
[20]
Han-Ul Kim, Dae-Youn Lee, Jae-Young Sim, and Chang-Su Kim. 2015. Sowp: Spatially ordered and weighted patch descriptor for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 3011–3019.
[21]
Matej Kristan, Jiri Matas, Ales Leonardis, Michael Felsberg, Roman Pflugfelder, Joni-Kristian Kamarainen, Luka Cehovin Zajc, Ondrej Drbohlav, Alan Lukezic, Amanda Berg, et al. 2019. The seventh visual object tracking vot2019 challenge results. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0.
[22]
Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, and Huiyu Zhou. 2019. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Trans. Industr. Electr. 66, 12 (2019), 9887–9897.
[23]
Xiangyuan Lan, Mang Ye, Shengping Zhang, and Pong C. Yuen. 2018. Robust collaborative discriminative learning for rgb-infrared tracking. In American Association for Artificial Intelligence. Vol. 32.
[24]
Xiangyuan Lan, Mang Ye, Shengping Zhang, Huiyu Zhou, and Pong C. Yuen. 2020. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recogn. Lett. 130 (2020), 12–20.
[25]
Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 12 (2016), 5743–5756.
[26]
Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. 2019. RGB-T object tracking: Benchmark and baseline. Pattern Recogn. 96 (2019), 106977.
[27]
Chenglong Li, Lei Liu, Andong Lu, Qing Ji, and Jin Tang. 2020. Challenge-aware RGBT tracking. In Proceedings of the European Conference on Computer Vision. Springer, 222–237.
[28]
Chenglong Li, Nan Zhao, Yijuan Lu, Chengli Zhu, and Jin Tang. 2017. Weighted sparse representation regularized graph learning for rgb-t object tracking. In Proceedings of the ACM International Conference on Multimedia. 1856–1864.
[29]
Chenglong Li, Chengli Zhu, Yan Huang, Jin Tang, and Liang Wang. 2018. Cross-modal ranking with soft consistency and noisy labels for robust rgb-t tracking. In Proceedings of the European Conference on Computer Vision. Springer, 222–237.
[30]
Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. 2018. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 4904–4913.
[31]
Yang Li and Jianke Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European Conference on Computer Vision (ECCV’14). Springer, 254–265.
[32]
Huaping Liu and Fuchun Sun. 2012. Fusion tracking in color and infrared images using joint sparse representation. Sci. Chin. Inf. Sci. 55, 3 (2012), 590–599.
[33]
Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, and Hongpeng Wang. 2020. Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans. Multimedia (2020), 2114–2126.
[34]
Cheng Long Li, Andong Lu, Ai Hua Zheng, Zhengzheng Tu, and Jin Tang. 2019. Multi-adapter rgbt tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
[35]
Andong Lu, Chenglong Li, Yuqing Yan, Jin Tang, and B. Luo. 2021. RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans. Image Process. 30 (2021), 5613–5625.
[36]
Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas, and Matej Kristan. 2017. Discriminative correlation filter with channel and spatial reliability. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[37]
Bo Ma, Jianbing Shen, Yangbiao Liu, Hongwei Hu, Ling Shao, and Xuelong Li. 2015. Visual tracking using strong classifier and structural local sparse descriptors. IEEE Transactions on Multimedia 17, 10 (2015), 1818–1828.
[38]
Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 4293–4302.
[39]
Ciarán O’Conaire, Noel E. O’Connor, Eddie Cooke, and Alan F. Smeaton. 2006. Comparison of fusion methods for thermo-visual surveillance tracking. In Proceedings of the 14th International Conference on Information Fusion.
[40]
Weijian Ruan, Jun Chen, Yi Wu, Jinqiao Wang, Chao Liang, Ruimin Hu, and Junjun Jiang. 2018. Multi-correlation filters with triangle-structure constraints for object tracking. IEEE Trans. Multimedia 21, 5 (2018), 1122–1134.
[41]
Jianbing Shen, Yuanpei Liu, Xingping Dong, Xiankai Lu, Fahad Shahbaz Khan, and Steven C. H. Hoi. 2021. Distilled siamese networks for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1–1. DOI:
[42]
Jianbing Shen, Xin Tang, Xingping Dong, and Ling Shao. 2020. Visual object tracking by hierarchical attention siamese network. IEEE Trans. Cybernet. 50, 7 (2020), 3068–3080. DOI:
[43]
Jack Valmadre, Luca Bertinetto, Joao Henriques, Andrea Vedaldi, and Philip HS Torr. 2017. End-to-end representation learning for correlation filter based tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 2805–2813.
[44]
Mengmeng Wang, Yong Liu, and Zeyi Huang. 2017. Large margin object tracking with circulant feature maps. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 4021–4029.
[45]
Ning Wang, Wengang Zhou, Qi Tian, Richang Hong, Meng Wang, and Houqiang Li. 2018. Multi-cue correlation filters for robust visual tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[46]
Yulong Wang, Chenglong Li, and Jin Tang. 2018. Learning soft-consistent correlation filters for RGB-T object tracking. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision.
[47]
Yong Wang, Xian Wei, Xuan Tang, Jingjing Wu, and Jiangxiong Fang. 2022. Response map evaluation for RGBT tracking. Neural Computing and Applications 34, 7 (2022), 5757–5769.
[48]
Yi Wu, Erik Blasch, Genshe Chen, Li Bai, and Haibin Ling. 2011. Multiple source data fusion via sparse representation for robust visual tracking. In Proceedings of the 14th International Conference on Information Fusion.
[49]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[50]
Weidai Xia, Dongming Zhou, Jinde Cao, Yanyu Liu, and Ruichao Hou. 2022. CIRNet: An improved RGBT tracking via cross-modality interaction and re-identification. Neurocomputing 493 (2022), 327–339. DOI:
[51]
Yun Xiao, Mengmeng Yang, Chenglong Li, Lei Liu, and Jin Tang. 2022. Attribute-based progressive fusion network for rgbt tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2831–2838.
[52]
Qin Xu, Yiming Mei, Jinpei Liu, and Chenglong Li. 2021. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Trans. Multimedia (2021), 567–580.
[53]
T. Xu, Z. Feng, X. J. Wu, and J. Kittler. 2021. Adaptive channel selection for robust visual object tracking with discriminative correlation filters. Int. J. Comput. Vis.8 (2021), 1359–1375.
[54]
Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, and Josef Kittler. 2019. Joint group feature selection and discriminative filter learning for robust visual object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’19). 7949–7959. DOI:
[55]
Kai Yang, Zhenyu He, Wenjie Pei, Zikun Zhou, Xin Li, Di Yuan, and Haijun Zhang. 2021. SiamCorners: Siamese corner networks for visual tracking. IEEE Trans. Multimedia 24 (2021), 1956–1967.
[56]
Rui Yang, Yabin Zhu, Xiao Wang, Chenglong Li, and Jin Tang. 2019. Learning target-oriented dual attention for robust rgb-t tracking. In Proceedings of the IEEE International Conference on Image Processing.
[57]
Rui Yao, Shixiong Xia, Zhen Zhang, and Yanning Zhang. 2016. Real-time correlation filter tracking by efficient dense belief propagation with structure preserving. IEEE Trans. Multimedia 19, 4 (2016), 772–784.
[58]
Di Yuan, Xiaojun Chang, Po-Yao Huang, Qiao Liu, and Zhenyu He. 2021. Self-supervised deep correlation tracking. IEEE Trans. Image Process. 30 (2021), 976–985. DOI:
[59]
Di Yuan, Xiaojun Chang, Zhihui Li, and Zhenyu He. 2022. Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans. Multimedia Comput. Commun. Appl. 18, 3, Article 70 (Mar.2022), 18 pages. DOI:
[60]
Di Yuan, Wei Kang, and Zhenyu He. 2020. Robust visual tracking with correlation filters and metric learning. Knowl.-Bas. Syst. 195 (2020), 105697. DOI:
[61]
Di Yuan, Xiu Shu, and Zhenyu He. 2020. TRBACF: Learning temporal regularized correlation filters for high performance online visual object tracking. J. Vis. Commun. Image Represent. 72 (2020), 102882.
[62]
Di Yuan, Xiu Shu, Qiao Liu, and Zhenyu He. 2022. Structural target-aware model for thermal infrared tracking. Neurocomputing 491 (2022), 44–56. DOI:
[63]
Sulan Zhai, Pengpeng Shao, Xinyan Liang, and Xin Wang. 2019. Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing 334 (2019), 172–181.
[64]
Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the European Conference on Computer Vision. 188–203
[65]
Zhipeng Zhang and Houwen Peng. 2019. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[66]
Long Zhao, Meng Zhu, Honge Ren, and Lingjixuan Xue. 2021. Channel exchanging for RGB-T tracking. Sensors (2021), 5800.
[67]
Wei Zhong, Huchuan Lu, and Ming-Hsuan Yang. 2012. Robust object tracking via sparsity-based collaborative model. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1838–1845.
[68]
Xue-Feng Zhu, Xiao-Jun Wu, Tianyang Xu, Zhen-Hua Feng, and Josef Kittler. 2021. Complementary discriminative correlation filters based on collaborative representation for visual object tracking. IEEE Trans. Circ. Syst. Vid. Technol. 31, 2 (2021), 557–568. DOI:
[69]
Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang, and Xiao Wang. 2019. Dense feature aggregation and pruning for rgbt tracking. In Proceedings of the ACM International Conference on Multimedia. 465–472.
[70]
Wangmeng Zuo, Xiaohe Wu, Liang Lin, Lei Zhang, and Ming-Hsuan Yang. 2018. Learning support correlation filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41, 5 (2018), 1158–1172.

Cited By

View all
  • (2025)Enhanced YOLOv10 Framework Featuring DPAM and DALSM for Real-Time Underwater Object DetectionIEEE Access10.1109/ACCESS.2025.352731513(8691-8708)Online publication date: 2025
  • (2025)Real-Time Long-Range Object Tracking Based on Ensembled ModelIEEE Access10.1109/ACCESS.2024.351771113(2679-2693)Online publication date: 2025
  • (2025)Estimation of the orientation of potatoes and detection bud eye position using potato orientation detection you only look once with fast and accurate features for the movement strategy of intelligent cutting robotsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109923142:COnline publication date: 15-Feb-2025
  • Show More Cited By

Index Terms

  1. Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 4
    April 2024
    676 pages
    EISSN:1551-6865
    DOI:10.1145/3613617
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 December 2023
    Online AM: 25 October 2023
    Accepted: 21 October 2023
    Revised: 27 August 2023
    Received: 10 April 2023
    Published in TOMM Volume 20, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Correlation filters
    2. RGBT tracking
    3. adaptive modality weight
    4. cross-modality learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Chongqing Talent
    • Joint Equipment Pre Research and Key Fund Project of the Ministry of Education
    • Natural Science Foundation of Chongqing, China
    • Human Resources and Social Security Bureau Project of Chongqing
    • Guangdong Oppo Mobile Telecommunications Corporation Ltd.

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)274
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Enhanced YOLOv10 Framework Featuring DPAM and DALSM for Real-Time Underwater Object DetectionIEEE Access10.1109/ACCESS.2025.352731513(8691-8708)Online publication date: 2025
    • (2025)Real-Time Long-Range Object Tracking Based on Ensembled ModelIEEE Access10.1109/ACCESS.2024.351771113(2679-2693)Online publication date: 2025
    • (2025)Estimation of the orientation of potatoes and detection bud eye position using potato orientation detection you only look once with fast and accurate features for the movement strategy of intelligent cutting robotsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109923142:COnline publication date: 15-Feb-2025
    • (2024)Enhanced Multi-Object Tracking: Inferring Motion States of Tracked ObjectsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3699960Online publication date: 11-Oct-2024
    • (2024)Asymmetric Deformable Spatio-temporal Framework for Infrared Object TrackingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678882Online publication date: 19-Jul-2024
    • (2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3665893Online publication date: 23-May-2024
    • (2024)A new efficient image processing circuit based on a nano-scale median filter and quantum-dot cellular automataInternational Journal of General Systems10.1080/03081079.2024.2429593(1-19)Online publication date: 23-Nov-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media