skip to main content
10.1145/3581783.3612341acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance

Published: 27 October 2023 Publication History

Abstract

RGB and thermal infrared (TIR) data have different visual properties, which make their fusion essential for effective object tracking in diverse environments and scenes. Existing RGBT tracking methods commonly use attention mechanisms to generate reliability weights for multi-modal feature fusion. However, without explicit supervision, these weights may be unreliably estimated, especially in complex scenarios. To address this problem, we propose a novel Quality-Aware RGBT Tracker (QAT) for robust RGBT tracking. QAT learns reliable weights for each modality in a supervised manner and performs weighted residual guidance to extract and leverage useful features from both modalities. We address the issue of the lack of labels for reliability learning by designing an efficient three-branch network that generates reliable pseudo labels, and a simple binary classification scheme that estimates high-accuracy reliability weights, mitigating the effect of noisy pseudo labels. To propagate useful features between modalities while reducing the influence of noisy modal features on the migrated information, we design a weighted residual guidance module based on the estimated weights and residual connections. We evaluate our proposed QAT on five benchmark datasets, including GTOT, RGBT210, RGBT234, LasHeR, and VTUAV, and demonstrate its excellent performance compared to state-of-the-art methods. Experimental results show that QAT outperforms existing RGBT tracking methods in various challenging scenarios, demonstrating its efficacy in improving the reliability and accuracy of RGBT tracking.

References

[1]
Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 850--865.
[2]
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision. 6182--6191.
[3]
Zhiyuan Cheng, Andong Lu, Zhang Zhang, Chenglong Li, and Liang Wang. 2022. Fusion Tree Network for RGBT Tracking. In IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--8.
[4]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4660--4669.
[5]
Berat A Erol, Abhijit Majumdar, Jonathan Lwowski, Patrick Benavidez, Paul Rad, and Mo Jamshidi. 2018. Improved deep neural network object tracking system for applications in home robotics. Computational Intelligence for Pattern Recognition (2018), 369--395.
[6]
Yuan Gao, Chenglong Li, Yabin Zhu, Jin Tang, Tao He, and Futian Wang. 2019. Deep adaptive fusion network for high performance rgbt tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
[7]
Shuyang Gu, Jianmin Bao, Dong Chen, and Fang Wen. 2020. Giqa: Generated image quality assessment. In Proceedings of the European Conference on Computer Vision. Springer, 369--385.
[8]
Jiayi Guo, Chaoqun Du, Jiangshan Wang, Huijuan Huang, Pengfei Wan, and Gao Huang. 2022. Assessing a Single Image in Reference-Guided Image Synthesis. In Proceedings of the AAAI Conference on Artificial Intelligence.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[10]
Ruichao Hou, Tongwei Ren, and Gangshan Wu. 2022. MIRNet: A Robust RGBT Tracking Jointly with Multi-Modal Interaction and Refinement. In International Conference on Multimedia and Expo. IEEE, 1--6.
[11]
Ilchae Jung, Jeany Son, Mooyeol Baek, and Bohyung Han. 2018. Real-time mdnet. In Proceedings of the European Conference on Computer Vision. 83--98.
[12]
Le Kang, Peng Ye, Yi Li, and David Doermann. 2014. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1733--1740.
[13]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4282--4291.
[14]
Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing 25, 12 (2016), 5743--5756.
[15]
Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. 2019. RGB-T object tracking: benchmark and baseline. Pattern Recognition 96 (2019), 106977.
[16]
Chenglong Li, Lei Liu, Andong Lu, Qing Ji, and Jin Tang. 2020. Challenge-aware rgbt tracking. In Proceedings of the European Conference on Computer Vision. Springer, 222--237.
[17]
Chenglong Li, Wanlin Xue, Yaqing Jia, Zhichen Qu, Bin Luo, Jin Tang, and Dengdi Sun. 2021. LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. IEEE Transactions on Image Processing (2021).
[18]
Chenglong Li, Nan Zhao, Yijuan Lu, Chengli Zhu, and Jin Tang. 2017. Weighted sparse representation regularized graph learning for rgb-t object tracking. In Proceedings of the ACM International Conference on Multimedia. 1856--1864.
[19]
Tsung-Jung Liu, Kuan-Hsien Liu, Hsin-Hua Liu, and Soo-Chang Pei. 2016. Age es-timation via fusion of multiple binary age grouping systems. In IEEE International Conference on Image Processing. IEEE, 609--613.
[20]
Xialei Liu, Joost Van De Weijer, and Andrew D Bagdanov. 2017. Rankiqa: Learning from rankings for no-reference image quality assessment. In Proceedings of the IEEE International Conference on Computer Vision. 1040--1049.
[21]
Andong Lu, Chenglong Li, Yuqing Yan, Jin Tang, and Bin Luo. 2021. RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss. IEEE Transactions on Image Processing (2021).
[22]
Andong Lu, Cun Qian, Chenglong Li, Jin Tang, and Liang Wang. 2022. Duality-gated mutual condition network for RGBT tracking. IEEE Transactions on Neural Networks and Learning Systems (2022).
[23]
Jiatian Mei, Dongming Zhou, Jinde Cao, Rencand Nie, and Kangjian He. 2023. Differential Reinforcement and Global Collaboration Network for RGBT Tracking. IEEE Sensors Journal 23, 7 (2023), 7301--7311.
[24]
Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4293--4302.
[25]
Shipra Ojha and Sachin Sakhare. 2015. Image processing techniques for object tracking in video surveillance-A survey. In Proceedings of the International Conference on Pervasive Computing. IEEE, 1--6.
[26]
Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[27]
Akshay Rangesh and Mohan Manubhai Trivedi. 2019. No blind spots: Full-surround multi-object tracking for autonomous vehicles using cameras and lidars. IEEE Transactions on Intelligent Vehicles 4, 4 (2019), 588--599.
[28]
Zhengzheng Tu, Chun Lin, Wei Zhao, Chenglong Li, and Jin Tang. 2021. M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking. IEEE Transactions on Image Processing 31 (2021), 85--98.
[29]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).
[30]
Chaoqun Wang, Chunyan Xu, Zhen Cui, Ling Zhou, Tong Zhang, Xiaoya Zhang, and Jian Yang. 2020. Cross-modal pattern-propagation for RGB-T tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7064--7073.
[31]
Yun Xiao, Mengmeng Yang, Chenglong Li, Lei Liu, and Jin Tang. 2022. Attribute-based Progressive Fusion Network for RGBT Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence.
[32]
Qin Xu, Yiming Mei, Jinpei Liu, and Chenglong Li. 2021. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Transactions on Multimedia (2021).
[33]
Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompt-ing for Multi-Modal Tracking. In Proceedings of the ACM International Conference on Multimedia. 3492--3500.
[34]
Hui Zhang, Lei Zhang, Li Zhuo, and Jing Zhang. 2020. Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20, 2 (2020), 393.
[35]
Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, and Fahad Shahbaz Khan. 2019. Multi-modal fusion for end-to-end rgb-t tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
[36]
Pengyu Zhang, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2021. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. International Journal of Computer Vision 129, 9 (2021), 2714--2729.
[37]
Pengyu Zhang, Jie Zhao, Chunjuan Bo, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2021. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing 30 (2021), 3335--3347.
[38]
Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, and Xiang Ruan. 2022. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8886--8895.
[39]
Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang, and Xiao Wang. 2019. Dense feature aggregation and pruning for rgbt tracking. In Proceedings of the ACM International Conference on Multimedia. 465--472.
[40]
Yabin Zhu, Chenglong Li, Jin Tang, and Bin Luo. 2021. Quality-aware feature aggregation network for robust rgbt tracking. IEEE Transactions on Intelligent Vehicles 6, 1 (2021), 121--130.
[41]
Yabin Zhu, Chenglong Li, Jin Tang, Bin Luo, and Liang Wang. 2021. RGBT tracking by trident fusion network. IEEE Transactions on Circuits and Systems for Video Technology (2021)

Cited By

View all
  • (2025)RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template UpdatingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.347547247:1(634-649)Online publication date: Jan-2025
  • (2025)SiamTFA: Siamese Triple-Stream Feature Aggregation Network for Efficient RGBT TrackingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.351255126:2(1900-1913)Online publication date: Feb-2025
  • (2025)Prototype-based cross-modal object trackingInformation Fusion10.1016/j.inffus.2025.102941118(102941)Online publication date: Jun-2025
  • Show More Cited By

Index Terms

  1. Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. reliability learning
    2. residual connection
    3. rgbt tracking
    4. weighted guidance

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)163
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template UpdatingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.347547247:1(634-649)Online publication date: Jan-2025
    • (2025)SiamTFA: Siamese Triple-Stream Feature Aggregation Network for Efficient RGBT TrackingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.351255126:2(1900-1913)Online publication date: Feb-2025
    • (2025)Prototype-based cross-modal object trackingInformation Fusion10.1016/j.inffus.2025.102941118(102941)Online publication date: Jun-2025
    • (2024)Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681564(1573-1582)Online publication date: 28-Oct-2024
    • (2024)Breaking Modality Gap in RGBT Tracking: Coupled Knowledge DistillationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680878(9291-9300)Online publication date: 28-Oct-2024
    • (2024)Review and Analysis of RGBT Single Object Tracking Methods: A Fusion PerspectiveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365130820:8(1-27)Online publication date: 9-Jul-2024
    • (2024)RGBT Tracking via Challenge-Based Appearance Disentanglement and InteractionIEEE Transactions on Image Processing10.1109/TIP.2024.337135533(1753-1767)Online publication date: 5-Mar-2024
    • (2024)A Comprehensive Review of RGBT TrackingIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.343609873(1-23)Online publication date: 2024
    • (2024)Visible–Infrared Dual-Sensor Tracking Based on Transformer via Progressive Feature Enhancement and FusionIEEE Sensors Journal10.1109/JSEN.2024.337299124:9(14519-14528)Online publication date: 1-May-2024
    • (2024)MCSSAFNet: A multi-scale state-space attention fusion network for RGBT trackingOptics Communications10.1016/j.optcom.2024.131394(131394)Online publication date: Dec-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media