research-article

Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance

Authors:

Jin TangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 3129 - 3137

https://doi.org/10.1145/3581783.3612341

Published: 27 October 2023 Publication History

Abstract

RGB and thermal infrared (TIR) data have different visual properties, which make their fusion essential for effective object tracking in diverse environments and scenes. Existing RGBT tracking methods commonly use attention mechanisms to generate reliability weights for multi-modal feature fusion. However, without explicit supervision, these weights may be unreliably estimated, especially in complex scenarios. To address this problem, we propose a novel Quality-Aware RGBT Tracker (QAT) for robust RGBT tracking. QAT learns reliable weights for each modality in a supervised manner and performs weighted residual guidance to extract and leverage useful features from both modalities. We address the issue of the lack of labels for reliability learning by designing an efficient three-branch network that generates reliable pseudo labels, and a simple binary classification scheme that estimates high-accuracy reliability weights, mitigating the effect of noisy pseudo labels. To propagate useful features between modalities while reducing the influence of noisy modal features on the migrated information, we design a weighted residual guidance module based on the estimated weights and residual connections. We evaluate our proposed QAT on five benchmark datasets, including GTOT, RGBT210, RGBT234, LasHeR, and VTUAV, and demonstrate its excellent performance compared to state-of-the-art methods. Experimental results show that QAT outperforms existing RGBT tracking methods in various challenging scenarios, demonstrating its efficacy in improving the reliability and accuracy of RGBT tracking.

References

[1]

Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 850--865.

[2]

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision. 6182--6191.

[3]

Zhiyuan Cheng, Andong Lu, Zhang Zhang, Chenglong Li, and Liang Wang. 2022. Fusion Tree Network for RGBT Tracking. In IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--8.

[4]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4660--4669.

[5]

Berat A Erol, Abhijit Majumdar, Jonathan Lwowski, Patrick Benavidez, Paul Rad, and Mo Jamshidi. 2018. Improved deep neural network object tracking system for applications in home robotics. Computational Intelligence for Pattern Recognition (2018), 369--395.

[6]

Yuan Gao, Chenglong Li, Yabin Zhu, Jin Tang, Tao He, and Futian Wang. 2019. Deep adaptive fusion network for high performance rgbt tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.

[7]

Shuyang Gu, Jianmin Bao, Dong Chen, and Fang Wen. 2020. Giqa: Generated image quality assessment. In Proceedings of the European Conference on Computer Vision. Springer, 369--385.

Digital Library

[8]

Jiayi Guo, Chaoqun Du, Jiangshan Wang, Huijuan Huang, Pengfei Wan, and Gao Huang. 2022. Assessing a Single Image in Reference-Guided Image Synthesis. In Proceedings of the AAAI Conference on Artificial Intelligence.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[10]

Ruichao Hou, Tongwei Ren, and Gangshan Wu. 2022. MIRNet: A Robust RGBT Tracking Jointly with Multi-Modal Interaction and Refinement. In International Conference on Multimedia and Expo. IEEE, 1--6.

[11]

Ilchae Jung, Jeany Son, Mooyeol Baek, and Bohyung Han. 2018. Real-time mdnet. In Proceedings of the European Conference on Computer Vision. 83--98.

Digital Library

[12]

Le Kang, Peng Ye, Yi Li, and David Doermann. 2014. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1733--1740.

Digital Library

[13]

Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4282--4291.

[14]

Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing 25, 12 (2016), 5743--5756.

Digital Library

[15]

Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. 2019. RGB-T object tracking: benchmark and baseline. Pattern Recognition 96 (2019), 106977.

Digital Library

[16]

Chenglong Li, Lei Liu, Andong Lu, Qing Ji, and Jin Tang. 2020. Challenge-aware rgbt tracking. In Proceedings of the European Conference on Computer Vision. Springer, 222--237.

Digital Library

[17]

Chenglong Li, Wanlin Xue, Yaqing Jia, Zhichen Qu, Bin Luo, Jin Tang, and Dengdi Sun. 2021. LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. IEEE Transactions on Image Processing (2021).

[18]

Chenglong Li, Nan Zhao, Yijuan Lu, Chengli Zhu, and Jin Tang. 2017. Weighted sparse representation regularized graph learning for rgb-t object tracking. In Proceedings of the ACM International Conference on Multimedia. 1856--1864.

Digital Library

[19]

Tsung-Jung Liu, Kuan-Hsien Liu, Hsin-Hua Liu, and Soo-Chang Pei. 2016. Age es-timation via fusion of multiple binary age grouping systems. In IEEE International Conference on Image Processing. IEEE, 609--613.

[20]

Xialei Liu, Joost Van De Weijer, and Andrew D Bagdanov. 2017. Rankiqa: Learning from rankings for no-reference image quality assessment. In Proceedings of the IEEE International Conference on Computer Vision. 1040--1049.

[21]

Andong Lu, Chenglong Li, Yuqing Yan, Jin Tang, and Bin Luo. 2021. RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss. IEEE Transactions on Image Processing (2021).

[22]

Andong Lu, Cun Qian, Chenglong Li, Jin Tang, and Liang Wang. 2022. Duality-gated mutual condition network for RGBT tracking. IEEE Transactions on Neural Networks and Learning Systems (2022).

[23]

Jiatian Mei, Dongming Zhou, Jinde Cao, Rencand Nie, and Kangjian He. 2023. Differential Reinforcement and Global Collaboration Network for RGBT Tracking. IEEE Sensors Journal 23, 7 (2023), 7301--7311.

[24]

Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4293--4302.

[25]

Shipra Ojha and Sachin Sakhare. 2015. Image processing techniques for object tracking in video surveillance-A survey. In Proceedings of the International Conference on Pervasive Computing. IEEE, 1--6.

[26]

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[27]

Akshay Rangesh and Mohan Manubhai Trivedi. 2019. No blind spots: Full-surround multi-object tracking for autonomous vehicles using cameras and lidars. IEEE Transactions on Intelligent Vehicles 4, 4 (2019), 588--599.

[28]

Zhengzheng Tu, Chun Lin, Wei Zhao, Chenglong Li, and Jin Tang. 2021. M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking. IEEE Transactions on Image Processing 31 (2021), 85--98.

Digital Library

[29]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).

[30]

Chaoqun Wang, Chunyan Xu, Zhen Cui, Ling Zhou, Tong Zhang, Xiaoya Zhang, and Jian Yang. 2020. Cross-modal pattern-propagation for RGB-T tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7064--7073.

[31]

Yun Xiao, Mengmeng Yang, Chenglong Li, Lei Liu, and Jin Tang. 2022. Attribute-based Progressive Fusion Network for RGBT Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence.

[32]

Qin Xu, Yiming Mei, Jinpei Liu, and Chenglong Li. 2021. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Transactions on Multimedia (2021).

[33]

Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompt-ing for Multi-Modal Tracking. In Proceedings of the ACM International Conference on Multimedia. 3492--3500.

[34]

Hui Zhang, Lei Zhang, Li Zhuo, and Jing Zhang. 2020. Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20, 2 (2020), 393.

[35]

Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, and Fahad Shahbaz Khan. 2019. Multi-modal fusion for end-to-end rgb-t tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.

[36]

Pengyu Zhang, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2021. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. International Journal of Computer Vision 129, 9 (2021), 2714--2729.

Digital Library

[37]

Pengyu Zhang, Jie Zhao, Chunjuan Bo, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2021. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing 30 (2021), 3335--3347.

Digital Library

[38]

Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, and Xiang Ruan. 2022. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8886--8895.

[39]

Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang, and Xiao Wang. 2019. Dense feature aggregation and pruning for rgbt tracking. In Proceedings of the ACM International Conference on Multimedia. 465--472.

Digital Library

[40]

Yabin Zhu, Chenglong Li, Jin Tang, and Bin Luo. 2021. Quality-aware feature aggregation network for robust rgbt tracking. IEEE Transactions on Intelligent Vehicles 6, 1 (2021), 121--130.

[41]

Yabin Zhu, Chenglong Li, Jin Tang, Bin Luo, and Liang Wang. 2021. RGBT tracking by trident fusion network. IEEE Transactions on Circuits and Systems for Video Technology (2021)

Cited By

Li BPeng FHui TWei XWei XZhang LShi HLiu S(2025)RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template UpdatingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.347547247:1(634-649)Online publication date: Jan-2025
https://doi.org/10.1109/TPAMI.2024.3475472
Zhang JQin YFan SXiao ZZhang J(2025)SiamTFA: Siamese Triple-Stream Feature Aggregation Network for Efficient RGBT TrackingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.351255126:2(1900-1913)Online publication date: Feb-2025
https://doi.org/10.1109/TITS.2024.3512551
Liu LLi CWang FShen LTang J(2025)Prototype-based cross-modal object trackingInformation Fusion10.1016/j.inffus.2025.102941118(102941)Online publication date: Jun-2025
https://doi.org/10.1016/j.inffus.2025.102941
Show More Cited By

Index Terms

Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking

Recommendations

Challenge-Aware RGBT Tracking
Computer Vision – ECCV 2020
Abstract
RGB and thermal source data suffer from both shared and specific challenges, and how to explore and exploit them plays a critical role to represent the target appearance in RGBT tracking. In this paper, we propose a novel challenge-aware neural ...
Dual Siamese network for RGBT tracking via fusing predicted position maps
Abstract
Visual object tracking is a basic task in the field of computer vision. Despite the rapid development of visual object tracking, it is not reliable to use only visible light images for object tracking in some cases. Since visible light and thermal ...
Multi-modal multi-task feature fusion for RGBT tracking
Abstract
RGBT tracking has received more and more attention in recent years, and in this paper, we propose a multi-task auxiliary learning framework for RGBT tracking. Specifically, we simplify the tracking task to an instance classification task and make ...
Highlights
- Fusing knowledge of multi-modal and multi-task.
- Extracting the multi-task knowledge via an auxiliary learning framework.
- Assisting instance classification task via three auxiliary tasks.
- No additional labels are required for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Major Project for New Generation of AI
Anhui Provincial Colleges Science Foundation for Distinguished Young Scholars
National Natural Science Foundation of China
Natural Science Foundation of Anhui Higher Education Institution
Natural Science Foundation of Anhui Province
University Synergy Innovation Program of Anhui Province

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
260
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)8

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li BPeng FHui TWei XWei XZhang LShi HLiu S(2025)RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template UpdatingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.347547247:1(634-649)Online publication date: Jan-2025
https://doi.org/10.1109/TPAMI.2024.3475472
Zhang JQin YFan SXiao ZZhang J(2025)SiamTFA: Siamese Triple-Stream Feature Aggregation Network for Efficient RGBT TrackingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.351255126:2(1900-1913)Online publication date: Feb-2025
https://doi.org/10.1109/TITS.2024.3512551
Liu LLi CWang FShen LTang J(2025)Prototype-based cross-modal object trackingInformation Fusion10.1016/j.inffus.2025.102941118(102941)Online publication date: Jun-2025
https://doi.org/10.1016/j.inffus.2025.102941
Chen LHuang YLi HZhou ZHe ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681564(1573-1582)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681564
Lu AZhao JLi CXiao YLuo BCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Breaking Modality Gap in RGBT Tracking: Coupled Knowledge DistillationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680878(9291-9300)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680878
Zhang ZWang JLi SJin LWu HZhao JZhang B(2024)Review and Analysis of RGBT Single Object Tracking Methods: A Fusion PerspectiveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365130820:8(1-27)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3651308
Liu LLi CXiao YRuan RFan M(2024)RGBT Tracking via Challenge-Based Appearance Disentanglement and InteractionIEEE Transactions on Image Processing10.1109/TIP.2024.337135533(1753-1767)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3371355
Zhang HYuan DShu XLi ZLiu QChang XHe ZShi G(2024)A Comprehensive Review of RGBT TrackingIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.343609873(1-23)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3436098
Kuai YLi DGao ZYuan MZhang D(2024)Visible–Infrared Dual-Sensor Tracking Based on Transformer via Progressive Feature Enhancement and FusionIEEE Sensors Journal10.1109/JSEN.2024.337299124:9(14519-14528)Online publication date: 1-May-2024
https://doi.org/10.1109/JSEN.2024.3372991
Zhao CMo BLi DWang XZhao JXu J(2024)MCSSAFNet: A multi-scale state-space attention fusion network for RGBT trackingOptics Communications10.1016/j.optcom.2024.131394(131394)Online publication date: Dec-2024
https://doi.org/10.1016/j.optcom.2024.131394
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten