research-article

Unambiguous Object Tracking by Exploiting Target Cues

Authors:

Yan ChenAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 1997 - 2005

https://doi.org/10.1145/3581783.3612240

Published: 27 October 2023 Publication History

Abstract

Siamese tracking exploits the template and the search region features to adaptively locate arbitrary objects in the tracking. A noteworthy issue is that both foreground and background mix in the template, and thus a tracker needs to learn what the target is and which pixels belong to it. However, existing trackers cannot effectively exploit the template information, resulting in a deficiency of target information and causing confusion for the tracker regarding which pixels belong to the target. To alleviate this issue, we propose UTrack, a simple and effective algorithm for unambiguous object tracking. UTrack utilizes long-term contextual information to propagate the appearance state of the target so as to explicitly model the apparent information of the target. Additionally, UTrack can resist the appearance change of the target by leveraging the target cues. Moreover, the proposed method uses the refined template to obtain more detailed information about the target and better understand which pixels belong to the target. Extensive experiments and comparisons with competitive trackers on challenging large-scale benchmarks show that our tracker can achieve state-of-the-art performances with real-time running. In particular, UTrack achieves 77.7% AO on GOT-10k.

References

[1]

Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2016. Fully-Convolutional Siamese Networks for Object Tracking. In ECCV Workshops (2) (Lecture Notes in Computer Science, Vol. 9914). 850--865.

[2]

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning Discriminative Model Prediction for Tracking. In ICCV. IEEE, 6181--6190.

[3]

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2020. Know Your Surroundings: Exploiting Scene Information for Object Tracking. In ECCV (23) (Lecture Notes in Computer Science, Vol. 12368). Springer, 205--221.

[4]

Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, and Changhong Fu. 2022. TCTrack: Temporal Contexts for Aerial Tracking. In CVPR. IEEE, 14778--14788.

[5]

Boyu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, and Wanli Ouyang. 2022. Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 375--392.

[6]

Xin Chen, Houwen Peng, DongWang, Huchuan Lu, and Han Hu. 2023. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking. CoRR abs/2304.14394 (2023).

[7]

Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021. Transformer Tracking. In CVPR. Computer Vision Foundation / IEEE, 8126--8135.

[8]

Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. 2022. MixFormer: Endto-End Tracking with Iterative Mixed Attention. In CVPR. IEEE, 13598--13608.

[9]

Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, and Xiaoyun Yang. 2020. High-Performance Long-Term Tracking With Meta-Updater. In CVPR. Computer Vision Foundation / IEEE, 6297--6306.

[10]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. ECO: Efficient Convolution Operators for Tracking. In CVPR. IEEE Computer Society, 6931--6939.

[11]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. ATOM: Accurate Tracking by Overlap Maximization. In CVPR. Computer Vision Foundation / IEEE, 4660--4669.

[12]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR. OpenReview.net.

[13]

Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In CVPR. Computer Vision Foundation / IEEE, 5374--5383.

[14]

Qi Feng, Vitaly Ablavsky, Qinxun Bai, and Stan Sclaroff. 2021. Siamese Natural Language Tracker: Tracking by Natural Language Descriptions With Siamese Trackers. In CVPR. Computer Vision Foundation / IEEE, 5851--5860.

[15]

Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, and Yunhong Wang. 2022. SparseTT: Visual Tracking with Sparse Transformers. In IJCAI. ijcai.org, 905--912.

[16]

Zhihong Fu, Qingjie Liu, Zehua Fu, and Yunhong Wang. 2021. STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks. In CVPR. Computer Vision Foundation / IEEE, 13774--13783.

[17]

Jie Gao, Bineng Zhong, and Yan Chen. 2023. Robust Tracking via Learning Model Update With Unsupervised Anomaly Detection Philosophy. IEEE Trans. Circuits Syst. Video Technol. 33, 5 (2023), 2330--2341.

Digital Library

[18]

Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, and Junsong Yuan. 2022. AiATrack: Attention in Attention for Transformer Visual Tracking. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 146--164.

[19]

Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, and Weiming Hu. 2022. Learning Target-aware Representation for Visual Tracking via Informative Interactions. In IJCAI. ijcai.org, 927--934.

[20]

Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, and Jianbing Shen. 2021. Learning To Fuse Asymmetric Feature Maps in Siamese Trackers. In CVPR. Computer Vision Foundation / IEEE, 16570--16580.

[21]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. 2022. Masked Autoencoders Are Scalable Vision Learners. In CVPR. IEEE, 15979--15988.

[22]

Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2021. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 5 (2021), 1562--1577.

[23]

Hei Law and Jia Deng. 2020. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 128, 3 (2020), 642--656.

[24]

Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. SiamRPN: Evolution of Siamese Visual Tracking With Very Deep Networks. In CVPR. Computer Vision Foundation / IEEE, 4282--4291.

[25]

Chenglong Li, XiaohaoWu, Nan Zhao, Xiaochun Cao, and Jin Tang. 2018. Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing 281 (2018), 78--85.

Digital Library

[26]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV (5) (Lecture Notes in Computer Science, Vol. 8693). Springer, 740--755.

[27]

Ilya Loshchilov and Frank Hutter. 2019. DecoupledWeight Decay Regularization. In ICLR (Poster). OpenReview.net.

[28]

Alan Lukezic, Jiri Matas, and Matej Kristan. 2020. D3S - A Discriminative Single Shot Segmentation Tracker. In CVPR. Computer Vision Foundation / IEEE, 7131--7140.

[29]

Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, and Zhicheng Yan. 2022. Unified Transformer Tracker for Object Tracking. In CVPR. IEEE, 8771--8780.

[30]

Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, and Luc Van Gool. 2022. Transforming Model Prediction for Tracking. In CVPR. IEEE, 8721--8730.

[31]

Christoph Mayer, Martin Danelljan, Danda Pani Paudel, and Luc Van Gool. 2021. Learning Target Candidate Association to Keep Track of What Not to Track. In ICCV. IEEE, 13424--13434.

[32]

Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, and Bernard Ghanem. 2018. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In ECCV (1) (Lecture Notes in Computer Science, Vol. 11205). Springer, 310--327.

[33]

Hyeonseob Nam and Bohyung Han. 2016. Learning Multi-domain Convolutional Neural Networks for Visual Tracking. In CVPR. IEEE Computer Society, 4293--4302.

[34]

Matthieu Paul, Martin Danelljan, Christoph Mayer, and Luc Van Gool. 2022. Robust Visual Tracking by Segmentation. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 571--588.

[35]

Matthieu Paul, Martin Danelljan, Christoph Mayer, and Luc Van Gool. 2022. Robust Visual Tracking by Segmentation. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 571--588.

[36]

Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian D. Reid, and Silvio Savarese. 2019. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In CVPR. Computer Vision Foundation / IEEE, 658--666.

[37]

Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2022. Transformer Tracking with Cyclic Shifting Window Attention. In CVPR. IEEE, 8781--8790.

[38]

Ran Tao, Efstratios Gavves, and Arnold W. M. Smeulders. 2016. Siamese Instance Search for Tracking. In CVPR. IEEE Computer Society, 1420--1429.

[39]

Paul Voigtlaender, Jonathon Luiten, Philip H. S. Torr, and Bastian Leibe. 2020. Siam R-CNN: Visual Tracking by Re-Detection. In CVPR. Computer Vision Foundation / IEEE, 6577--6587.

[40]

Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li. 2021. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. In CVPR. Computer Vision Foundation / IEEE, 1571--1580.

[41]

Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, and Feng Wu. 2021. Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark. In CVPR. Computer Vision Foundation / IEEE, 13763--13773.

[42]

Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1834--1848.

Digital Library

[43]

Fei Xie, Chunyu Wang, Guangting Wang, Yue Cao, Wankou Yang, and Wenjun Zeng. 2022. Correlation-Aware Deep Tracking. In CVPR. IEEE, 8741--8750.

[44]

Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas S. Huang. 2018. YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark. CoRR abs/1809.03327 (2018).

Digital Library

[45]

Bin Yan, Houwen Peng, Jianlong Fu, DongWang, and Huchuan Lu. 2021. Learning Spatio-Temporal Transformer for Visual Tracking. In ICCV. IEEE, 10428--10437.

[46]

Bin Yan, Xinyu Zhang, DongWang, Huchuan Lu, and Xiaoyun Yang. 2021. Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. In CVPR. Computer Vision Foundation / IEEE, 5289--5298.

[47]

Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In ACM Multimedia. ACM, 3492--3500.

[48]

Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2022. Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 341--357.

[49]

Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, and Weiming Hu. 2021. Learn to Match: Automatic Matching Network Design for Visual Tracking. In ICCV. IEEE, 13319--13328.

[50]

Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, and Weiming Hu. 2020. Ocean: Object-Aware Anchor-Free Tracking. In ECCV (21) (Lecture Notes in Computer Science, Vol. 12366). Springer, 771--787.

[51]

Bin Zhao, Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2021. Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos. In ICCV. IEEE, 13536--13546.

Cited By

Xu YLi TZhu BWang FSun F(2024)Siamese Tracking Network with Multi-attention MechanismNeural Processing Letters10.1007/s11063-024-11670-556:5Online publication date: 23-Aug-2024
https://doi.org/10.1007/s11063-024-11670-5

Index Terms

Unambiguous Object Tracking by Exploiting Target Cues
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking

Recommendations

Outdoor Target Tracking and Positioning Based on Fisheye Lens
AICI '09: Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence - Volume 03

Omni-directional vision (omni vision) has been used in many fields because of its advantage of extremely wide view; one way to establish omni vision system is using fisheye lens. Target recognition and tracking is a tough task in computer vision, which ...
Real Time Target Tracking with Pan Tilt Zoom Camera
DICTA '09: Proceedings of the 2009 Digital Image Computing: Techniques and Applications

We present an approach for real-time tracking of a non-rigid target with a moving pan-tilt-zoom (PTZ) camera. The tracking of the object and control of the camera is handled by one computer in real time. The main contribution of the paper is method for ...
Exploiting temporal variation of received radio signal strength for indoor human tracking: poster abstract
IPSN '16: Proceedings of the 15th International Conference on Information Processing in Sensor Networks

Indoor positioning is one of the key technologies for enabling location-aware applications. To the best of our knowledge, existing indoor positioning technologies are aimed at providing very high-accuracy location information, but are hindered by the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
190
Total Downloads

Downloads (Last 12 months)85
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu YLi TZhu BWang FSun F(2024)Siamese Tracking Network with Multi-attention MechanismNeural Processing Letters10.1007/s11063-024-11670-556:5Online publication date: 23-Aug-2024
https://doi.org/10.1007/s11063-024-11670-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten