skip to main content
10.1145/3581783.3612240acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Unambiguous Object Tracking by Exploiting Target Cues

Published: 27 October 2023 Publication History

Abstract

Siamese tracking exploits the template and the search region features to adaptively locate arbitrary objects in the tracking. A noteworthy issue is that both foreground and background mix in the template, and thus a tracker needs to learn what the target is and which pixels belong to it. However, existing trackers cannot effectively exploit the template information, resulting in a deficiency of target information and causing confusion for the tracker regarding which pixels belong to the target. To alleviate this issue, we propose UTrack, a simple and effective algorithm for unambiguous object tracking. UTrack utilizes long-term contextual information to propagate the appearance state of the target so as to explicitly model the apparent information of the target. Additionally, UTrack can resist the appearance change of the target by leveraging the target cues. Moreover, the proposed method uses the refined template to obtain more detailed information about the target and better understand which pixels belong to the target. Extensive experiments and comparisons with competitive trackers on challenging large-scale benchmarks show that our tracker can achieve state-of-the-art performances with real-time running. In particular, UTrack achieves 77.7% AO on GOT-10k.

References

[1]
Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2016. Fully-Convolutional Siamese Networks for Object Tracking. In ECCV Workshops (2) (Lecture Notes in Computer Science, Vol. 9914). 850--865.
[2]
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning Discriminative Model Prediction for Tracking. In ICCV. IEEE, 6181--6190.
[3]
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2020. Know Your Surroundings: Exploiting Scene Information for Object Tracking. In ECCV (23) (Lecture Notes in Computer Science, Vol. 12368). Springer, 205--221.
[4]
Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, and Changhong Fu. 2022. TCTrack: Temporal Contexts for Aerial Tracking. In CVPR. IEEE, 14778--14788.
[5]
Boyu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, and Wanli Ouyang. 2022. Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 375--392.
[6]
Xin Chen, Houwen Peng, DongWang, Huchuan Lu, and Han Hu. 2023. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking. CoRR abs/2304.14394 (2023).
[7]
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021. Transformer Tracking. In CVPR. Computer Vision Foundation / IEEE, 8126--8135.
[8]
Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. 2022. MixFormer: Endto-End Tracking with Iterative Mixed Attention. In CVPR. IEEE, 13598--13608.
[9]
Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, and Xiaoyun Yang. 2020. High-Performance Long-Term Tracking With Meta-Updater. In CVPR. Computer Vision Foundation / IEEE, 6297--6306.
[10]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. ECO: Efficient Convolution Operators for Tracking. In CVPR. IEEE Computer Society, 6931--6939.
[11]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. ATOM: Accurate Tracking by Overlap Maximization. In CVPR. Computer Vision Foundation / IEEE, 4660--4669.
[12]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR. OpenReview.net.
[13]
Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In CVPR. Computer Vision Foundation / IEEE, 5374--5383.
[14]
Qi Feng, Vitaly Ablavsky, Qinxun Bai, and Stan Sclaroff. 2021. Siamese Natural Language Tracker: Tracking by Natural Language Descriptions With Siamese Trackers. In CVPR. Computer Vision Foundation / IEEE, 5851--5860.
[15]
Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, and Yunhong Wang. 2022. SparseTT: Visual Tracking with Sparse Transformers. In IJCAI. ijcai.org, 905--912.
[16]
Zhihong Fu, Qingjie Liu, Zehua Fu, and Yunhong Wang. 2021. STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks. In CVPR. Computer Vision Foundation / IEEE, 13774--13783.
[17]
Jie Gao, Bineng Zhong, and Yan Chen. 2023. Robust Tracking via Learning Model Update With Unsupervised Anomaly Detection Philosophy. IEEE Trans. Circuits Syst. Video Technol. 33, 5 (2023), 2330--2341.
[18]
Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, and Junsong Yuan. 2022. AiATrack: Attention in Attention for Transformer Visual Tracking. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 146--164.
[19]
Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, and Weiming Hu. 2022. Learning Target-aware Representation for Visual Tracking via Informative Interactions. In IJCAI. ijcai.org, 927--934.
[20]
Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, and Jianbing Shen. 2021. Learning To Fuse Asymmetric Feature Maps in Siamese Trackers. In CVPR. Computer Vision Foundation / IEEE, 16570--16580.
[21]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. 2022. Masked Autoencoders Are Scalable Vision Learners. In CVPR. IEEE, 15979--15988.
[22]
Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2021. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 5 (2021), 1562--1577.
[23]
Hei Law and Jia Deng. 2020. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 128, 3 (2020), 642--656.
[24]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. SiamRPN: Evolution of Siamese Visual Tracking With Very Deep Networks. In CVPR. Computer Vision Foundation / IEEE, 4282--4291.
[25]
Chenglong Li, XiaohaoWu, Nan Zhao, Xiaochun Cao, and Jin Tang. 2018. Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing 281 (2018), 78--85.
[26]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV (5) (Lecture Notes in Computer Science, Vol. 8693). Springer, 740--755.
[27]
Ilya Loshchilov and Frank Hutter. 2019. DecoupledWeight Decay Regularization. In ICLR (Poster). OpenReview.net.
[28]
Alan Lukezic, Jiri Matas, and Matej Kristan. 2020. D3S - A Discriminative Single Shot Segmentation Tracker. In CVPR. Computer Vision Foundation / IEEE, 7131--7140.
[29]
Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, and Zhicheng Yan. 2022. Unified Transformer Tracker for Object Tracking. In CVPR. IEEE, 8771--8780.
[30]
Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, and Luc Van Gool. 2022. Transforming Model Prediction for Tracking. In CVPR. IEEE, 8721--8730.
[31]
Christoph Mayer, Martin Danelljan, Danda Pani Paudel, and Luc Van Gool. 2021. Learning Target Candidate Association to Keep Track of What Not to Track. In ICCV. IEEE, 13424--13434.
[32]
Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, and Bernard Ghanem. 2018. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In ECCV (1) (Lecture Notes in Computer Science, Vol. 11205). Springer, 310--327.
[33]
Hyeonseob Nam and Bohyung Han. 2016. Learning Multi-domain Convolutional Neural Networks for Visual Tracking. In CVPR. IEEE Computer Society, 4293--4302.
[34]
Matthieu Paul, Martin Danelljan, Christoph Mayer, and Luc Van Gool. 2022. Robust Visual Tracking by Segmentation. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 571--588.
[35]
Matthieu Paul, Martin Danelljan, Christoph Mayer, and Luc Van Gool. 2022. Robust Visual Tracking by Segmentation. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 571--588.
[36]
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian D. Reid, and Silvio Savarese. 2019. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In CVPR. Computer Vision Foundation / IEEE, 658--666.
[37]
Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2022. Transformer Tracking with Cyclic Shifting Window Attention. In CVPR. IEEE, 8781--8790.
[38]
Ran Tao, Efstratios Gavves, and Arnold W. M. Smeulders. 2016. Siamese Instance Search for Tracking. In CVPR. IEEE Computer Society, 1420--1429.
[39]
Paul Voigtlaender, Jonathon Luiten, Philip H. S. Torr, and Bastian Leibe. 2020. Siam R-CNN: Visual Tracking by Re-Detection. In CVPR. Computer Vision Foundation / IEEE, 6577--6587.
[40]
Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li. 2021. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. In CVPR. Computer Vision Foundation / IEEE, 1571--1580.
[41]
Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, and Feng Wu. 2021. Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark. In CVPR. Computer Vision Foundation / IEEE, 13763--13773.
[42]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1834--1848.
[43]
Fei Xie, Chunyu Wang, Guangting Wang, Yue Cao, Wankou Yang, and Wenjun Zeng. 2022. Correlation-Aware Deep Tracking. In CVPR. IEEE, 8741--8750.
[44]
Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas S. Huang. 2018. YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark. CoRR abs/1809.03327 (2018).
[45]
Bin Yan, Houwen Peng, Jianlong Fu, DongWang, and Huchuan Lu. 2021. Learning Spatio-Temporal Transformer for Visual Tracking. In ICCV. IEEE, 10428--10437.
[46]
Bin Yan, Xinyu Zhang, DongWang, Huchuan Lu, and Xiaoyun Yang. 2021. Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. In CVPR. Computer Vision Foundation / IEEE, 5289--5298.
[47]
Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In ACM Multimedia. ACM, 3492--3500.
[48]
Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2022. Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework. In ECCV (22) (Lecture Notes in Computer Science, Vol. 13682). Springer, 341--357.
[49]
Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, and Weiming Hu. 2021. Learn to Match: Automatic Matching Network Design for Visual Tracking. In ICCV. IEEE, 13319--13328.
[50]
Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, and Weiming Hu. 2020. Ocean: Object-Aware Anchor-Free Tracking. In ECCV (21) (Lecture Notes in Computer Science, Vol. 12366). Springer, 771--787.
[51]
Bin Zhao, Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2021. Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos. In ICCV. IEEE, 13536--13546.

Cited By

View all
  • (2024)Siamese Tracking Network with Multi-attention MechanismNeural Processing Letters10.1007/s11063-024-11670-556:5Online publication date: 23-Aug-2024

Index Terms

  1. Unambiguous Object Tracking by Exploiting Target Cues

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. target cues
    2. tracking
    3. unambiguous tracking

    Qualifiers

    • Research-article

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)85
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Siamese Tracking Network with Multi-attention MechanismNeural Processing Letters10.1007/s11063-024-11670-556:5Online publication date: 23-Aug-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media