research-article

GTTrack: Gaussian Transformer Tracker for Visual Tracking

Authors:
Yun Liang

South China Agricultural University, CN

South China Agricultural University, CN

0000-0003-0799-0054
View Profile

,
Fumian Long

South China Agricultural University, CN

South China Agricultural University, CN

0009-0007-0219-9892
View Profile

,
Qiaoqiao Li

South China Agricultural University, CN

South China Agricultural University, CN

0009-0001-7670-1212
View Profile

,
Dong Wang

South China Agricultural University, CN

South China Agricultural University, CN

0000-0002-0421-655X
View Profile

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in AsiaDecember 2023Article No.: 72Pages 1–7https://doi.org/10.1145/3595916.3626446

Published:01 January 2024Publication History

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Pages 1–7

ABSTRACT

Recently, Transformer based visual object tracking methods have achieved impressive advancements and significantly improved tracking performance. Transformer includes two modules of self-attention and cross-attention for those methods. However, it brings up two problems: first, the self-attention only considers the relative relation between elements when establishing global association, which can not highlight the essential areas of the tracked target. Second, the cross-attention only relies on feature similarity to locate the target, where the interference of similar objects is challenging. In this paper, we propose a new transformer tracking method of GTTrack by defining Gaussian Attention (GA) and Adaptive Focusing Module (AFM). The GA leads into Gaussian prior to generate a semantic template with robust object features, in which Gaussian prior pays more attention to the central region of the tracked target. The AFM calculates the similarity between current frame and the template by combining the appearance features and position features. The position features are defined with an adaptive Gaussian prior according to the target area in the previous frame. The introduction of position features enhances the contrast between the tracked target and the similar objects. Extensive experiments also demonstrate that the GTTrack outperforms many state-of-the-art trackers and achieves leading performance. Code will be available.

References

Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016. Fully-convolutional siamese networks for object tracking. In European conference on computer vision. Springer, 850–865.Google ScholarCross Ref
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.Google ScholarDigital Library
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8126–8135.Google ScholarCross Ref
Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, and Rongrong Ji. 2020. Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6668–6677.Google ScholarCross Ref
Martin Danelljan, Luc Van Gool, and Radu Timofte. 2020. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7183–7192.Google ScholarCross Ref
Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5374–5383.Google ScholarCross Ref
Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, and Yunhong Wang. 2022. SparseTT: Visual Tracking with Sparse Transformers. (2022).Google Scholar
Zhihong Fu, Qingjie Liu, Zehua Fu, and Yunhong Wang. 2021. Stmtrack: Template-free visual tracking with space-time memory networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13774–13783.Google ScholarCross Ref
Dongyan Guo, Jun Wang, Ying Cui, Zhenhua Wang, and Shengyong Chen. 2020. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6269–6277.Google ScholarCross Ref
Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2019. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2019), 1562–1577.Google ScholarCross Ref
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4282–4291.Google ScholarCross Ref
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8971–8980.Google ScholarCross Ref
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.Google ScholarCross Ref
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022.Google ScholarCross Ref
Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for uav tracking. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 445–461.Google Scholar
Matthias Muller, Adel Bibi, Silvio Giancola, Salman Alsubaihi, and Bernard Ghanem. 2018. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (ECCV). 300–317.Google ScholarDigital Library
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211–252.Google ScholarDigital Library
Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. 2022. Transformer tracking with cyclic shifting window attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8791–8800.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Paul Voigtlaender, Jonathon Luiten, Philip HS Torr, and Bastian Leibe. 2020. Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6578–6588.Google ScholarCross Ref
Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li. 2021. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1571–1580.Google ScholarCross Ref
Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, and Yun Fu. 2020. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10186–10195.Google ScholarCross Ref
Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and Gang Yu. 2020. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12549–12556.Google ScholarCross Ref
Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, and Huchuan Lu. 2021. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10448–10457.Google ScholarCross Ref
Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, and Hanqing Lu. 2021. High-performance discriminative tracking with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9856–9865.Google ScholarCross Ref
Liang Yun, Li Qiaoqiao, and Long Fumian. 2023. Global Dilated Attention and Target Focusing Network for Robust Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Zhipeng Zhang and Houwen Peng. 2019. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4591–4600.Google ScholarCross Ref

Index Terms

GTTrack: Gaussian Transformer Tracker for Visual Tracking
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking

Recommendations

Exploiting spatial relationships for visual tracking
Abstract
Transformer has been widely applied to visual tracking tasks, and the performance of object tracking keeps getting better since the attention mechanism excels in capturing long-range dependencies. However, conventional attention mechanisms can ...
Highlights
- A novel axial attention mechanism is applied for the first time to the VOT task to mine feature position relationships.
- The axial attention mechanism we designed is distinctly different from previous axial attention mechanisms.
- We ...
Read More
Visual Object Tracking Based on Mean-shift and Particle-Kalman Filter

Even though many algorithms have been developed and many applications of object tracking have been made, object tracking is still considered as a difficult task to accomplish. The existence of several problems such as illumination variation, tracking ...
Read More
Visual object tracking--classical and contemporary approaches

Visual object tracking (VOT) is an important subfield of computer vision. It has widespread application domains, and has been considered as an important part of surveillance and security system. VOA facilitates finding the position of target in image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
December 2023
745 pages
ISBN:9798400702051
DOI:10.1145/3595916
Editors:
Wen-Huang Cheng,
Wei-Ta Chu,
Min-Chun Hu,
Jiaying Liu,
Munchurl Kim,
Wei Zhang
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GTTrack
Transformer
visual object tracking
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate59of204submissions,29%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 42
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

GTTrack: Gaussian Transformer Tracker for Visual Tracking

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting spatial relationships for visual tracking

Visual Object Tracking Based on Mean-shift and Particle-Kalman Filter

Visual object tracking--classical and contemporary approaches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

GTTrack: Gaussian Transformer Tracker for Visual Tracking

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting spatial relationships for visual tracking

Visual Object Tracking Based on Mean-shift and Particle-Kalman Filter

Visual object tracking--classical and contemporary approaches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media