skip to main content
10.1145/3573942.3574038acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

Lightweight Transformer Tracker: Compact and Effect Neural Network for Object Tracking with Long-Short Range Attention

Published: 16 May 2023 Publication History

Abstract

Recent years, attention mechanism has been widely used in computer vision, such as object detection, tracking and recognition. Many studies reveal that attention has better performance than those CNN-based or RNN-based networks. However, the cost of better performance, such as complicated structure and complex algorithm, cannot be neglected. In this paper, a lightweight transformer tracker named LTT is proposed. Different from Transformer Tracking network (TransT), three aspects of lightweight operation are adopted: firstly, YOLO-nano Darknet is utilized as the feature extraction network; secondly, the size of template image is scaled to 1/4 of the original and removing the self-attention layer; finally, the combination layer of convolution and cross-attention (long-short range attention) is adopted for the sake of feature fusion. Experiments show that our tracker runs at roughly 70 fps on GPU while there is no significant performance loss compared with other networks like SiamFc or SiamRPN++. Moreover, the model size of our tracker is only 4.58M.

References

[1]
CHEN X, YAN B, ZHU J, Transformer Tracking[M/OL]. arXiv, 2021[2022-06-03]. http://arxiv.org/abs/2103.15436.
[2]
VASWANI A, SHAZEER N, PARMAR N, Attention Is All You Need[J/OL]. CoRR, 2017, abs/1706.03762. http://arxiv.org/abs/1706.03762.
[3]
WU Z, LIU Z, LIN J, Lite Transformer with Long-Short Range Attention[J/OL]. CoRR, 2020, abs/2004.11886. https://arxiv.org/abs/2004.11886.
[4]
GE Z, LIU S, WANG F, YOLOX: Exceeding YOLO Series in 2021[J/OL]. CoRR, 2021, abs/2107.08430. https://arxiv.org/abs/2107.08430.
[5]
HUANG L, ZHAO X, HUANG K. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577. https://doi.org/10.1109/TPAMI.2019.2957464.
[6]
Object Tracking Benchmark | IEEE Journals & Magazine | IEEE Xplore[EB/OL]. [2022-06-03]. https://ieeexplore.ieee.org/document/7001050.
[7]
A Benchmark and Simulator for UAV Tracking | SpringerLink[EB/OL]. [2022-06-02]. https://link.springer.com/chapter/10.1007/978-3-319-46448-0_27.
[8]
XING D, EVANGELIOU N, TSOUKALAS A, Siamese Transformer Pyramid Networks for Real-Time UAV Tracking: arXiv:2110.08822[R/OL]. arXiv, 2021[2022-06-02]. http://arxiv.org/abs/2110.08822.
[9]
WANG N, ZHOU W, WANG J, Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking[J/OL]. CoRR, 2021, abs/2103.11681. https://arxiv.org/abs/2103.11681.
[10]
CARION N, MASSA F, SYNNAEVE G, End-to-End Object Detection with Transformers[M/OL]. arXiv, 2020[2022-08-17]. http://arxiv.org/abs/2005.12872.
[11]
LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking | IEEE Conference Publication | IEEE Xplore[EB/OL]. [2022-06-03]. https://ieeexplore.ieee.org/document/8954084/.
[12]
[1812.11703] SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks[EB/OL]. [2022-06-03]. https://arxiv.org/abs/1812.11703.
[13]
BERTINETTO L, VALMADRE J, HENRIQUES J F, Fully-Convolutional Siamese Networks for Object Tracking[M/OL]//HUA G, JÉGOU H. Computer Vision – ECCV 2016 Workshops: 9914. Cham: Springer International Publishing, 2016: 850-865[2022-08-17]. http://link.springer.com/10.1007/978-3-319-48881-3_56.
[14]
Distractor-Aware Siamese Networks for Visual Object Tracking | SpringerLink[EB/OL]. [2022-06-03]. https://link.springer.com/chapter/10.1007/978-3-030-01240-3_7.
[15]
CHEN Q, WANG Y, YANG T, You Only Look One-level Feature[J/OL]. CoRR, 2021, abs/2103.09460. https://arxiv.org/abs/2103.09460.
[16]
DANELLJAN M, VAN GOOL L, TIMOFTE R. Probabilistic Regression for Visual Tracking[M/OL]. arXiv, 2020[2022-08-17]. http://arxiv.org/abs/2003.12565.
[17]
BHAT G, DANELLJAN M, VAN GOOL L, Learning Discriminative Model Prediction for Tracking[C/OL]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 6181-6190[2022-08-17]. https://ieeexplore.ieee.org/document/9010649/.
[18]
DANELLJAN M, BHAT G, KHAN F S, ATOM: Accurate Tracking by Overlap Maximization[M/OL]. arXiv, 2019[2022-08-17]. http://arxiv.org/abs/1811.07628.
[19]
DANELLJAN M, BHAT G, KHAN F S, ECO: Efficient Convolution Operators for Tracking[M/OL]. arXiv, 2017[2022-08-17]. http://arxiv.org/abs/1611.09224.
[20]
NAM H, HAN B. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking[M/OL]. arXiv, 2016[2022-08-17]. http://arxiv.org/abs/1510.07945.
[21]
CHEN Z, ZHONG B, LI G, Siamese Box Adaptive Network for Visual Tracking: arXiv:2003.06761[R/OL]. arXiv, 2020[2022-06-02]. http://arxiv.org/abs/2003.06761.
[22]
GUO D, WANG J, CUI Y, SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking: arXiv:1911.07241[R/OL]. arXiv, 2019[2022-06-02]. http://arxiv.org/abs/1911.07241.
[23]
LI B, YAN J, WU W, High Performance Visual Tracking with Siamese Region Proposal Network[C/OL]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 8971-8980[2022-06-02]. https://ieeexplore.ieee.org/document/8579033/.
[24]
WANG Q, ZHANG L, BERTINETTO L, Fast Online Object Tracking and Segmentation: A Unifying Approach: arXiv:1812.05050[R/OL]. arXiv, 2019[2022-06-02]. http://arxiv.org/abs/1812.05050.

Index Terms

  1. Lightweight Transformer Tracker: Compact and Effect Neural Network for Object Tracking with Long-Short Range Attention

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
    September 2022
    1221 pages
    ISBN:9781450396899
    DOI:10.1145/3573942
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Attention mechanism
    2. Lightweight tracker
    3. Object tracking

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Science Foundation of China
    • Shaanxi Industrial Development Key Project

    Conference

    AIPR 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 43
      Total Downloads
    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media