Learning deep convolutional descriptor aggregation for efficient visual tracking

Ke, Xiao; Li, Yuezhou; Guo, Wenzhong; Huang, Yanyan

doi:10.1007/s00521-021-06638-8

Learning deep convolutional descriptor aggregation for efficient visual tracking

Original Article
Published: 27 October 2021

Volume 34, pages 3745–3765, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xiao Ke^1,2^na1,
Yuezhou Li^1,2^na1,
Wenzhong Guo ORCID: orcid.org/0000-0001-6586-4588^1,2 &
…
Yanyan Huang^1,2

430 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Visual trackers have achieved a high-level performance from deep features, but many limitations remain. Online trackers suffer from low speed while using deep features for parameter updating, and deep trackers trained offline demonstrate data hunger. To meet these challenges, our work aims to mine the target representation capability of a pre-trained model and presents deep convolutional descriptor aggregation (DCDA) for visual tracking. Based on spatial and semantic priors, we propose an edge-aware selection (EAS) and a central-aware selection (CAS) method to aggregate the accuracy-aware and robustness-aware features. To make full use of the scene context, our method is derived from one-shot learning by designing a dedicated regression process that is capable of predicting discriminative model in a few iterations. By exploiting robustness feature aggregation, the accuracy feature aggregation, and the discriminative regression, our DCDA with Siamese tracking architecture not only enhances the target prediction capacity, but also achieves a low-cost reuse of the pre-trained model. Comprehensive experiments on OTB-100, VOT2016, VOT2017, VOT2020, NFS30, and NFS240 show that our DCDA tracker achieves state-of-the-art performance with a high running speed of 65 FPS. The source code and all the experimental results of this work will be made public at https://github.com/Gitlyz007/DCDA_Tracker.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

CBAM: Convolutional Block Attention Module

Notes

The pre-trained models are obtained from: https://www.vlfeat.org/matconvnet/pretrained/.

References

Bau D, Zhou B, Khosla A, Oliva A, Torralba A(2017) Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the CVPR, pp 6541–6549
Bertinetto L, Henriques J, Valmadre J, Torr P, Vedaldi A (2016) Learning feed-forward one-shot learners. In: Proceeding of the NIPS, pp 523–531
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016) Staple: complementary learners for real-time tracking. In: Proceeding of the CVPR, pp 1401–1409
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the ECCVW, pp 850–865. Springer
Zhizhen C, Hongyang L, Huchuan L, Ming-Hsuan Y (2017) Dual deep network for visual tracking. IEEE Trans Image Process 26(4):2005–2015
Article MathSciNet Google Scholar
Choi J, Jin CH, Fischer T, Yun S, Lee K, Jeong J, Demiris Y, Young CJ (2018) Context-aware deep feature compression for high-speed visual tracking. In: Proceeding of the CVPR, pp 479–488
Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceeding of the CVPR, pp 6172–6181
Danelljan M (2018) Learning convolution operators for visual tracking, vol 1926. Linköping University Electronic Press, Linköping
Danelljan M, Häger G, Khan FS, Felsberg M (2015) Coloring channel representations for visual tracking. In: Scandinavian conference on image analysis, pp 117–129. Springer
Danelljan M, Hager G, Shahbaz KF, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceeding of the ICCVW, pp 58–66
Danelljan M, Hager G, Shahbaz KF, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceeding of the ICCV, pp 4310–4318
Martin D, Gustav H, Shahbaz KF, Michael F (2016a) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Google Scholar
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceeding of the ECCV, pp. 472–488. Springer
Danelljan M, Bhat G, Shahbaz KF, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceeding of the CVPR, pp 6638–6646
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceeding of the ECCV, pp 459–474
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao Y, Ling Y (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceeding of the CVPR, pp 5374–5383
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the ICML, pp 1126–1135. JMLR. org
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: Proceedings of the CVPR, pp 4649–4659
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the CVPR, pp 4834–4843
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the CVPR, pp 770–778
He Z, Fan Y, Zhuang J, Dong Y, Bai HL (2017) Correlation filters with weighted convolution responses. In: Proceedings of the ICCVW, pp 1992–2000
Held D, Thrun S, Sav S (2016) Learning to track at 100 fps with deep regression networks. In: Proceedings of the ECCV, pp 749–765. Springer
Henriques João F, Rui C, Pedro M, Jorge B (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Kiani GH, Sim T, Lucey S (2015) Correlation filters with limited boundaries. In: Proceedings of the CVPR, pp 4630–4638
Kiani GH, Fagg A, Huang C, Ramanan D, Lucey S (2017) Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of the ICCV, pp 1125–1134
Kiani GH, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the ICCV, pp 1135–1143
Kristan M, Lukezic A, Danelljan M, Čehovin ZL, Matas J (2020) The new vot2020 short-term tracking performance evaluation protocol and measures
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernández G, Vojir H, Tomas et al (2016) The visual object tracking vot2016 challenge results. In: Proceedings of the ECCVW, vol 2, p 8
Kristan M, Matas J, Leonardis A, Vojir T, Pflugfelder R, Fernandez G, Nebehay G, Porikli F, Čehovin L (2016) A novel performance evaluation methodology for single-target trackers. IEEE Trans Pattern Anal Mach Intell 38(11):2137–2155. https://doi.org/10.1109/TPAMI.2016.2516982
Article Google Scholar
Kristan M, Leonardis A, Matas A, Felsberg M, Pflugfelder R, Cehovin ZL, Vojir L, Hager G, Lukezic A, Eldesokey A et al (2017) The visual object tracking vot2017 challenge results. In: Proceedings of the ICCVW, pp 1949–1972
Matej K, Jiri M, Ales L, Michael F, Roman P, Joni-Kristian K, Luka CZ, Ondrej D, Alan L, Amanda B et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the ICCVW
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the CVPR, pp 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing F, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the CVPR, pp 4282–4291
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu X (2019) Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the ICCV, pp 6162–6171
Li X, Ma C, Wu B, He Z, Yang MH (2019) Target-aware deep tracking. In: Proceedings of the CVPR, pp 1369–1378
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the ECCV, pp 254–265. Springer
Yang L, Jianke Z, Hoi Steven CH, Wenjie S, Zhefeng W, Hantang L (2019) Robust estimation of similarity transformation for visual object tracking. In: Proc AAAI 33:8666–8673
Shuai L, Shuai W, Xinyu L, Chin-Teng L, Zhihan L (2020) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst
Shuai L, Xinyu L, Shuai W, Khan M (2021) Fuzzy-aided solution for out-of-view challenge in visual tracking under iot-assisted complex environment. Neural Comput Appl 33:1055–1065
Article Google Scholar
Shuai L, Shuai W, Xinyu L, Gandomi Amir H, Mahmoud D, Khan M, de Albuquerque Victor Hugo C, (2021) Human memory update strategy: a multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimedia
Wenxi L, Yibing S, Dengsheng C, He Shengfeng Yu, Yuanlong YT, Hancke Gehard P, Lau Rynson WH (2019) Deformable object tracking with gated fusion. IEEE Trans Image Process 28(8):3766–3777
Article MathSciNet Google Scholar
Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: Proceedings of the CVPR, pp 5388–5396
Chao M, Jia-Bin H, Xiaokang Y, Ming-Hsuan Y (2018) Robust visual tracking via hierarchical convolutional features. IEEE Trans Pattern Anal Mach Intell 41(11):2709–2723
Google Scholar
Marvasti-Zadeh MH, Ghanei-Yakhdan H, Kasaei S (2021) Efficient scale estimation methods using lightweight deep convolutional neural networks for visual tracking. Neural Comput Appl, pp 1–16
Munkhdalai T, Yu H (2017) Meta networks. In: Proceedings of the ICML, pp 2554–2563. JMLR. org
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the CVPR, pp 4293–4302
Zaiyu P, Jun W, Guoqing W, Jihong Z (2020) Multi-scale deep representation aggregation for vein recognition. IEEE Trans Inf Forens Security 16:1–15
Google Scholar
Adam P, Sam G, Francisco M, Adam L, James B, Gregory C, Trevor K, Zeming L, Natalia G, Luca A et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Proceedings of the NIPS 8024–8035
Yuankai Q, Shengping Z, Lei Q, Qingming H, Hongxun Y, Jongwoo L, Ming-Hsuan Y (2018) Hedging deep features for visual tracking. IEEE Trans Pattern Anal Mach Intell 41(5):1116–1130
Google Scholar
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the CVPR, pp 5296–5305
Olga R, Jia D, Hao S, Jonathan K, Sanjeev S, Sean M, Zhiheng H, Andrej K, Aditya K, Michael B, Berg Alexander C, Li F-F (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun C, Wang D, Lu H, Yang M-H (2018) Learning spatial-aware regressions for visual tracking. In: Proceedings of the CVPR, pp 8962–8970
Szegedy C, Liu W, Jia Y, Sermanet P, Reed P, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the CVPR, pp 1–9
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the CVPR, pp 2805–2813
Wang G, Luo C, Xiong Z, Zeng Z (2019) Spm-tracker: series-parallel matching for real-time visual object tracking. In: Proceedings of the CVPR, pp 3643–3652
Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: Proceedings of the ICCV, pp 3119–3127
Wang N, Song Y, Ma C, Zhou W, Liu W, Li H (2019) Unsupervised deep tracking. In: Proceedings of the CVPR, pp 1308–1317
Xiu-Shen W, Jian-Hao L, Jianxin W, Zhi-Hua Z (2017) Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process 26(6):2868–2881
Article MathSciNet Google Scholar
Xiu-Shen W, Chen-Lin Z, Jianxin W, Chunhua S, Zhi-Hua Z (2019) Unsupervised object discovery and co-localization by deep descriptor transformation. Pattern Recogn 88:113–126
Article Google Scholar
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the CVPR, pp 2411–2418
Yi W, Jongwoo L, Ming-Hsuan Y (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Xu J, Shi C, Qi C, Wang C, Xiao B (2018) Unsupervised part-based weighting aggregation of deep convolutional features for image retrieval. In: Proceedings of the AAAI, vol 32
Kang Y, Huihui S, Kaihua Z, Qingshan L (2020) Hierarchical attentive siamese network for real-time visual tracking. Neural Comput Appl 32(18):14335–14346
Article Google Scholar
Yang T, Chan AB (2018) Learning dynamic memory networks for object tracking. In: Proceedings of the ECCV, pp 152–167
Tianyu Y, Chan Antoni B (2019) Visual tracking via dynamic memory networks. IEEE Trans Pattern Anal Mach Intell
Yang Y, De-Chuan Z, Ying F, Yuan J, Zhi-Hua Z (2017) Deep learning for fixed model reuse. In: Proceedings of the AAAI
Yin J, Wang W, Meng Q, Yang R, Shen J (2020) A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the CVPR, pp 6768–6777
Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: Proceedings of the ECCV, pp 188–203. Springer
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the CVPR, pp 2921–2929
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang MH (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the ECCV, pp 366–382
Jie Z, Shufang W, Hong Z, Yan L, Li Z (2019) Multi-center convolutional descriptor aggregation for image retrieval. Int J Mach Learn Cybern 10(7):1863–1873
Article Google Scholar
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the ECCV, pp 101–117

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 61972097, 61672159, U1705262, and 61672158, in part by the Technology Guidance Project of Fujian Province under Grant 2017H0015, in part by the Natural Science Foundation of Fujian Province under Grants 2021J01612, 2018J1798 and 2018J07005, in part by the Major Project of Fujian Province under Grant 2021HZ022007.

Author information

Xiao Ke and Yuezhou Li contributed equally to this work.

Authors and Affiliations

Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
Xiao Ke, Yuezhou Li, Wenzhong Guo & Yanyan Huang
Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou, 350003, China
Xiao Ke, Yuezhou Li, Wenzhong Guo & Yanyan Huang

Authors

Xiao Ke
View author publications
You can also search for this author in PubMed Google Scholar
Yuezhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenzhong Guo.

Ethics declarations

Conflicts of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information (25708 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ke, X., Li, Y., Guo, W. et al. Learning deep convolutional descriptor aggregation for efficient visual tracking. Neural Comput & Applic 34, 3745–3765 (2022). https://doi.org/10.1007/s00521-021-06638-8

Download citation

Received: 15 March 2021
Accepted: 15 October 2021
Published: 27 October 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00521-021-06638-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning deep convolutional descriptor aggregation for efficient visual tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

CBAM: Convolutional Block Attention Module

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary Information (25708 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning deep convolutional descriptor aggregation for efficient visual tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

CBAM: Convolutional Block Attention Module

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary Information (25708 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation