Self-supervised discriminative model prediction for visual tracking

Yuan, Di; Geng, Gu; Shu, Xiu; Liu, Qiao; Chang, Xiaojun; He, Zhenyu; Shi, Guangming

doi:10.1007/s00521-023-09348-5

Self-supervised discriminative model prediction for visual tracking

Original Article
Published: 26 December 2023

Volume 36, pages 5153–5164, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Di Yuan ORCID: orcid.org/0000-0001-9403-1112¹,
Gu Geng¹,
Xiu Shu²,
Qiao Liu³,
Xiaojun Chang⁴,
Zhenyu He⁵ &
…
Guangming Shi¹

163 Accesses
Explore all metrics

Abstract

The discriminative model prediction (DiMP) object tracking model is an excellent end-to-end tracking framework and have achieved the best results of its time. However, there are two problems with DiMP in the process of actual use: (1) DiMP is prone to interference from similar objects during the tracking process, and (2) DiMP requires a large amount of labeled data for training. In this paper, we propose two methods to enhance the robustness of interference to similar objects in target tracking: multi-scale region search and Gaussian convolution-based response map processing. Simultaneously, aiming at tackling the issue of requiring a large amount of labeled data for training, we implement self-supervised training based on forward-backward tracking for the DiMP tracking method. Furthermore, a new consistency loss function is designed to better self-supervised training. Extensive experiments show that the enhancements implemented in the DiMP tracking framework can bolster its robustness, and the tracker based on self-supervised training has outstanding tracking performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Object Tracking Based on Deep Feature and Dual Classifier Trained with Hard Samples

Visual tracking with semi-supervised online weighted multiple instance learning

Article 24 February 2015

DomainSiam: Domain-Aware Siamese Network for Visual Object Tracking

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Yuan D, Shu X, Liu Q, Zhang X, He Z (2023) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 35:3423–3434
Article PubMed Google Scholar
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291
Xu L, Gao M, Liu Z, Li Q, Jeon G (2022) Accelerated duality-aware correlation filters for visual tracking. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06794-x
Article PubMed PubMed Central Google Scholar
Yuan D, Li X, He Z, Liu Q, Lu S (2020) Visual object tracking with adaptive structural convolutional network. Knowl Based Syst 194:105554
Article Google Scholar
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6181–6190
Martin D, Bhat G (2019) Pytracking: visual tracking library based on pytorch. https://github.com/visionml/pytracking
Choi S, Lee J, Lee Y, Hauptmann A (2020) Robust long-term object tracking via improved discriminative model prediction. In: Proceedings of the European conference on computer vision. Springer, pp 602–617
Yuan D, Shu X, Fan N, Chang X, Liu Q, He Z (2022) Accurate bounding-box regression with distance-IoU loss for visual tracking. J Vis Commun Image Represent 83:103428
Article Google Scholar
Wang N, Zhou W, Wang J, Li H (2021) Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1571–1580
Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7183–7192
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5369–5378
Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European conference on computer vision, pp 300–317
Huang L, Zhao X, Huang K (2019) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision. Springer, pp 740–755
Meng F, Gong X, Zhang Y (2023) RHL-track: visual object tracking based on recurrent historical localization. Neural Comput Appl 35(17):12611–12625
Article Google Scholar
Mayer C, Danelljan M, Bhat G, Paul M, Paudel DP, Yu F, Van Gool L (2022) Transforming model prediction for tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8731–8740
Ke X, Li Y, Guo W, Huang Y (2022) Learning deep convolutional descriptor aggregation for efficient visual tracking. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06638-8
Article Google Scholar
Wang N, Song Y, Ma C, Zhou W, Liu W, Li H (2019) Unsupervised deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1308–1317
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Zhao N, Wu Z, Lau RW, Lin S (2021) Distilling localization for self-supervised representation learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 10990–10998
Yuan D, Shu X, Liu Q, He Z (2022) Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Trans Circuits Syst II Express Briefs 70(3):1224–1228
Google Scholar
Joyce JM (2011) Kullback–Leibler divergence. International encyclopedia of statistical science. Springer, Berlin, pp 720–722
Chapter Google Scholar
Liu Q, Li X, He Z, Fan N, Yuan D, Wang H (2021) Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans Multimed 23:2114–2126
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the advances in neural information processing systems, vol 30
Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1058–1067
Zhang C, Zhang K, Pham TX, Niu A, Qiao Z, Yoo CD, Kweon IS (2020) Dual temperature helps contrastive learning without many negative samples: towards understanding and simplifying moco. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14441–14450
Zbontar J, Jing L, Misra I, LeCun Y, Deny S (2021) Barlow twins: self-supervised learning via redundancy reduction. In: Proceedings of the international conference on machine learning, pp 12310–12320
Yun S, Lee H, Kim J, Shin J (2022) Patch-level representation learning for self-supervised vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8354–8363
Sun J, Zhang L, Zha Y, Gonzalez-Garcia A, Zhang P, Huang W, Zhang Y (2021) Unsupervised cross-modal distillation for thermal infrared tracking, in: Proceedings of the 29th ACM international conference on multimedia, pp 2262–2270
Lukežic A, Vojír T, Zajc LC, Matas J, Kristan M (2017) Discriminative correlation filter with channel and spatial reliability. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4847–4856
Sio CH, Ma YJ, Shuai HH, Chen JC, Cheng WH (2020) S2siamfc: self-supervised fully convolutional siamese network for visual tracking. In: Proceedings of the 28th ACM international conference on multimedia, pp 1948–1957
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision
Li X, Liu S, De Mello S, Wang X, Kautz J, Yang MH (2019) Joint-task self-supervised learning for temporal correspondence. In: Proceedings of the advances in neural information processing systems, vol 32
Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multimed Comput Commun Appl 18(3):1–18
Article Google Scholar
Zolfaghari M, Singh K, Brox T (2018) Eco: efficient convolutional network for online video understanding. In: Proceedings of the European conference on computer vision, pp 695–712
Yuan D, Shu X, He Z (2020) TRBACF: learning temporal regularized correlation filters for high performance online visual object tracking. J Vis Commun Image Represent 72:102882
Article Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article PubMed Google Scholar
Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans Image Process 24(12):5630–5644
Article ADS MathSciNet PubMed Google Scholar
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for UAV tracking. In: Proceedings of the European conference on computer vision, pp 445–461
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Yuan D, Chang X, Liu Q, Yang Y, Wang D, Shu M, He Z, Shi G (2023) Active learning for deep visual tracking. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3266837
Article PubMed Google Scholar
Yuan D, Chang X, Huang P-Y, Liu Q, He Z (2021) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985
Article ADS PubMed Google Scholar
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp 101–117
Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking. Neural Comput Appl 1–23
Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8126–8135
Zhang J, Yuan T, He Y, Wang J (2022) A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06771-4
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China under Grant Nos. 62202362, 62302073 and 62172126, by the Guangzhou Key Laboratory of Scene Understanding and Intelligent Interaction under Grant No. 202201000001, by the Fundamental Research Funds for the Central Universities under Grant No. XJS222503, by the China Postdoctoral Science Foundation under Grant Nos. 2022TQ0247 and 2023M742742, and by the Science and Technology Projects in Guangzhou under Grant No. 2023A04J0397.

Author information

Authors and Affiliations

Guangzhou Institute of Technology, Xidian University, Guangzhou, 510555, China
Di Yuan, Gu Geng & Guangming Shi
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
Xiu Shu
National Center for Applied Mathematics in Chongqing, Chongqing Normal University, Chongqing, 401331, China
Qiao Liu
Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, 2007, Australia
Xiaojun Chang
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
Zhenyu He

Authors

Di Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Gu Geng
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Shu
View author publications
You can also search for this author in PubMed Google Scholar
Qiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu He
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Di Yuan or Qiao Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yuan, D., Geng, G., Shu, X. et al. Self-supervised discriminative model prediction for visual tracking. Neural Comput & Applic 36, 5153–5164 (2024). https://doi.org/10.1007/s00521-023-09348-5

Download citation

Received: 11 August 2023
Accepted: 26 November 2023
Published: 26 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00521-023-09348-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised discriminative model prediction for visual tracking

Abstract

Access this article

Similar content being viewed by others

Robust Object Tracking Based on Deep Feature and Dual Classifier Trained with Hard Samples

Visual tracking with semi-supervised online weighted multiple instance learning

DomainSiam: Domain-Aware Siamese Network for Visual Object Tracking

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-supervised discriminative model prediction for visual tracking

Abstract

Access this article

Similar content being viewed by others

Robust Object Tracking Based on Deep Feature and Dual Classifier Trained with Hard Samples

Visual tracking with semi-supervised online weighted multiple instance learning

DomainSiam: Domain-Aware Siamese Network for Visual Object Tracking

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation