Accurate target estimation with image contents for visual tracking

Wang, Sheng; Chen, Xi; Yan, Jia

doi:10.1007/s11042-024-18869-7

Accurate target estimation with image contents for visual tracking

Published: 06 April 2024

Volume 83, pages 90153–90175, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sheng Wang¹,
Xi Chen¹ &
Jia Yan²

132 Accesses
Explore all metrics

Abstract

Recently, Siamese-like trackers have performed very well. Most of the methods exploit classification scores and quality assessment scores to estimate a target. However, their classification scores have a low correlation with target locational estimation, and quality scores by a simple strategy benefit the correlation limitedly, which damages the tracking ability of a tracker. To alleviate this problem, we propose a simple Siamese target estimation with image contents (luminance, contrast, and structure) method for object tracking. Specifically, we first employ image contents involving the target to generate similarity scores by SSIM (Structure Similarity Index Measure) in the similarity branch, aiming to aid the classification branch in improving target estimation by considering the whole target context information in our model. Secondly, we give different weights of the classification branch and similarity branch during inference to ease the low correlation, which shows more flexibility for the target locational estimation. Our tracker achieves competitive performance on three challenging benchmarks like OTB100, GOT-10k, and TrackingNet over a real-time speed, proving the effectiveness of our method. Particularly, our tracker outperforms the leading baseline by over 6.0% in SR$_{0.75}$ score on GOT-10k benchmark still running at 67 FPS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards a Better Match in Siamese Network Based Visual Object Tracker

A robust attention-enhanced network with transformer for visual tracking

Article 31 March 2023

Hybrid Online Visual Tracking of Non-rigid Objects

Article 16 April 2024

Data availability

All the training data sets and test data sets used in our experiment are public and can be downloaded from their official websites. The results of all published algorithms can also be obtained from the websites provided by their respective authors.

References

Elayaperumal D, Joo YH (2020) Visual object tracking using sparse context-aware spatio-temporal correlation filter. J Vis Commun Image Represent 70:102820
Article MATH Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zhang J, Xie X, Zheng Z, Kuang L-D, Zhang Y (2022) Siamoa: siamese offset-aware object tracking. Neural Comput Appl:1–17
Li Z, Hu C, Nai K, Yuan J (2021) Siamese target estimation network with aiou loss for real-time visual tracking. J Vis Commun Image Represent 77:103107
Article MATH Google Scholar
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, Springer, pp 850–865
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advan Neural Inform Process Syst 28
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 9627–9636
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Conference Artif Intell 34:12549–12556
Article MATH Google Scholar
Wu S, Li X, Wang X (2020) IoU-aware single-stage object detector for accurate localization. Image Vis Comput 97:103911
Article MATH Google Scholar
Tian Z, Shen C, Chen H, He T (2020) Fcos: a simple and strong anchor-free object detector. IEEE transactions on pattern analysis and machine intelligence
Zhang H, Wang Y, Dayoub F, Sunderhauf N (2021) VarifocalNet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8514–8523
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article MATH Google Scholar
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article MATH Google Scholar
Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European conference on computer vision, Springer, pp 472–488
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Kiani Galoogahi H, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1135–1143
Sun Y, Sun C, Wang D, He Y, Lu H (2019) Roi pooled correlation filters for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5783–5791
Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 483–498
Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, Bmva Press, Sept 1-5, 2014
Dai K, Wang D, Lu H, Sun C, Li J (2019) Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4670–4679
Wang F, Yin S, Mbelwa JT, Sun F (2022) Context and saliency aware correlation filter for visual tracking. Multimed Tools Appl 81(19):27879–27893
Article Google Scholar
Zhao Z, Zhu Z, Yan M, Wu B, Zhao Z (2023) Robust object tracking based on power-law probability map and ridge regression. Multimedia Tool Appl:1–19
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: a unifying approach. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, Computer Vision Foundation / IEEE, June 16-20, 2019, pp 1328–1338
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 101–117
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277
Sheng Q-h, Huang J, Li Z, Zhou C-y, Yin H-b (2023) SiamDAG: Siamese dynamic receptive field and global context modeling network for visual tracking. Multimed Tools Appl 82(1):681–701
Article Google Scholar
Zhang J, Huang H, Jin X, Kuang L-D, Zhang J (2023) Siamese visual tracking based on criss-cross attention and improved head network. Multimedia Tool Appl:1–27
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4591–4600
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4834–4843
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the european conference on computer vision (ECCV), pp 784–799
Li X, Wang W, Wu L, Chen S, Hu X, Li J, Tang J, Yang J (2020) Generalized Focal Loss: learning qualified and distributed bounding boxes for dense object detection. Adv Neural Inf Process Syst 33:21002–21012
Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advan Neural Inform Process Syst 25
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: European conference on computer vision, Springer, pp 740–755
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5374–5383
Huang L, Zhao X, Huang K (2019) GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Article MATH Google Scholar
Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the european conference on computer vision (ECCV), pp 300–317
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Analysis Mach Intell 37(9):1834–1848
Article MATH Google Scholar
Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) GradNet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6162–6171
Zheng J, Ma C, Peng H, Yang X (2021) Learning to track objects from unlabeled videos. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13546–13555
Wang N, Song Y, Ma C, Zhou W, Liu W, Li H (2019) Unsupervised deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1308–1317
Hu W, Wang Q, Zhang L, Bertinetto L, Torr PH (2023) SiamMask: a framework for fast online object tracking and segmentation. IEEE Trans Pattern Anal Mach Intell 45(3):3072–3089
MATH Google Scholar
Lukežič A, Matas J, Kristan M (2021) A discriminative single-shot segmentation network for visual object tracking. IEEE Trans Pattern Anal Mach Intell 44(12):9742–9755
Article MATH Google Scholar
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: European conference on computer vision, Springer, pp 771–787
Zhao A, Zhang Y (2023) Evota: an enhanced visual object tracking network with attention mechanism. Multimedia Tool Appl:1–22
Chen X, Peng H, Wang D, Lu H, Hu H (2023) SeqTrack: sequence to sequence learning for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14572–14581
Xie F, Chu L, Li J, Lu Y, Ma C (2023) VideoTrack: learning to track objects via video transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 22826–22835
Yang Y, Gu X (2023) Joint correlation and attention based feature fusion network for accurate visual tracking. IEEE Trans Image Process
Peng J, Jiang Z, Gu Y, Wu Y, Wang Y, Tai Y, Wang C, Lin W (2021) Siamrcr: reciprocal classification and regression for visual object tracking. arXiv:2105.11237
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6182–6191
Huang L, Zhao X, Huang K (2020) GlobalTrack: a simple and strong baseline for long-term tracking. Proc AAAI Conf on Artif Intell 34:11037–11044
MATH Google Scholar
Yu B, Tang M, Zheng L, Zhu G, Wang J, Feng H, Feng X, Lu H (2021) High-performance discriminative tracking with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9856–9865
Wang N, Zhou W, Song Y, Ma C, Liu W, Li H (2021) Unsupervised deep representation learning for real-time tracking. Int J Comput Vis 129(2):400–418
Article MATH Google Scholar
Han W, Dong X, Khan FS, Shao L, Shen J (2021) Learning to fuse asymmetric feature maps in siamese trackers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16570–16580

Download references

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, 430072, Wuhan, China
Sheng Wang & Xi Chen
School of Electronic Information, Wuhan University, 430072, Wuhan, China
Jia Yan

Authors

Sheng Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xi Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jia Yan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xi Chen.

Ethics declarations

Conflicts of interest

We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, S., Chen, X. & Yan, J. Accurate target estimation with image contents for visual tracking. Multimed Tools Appl 83, 90153–90175 (2024). https://doi.org/10.1007/s11042-024-18869-7

Download citation

Received: 20 November 2023
Revised: 21 February 2024
Accepted: 04 March 2024
Published: 06 April 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s11042-024-18869-7

Keywords

Part of a collection:

Track 6: Computer Vision for Multimedia Applications

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate target estimation with image contents for visual tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards a Better Match in Siamese Network Based Visual Object Tracker

A robust attention-enhanced network with transformer for visual tracking

Hybrid Online Visual Tracking of Non-rigid Objects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now