Refiner: a general object position refinement algorithm for visual tracking

Wu, Han; Zhao, Bo; Liu, Guizhong

doi:10.1007/s00521-023-09263-9

Refiner: a general object position refinement algorithm for visual tracking

Original Article
Published: 08 December 2023

Volume 36, pages 3967–3981, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Han Wu¹,
Bo Zhao^1,2 &
Guizhong Liu¹

256 Accesses
1 Altmetric
Explore all metrics

Abstract

Object tracking is an important topic in computer vision. Most existing trackers require an accurate initial position of the target. However, in the real application, the initial location for tracking may not be accurate, which may lead to tracking drift. To solve this problem, we propose a simple deep learning-based method called Refiner that can produce the accurate position of an object given its rough location. Specifically, we propose an end-to-end position refinement network that consists of a backbone network, a feature enhancement module, a feature fusion module, and a shape predictor; the shape predictor includes two branches: a bounding box prediction branch and a mask prediction branch. We improve the spatial robustness of existing trackers by correcting the inaccurate initial position. In addition, the proposed method can also be used in the tracking process to improve the accuracy of the subsequent tracking results. Lots of experiments on the object tracking benchmarks verify its effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Siamese refine polar mask prediction network for visual tracking

Article 21 October 2023

RPT: Learning Point Set Representation for Siamese Visual Tracking

Structured Siamese Network for Real-Time Visual Tracking

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Article PubMed Google Scholar
Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR42600.2020.00721
Mayer C, Danelljan M, Bhat G, Paul M, Paudel DP, Yu F, Gool LV (2022) Transforming model prediction for tracking. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR52688.2022.00853
Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8126–8135. https://doi.org/10.1109/CVPR46437.2021.00803
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: 2017 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 6931–6939. https://doi.org/10.1109/CVPR.2017.733
Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization. In: Computer vision—ECCV 2014. Lecture notes in computer science, vol 8694, pp 188–203. https://doi.org/10.1007/978-3-319-10599-4_13
Hare S, Golodetz S, Saffari A, Vineet V, Cheng M, Hicks SL, Torr PHS (2016) Struck: Structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109. https://doi.org/10.1109/TPAMI.2015.2509974
Article PubMed Google Scholar
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422. https://doi.org/10.1109/TPAMI.2011.239
Article PubMed Google Scholar
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Article PubMed Google Scholar
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 580–587. https://doi.org/10.1109/CVPR.2014.81
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016. Barcelona, Spain, pp 379–387
Cai Z, Vasconcelos N (2021) Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell 43(5):1483–1498. https://doi.org/10.1109/TPAMI.2019.2956516
Article PubMed Google Scholar
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid ID, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE Conference on computer vision and pattern recognition (CVPR), pp 658–666. https://doi.org/10.1109/CVPR.2019.00075
Lin T, Goyal P, Girshick RB, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Article PubMed Google Scholar
Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C, Luo P (2021) Sparse R-CNN: end-to-end object detection with learnable proposals. In: 2021 IEEE Conference on computer vision and pattern recognition (CVPR), pp 14454–14463. https://doi.org/10.1109/CVPR46437.2021.01422
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Computer vision—ECCV 2018, vol 11218, pp 816–832. https://doi.org/10.1007/978-3-030-01264-9_48
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision—ECCV 2016, vol 9905, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: 2017 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 4293–4302. https://doi.org/10.1109/CVPR.2016.465
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 4277–4286. https://doi.org/10.1109/CVPR.2019.00441
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 4655–4664. https://doi.org/10.1109/CVPR.2019.00479
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: Computer vision—ECCV 2020, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
Yan B, Zhang X, Wang D, Lu H, Yang X (2021) Alpha-refine: boosting tracking performance by precise bounding box estimation. In: 2021 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5289–5298. https://doi.org/10.1109/CVPR46437.2021.00525
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Computer vision—ECCV 2016 workshops, vol 9914, pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Danelljan M, Häger G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928
Article PubMed Google Scholar
Yuan D, Chang X, Huang P, Liu Q, He Z (2021) Self-supervised deep correlation tracking. Trans Image Process 30:976–985. https://doi.org/10.1109/TIP.2020.3037518
Article ADS Google Scholar
Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2022) Siamcorners: siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239
Article Google Scholar
Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multim Comput Commun Appl 18(3):1–18. https://doi.org/10.1145/3486678
Article Google Scholar
Yuan D, Chang X, Liu Q, Yang Y, Wang D, Shu M, He Z, Shi G (2023) Active learning for deep visual tracking. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3266837
Article PubMed Google Scholar
Lempitsky VS, Kohli P, Rother C, Sharp T (2009) Image segmentation with a bounding box prior. In: 2009 IEEE International conference on computer vision (ICCV), pp 277–284. https://doi.org/10.1109/ICCV.2009.5459262
Hsu C, Hsu K, Tsai C, Lin Y, Chuang Y (2019) Weakly supervised instance segmentation using the bounding box tightness prior. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, pp 6582–6593
Lan S, Yu Z, Choy CB, Radhakrishnan S, Liu G, Zhu Y, Davis LS, Anandkumar A (2021) Discobox: weakly supervised instance segmentation and semantic correspondence from box supervision. In: 2021 IEEE International conference on computer vision (ICCV), pp 3386–3396. https://doi.org/10.1109/ICCV48922.2021.00339
Wang X, Feng J, Hu B, Ding Q, Ran L, Chen X, Liu W (2021) Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: 2021 IEEE Conference on computer vision and pattern recognition (CVPR), pp 10225–10235. https://doi.org/10.1109/CVPR46437.2021.01009
Rother C, Kolmogorov V, Blake A (2004) “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314. https://doi.org/10.1145/1015706.1015720
Article Google Scholar
Xu N, Price BL, Cohen S, Yang J, Huang TS (2017) Deep grabcut for object selection. In: British machine vision conference 2017, BMVC
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Chen K, Pang J, Wang J, Xiong Y, Li X, Sun S, Feng W, Liu Z, Shi J, Ouyang W, Loy CC, Lin D (2019) Hybrid task cascade for instance segmentation. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 4974–4983. https://doi.org/10.1109/CVPR.2019.00511
He K, Gkioxari G, Dollár P, Girshick RB (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Article PubMed Google Scholar
Bolya D, Zhou C, Xiao F, Lee YJ (2022) YOLACT++ better real-time instance segmentation. IEEE Trans Pattern Anal Mach Intell 44(2):1108–1121. https://doi.org/10.1109/TPAMI.2020.3014297
Article PubMed Google Scholar
Li F, Zhang H, Xu H, Liu S, Zhang L, Ni LM, Shum H (2022) Mask DINO: towards a unified transformer-based framework for object detection and segmentation. CoRR https://doi.org/10.48550/arXiv.2206.02777
Ren S, He K, Girshick RB, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article PubMed Google Scholar
Zhuge M, Fan D-P, Liu N, Zhang D, Xu D, Shao L (2022) Salient object detection via integrity learning. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3179526
Article Google Scholar
Zhou X, Shen K, Liu Z, Gong C, Zhang J, Yan C (2022) Edge-aware multiscale feature integration network for salient object detection in optical remote sensing images. IEEE Trans Geosci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2021.3091312
Article Google Scholar
Liu Z, Mao H, Wu C, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR52688.2022.01167
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Zhou D, Yu Z, Xie E, Xiao C, Anandkumar A, Feng J, Alvarez JM (2022) Understanding the robustness in vision transformers. In: International conference on machine learning, ICML 2022. Proceedings of machine learning research, vol 162, pp 27378–27394
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, ICML 2015. JMLR Workshop and Conference Proceedings, vol 37, pp 448–456
Ba LJ, Kiros JR, Hinton GE (2016) Layer normalization. CoRR arxiv:abs/1607.06450
Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR arxiv:abs/1606.08415
Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Comput Graph 85:15–22. https://doi.org/10.1016/j.cag.2019.09.002
Article Google Scholar
Fan H, Bai H, Lin L, Yang F, Chu P, Deng G, Yu S, Harshit Huang M, Liu J, Xu Y, Liao C, Yuan L, Ling H (2021) LaSOT: a high-quality large-scale single object tracking benchmark. Int J Comput Vis 129(2):439–461. https://doi.org/10.1007/s11263-020-01387-y
Article Google Scholar
Huang L, Zhao X, Huang K (2021) GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464
Article PubMed Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Computer vision—ECCV 2014, vol 8693, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Xu N, Yang L, Fan Y, Yang J, Yue D, Liang Y, Price BL, Cohen S, Huang TS (2018) Youtube-vos: sequence-to-sequence video object segmentation. In: Computer vision—ECCV 2018, vol 11209, pp 603–619. https://doi.org/10.1007/978-3-030-01228-1_36
Deng C, Yang X, Nie F, Tao D (2020) Saliency detection via a multiple self-weighted graph-based manifold ranking. IEEE Trans Multimed 22(4):885–896. https://doi.org/10.1109/TMM.2019.2934833
Article Google Scholar
Wang L, Lu H, Wang Y, Feng M, Wang D, Yin B, Ruan X (2017) Learning to detect salient objects with image-level supervision. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3796–3805. https://doi.org/10.1109/CVPR.2017.404
Shi J, Yan Q, Xu L, Jia J (2016) Hierarchical image saliency detection on extended CSSD. IEEE Trans Pattern Anal Mach Intell 38(4):717–729. https://doi.org/10.1109/TPAMI.2015.2465960
Article PubMed Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International conference on computer vision (ICCV), pp 1026–1034. https://doi.org/10.1109/ICCV.2015.123
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations, ICLR 2015
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF International conference on computer vision, ICCV, pp 6181–6190. https://doi.org/10.1109/ICCV.2019.00628
Kristan M, Leonardis A, Matas J et al (2020) The eighth visual object tracking VOT2020 challenge results. In: Computer vision—ECCV 2020 workshops, vol 12539, pp 547–601. https://doi.org/10.1007/978-3-030-68238-5_39
Mayer C, Danelljan M, Paudel DP, Gool LV (2021) Learning target candidate association to keep track of what not to track. In: 2021 IEEE/CVF International conference on computer vision, ICCV, pp 13424–13434. https://doi.org/10.1109/ICCV48922.2021.01319
Bhat G, Danelljan M, Gool LV, Timofte R (2020) Know your surroundings: exploiting scene information for object tracking. In: Computer vision—ECCV 2020. Lecture notes in computer science, vol 12368, pp 205–221. https://doi.org/10.1007/978-3-030-58592-1_13
Kristan M, Leonardis A, Matas J et al (2022) The tenth visual object tracking VOT2022 challenge results. In: Karlinsky L, Michaeli T, Nishino K (eds) Computer vision—ECCV 2022 workshops, vol 13808, pp 431–460. https://doi.org/10.1007/978-3-031-25085-9_25
Cui Y, Jiang C, Wang L, Wu G (2022) MixFormer: End-to-end tracking with iterative mixed attention. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR52688.2022.01324
Müller M, Bibi A, Giancola S, Al-Subaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Computer vision—ECCV 2018, vol 11205, pp 310–327. https://doi.org/10.1007/978-3-030-01246-5_19

Download references

Funding

Funding was provided by Shaanxi Key Research and Development Program (Grant No. 2018ZDCXL-GY-04-03-02).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Xi’an Jiaotong University, No.28, West Xianning Road, Xi’an, 710049, Shaanxi, China
Han Wu, Bo Zhao & Guizhong Liu
China North Vehicle Research Institute, Beijing, 100072, China
Bo Zhao

Authors

Han Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, H., Zhao, B. & Liu, G. Refiner: a general object position refinement algorithm for visual tracking. Neural Comput & Applic 36, 3967–3981 (2024). https://doi.org/10.1007/s00521-023-09263-9

Download citation

Received: 31 January 2023
Accepted: 06 November 2023
Published: 08 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00521-023-09263-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Refiner: a general object position refinement algorithm for visual tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Siamese refine polar mask prediction network for visual tracking

RPT: Learning Point Set Representation for Siamese Visual Tracking

Structured Siamese Network for Real-Time Visual Tracking

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Refiner: a general object position refinement algorithm for visual tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Siamese refine polar mask prediction network for visual tracking

RPT: Learning Point Set Representation for Siamese Visual Tracking

Structured Siamese Network for Real-Time Visual Tracking

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation