Skip to main content
Log in

Variable scale learning for visual object tracking

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recently, deep learning achieves competitive accuracy and robustness and dramatically improves the performance of target scale estimation through pre-trained special network branches. Yet, a fast and robust scale estimation method is still a challenging problem for visual object tracking. Early correlation filter tracking algorithm uses a multiscale search method to estimate the scale with the constant number of scale factors and invariant aspect ratio, which is redundant for the video frames with little or no scale change. Also, an independent network branch for target scale state is proposed, but the training network needs an abundance of datasets, and the effect is not very stable for the unseen target object. Aiming at the problems of existing scale estimation solutions, several variable scale learning methods are proposed to explore the scale change of the target. Firstly, we proposed a variable scale factor learning method, which makes us rid of the commonly used multiscale search with the flaws of fixed scale factors. Secondly, we used a multiscale aspect ratio solution to make up for invariant aspect ratio. Thirdly, the first and second scale methods were combined to propose a variable scale aspect ratio estimation method. Finally, the proposed scale estimation methods were embedded into the state-of-the-art ECO (Efficient Convolution Operators) and ATOM (Accurate Tracking by Overlap Maximization) trackers to replace the original scale methods for verifying the effectiveness of our proposed method. Extensive experiments on OTB100, UAV123, TC128 and LaSOT datasets demonstrate that the tracking performance can be improved effectively by using the proposed scale methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Availability of data and materials

We used four publicly available datasets in order to illustrate and test our methods. The OTB dataset can be found in http://cvlab.hanyang.ac.kr/tracker_benchmark/index.html. The UAV123 dataset can be found in https://ivul.kaust.edu.sa/Pages/pub-benchmark-simulator-uav.aspx. The TC128 dataset can be found in http://www.dabi.temple.edu/~hbling/data/TColor-128/TColor-128.html. The LaSOT dataset can be found in https://cis.temple.edu/lasot/download.html.

References

  • Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016a) Staple: Complementary learners for real-time tracking. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1401–1409

  • Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016b) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp 850–865

  • Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 493–509

  • Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 6182–6191

  • Bhat G, Danelljan M, Van Gool L, Timofte R (2020) Know your surroundings: Exploiting scene information for object tracking. In: Proceedings of the European Conference on Computer Vision, pp 205–221

  • Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2544–2550

  • Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8126–8135

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), pp 886–893

  • Danelljan M, Häger G, Khan F, Felsberg M (2014a) Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, pp 1–11

  • Danelljan M, Khan F S, Felsberg M, Weijer JVD (2014b) Adaptive color attributes for real-time visual tracking. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1090–1097

  • Danelljan M, Häger G, Khan F S, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp 4310–4318

  • Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of the European Conference on Computer Vision, pp 472–488

  • Danelljan M, Bhat G, Khan FS, Felsberg M (2017a) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646

  • Danelljan M, Häger G, Khan FS, Felsberg M (2017b) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575

    Article  Google Scholar 

  • Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4660–4669

  • Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7183–7192

  • Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5369–5378

  • Henriques J F, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision, pp 702–715

  • Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  • Hu B, Zhao H, Yang Y, Zhou B, Raj ANJ (2020) Multiple faces tracking using feature fusion and neural network in video. Intell Autom Soft Comput 26(6):1549–1560

    Article  Google Scholar 

  • Huang D, Gu P, Feng H-M, Lin Y, Zheng L (2020) Robust visual tracking models designs through kernelized correlation filters. Intell Autom Soft Comput 26(2):313–322

    Google Scholar 

  • Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–799

  • Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, ˇCehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0–0

  • Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the European conference on computer vision, pp 254–265

  • Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8971–8980

  • Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291

  • Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: Algorithms and benchmark. IEEE Trans Image Process 24(12):5630–5644

    Article  MathSciNet  MATH  Google Scholar 

  • Liu Z, Wang XA, Sun C, Lu K (2019) Implementation system of human eye tracking algorithm based on fpga. CMC-Comput Mat Contin 58(3):653–664

    Google Scholar 

  • Ma H, Lin Z, Acton ST (2020) Fast: Fast and accurate scale estimation for tracking. IEEE Signal Process Lett 27:161–165

    Article  Google Scholar 

  • Ma C, Huang J, Yang X, Yang M (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp 3074–3082

  • Marvasti-Zadeh SM, Cheng L, Ghanei-Yakhdan H, Kasaei S (2021) Deep learning for visual tracking: A comprehensive survey. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3046478

    Article  Google Scholar 

  • Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV), pp 445–461

  • Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  • Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  • Santhosh P, Kaarthick B (2019) An automated player detection and tracking in basketball game. CMC-Comput Mat Contin 58(3):625–639

    Google Scholar 

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations, pp 1–14

  • Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2805–2813

  • Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6578–6588

  • Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  • Xu Y, Wang Z, Li Z, Ye Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12549–12556

  • Zhang J, Jin X, Sun J, Wang J, Li K (2019a) Dual model learning combined with multiple feature selection for accurate visual tracking. IEEE Access 7:43956–43969

    Article  Google Scholar 

  • Zhang J, Wu Y, Feng W, Wang J (2019b) Spatially attentive visual tracking using multi-model adaptive response fusion. IEEE Access 7:83873–83887

    Article  Google Scholar 

  • Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019c) Learning the model update for siamese trackers. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 4009–4018

  • Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2020a) Spatial and semantic convolutional features for robust visual object tracking. Multimed Tools Appl 79(21):15095–15115

    Article  Google Scholar 

  • Zhang J, Sun J, Wang J, Yue X-G (2020b) Visual object tracking based on residual network and cascaded correlation filters. J Ambient Intell Humaniz Comput 12:8427–8440

    Article  Google Scholar 

  • Zhang J, Liu Y, Liu H, Wang J (2021) Learning local–global multiple correlation filters for robust visual tracking with kalman filter redetection. Sensors 21(4):1129

    Article  Google Scholar 

  • Zhao S, Xu T, Wu X-J, Zhu X-F (2021) Adaptive feature fusion for visual object tracking. Pattern Recognit 111:107679

    Article  Google Scholar 

  • Zhao H, Yang G, Wang D, Lu H (2021) Deep mutual learning for visual object tracking. Pattern Recognit 112:107796

    Article  Google Scholar 

  • Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 101–117

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for useful and constructive comments that help improve the quality of this paper.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62176272), Guangzhou Science and Technology Fund (Grant No. 201803010072), Science, Technology & Innovation Commission of Shenzhen Municipality (JCYL 20170818165305521), and China Medical University Hospital (DMR-107-067, DMR-108-132, DMR-110-097). We also acknowledge the start-up funding from SYSU “Hundred Talent Program”.

Author information

Authors and Affiliations

Authors

Contributions

Xuedong He completed the experiments and analyzed the results. Xuedong He, Lu Zhao and Calvin Yu-Chian Chen wrote the manuscript together.

Corresponding author

Correspondence to Calvin Yu-Chian Chen.

Ethics declarations

Conflicts of interest

The author reports no conflicts of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, X., Zhao, L. & Chen, C.YC. Variable scale learning for visual object tracking. J Ambient Intell Human Comput 14, 3315–3330 (2023). https://doi.org/10.1007/s12652-021-03469-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03469-2

Keywords

Navigation