Abstract
Recently, deep learning achieves competitive accuracy and robustness and dramatically improves the performance of target scale estimation through pre-trained special network branches. Yet, a fast and robust scale estimation method is still a challenging problem for visual object tracking. Early correlation filter tracking algorithm uses a multiscale search method to estimate the scale with the constant number of scale factors and invariant aspect ratio, which is redundant for the video frames with little or no scale change. Also, an independent network branch for target scale state is proposed, but the training network needs an abundance of datasets, and the effect is not very stable for the unseen target object. Aiming at the problems of existing scale estimation solutions, several variable scale learning methods are proposed to explore the scale change of the target. Firstly, we proposed a variable scale factor learning method, which makes us rid of the commonly used multiscale search with the flaws of fixed scale factors. Secondly, we used a multiscale aspect ratio solution to make up for invariant aspect ratio. Thirdly, the first and second scale methods were combined to propose a variable scale aspect ratio estimation method. Finally, the proposed scale estimation methods were embedded into the state-of-the-art ECO (Efficient Convolution Operators) and ATOM (Accurate Tracking by Overlap Maximization) trackers to replace the original scale methods for verifying the effectiveness of our proposed method. Extensive experiments on OTB100, UAV123, TC128 and LaSOT datasets demonstrate that the tracking performance can be improved effectively by using the proposed scale methods.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
We used four publicly available datasets in order to illustrate and test our methods. The OTB dataset can be found in http://cvlab.hanyang.ac.kr/tracker_benchmark/index.html. The UAV123 dataset can be found in https://ivul.kaust.edu.sa/Pages/pub-benchmark-simulator-uav.aspx. The TC128 dataset can be found in http://www.dabi.temple.edu/~hbling/data/TColor-128/TColor-128.html. The LaSOT dataset can be found in https://cis.temple.edu/lasot/download.html.
References
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016a) Staple: Complementary learners for real-time tracking. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1401–1409
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016b) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp 850–865
Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 493–509
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 6182–6191
Bhat G, Danelljan M, Van Gool L, Timofte R (2020) Know your surroundings: Exploiting scene information for object tracking. In: Proceedings of the European Conference on Computer Vision, pp 205–221
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2544–2550
Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8126–8135
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), pp 886–893
Danelljan M, Häger G, Khan F, Felsberg M (2014a) Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, pp 1–11
Danelljan M, Khan F S, Felsberg M, Weijer JVD (2014b) Adaptive color attributes for real-time visual tracking. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1090–1097
Danelljan M, Häger G, Khan F S, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp 4310–4318
Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of the European Conference on Computer Vision, pp 472–488
Danelljan M, Bhat G, Khan FS, Felsberg M (2017a) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Danelljan M, Häger G, Khan FS, Felsberg M (2017b) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4660–4669
Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7183–7192
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5369–5378
Henriques J F, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision, pp 702–715
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Hu B, Zhao H, Yang Y, Zhou B, Raj ANJ (2020) Multiple faces tracking using feature fusion and neural network in video. Intell Autom Soft Comput 26(6):1549–1560
Huang D, Gu P, Feng H-M, Lin Y, Zheng L (2020) Robust visual tracking models designs through kernelized correlation filters. Intell Autom Soft Comput 26(2):313–322
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–799
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, ˇCehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0–0
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the European conference on computer vision, pp 254–265
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291
Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: Algorithms and benchmark. IEEE Trans Image Process 24(12):5630–5644
Liu Z, Wang XA, Sun C, Lu K (2019) Implementation system of human eye tracking algorithm based on fpga. CMC-Comput Mat Contin 58(3):653–664
Ma H, Lin Z, Acton ST (2020) Fast: Fast and accurate scale estimation for tracking. IEEE Signal Process Lett 27:161–165
Ma C, Huang J, Yang X, Yang M (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp 3074–3082
Marvasti-Zadeh SM, Cheng L, Ghanei-Yakhdan H, Kasaei S (2021) Deep learning for visual tracking: A comprehensive survey. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3046478
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV), pp 445–461
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Santhosh P, Kaarthick B (2019) An automated player detection and tracking in basketball game. CMC-Comput Mat Contin 58(3):625–639
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations, pp 1–14
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2805–2813
Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6578–6588
Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Xu Y, Wang Z, Li Z, Ye Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12549–12556
Zhang J, Jin X, Sun J, Wang J, Li K (2019a) Dual model learning combined with multiple feature selection for accurate visual tracking. IEEE Access 7:43956–43969
Zhang J, Wu Y, Feng W, Wang J (2019b) Spatially attentive visual tracking using multi-model adaptive response fusion. IEEE Access 7:83873–83887
Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019c) Learning the model update for siamese trackers. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 4009–4018
Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2020a) Spatial and semantic convolutional features for robust visual object tracking. Multimed Tools Appl 79(21):15095–15115
Zhang J, Sun J, Wang J, Yue X-G (2020b) Visual object tracking based on residual network and cascaded correlation filters. J Ambient Intell Humaniz Comput 12:8427–8440
Zhang J, Liu Y, Liu H, Wang J (2021) Learning local–global multiple correlation filters for robust visual tracking with kalman filter redetection. Sensors 21(4):1129
Zhao S, Xu T, Wu X-J, Zhu X-F (2021) Adaptive feature fusion for visual object tracking. Pattern Recognit 111:107679
Zhao H, Yang G, Wang D, Lu H (2021) Deep mutual learning for visual object tracking. Pattern Recognit 112:107796
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 101–117
Acknowledgements
The authors would like to thank the anonymous reviewers for useful and constructive comments that help improve the quality of this paper.
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 62176272), Guangzhou Science and Technology Fund (Grant No. 201803010072), Science, Technology & Innovation Commission of Shenzhen Municipality (JCYL 20170818165305521), and China Medical University Hospital (DMR-107-067, DMR-108-132, DMR-110-097). We also acknowledge the start-up funding from SYSU “Hundred Talent Program”.
Author information
Authors and Affiliations
Contributions
Xuedong He completed the experiments and analyzed the results. Xuedong He, Lu Zhao and Calvin Yu-Chian Chen wrote the manuscript together.
Corresponding author
Ethics declarations
Conflicts of interest
The author reports no conflicts of interest in this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, X., Zhao, L. & Chen, C.YC. Variable scale learning for visual object tracking. J Ambient Intell Human Comput 14, 3315–3330 (2023). https://doi.org/10.1007/s12652-021-03469-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03469-2