Visual object tracking (VOT) is a fundamental and complex problem in computer vision field. In the past few years, the research focus has been shifted from template matching to deep learning models. Especially, the Siamese networks dominate tracking domain in recent years, which take the first frame as the reference and perform object detection and localization in the following frames. However, most of them could not capture target changes due to the lack of strong feature representation abilities. To address these issue, we propose an advanced tracking network in this paper based on recurrent historical localization information. Unlike traditional symmetric structures, we utilize two convolution layers to perform target classification that predicts the initial target center. Then, we apply a gated recurrent unit that fuses multi-resolution features with historical localization information to yield the final optimized target position. Extensive experiments have been conducted on six mainstream datasets: OTB100, GOT-10k, TrackingNet, LaSOT, VOT2018 and NFS, where our tracker exhibits state-of-the-art performances.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The raw/processed data required to reproduce these findings will be shared once this paper has been accepted.
Yi S, Li H, Wang X (2016) Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. IEEE Trans Image Process 25(9):4354–4368. https://doi.org/10.1109/TIP.2016.2590322
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6526–6534 . https://doi.org/10.1109/CVPR.2017.691
Yuan D, Shu X, Liu Q, Zhang X, He Z (2022) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 1–12
Wang Y, Wei X, Tang X, Wu J, Fang J (2022) Response map evaluation for RGBT tracking. Neural Comput Appl 34(7):5757–5769
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Kiran M, Nguyen-Meidine LT, Sahay R, Cruz RMOE, Blais-Morin L-A, Granger E (2022) Dynamic template selection through change detection for adaptive Siamese tracking. arXiv preprint arXiv:2203.03181
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5369–5378. https://doi.org/10.1109/CVPR.2019.00552
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Hua G, Jégou H (eds) Computer Vision—ECCV 2016 Workshops. Springer, Cham, pp 850–865
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286. https://doi.org/10.1109/CVPR.2019.00441
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018. Springer, Cham, pp 103–119
Guo, D., Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6268–6276. https://doi.org/10.1109/CVPR42600.2020.00630
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4655–4664. https://doi.org/10.1109/CVPR.2019.00479
Li X, Huang L, Wei Z, Nie J, Chen Z (2021) Adaptive multi-branch correlation filters for robust visual tracking. Neural Comput Appl 33(7):2889–2904
Yuan D, Chang X, Huang P-Y, Liu Q, He Z (2020) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985
Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for uav tracking. ACM Trans Multimedia Comput Commun Appl (TOMM) 18(3):1–18
Zhou J, Wang P, Sun H (2020) Discriminative and robust online learning for Siamese visual tracking. Proc AAAI Conf Artif Intell 34(07):13017–13024. https://doi.org/10.1609/aaai.v34i07.7002
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4644–4654. https://doi.org/10.1109/CVPR.2019.00478
Dai K, Zhang Y, Wang D, Li J, Lu H, Yang X (2020) High-performance long-term tracking with meta-updater. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6297–6306. https://doi.org/10.1109/CVPR42600.2020.00633
Zhou L, Ding X, Li W, Leng J, Lei B, Yang W (2022) A location-aware Siamese network for high-speed visual tracking. Appl Intell. https://doi.org/10.1007/s10489-022-03636-8
Wang G, Luo C, Xiong Z, Zeng W (2019) SPM-tracker: series-parallel matching for real-time visual object tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3638–3647. https://doi.org/10.1109/CVPR.2019.00376
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1328–1338. https://doi.org/10.1109/CVPR.2019.00142
Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) Siam R-CNN: visual tracking by re-detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6577–6587. https://doi.org/10.1109/CVPR42600.2020.00661
Cheng S, Zhong B, Li G, Liu X, Tang Z, Li X, Wang J (2021) Learning to filter: Siamese relation network for robust tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4421–4431
Tan K, Xu T-B, Wei Z (2022) Imsiam: IOU-aware matching-adaptive Siamese network for object tracking. Neurocomputing 492:222–233. https://doi.org/10.1016/j.neucom.2022.04.003
Zhang Z, Peng H (2019) Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4586–4595. https://doi.org/10.1109/CVPR.2019.00472
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6667–6676 . https://doi.org/10.1109/CVPR42600.2020.00670
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953. https://doi.org/10.1109/CVPR.2019.00814
Saribas H, Cevikalp H, Köpüklü O, Uzun B (2022) Trat: tracking by attention using spatio-temporal features. Neurocomputing 492:150–161. https://doi.org/10.1016/j.neucom.2022.04.043
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Conf Artif Intell 34(07):12549–12556. https://doi.org/10.1609/aaai.v34i07.6944
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6181–6190. https://doi.org/10.1109/ICCV.2019.00628
Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7181–7190. https://doi.org/10.1109/CVPR42600.2020.00721
Chen S, Qiu C, Zhang Z (2022) An efficient method for tracking failure detection using parallel correlation filtering and Siamese network. Appl Intell 52(7):7713–7722. https://doi.org/10.1007/s10489-021-02768-7
Zhou Y, Zhang Y (2022) Siamet: a Siamese based visual tracking network with enhanced templates. Appl Intell 52(9):9782–9794. https://doi.org/10.1007/s10489-021-03057-z
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: gradient-guided network for visual object tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6161–6170 . https://doi.org/10.1109/ICCV.2019.00626
Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for Siamese trackers. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4009–4018. https://doi.org/10.1109/ICCV.2019.00411
Blatter P, Kanakis M, Danelljan M, Van Gool L (2021) Efficient visual tracking with exemplar transformers. arXiv preprint arXiv:2112.09686
Chen X, Wang D, Li D, Lu H (2022) Efficient visual tracking via hierarchical cross-attention transformer. arXiv preprint arXiv:2203.13537
Yan B, Peng H, Wu K, Wang D, Fu J, Lu H (2021) Lighttrack: finding lightweight neural networks for object tracking via one-shot architecture search. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15175–15184. https://doi.org/10.1109/CVPR46437.2021.01493
Zhou L, Ding X, Li W, Leng J, Lei B, Yang W (2022) A location-aware Siamese network for high-speed visual tracking. Appl Intell 1–17
Gao L, Liu B, Fu P, Xu M, Li J (2022) Visual tracking via dynamic saliency discriminative correlation filter. Appl Intell 52(6):5897–5911
Kristan M, Leonardis A, Matas J, Felsberg M (2019) The sixth visual object tracking vot2018 challenge results. In: Leal-Taixé L, Roth S (eds) Computer Vision—ECCV 2018 Workshops. Springer, Cham, pp 3–53
Sun C, Wang D, Lu H, Yang M-H (2018) Correlation tracking via joint discrimination and reliability learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 489–497. https://doi.org/10.1109/CVPR.2018.00058
Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018. Springer, Cham, pp 493–509
Xu T, Feng Z, Wu X-J, Kittler J (2021) Adaptive channel selection for robust visual object tracking with discriminative correlation filters. Int J Comput Vis. https://doi.org/10.1007/s11263-021-01435-1
Luo Y, Xiao H, Ou J, Chen X (2022) Siamsmdfff: Siamese network tracker based on shallow-middle-deep three-level feature fusion and clustering-based adaptive rectangular window filtering. Neurocomputing 483:160–170. https://doi.org/10.1016/j.neucom.2022.02.027
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6931–6939. https://doi.org/10.1109/CVPR.2017.733
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4293–4302. https://doi.org/10.1109/CVPR.2016.465
Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2022) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimedia 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239
Huang L, Zhao X, Huang K (2021) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464
Yang T, Xu P, Hu R, Chai H, Chan AB (2020) Roam: recurrently optimizing tracking model. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6717–6726 . https://doi.org/10.1109/CVPR42600.2020.00675
Xing D, Evangeliou N, Tsoukalas A, Tzes A (2022) Siamese transformer pyramid networks for real-time UAV tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2139–2148
Müller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018. Springer, Cham, pp 310–327
Galoogahi HK, Fagg A, Huang C, Ramanan D, Lucey S (2017) Need for speed: a benchmark for higher frame rate object tracking. In: 2017 IEEE international conference on computer vision (ICCV), pp 1134–1143. https://doi.org/10.1109/ICCV.2017.128
Danelljan M, Robinson A, Khan F, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking
Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, Yang M-H (2019) Hedging deep features for visual tracking. IEEE Trans Pattern Anal Mach Intell 41(5):1116–1130. https://doi.org/10.1109/TPAMI.2018.2828817
Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 3119–3127. https://doi.org/10.1109/ICCV.2015.357
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meng, F., Gong, X. & Zhang, Y. RHL-track: visual object tracking based on recurrent historical localization. Neural Comput & Applic 35, 12611–12625 (2023). https://doi.org/10.1007/s00521-023-08422-2
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08422-2