Skip to main content

Advertisement

Log in

RHL-track: visual object tracking based on recurrent historical localization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Visual object tracking (VOT) is a fundamental and complex problem in computer vision field. In the past few years, the research focus has been shifted from template matching to deep learning models. Especially, the Siamese networks dominate tracking domain in recent years, which take the first frame as the reference and perform object detection and localization in the following frames. However, most of them could not capture target changes due to the lack of strong feature representation abilities. To address these issue, we propose an advanced tracking network in this paper based on recurrent historical localization information. Unlike traditional symmetric structures, we utilize two convolution layers to perform target classification that predicts the initial target center. Then, we apply a gated recurrent unit that fuses multi-resolution features with historical localization information to yield the final optimized target position. Extensive experiments have been conducted on six mainstream datasets: OTB100, GOT-10k, TrackingNet, LaSOT, VOT2018 and NFS, where our tracker exhibits state-of-the-art performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The raw/processed data required to reproduce these findings will be shared once this paper has been accepted.

References

  1. Yi S, Li H, Wang X (2016) Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. IEEE Trans Image Process 25(9):4354–4368. https://doi.org/10.1109/TIP.2016.2590322

    Article  MathSciNet  MATH  Google Scholar 

  2. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6526–6534 . https://doi.org/10.1109/CVPR.2017.691

  3. Yuan D, Shu X, Liu Q, Zhang X, He Z (2022) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 1–12

  4. Wang Y, Wei X, Tang X, Wu J, Fang J (2022) Response map evaluation for RGBT tracking. Neural Comput Appl 34(7):5757–5769

    Article  Google Scholar 

  5. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  6. Kiran M, Nguyen-Meidine LT, Sahay R, Cruz RMOE, Blais-Morin L-A, Granger E (2022) Dynamic template selection through change detection for adaptive Siamese tracking. arXiv preprint arXiv:2203.03181

  7. Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5369–5378. https://doi.org/10.1109/CVPR.2019.00552

  8. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: Hua G, Jégou H (eds) Computer Vision—ECCV 2016 Workshops. Springer, Cham, pp 850–865

    Chapter  Google Scholar 

  9. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935

  10. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4277–4286. https://doi.org/10.1109/CVPR.2019.00441

  11. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018. Springer, Cham, pp 103–119

    Chapter  Google Scholar 

  12. Guo, D., Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6268–6276. https://doi.org/10.1109/CVPR42600.2020.00630

  13. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4655–4664. https://doi.org/10.1109/CVPR.2019.00479

  14. Li X, Huang L, Wei Z, Nie J, Chen Z (2021) Adaptive multi-branch correlation filters for robust visual tracking. Neural Comput Appl 33(7):2889–2904

    Article  Google Scholar 

  15. Yuan D, Chang X, Huang P-Y, Liu Q, He Z (2020) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985

    Article  Google Scholar 

  16. Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for uav tracking. ACM Trans Multimedia Comput Commun Appl (TOMM) 18(3):1–18

    Article  Google Scholar 

  17. Zhou J, Wang P, Sun H (2020) Discriminative and robust online learning for Siamese visual tracking. Proc AAAI Conf Artif Intell 34(07):13017–13024. https://doi.org/10.1609/aaai.v34i07.7002

    Article  Google Scholar 

  18. Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4644–4654. https://doi.org/10.1109/CVPR.2019.00478

  19. Dai K, Zhang Y, Wang D, Li J, Lu H, Yang X (2020) High-performance long-term tracking with meta-updater. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6297–6306. https://doi.org/10.1109/CVPR42600.2020.00633

  20. Zhou L, Ding X, Li W, Leng J, Lei B, Yang W (2022) A location-aware Siamese network for high-speed visual tracking. Appl Intell. https://doi.org/10.1007/s10489-022-03636-8

    Article  Google Scholar 

  21. Wang G, Luo C, Xiong Z, Zeng W (2019) SPM-tracker: series-parallel matching for real-time visual object tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3638–3647. https://doi.org/10.1109/CVPR.2019.00376

  22. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1328–1338. https://doi.org/10.1109/CVPR.2019.00142

  23. Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) Siam R-CNN: visual tracking by re-detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6577–6587. https://doi.org/10.1109/CVPR42600.2020.00661

  24. Cheng S, Zhong B, Li G, Liu X, Tang Z, Li X, Wang J (2021) Learning to filter: Siamese relation network for robust tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4421–4431

  25. Tan K, Xu T-B, Wei Z (2022) Imsiam: IOU-aware matching-adaptive Siamese network for object tracking. Neurocomputing 492:222–233. https://doi.org/10.1016/j.neucom.2022.04.003

    Article  Google Scholar 

  26. Zhang Z, Peng H (2019) Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4586–4595. https://doi.org/10.1109/CVPR.2019.00472

  27. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6667–6676 . https://doi.org/10.1109/CVPR42600.2020.00670

  28. Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46

  29. Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7944–7953. https://doi.org/10.1109/CVPR.2019.00814

  30. Saribas H, Cevikalp H, Köpüklü O, Uzun B (2022) Trat: tracking by attention using spatio-temporal features. Neurocomputing 492:150–161. https://doi.org/10.1016/j.neucom.2022.04.043

    Article  Google Scholar 

  31. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Conf Artif Intell 34(07):12549–12556. https://doi.org/10.1609/aaai.v34i07.6944

    Article  Google Scholar 

  32. Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6181–6190. https://doi.org/10.1109/ICCV.2019.00628

  33. Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7181–7190. https://doi.org/10.1109/CVPR42600.2020.00721

  34. Chen S, Qiu C, Zhang Z (2022) An efficient method for tracking failure detection using parallel correlation filtering and Siamese network. Appl Intell 52(7):7713–7722. https://doi.org/10.1007/s10489-021-02768-7

    Article  Google Scholar 

  35. Zhou Y, Zhang Y (2022) Siamet: a Siamese based visual tracking network with enhanced templates. Appl Intell 52(9):9782–9794. https://doi.org/10.1007/s10489-021-03057-z

    Article  Google Scholar 

  36. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226

    Article  Google Scholar 

  37. Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: gradient-guided network for visual object tracking. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6161–6170 . https://doi.org/10.1109/ICCV.2019.00626

  38. Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for Siamese trackers. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4009–4018. https://doi.org/10.1109/ICCV.2019.00411

  39. Blatter P, Kanakis M, Danelljan M, Van Gool L (2021) Efficient visual tracking with exemplar transformers. arXiv preprint arXiv:2112.09686

  40. Chen X, Wang D, Li D, Lu H (2022) Efficient visual tracking via hierarchical cross-attention transformer. arXiv preprint arXiv:2203.13537

  41. Yan B, Peng H, Wu K, Wang D, Fu J, Lu H (2021) Lighttrack: finding lightweight neural networks for object tracking via one-shot architecture search. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15175–15184. https://doi.org/10.1109/CVPR46437.2021.01493

  42. Zhou L, Ding X, Li W, Leng J, Lei B, Yang W (2022) A location-aware Siamese network for high-speed visual tracking. Appl Intell 1–17

  43. Gao L, Liu B, Fu P, Xu M, Li J (2022) Visual tracking via dynamic saliency discriminative correlation filter. Appl Intell 52(6):5897–5911

    Article  Google Scholar 

  44. Kristan M, Leonardis A, Matas J, Felsberg M (2019) The sixth visual object tracking vot2018 challenge results. In: Leal-Taixé L, Roth S (eds) Computer Vision—ECCV 2018 Workshops. Springer, Cham, pp 3–53

    Chapter  Google Scholar 

  45. Sun C, Wang D, Lu H, Yang M-H (2018) Correlation tracking via joint discrimination and reliability learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 489–497. https://doi.org/10.1109/CVPR.2018.00058

  46. Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018. Springer, Cham, pp 493–509

    Chapter  Google Scholar 

  47. Xu T, Feng Z, Wu X-J, Kittler J (2021) Adaptive channel selection for robust visual object tracking with discriminative correlation filters. Int J Comput Vis. https://doi.org/10.1007/s11263-021-01435-1

    Article  MATH  Google Scholar 

  48. Luo Y, Xiao H, Ou J, Chen X (2022) Siamsmdfff: Siamese network tracker based on shallow-middle-deep three-level feature fusion and clustering-based adaptive rectangular window filtering. Neurocomputing 483:160–170. https://doi.org/10.1016/j.neucom.2022.02.027

    Article  Google Scholar 

  49. Danelljan M, Bhat G, Khan FS, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6931–6939. https://doi.org/10.1109/CVPR.2017.733

  50. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4293–4302. https://doi.org/10.1109/CVPR.2016.465

  51. Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2022) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimedia 24:1956–1967. https://doi.org/10.1109/TMM.2021.3074239

    Article  Google Scholar 

  52. Huang L, Zhao X, Huang K (2021) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464

    Article  Google Scholar 

  53. Yang T, Xu P, Hu R, Chai H, Chan AB (2020) Roam: recurrently optimizing tracking model. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6717–6726 . https://doi.org/10.1109/CVPR42600.2020.00675

  54. Xing D, Evangeliou N, Tsoukalas A, Tzes A (2022) Siamese transformer pyramid networks for real-time UAV tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2139–2148

  55. Müller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018. Springer, Cham, pp 310–327

    Chapter  Google Scholar 

  56. Galoogahi HK, Fagg A, Huang C, Ramanan D, Lucey S (2017) Need for speed: a benchmark for higher frame rate object tracking. In: 2017 IEEE international conference on computer vision (ICCV), pp 1134–1143. https://doi.org/10.1109/ICCV.2017.128

  57. Danelljan M, Robinson A, Khan F, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking

  58. Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, Yang M-H (2019) Hedging deep features for visual tracking. IEEE Trans Pattern Anal Mach Intell 41(5):1116–1130. https://doi.org/10.1109/TPAMI.2018.2828817

    Article  Google Scholar 

  59. Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 3119–3127. https://doi.org/10.1109/ICCV.2015.357

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meng, F., Gong, X. & Zhang, Y. RHL-track: visual object tracking based on recurrent historical localization. Neural Comput & Applic 35, 12611–12625 (2023). https://doi.org/10.1007/s00521-023-08422-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08422-2

Keywords