Skip to main content
Log in

A location-aware siamese network for high-speed visual tracking

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Accurately locating the target position is a challenging task during high-speed visual tracking. Most Siamese trackers based on shallow networks can maintain a fast speed, but they have poor positioning performance. The underlying reason for this is that the appearance features extracted from the shallow network are not effective enough, making it difficult to accurately locate the target from the complex background. Therefore, we present a location-aware Siamese network to address this issue. Specifically, we propose a novel context enhancement module (CEM), which contributes to capturing distinguished object information from both the local and the global levels. At the local level, the features of image local blocks contain more discriminative information that is conductive to locating the target. At the global level, global context information can effectively model long-range dependency, meaning that our tracker can better understand the tracking scene. Then, we construct a well-designed feature fusion network (F-net) to make full use of context information at different scales, where the location block can dynamically adjust to the convolution direction according to the geometry of the target. Finally, Distance-IoU loss (DIoU) is employed to guide the tracker to obtain a more accurate estimation of the target position. Extensive experiments on seven benchmarks including the VOT2016, VOT2018, VOT2019, OTB50, OTB100, UAV123 and LaSOT demonstrate that our tracker achieves competitive results while running at over 200 frames-per-second (FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. the results come from the official website.

  2. the tracking results come from the paper.

References

  1. Rios-Cabrera R, Tuytelaars T, Van Gool L (2012) Efficient multi-camera vehicle detection, tracking, and identification in a tunnel surveillance application. Comput Vis Image Underst 116(6):742–753

    Article  Google Scholar 

  2. Xing J, Ai H, Lao S (2010) Multiple human tracking based on multi-view upper-body detection and discriminative learning. In: 2010 20th international conference on pattern recognition. IEEE, pp 1698–1701

  3. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550

  4. Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  5. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  6. Bertinetto L, Valmadre J, Henriques J, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865

  7. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980

  8. Xuan S, Li S, Zhao Z, Zhou Z, Zhang W, Tan H, Xia G, Gu Y (2021) Rotation adaptive correlation filter for moving object tracking in satellite videos. Neurocomputing 438:94–106

    Article  Google Scholar 

  9. Tai Y, Tan Y, Xiong S, Tian J (2021) Subspace reconstruction based correlation filter for object tracking. Comput Vis Image Underst 212:103272

    Article  Google Scholar 

  10. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318

  11. Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813

  12. Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 459–474

  13. Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7952–7961

  14. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 101–117

  15. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4591–4600

  16. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291

  17. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338

  18. Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: European conference on computer vision. Springer, pp 771–787

  19. Zeng Y, Zeng B, Yin X, Chen G (2021) Siampcf: siamese point regression with coarse-fine classification network for visual tracking. Appl Intell, 1–14

  20. Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326

  21. Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457

  22. Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4148–4157

  23. Xie S, Hu H (2018) Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans Multimed 21(1):211–220

    Article  Google Scholar 

  24. Zhang H, Su W, Yu J, Wang Z (2021) Weakly supervised local-global relation network for facial expression recognition. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 1040–1046

  25. Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X (2020) Siamese local and global networks for robust face tracking. IEEE Trans Image Process 29:9152–9164

    Article  MATH  Google Scholar 

  26. Zhou W, Wen L, Zhang L, Du D, Luo T, Wu Y (2021) Siamcan: real-time visual tracking based on siamese center-aware network. IEEE Trans Image Process 30:3597–3609

    Article  Google Scholar 

  27. Thanikasalam K, Fookes C, Sridharan S, Ramanan A, Pinidiyaarachchi A (2019) Target-specific siamese attention network for real-time object tracking. IEEE Trans Inf Forensics Secur 15:1276–1289

    Article  Google Scholar 

  28. Fan J, Song H, Zhang K, Yang K, Liu Q (2020) Feature alignment and aggregation siamese networks for fast visual tracking. IEEE Trans Circuits Syst Video Technol 31(4):1296–1307

    Article  Google Scholar 

  29. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677

  30. Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277

  31. Zhao S, Xu T, Wu X-J, Zhu X-F (2021) Adaptive feature fusion for visual object tracking. Pattern Recogn 111:107679

    Article  Google Scholar 

  32. Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429

  33. Zheng L, Chen Y, Tang M, Wang J, Lu H (2020) Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401:36–47

    Article  Google Scholar 

  34. Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211

    Article  Google Scholar 

  35. Zhou L, He Y, Li W, Mi J, Lei B (2021) Iou-guided siamese region proposal network for real-time visual tracking. Neurocomputing 462:544–554

    Article  Google Scholar 

  36. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400

  37. Zhao F, Zhang T, Song Y, Tang M, Wang X, Wang J (2020) Siamese regression tracking with reinforced template updating. IEEE Trans Image Process 30:628–640

    Article  Google Scholar 

  38. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  39. Zhao F, Wang J, Wu Y, Tang M (2018) Adversarial deep tracking. IEEE Trans Circuits Syst Video Technol 29(7):1998–2011

    Article  Google Scholar 

  40. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252

    Article  Google Scholar 

  41. Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5296–5305

  42. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  43. Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577

    Article  Google Scholar 

  44. Hadfield S, Bowden R, Lebeda K (2016) The visual object tracking vot2016 challenge results. Lect Notes Comput Sci 9914:777–823

    Article  Google Scholar 

  45. Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the european conference on computer vision (ECCV) workshops, pp 0–0

  46. Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J-K, Cehovin Zajc L, Drbohlav O, Lukezic A, Berg A et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0

  47. Wu Y, Lim J, Yang M-H (2013) Online object tracking: A benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418

  48. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision. Springer, pp 445–461

  49. Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5374–5383

  50. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12549–12556

  51. Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6728–6737

  52. Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9543–9552

  53. Zhao M, Okada K, Inaba M (2021) Trtr: Visual tracking with transformer. arXiv:2105.03817

  54. Chen BX, Tsotsos JK (2019) Fast visual object tracking with rotated bounding boxes. arXiv:1907.03892

  55. Yang T, Xu P, Hu R, Chai H, Chan AB (2020) Roam: Recurrently optimizing tracking model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6718–6727

  56. Yang K, He Z, Zhou Z, Fan N (2020) Siamatt: Siamese attention network for visual tracking. Knowledge-based systems 203:106079

    Article  Google Scholar 

  57. Gao J, Zhang T, Yang X, Xu C (2017) Deep relative tracking. IEEE Trans Image Process 26(4):1845–1858

    Article  MATH  Google Scholar 

  58. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669

  59. Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, September 1-5, 2014. Bmva Press

  60. Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488

  61. Voigtlaender P, Luiten J, Torr PH, Leibe B (2020) Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6578–6588

  62. Sun C, Wang D, Lu H, Yang M-H (2018) Learning spatial-aware regressions for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8962–8970

  63. Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378

  64. Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6182–6191

  65. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646

  66. Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision. Springer, pp 254–265

  67. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

Download references

Acknowledgements

This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K201900601 and No.KJQN-202100627), the National Natural Science Foundation of China (No.62036007, No.62050175 and No.62102057), the National Natural Science Foundation of Chongqing (No.cstc2019jcyj-msxmX0461), Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (No.2020SDSJ01), the Construction fund for Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2019ZYYD007), Chongqing Excellent Scientist Project (No.cstc2021ycjh-bgzxm0339) and the Graduate Scientific Research and Innovation Foundation of Chongqing (No.CYS21304 and No.CYS20267).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lifang Zhou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, L., Ding, X., Li, W. et al. A location-aware siamese network for high-speed visual tracking. Appl Intell 53, 4431–4447 (2023). https://doi.org/10.1007/s10489-022-03636-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03636-8

Keywords

Navigation