Abstract
Accurately locating the target position is a challenging task during high-speed visual tracking. Most Siamese trackers based on shallow networks can maintain a fast speed, but they have poor positioning performance. The underlying reason for this is that the appearance features extracted from the shallow network are not effective enough, making it difficult to accurately locate the target from the complex background. Therefore, we present a location-aware Siamese network to address this issue. Specifically, we propose a novel context enhancement module (CEM), which contributes to capturing distinguished object information from both the local and the global levels. At the local level, the features of image local blocks contain more discriminative information that is conductive to locating the target. At the global level, global context information can effectively model long-range dependency, meaning that our tracker can better understand the tracking scene. Then, we construct a well-designed feature fusion network (F-net) to make full use of context information at different scales, where the location block can dynamically adjust to the convolution direction according to the geometry of the target. Finally, Distance-IoU loss (DIoU) is employed to guide the tracker to obtain a more accurate estimation of the target position. Extensive experiments on seven benchmarks including the VOT2016, VOT2018, VOT2019, OTB50, OTB100, UAV123 and LaSOT demonstrate that our tracker achieves competitive results while running at over 200 frames-per-second (FPS).
Similar content being viewed by others
Notes
the results come from the official website.
the tracking results come from the paper.
References
Rios-Cabrera R, Tuytelaars T, Van Gool L (2012) Efficient multi-camera vehicle detection, tracking, and identification in a tunnel surveillance application. Comput Vis Image Underst 116(6):742–753
Xing J, Ai H, Lao S (2010) Multiple human tracking based on multi-view upper-body detection and discriminative learning. In: 2010 20th international conference on pattern recognition. IEEE, pp 1698–1701
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Bertinetto L, Valmadre J, Henriques J, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Xuan S, Li S, Zhao Z, Zhou Z, Zhang W, Tan H, Xia G, Gu Y (2021) Rotation adaptive correlation filter for moving object tracking in satellite videos. Neurocomputing 438:94–106
Tai Y, Tan Y, Xiong S, Tian J (2021) Subspace reconstruction based correlation filter for object tracking. Comput Vis Image Underst 212:103272
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 459–474
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7952–7961
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 101–117
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4591–4600
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: European conference on computer vision. Springer, pp 771–787
Zeng Y, Zeng B, Yin X, Chen G (2021) Siampcf: siamese point regression with coarse-fine classification network for visual tracking. Appl Intell, 1–14
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4148–4157
Xie S, Hu H (2018) Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans Multimed 21(1):211–220
Zhang H, Su W, Yu J, Wang Z (2021) Weakly supervised local-global relation network for facial expression recognition. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 1040–1046
Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X (2020) Siamese local and global networks for robust face tracking. IEEE Trans Image Process 29:9152–9164
Zhou W, Wen L, Zhang L, Du D, Luo T, Wu Y (2021) Siamcan: real-time visual tracking based on siamese center-aware network. IEEE Trans Image Process 30:3597–3609
Thanikasalam K, Fookes C, Sridharan S, Ramanan A, Pinidiyaarachchi A (2019) Target-specific siamese attention network for real-time object tracking. IEEE Trans Inf Forensics Secur 15:1276–1289
Fan J, Song H, Zhang K, Yang K, Liu Q (2020) Feature alignment and aggregation siamese networks for fast visual tracking. IEEE Trans Circuits Syst Video Technol 31(4):1296–1307
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277
Zhao S, Xu T, Wu X-J, Zhu X-F (2021) Adaptive feature fusion for visual object tracking. Pattern Recogn 111:107679
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Zheng L, Chen Y, Tang M, Wang J, Lu H (2020) Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401:36–47
Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211
Zhou L, He Y, Li W, Mi J, Lei B (2021) Iou-guided siamese region proposal network for real-time visual tracking. Neurocomputing 462:544–554
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Zhao F, Zhang T, Song Y, Tang M, Wang X, Wang J (2020) Siamese regression tracking with reinforced template updating. IEEE Trans Image Process 30:628–640
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Zhao F, Wang J, Wu Y, Tang M (2018) Adversarial deep tracking. IEEE Trans Circuits Syst Video Technol 29(7):1998–2011
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5296–5305
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Hadfield S, Bowden R, Lebeda K (2016) The visual object tracking vot2016 challenge results. Lect Notes Comput Sci 9914:777–823
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the european conference on computer vision (ECCV) workshops, pp 0–0
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J-K, Cehovin Zajc L, Drbohlav O, Lukezic A, Berg A et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0
Wu Y, Lim J, Yang M-H (2013) Online object tracking: A benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision. Springer, pp 445–461
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5374–5383
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12549–12556
Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6728–6737
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9543–9552
Zhao M, Okada K, Inaba M (2021) Trtr: Visual tracking with transformer. arXiv:2105.03817
Chen BX, Tsotsos JK (2019) Fast visual object tracking with rotated bounding boxes. arXiv:1907.03892
Yang T, Xu P, Hu R, Chai H, Chan AB (2020) Roam: Recurrently optimizing tracking model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6718–6727
Yang K, He Z, Zhou Z, Fan N (2020) Siamatt: Siamese attention network for visual tracking. Knowledge-based systems 203:106079
Gao J, Zhang T, Yang X, Xu C (2017) Deep relative tracking. IEEE Trans Image Process 26(4):1845–1858
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669
Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, September 1-5, 2014. Bmva Press
Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488
Voigtlaender P, Luiten J, Torr PH, Leibe B (2020) Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6578–6588
Sun C, Wang D, Lu H, Yang M-H (2018) Learning spatial-aware regressions for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8962–8970
Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6182–6191
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision. Springer, pp 254–265
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Acknowledgements
This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K201900601 and No.KJQN-202100627), the National Natural Science Foundation of China (No.62036007, No.62050175 and No.62102057), the National Natural Science Foundation of Chongqing (No.cstc2019jcyj-msxmX0461), Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (No.2020SDSJ01), the Construction fund for Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2019ZYYD007), Chongqing Excellent Scientist Project (No.cstc2021ycjh-bgzxm0339) and the Graduate Scientific Research and Innovation Foundation of Chongqing (No.CYS21304 and No.CYS20267).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, L., Ding, X., Li, W. et al. A location-aware siamese network for high-speed visual tracking. Appl Intell 53, 4431–4447 (2023). https://doi.org/10.1007/s10489-022-03636-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03636-8