A location-aware siamese network for high-speed visual tracking

Zhou, Lifang; Ding, Xiang; Li, Weisheng; Leng, Jiaxu; Lei, Bangjun; Yang, Weibin

doi:10.1007/s10489-022-03636-8

A location-aware siamese network for high-speed visual tracking

Published: 10 June 2022

Volume 53, pages 4431–4447, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Lifang Zhou^1,2,
Xiang Ding^1,2,
Weisheng Li²,
Jiaxu Leng²,
Bangjun Lei³ &
…
Weibin Yang⁴

454 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Accurately locating the target position is a challenging task during high-speed visual tracking. Most Siamese trackers based on shallow networks can maintain a fast speed, but they have poor positioning performance. The underlying reason for this is that the appearance features extracted from the shallow network are not effective enough, making it difficult to accurately locate the target from the complex background. Therefore, we present a location-aware Siamese network to address this issue. Specifically, we propose a novel context enhancement module (CEM), which contributes to capturing distinguished object information from both the local and the global levels. At the local level, the features of image local blocks contain more discriminative information that is conductive to locating the target. At the global level, global context information can effectively model long-range dependency, meaning that our tracker can better understand the tracking scene. Then, we construct a well-designed feature fusion network (F-net) to make full use of context information at different scales, where the location block can dynamically adjust to the convolution direction according to the geometry of the target. Finally, Distance-IoU loss (DIoU) is employed to guide the tracker to obtain a more accurate estimation of the target position. Extensive experiments on seven benchmarks including the VOT2016, VOT2018, VOT2019, OTB50, OTB100, UAV123 and LaSOT demonstrate that our tracker achieves competitive results while running at over 200 frames-per-second (FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Notes

the results come from the official website.
the tracking results come from the paper.

References

Rios-Cabrera R, Tuytelaars T, Van Gool L (2012) Efficient multi-camera vehicle detection, tracking, and identification in a tunnel surveillance application. Comput Vis Image Underst 116(6):742–753
Article Google Scholar
Xing J, Ai H, Lao S (2010) Multiple human tracking based on multi-view upper-body detection and discriminative learning. In: 2010 20th international conference on pattern recognition. IEEE, pp 1698–1701
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Bertinetto L, Valmadre J, Henriques J, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Xuan S, Li S, Zhao Z, Zhou Z, Zhang W, Tan H, Xia G, Gu Y (2021) Rotation adaptive correlation filter for moving object tracking in satellite videos. Neurocomputing 438:94–106
Article Google Scholar
Tai Y, Tan Y, Xiong S, Tian J (2021) Subspace reconstruction based correlation filter for object tracking. Comput Vis Image Underst 212:103272
Article Google Scholar
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 459–474
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7952–7961
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 101–117
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4591–4600
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: European conference on computer vision. Springer, pp 771–787
Zeng Y, Zeng B, Yin X, Chen G (2021) Siampcf: siamese point regression with coarse-fine classification network for visual tracking. Appl Intell, 1–14
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4148–4157
Xie S, Hu H (2018) Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans Multimed 21(1):211–220
Article Google Scholar
Zhang H, Su W, Yu J, Wang Z (2021) Weakly supervised local-global relation network for facial expression recognition. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 1040–1046
Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X (2020) Siamese local and global networks for robust face tracking. IEEE Trans Image Process 29:9152–9164
Article MATH Google Scholar
Zhou W, Wen L, Zhang L, Du D, Luo T, Wu Y (2021) Siamcan: real-time visual tracking based on siamese center-aware network. IEEE Trans Image Process 30:3597–3609
Article Google Scholar
Thanikasalam K, Fookes C, Sridharan S, Ramanan A, Pinidiyaarachchi A (2019) Target-specific siamese attention network for real-time object tracking. IEEE Trans Inf Forensics Secur 15:1276–1289
Article Google Scholar
Fan J, Song H, Zhang K, Yang K, Liu Q (2020) Feature alignment and aggregation siamese networks for fast visual tracking. IEEE Trans Circuits Syst Video Technol 31(4):1296–1307
Article Google Scholar
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277
Zhao S, Xu T, Wu X-J, Zhu X-F (2021) Adaptive feature fusion for visual object tracking. Pattern Recogn 111:107679
Article Google Scholar
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Zheng L, Chen Y, Tang M, Wang J, Lu H (2020) Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401:36–47
Article Google Scholar
Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211
Article Google Scholar
Zhou L, He Y, Li W, Mi J, Lei B (2021) Iou-guided siamese region proposal network for real-time visual tracking. Neurocomputing 462:544–554
Article Google Scholar
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Zhao F, Zhang T, Song Y, Tang M, Wang X, Wang J (2020) Siamese regression tracking with reinforced template updating. IEEE Trans Image Process 30:628–640
Article Google Scholar
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Zhao F, Wang J, Wu Y, Tang M (2018) Adversarial deep tracking. IEEE Trans Circuits Syst Video Technol 29(7):1998–2011
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252
Article Google Scholar
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5296–5305
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Article Google Scholar
Hadfield S, Bowden R, Lebeda K (2016) The visual object tracking vot2016 challenge results. Lect Notes Comput Sci 9914:777–823
Article Google Scholar
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the european conference on computer vision (ECCV) workshops, pp 0–0
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J-K, Cehovin Zajc L, Drbohlav O, Lukezic A, Berg A et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0
Wu Y, Lim J, Yang M-H (2013) Online object tracking: A benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision. Springer, pp 445–461
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5374–5383
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12549–12556
Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6728–6737
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9543–9552
Zhao M, Okada K, Inaba M (2021) Trtr: Visual tracking with transformer. arXiv:2105.03817
Chen BX, Tsotsos JK (2019) Fast visual object tracking with rotated bounding boxes. arXiv:1907.03892
Yang T, Xu P, Hu R, Chai H, Chan AB (2020) Roam: Recurrently optimizing tracking model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6718–6727
Yang K, He Z, Zhou Z, Fan N (2020) Siamatt: Siamese attention network for visual tracking. Knowledge-based systems 203:106079
Article Google Scholar
Gao J, Zhang T, Yang X, Xu C (2017) Deep relative tracking. IEEE Trans Image Process 26(4):1845–1858
Article MATH Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669
Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, September 1-5, 2014. Bmva Press
Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488
Voigtlaender P, Luiten J, Torr PH, Leibe B (2020) Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6578–6588
Sun C, Wang D, Lu H, Yang M-H (2018) Learning spatial-aware regressions for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8962–8970
Li X, Ma C, Wu B, He Z, Yang M-H (2019) Target-aware deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1369–1378
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6182–6191
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision. Springer, pp 254–265
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

Download references

Acknowledgements

This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K201900601 and No.KJQN-202100627), the National Natural Science Foundation of China (No.62036007, No.62050175 and No.62102057), the National Natural Science Foundation of Chongqing (No.cstc2019jcyj-msxmX0461), Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (No.2020SDSJ01), the Construction fund for Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2019ZYYD007), Chongqing Excellent Scientist Project (No.cstc2021ycjh-bgzxm0339) and the Graduate Scientific Research and Innovation Foundation of Chongqing (No.CYS21304 and No.CYS20267).

Author information

Authors and Affiliations

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
Lifang Zhou & Xiang Ding
Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
Lifang Zhou, Xiang Ding, Weisheng Li & Jiaxu Leng
Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, China
Bangjun Lei
Chongqing University Cancer Hospital, Chongqing University, Chongqing, China
Weibin Yang

Authors

Lifang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Ding
View author publications
You can also search for this author in PubMed Google Scholar
Weisheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxu Leng
View author publications
You can also search for this author in PubMed Google Scholar
Bangjun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Weibin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lifang Zhou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, L., Ding, X., Li, W. et al. A location-aware siamese network for high-speed visual tracking. Appl Intell 53, 4431–4447 (2023). https://doi.org/10.1007/s10489-022-03636-8

Download citation

Accepted: 14 April 2022
Published: 10 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03636-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A location-aware siamese network for high-speed visual tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A location-aware siamese network for high-speed visual tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation