Learning bi-grained cross-correlation siamese networks for visual tracking

Zhao, Defang; Ma, Chao; Zhu, Dandan; Shuai, Jia; Lu, Jianwei

doi:10.1007/s10489-021-03015-9

Learning bi-grained cross-correlation siamese networks for visual tracking

Published: 02 February 2022

Volume 52, pages 12175–12190, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Defang Zhao¹,
Chao Ma²,
Dandan Zhu ORCID: orcid.org/0000-0003-0329-6321³,
Jia Shuai² &
…
Jianwei Lu¹

323 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Siamese network based trackers measure the similarity between a target template and a search region by computing their cross-correlation. Specifically, Siamese trackers regard the target template as a spatial filter to convolve the search region, putting emphasis on the coarse-grained semantic abstraction of the target in the spatial domain. Along with the demonstrated success of Siamese trackers, little attention has been paid to fine-grained spatial details in cross-correlation computation, which is crucial to precise target localization. In this paper, we propose to learn point-wise cross-correlation Siamese networks for visual tracking. By sketching the contour of the target, the proposed point-wise cross-correlation module helps Siamese networks to be aware of the distinctive boundary between the target and background. In conjunction with traditional depth-wise cross-correlation, the proposed Siamese network takes both advantages of coarse-grained semantic abstraction and fine-grained details to precisely locate the target. Extensive experiments demonstrate the effectiveness and efficiency of the proposed tracker, which achieves new state-of-the-art results on five visual tracking benchmarks including VOT2016, VOT2018, VOT2019, OTB100, and LaSOT with the speed of 38 FPS. As an extra benefit, our tracker can output the segmentation mask for the target. We demonstrate the favorable performance of our tracker on the video object segmentation datasets in comparison with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

End-to-End Object Detection with Transformers

References

Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Article Google Scholar
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: IEEE Conference on computer vision and pattern recognition, pp 4282–4291
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: a unifying approach. In: IEEE Conference on computer vision and pattern recognition, pp 1328–1338
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: IEEE Conference on computer vision and pattern recognition, pp 6668–6677
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking. In: IEEE Conference on computer vision and pattern recognition, pp 6269–6277
Purves D, Augustine GJ, Fitzpatrick D, Hall WC, LaMantia AS, McNamara JO, White LE (2008) Neuroscience, 4th edn. Oxford University Press
Bertinetto L, Valmadre J, Henriques J, Vedaldi A, Torr PHS (2016) Fully-Convolutional Siamese networks for object tracking. In: European conference on computer vision, pp 850–865
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: IEEE International conference on computer vision, pp 1763– 1771
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: IEEE Conference on computer vision and pattern recognition, pp 4854–4863
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: IEEE Conference on computer vision and pattern recognition, pp 4834–4843
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: IEEE Conference on computer vision and pattern recognition, pp 8971–8980
Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: European conference on computer vision, pp 734–750
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: IEEE Conference on computer vision and pattern recognition, pp 850–859
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: Keypoint triplets for object detection. In: IEEE Conference on computer vision and pattern recognition, pp 6569–6578
Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully Convolutional One-Stage object detection. In: IEEE International conference on computer vision, pp 9627–9636
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2020) Polarmask: Single Shot Instance Segmentation with Polar Representation. In: IEEE Conference on computer vision and pattern recognition, pp 12193–12202
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Li FF (2015) Imagenet Large Scale Visual Recognition Challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: IEEE Conference on computer vision and pattern recognition, pp 5296–5305
Xu N, Yang L, Fan Y, Yang J, Yue D, Liang Y, Price B, Cohen S, Huang T (2018) Youtube-VOS: Sequence-to-sequence Video Object Segmentation. In: European conference on computer vision, pp 585–601
Huang L, Zhao X, Huang K (2020) GOT-10K: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence
Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) LaSOT: A high-quality benchmark for large-scale single object tracking. In: IEEE Conference on computer vision and pattern recognition, pp 5374–5383
Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnickg CL (2014) Microsoft COCO: Common objects in context. In: European conference on computer vision, pp 740–755
Nam H, Baek M, Han B (2016) Modeling and Propagating CNNs in a Tree Structure for Visual Tracking, [Online]. Available: arXiv:1608.07242
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision, pp 472–488
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: Efficient Convolution operators for tracking. In: IEEE Conference on computer vision and pattern recognition, pp 6638–6646
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: European conference on computer vision, pp 101–117
Sun C, Wang D, Lu H, Yang MH (2018) Correlation tracking via joint discrimination and reliability learning. In: IEEE Conference on computer vision and pattern recognition, pp 489–497
Chen B, Tsotsos JK (2019) Fast visual object tracking with rotated bounding boxes, [Online]. Available: arXiv:1907.03892
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. In: AAAI Conference on artificial intelligence, pp 12549–12556
Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: European conference on computer vision, pp 483–498
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc LC, Vojır T, Bhat G, Lukezic A, Eldesokey A et al (2018) The sixth Visual Object Tracking VOT2018 challenge results. In: European conference on computer vision
Xu T, Feng Z, Wu X, Kittler J (2019) Learning adaptive discriminative correlation ffilters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28(11):5596–5609
Article MathSciNet Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: Accurate Tracking by overlap maximization. In: IEEE Conference on computer vision and pattern recognition, pp 4460–4469
Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: IEEE International conference on computer vision, pp 6182–6191
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc LC, Vojır T, Bhat G, Lukezic A, Eldesokey A et al (2016) The visual object tracking vot2016 challenge results. In: European conference on computer vision
Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen JK, Zajc LC, Drbohlav O, Lukezic A, Berg A et al (2019) And The Seventh Visual Object Tracking VOT2019 Challenge Results. In: IEEE International conference on computer vision workshops
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on computer vision and pattern recognition, pp 4293–4302
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: IEEE Conference on computer vision and pattern recognition, pp 7952–7961
Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: Gradient-guided network for visual object tracking. In: IEEE International conference on computer vision workshops, pp 6162–6171
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for Correlation Filter based tracking. In: IEEE Conference on computer vision and pattern recognition, pp 2805–2813
Wang G, Luo C, Xiong Z, Zeng W (2019) SPM-Tracker: Series-parallel matching for real-time visual object tracking. In: IEEE Conference on computer vision and pattern recognition, pp 3643–3652
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: IEEE Conference on computer vision and pattern recognition, pp 4591–4600
Perazzi F, Pont-Tuset J, McWilliams B, Gool LV, Gross M, Sorkine-Hornung A (2017) A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE Conference on computer vision and pattern recognition, pp 724– 732
Pont-Tuset J, Perazzi F, Caelles S, Arbelaez P, Sorkine-Hornung A, Gool LV (2017) The 2017 davis challenge on video object segmentation, [Online]. Available: arXiv:1704.00675
Jampan V, Gadde R, Gehler PV (2017) Video propagation networks. In: IEEE Conference on computer vision and pattern recognition, pp 451–461
Cheng J, Tsai YH, Hung WC, Wang S, Yang MH (2018) Fast and accurate online video object segmentation via tracking parts. In: IEEE Conference on computer vision and pattern recognition, pp 7415–7424
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for the 2017 davis challenge on video object segmentation. In: The 2017 DAVIS challenge on video object segmentation-CVPR workshops 5(6)
Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixe L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: IEEE Conference on computer vision and pattern recognition, pp 221–230
Yang L, Wang Y, Xiong X, Yang J, Katsaggelos AK (2018) Efficient video object segmentation via network modulation. In: IEEE Conference on computer vision and pattern recognition, pp 6499–6507
Dai K, Wang D, Lu H, Sun C, Li J (2019) Visual tracking via adaptive spatially-regularized correlation filters. In: IEEE Conference on computer vision and pattern recognition, pp 4670–4679
Liang Y, He F, Zeng X (2020) 3D mesh simplification with feature preservation based on Whale Optimization Algorithm and Differential Evolution. Integrated Computer-Aided Engineering, pp 1–19
Chen Y, He F, Li H, Zhang D, Wu Y (2020) A full migration BBO algorithm with enhanced population quality bounds for multimodal biomedical image registration. Appl Soft Comput:93
Quan Q, He F, Li H (2021) A multi-phase blending method with incremental intensity for training detection networks. Vis Comput 37(2):245–259
Article Google Scholar
Zhang S, He F (2020) DRCDN: Learning deep residual convolutional dehazing networks. Vis Comput 36(9):1797–1808
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering, Tongji University, Shanghai, 201804, China
Defang Zhao & Jianwei Lu
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
Chao Ma & Jia Shuai
School of Computer Science and Technology, Donghua University, Shanghai, 201620, China
Dandan Zhu

Authors

Defang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Dandan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Shuai
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dandan Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, D., Ma, C., Zhu, D. et al. Learning bi-grained cross-correlation siamese networks for visual tracking. Appl Intell 52, 12175–12190 (2022). https://doi.org/10.1007/s10489-021-03015-9

Download citation

Accepted: 29 October 2021
Published: 02 February 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10489-021-03015-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning bi-grained cross-correlation siamese networks for visual tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning bi-grained cross-correlation siamese networks for visual tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation