Feature-comparison network for visual tracking

Cui, Zhiyan; Lu, Na

doi:10.1007/s10489-023-04466-y

Feature-comparison network for visual tracking

Published: 26 January 2023

Volume 53, pages 18263–18276, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

319 Accesses
2 Citations
Explore all metrics

Abstract

Siamese networks for visual tracking have been widely applied due to their good performance. However, the performance of Siamese networks relies on the selection of several hyperparameters, including the cosine window weight and target scale penalty. Inappropriate parameter selection will lead to biased target localization and unsteady tracking. The parameter selection is dataset-specific and time-consuming. The necessity of these parameters is caused by the diffused and background-interfered target response map. In addition, the comparison between the target template and candidates in Siamese networks is performed by a simple inner product, which is linear, unbounded, covariate shifted, and cannot benefit the learning of target-background discriminant features. To address the above issues, a novel feature-comparison network (FCNet) has been developed, which combines a feature extraction network and a feature comparison network. First, an RoIAlign layer is incorporated for efficient target proposal generation. Then, the Siamese structure is borrowed to form the feature extraction network but with a different network architecture. Instead of the simple inner product in Siamese networks, a feature concatenation and comparison structure have been adopted for sample feature similarity evaluation, which has combined several convolutional and fully-connected layers for similarity computation. The comparison network, which is nonlinear, bounded and covariate unshifted, performs more efficient correlation computation and provides similarity feedback for target-background discriminant feature learning with stronger representation and generalization. A more compact and target-dominant response map has been obtained by FCNet, which assures robust and steady tracking. Experiments on benchmarks OTB2013, OTB2015, VOT2016 and UAV123 show that FCNet has obtained state-of-the-art real-time tracking performance with 30 FPS. The code and models will be available on GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Feature Selection Siamese Networks for Visual Tracking

Siamese Network Based Features Fusion for Adaptive Visual Tracking

RCFT: re-parameterization convolution and feature filter for object tracking

Article Open access 15 September 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Lu N, Wu Y, Feng L, Song J (2018) Deep learning for fall detection: three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323
Article Google Scholar
Shi X, Lu N, Cui Z (2019) Smoke detection based on dark channel and convolutional neural networks. In: 2019 5th international conference on big data and information analytics (BigDIA), IEEE, pp 23–28
Zhang T, Sun X, Li X, Yi Z (2021) Image generation and constrained two-stage feature fusion for person re-identification. Appl Intell :1–11
Weng Y, Sun Y, Jiang D, Tao B, Liu Y, Yun J, Zhou D (2021) Enhancement of real-time grasp detection by cascaded deep convolutional neural networks. Concurr Comput Pract Experience 33 (5):5976
Article Google Scholar
Gao Q, Liu J, Ju Z, Zhang X (2019) Dual-hand detection for human–robot interaction by a parallel network based on hand detection and body pose estimation. IEEE Trans Ind Electron 66(12):9663–9672
Article Google Scholar
Cui Z, Wang Q, Guo J, Lu N (2022) Few-shot classification of faċade defects based on extensible classifier and contrastive learning. Autom Constr 141:104381
Article Google Scholar
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, Springer, pp 850–865
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4591–4600
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell :1–30
Cui Z, Lu N, Wang W (2022) Pseudo loss active learning for deep visual tracking. Pattern Recognition :108773
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4834–4843
Li T, Wu P, Ding F, Yang W (2020) Parallel dual networks for visual object tracking. Appl Intell 50(12):4631–4646
Article Google Scholar
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Cui Z, Lu N, Jing X, Shi X (2018) Fast dynamic convolutional neural networks for visual tracking. In: Asian conference on machine learning, pp 770–785
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, Springer, pp 702– 715
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision, Springer, pp 472–488
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Touil DE, Terki N, Medouakh S (2018) Learning spatially correlation filters based on convolutional features via pso algorithm and two combined color spaces for visual tracking. Appl Intell 48(9):2837–2846
Article Google Scholar
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4649–4659
Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211
Article Google Scholar
Huang W, Gu J, Ma X, Li Y (2020) End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking. Appl Intell 50(6):1908–1921
Article Google Scholar
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision (ECCV), pp 83–98
Cui Z, Lu N (2021) Feature selection accelerated convolutional neural networks for visual tracking. Appl Intell :1–15
Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision, Springer, pp 749–765
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4660–4669
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article MATH Google Scholar
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Article MathSciNet MATH Google Scholar
Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition?. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 2146–2153
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456
Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks, Springer, pp 382–391
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 5265–5274
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Danelljan M, Häger G., Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Article Google Scholar
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Čehovin Zajc L, Vojír T, Häger G, Lukežič A, Fernandez Dominguez G, Gupta A, Petrosino A, Memarmoghadam A, Garcia-Martin A, Montero A, Vedaldi A, Robinson A, Ma A, Varfolomieiev A, Chi Z (2016) The visual object tracking vot2016 challenge results. 9914: 777–823
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision, Springer, pp 445–461
Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: European conference on computer vision, Springer, pp 188–203
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, Springer, pp 254–265
Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D (2015) Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 749–758
Hare S, Golodetz S, Saffari A, Vineet V, Cheng M-M, Hicks SL, Torr PH (2015) Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109
Article Google Scholar
Jia X, Lu H, Yang M-H (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1822–1829
Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China grant 61876147.

Author information

Authors and Affiliations

Systems Engineering Institute, School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
Zhiyan Cui & Na Lu

Authors

Zhiyan Cui
View author publications
You can also search for this author in PubMed Google Scholar
Na Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Na Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cui, Z., Lu, N. Feature-comparison network for visual tracking. Appl Intell 53, 18263–18276 (2023). https://doi.org/10.1007/s10489-023-04466-y

Download citation

Accepted: 09 January 2023
Published: 26 January 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-023-04466-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature-comparison network for visual tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Feature Selection Siamese Networks for Visual Tracking

Siamese Network Based Features Fusion for Adaptive Visual Tracking

RCFT: re-parameterization convolution and feature filter for object tracking

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now