Abstract
Siamese networks for visual tracking have been widely applied due to their good performance. However, the performance of Siamese networks relies on the selection of several hyperparameters, including the cosine window weight and target scale penalty. Inappropriate parameter selection will lead to biased target localization and unsteady tracking. The parameter selection is dataset-specific and time-consuming. The necessity of these parameters is caused by the diffused and background-interfered target response map. In addition, the comparison between the target template and candidates in Siamese networks is performed by a simple inner product, which is linear, unbounded, covariate shifted, and cannot benefit the learning of target-background discriminant features. To address the above issues, a novel feature-comparison network (FCNet) has been developed, which combines a feature extraction network and a feature comparison network. First, an RoIAlign layer is incorporated for efficient target proposal generation. Then, the Siamese structure is borrowed to form the feature extraction network but with a different network architecture. Instead of the simple inner product in Siamese networks, a feature concatenation and comparison structure have been adopted for sample feature similarity evaluation, which has combined several convolutional and fully-connected layers for similarity computation. The comparison network, which is nonlinear, bounded and covariate unshifted, performs more efficient correlation computation and provides similarity feedback for target-background discriminant feature learning with stronger representation and generalization. A more compact and target-dominant response map has been obtained by FCNet, which assures robust and steady tracking. Experiments on benchmarks OTB2013, OTB2015, VOT2016 and UAV123 show that FCNet has obtained state-of-the-art real-time tracking performance with 30 FPS. The code and models will be available on GitHub.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Lu N, Wu Y, Feng L, Song J (2018) Deep learning for fall detection: three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323
Shi X, Lu N, Cui Z (2019) Smoke detection based on dark channel and convolutional neural networks. In: 2019 5th international conference on big data and information analytics (BigDIA), IEEE, pp 23–28
Zhang T, Sun X, Li X, Yi Z (2021) Image generation and constrained two-stage feature fusion for person re-identification. Appl Intell :1–11
Weng Y, Sun Y, Jiang D, Tao B, Liu Y, Yun J, Zhou D (2021) Enhancement of real-time grasp detection by cascaded deep convolutional neural networks. Concurr Comput Pract Experience 33 (5):5976
Gao Q, Liu J, Ju Z, Zhang X (2019) Dual-hand detection for human–robot interaction by a parallel network based on hand detection and body pose estimation. IEEE Trans Ind Electron 66(12):9663–9672
Cui Z, Wang Q, Guo J, Lu N (2022) Few-shot classification of faċade defects based on extensible classifier and contrastive learning. Autom Constr 141:104381
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, Springer, pp 850–865
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4591–4600
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell :1–30
Cui Z, Lu N, Wang W (2022) Pseudo loss active learning for deep visual tracking. Pattern Recognition :108773
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4834–4843
Li T, Wu P, Ding F, Yang W (2020) Parallel dual networks for visual object tracking. Appl Intell 50(12):4631–4646
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Cui Z, Lu N, Jing X, Shi X (2018) Fast dynamic convolutional neural networks for visual tracking. In: Asian conference on machine learning, pp 770–785
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, Springer, pp 702– 715
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision, Springer, pp 472–488
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Touil DE, Terki N, Medouakh S (2018) Learning spatially correlation filters based on convolutional features via pso algorithm and two combined color spaces for visual tracking. Appl Intell 48(9):2837–2846
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4649–4659
Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211
Huang W, Gu J, Ma X, Li Y (2020) End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking. Appl Intell 50(6):1908–1921
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision (ECCV), pp 83–98
Cui Z, Lu N (2021) Feature selection accelerated convolutional neural networks for visual tracking. Appl Intell :1–15
Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision, Springer, pp 749–765
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4660–4669
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition?. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 2146–2153
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456
Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks, Springer, pp 382–391
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 5265–5274
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Danelljan M, Häger G., Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Čehovin Zajc L, Vojír T, Häger G, Lukežič A, Fernandez Dominguez G, Gupta A, Petrosino A, Memarmoghadam A, Garcia-Martin A, Montero A, Vedaldi A, Robinson A, Ma A, Varfolomieiev A, Chi Z (2016) The visual object tracking vot2016 challenge results. 9914: 777–823
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision, Springer, pp 445–461
Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: European conference on computer vision, Springer, pp 188–203
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, Springer, pp 254–265
Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D (2015) Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 749–758
Hare S, Golodetz S, Saffari A, Vineet V, Cheng M-M, Hicks SL, Torr PH (2015) Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109
Jia X, Lu H, Yang M-H (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1822–1829
Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
Acknowledgements
This work is supported by National Natural Science Foundation of China grant 61876147.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cui, Z., Lu, N. Feature-comparison network for visual tracking. Appl Intell 53, 18263–18276 (2023). https://doi.org/10.1007/s10489-023-04466-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04466-y