Skip to main content

Advertisement

Log in

Feature-comparison network for visual tracking

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Siamese networks for visual tracking have been widely applied due to their good performance. However, the performance of Siamese networks relies on the selection of several hyperparameters, including the cosine window weight and target scale penalty. Inappropriate parameter selection will lead to biased target localization and unsteady tracking. The parameter selection is dataset-specific and time-consuming. The necessity of these parameters is caused by the diffused and background-interfered target response map. In addition, the comparison between the target template and candidates in Siamese networks is performed by a simple inner product, which is linear, unbounded, covariate shifted, and cannot benefit the learning of target-background discriminant features. To address the above issues, a novel feature-comparison network (FCNet) has been developed, which combines a feature extraction network and a feature comparison network. First, an RoIAlign layer is incorporated for efficient target proposal generation. Then, the Siamese structure is borrowed to form the feature extraction network but with a different network architecture. Instead of the simple inner product in Siamese networks, a feature concatenation and comparison structure have been adopted for sample feature similarity evaluation, which has combined several convolutional and fully-connected layers for similarity computation. The comparison network, which is nonlinear, bounded and covariate unshifted, performs more efficient correlation computation and provides similarity feedback for target-background discriminant feature learning with stronger representation and generalization. A more compact and target-dominant response map has been obtained by FCNet, which assures robust and steady tracking. Experiments on benchmarks OTB2013, OTB2015, VOT2016 and UAV123 show that FCNet has obtained state-of-the-art real-time tracking performance with 30 FPS. The code and models will be available on GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418

  2. Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  3. Lu N, Wu Y, Feng L, Song J (2018) Deep learning for fall detection: three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323

    Article  Google Scholar 

  4. Shi X, Lu N, Cui Z (2019) Smoke detection based on dark channel and convolutional neural networks. In: 2019 5th international conference on big data and information analytics (BigDIA), IEEE, pp 23–28

  5. Zhang T, Sun X, Li X, Yi Z (2021) Image generation and constrained two-stage feature fusion for person re-identification. Appl Intell :1–11

  6. Weng Y, Sun Y, Jiang D, Tao B, Liu Y, Yun J, Zhou D (2021) Enhancement of real-time grasp detection by cascaded deep convolutional neural networks. Concurr Comput Pract Experience 33 (5):5976

    Article  Google Scholar 

  7. Gao Q, Liu J, Ju Z, Zhang X (2019) Dual-hand detection for human–robot interaction by a parallel network based on hand detection and body pose estimation. IEEE Trans Ind Electron 66(12):9663–9672

    Article  Google Scholar 

  8. Cui Z, Wang Q, Guo J, Lu N (2022) Few-shot classification of faċade defects based on extensible classifier and contrastive learning. Autom Constr 141:104381

    Article  Google Scholar 

  9. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, Springer, pp 850–865

  10. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4591–4600

  11. Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell :1–30

  12. Cui Z, Lu N, Wang W (2022) Pseudo loss active learning for deep visual tracking. Pattern Recognition :108773

  13. He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4834–4843

  14. Li T, Wu P, Ding F, Yang W (2020) Parallel dual networks for visual object tracking. Appl Intell 50(12):4631–4646

    Article  Google Scholar 

  15. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361

  19. Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208

  20. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  21. Cui Z, Lu N, Jing X, Shi X (2018) Fast dynamic convolutional neural networks for visual tracking. In: Asian conference on machine learning, pp 770–785

  22. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550

  23. Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, Springer, pp 702– 715

  24. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  25. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision, Springer, pp 472–488

  26. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646

  27. Touil DE, Terki N, Medouakh S (2018) Learning spatially correlation filters based on convolutional features via pso algorithm and two combined color spaces for visual tracking. Appl Intell 48(9):2837–2846

    Article  Google Scholar 

  28. Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813

  29. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  30. Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4649–4659

  31. Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211

    Article  Google Scholar 

  32. Huang W, Gu J, Ma X, Li Y (2020) End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking. Appl Intell 50(6):1908–1921

    Article  Google Scholar 

  33. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  34. Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision (ECCV), pp 83–98

  35. Cui Z, Lu N (2021) Feature selection accelerated convolutional neural networks for visual tracking. Appl Intell :1–15

  36. Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564

  37. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision, Springer, pp 749–765

  38. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4660–4669

  39. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  40. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  MATH  Google Scholar 

  41. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314

    Article  MathSciNet  MATH  Google Scholar 

  42. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition?. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 2146–2153

  43. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456

  44. Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks, Springer, pp 382–391

  45. Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 5265–5274

  46. Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429

  47. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318

  48. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  49. Danelljan M, Häger G., Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575

    Article  Google Scholar 

  50. Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Čehovin Zajc L, Vojír T, Häger G, Lukežič A, Fernandez Dominguez G, Gupta A, Petrosino A, Memarmoghadam A, Garcia-Martin A, Montero A, Vedaldi A, Robinson A, Ma A, Varfolomieiev A, Chi Z (2016) The visual object tracking vot2016 challenge results. 9914: 777–823

  51. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision, Springer, pp 445–461

  52. Zhang J, Ma S, Sclaroff S (2014) Meem: robust tracking via multiple experts using entropy minimization. In: European conference on computer vision, Springer, pp 188–203

  53. Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, Springer, pp 254–265

  54. Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D (2015) Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 749–758

  55. Hare S, Golodetz S, Saffari A, Vineet V, Cheng M-M, Hicks SL, Torr PH (2015) Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109

    Article  Google Scholar 

  56. Jia X, Lu H, Yang M-H (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1822–1829

  57. Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422

    Article  Google Scholar 

  58. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China grant 61876147.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Na Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, Z., Lu, N. Feature-comparison network for visual tracking. Appl Intell 53, 18263–18276 (2023). https://doi.org/10.1007/s10489-023-04466-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04466-y

Keywords