Skip to main content

Advertisement

Log in

Ensemble siamese networks for object tracking

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In recent years, considering a balanced accuracy and efficiency, Fully-Convolutional Siamese network (SiamFC) is widely used in the field of visual tracking. Although SiamFC has achieved great success, it is still frustrated in discrimination especially in the discriminative scene. The main reason for the poor discrimination ability of SiamFC is that during the training process, it pays more attention to fitting the whole dataset than learning discrimination ability to similar objects. In terms of this issue, we propose Ensemble Siamese networks (ESiamFC) for tracking by introducing ensemble learning into SiamFC. In detail, firstly, we map the training dataset ILSVRC2015 into embedded space. Secondly, we use balanced k-means to cluster video features. Thirdly, in each cluster, we apply transfer learning into SiamFC to obtain k base trackers with their preferences. Last but not least, to leverage the diversity of base trackers, we propose a Cluster Weight fusion module which can automatically assign fusion weight to base trackers according to the semantic information of the tracking object. Extensive experiments on multiple benchmarks demonstrate that our tracker outperforms SiamFC in precision with a relative increase of 7.1%, 8.6%, 6.7% on Tcolor128, DTB70, LaSOT, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. National Conference on Artificial Intelligence, pp. 4140–4146

  2. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226

    Article  Google Scholar 

  3. Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans Image Process 24(12):5630–5644. https://doi.org/10.1109/TIP.2015.2482905

    Article  MathSciNet  MATH  Google Scholar 

  4. Bertinetto L, Valmadra J (2016) Fully-convolutional siamese networks for object tracking. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-319-48881-3_56

    Article  Google Scholar 

  5. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.465

    Article  Google Scholar 

  6. Wang L, Ouyang W, Wang X (2015) Visual tracking with fully convolutional networks. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2015.357

    Article  Google Scholar 

  7. Bolme DS, Beveridge JR (2010) Visual object tracking using adaptive correlation filters. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2010.5539960

    Article  Google Scholar 

  8. Danelljan M, Bhat G, Khan FS (2017) ECO: efficient convolution operators for tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.733

    Article  Google Scholar 

  9. Danelljan M, Hager G, Khan FS (2015) Learning spatially regularized correlation filters for visual tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2015.490

    Article  Google Scholar 

  10. Deng L, Platt JC (2014) Ensemble deep learning for speech recognition. Conference of the International Speech Communication Association, pp.1915–1919

  11. Jan ZM, Verma BK (2020) Multiple strong and balanced cluster-based ensemble of deep learners. Pattern Recogn 107:107420. https://doi.org/10.1016/J.PATCOG.2020.107420

    Article  Google Scholar 

  12. Yin Z, Zhao M, Wang Y (2017) Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Comput Methods Programs Biomed 140:93–110. https://doi.org/10.1016/J.CMPB.2016.12.005

    Article  Google Scholar 

  13. Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53. https://doi.org/10.1109/MCI.2015.2471235

    Article  Google Scholar 

  14. Hartigan JA, Wong MA (1979) A K-Means clustering algorithm. J Royal Stat Soc Seri C-Appl Stat 28(1):100–108. https://doi.org/10.2307/2346830

    Article  MATH  Google Scholar 

  15. Li Y, Ang KH, Chong GCY (2006) PID control system analysis and design. IEEE Control Syst Mag 26(1):32–41. https://doi.org/10.1109/MCS.2006.1580152

    Article  Google Scholar 

  16. Fan H, Ling H, Yang F (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2019.00552

    Article  Google Scholar 

  17. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2005.177

    Article  Google Scholar 

  18. Jvd W, Schmid C, Verbeek J (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523. https://doi.org/10.1109/TIP.2009.2019809

    Article  MathSciNet  MATH  Google Scholar 

  19. Possegger H, Mauthner T, Bischof H (2015) In defense of color-based model-free tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2015.7298823

    Article  Google Scholar 

  20. Razavian AS, Azizpour H, Sullivan J (2014) CNN Features Off-the-Shelf: an astounding baseline for recognition. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPRW.2014.131

    Article  Google Scholar 

  21. Ma C, Huang JB, Yang X (2015) Hierarchical convolutional features for visual tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2015.352

    Article  Google Scholar 

  22. He Z, Fan Y, Zhuang J (2017) Correlation filters with weighted convolution responses. Int Conf Comput Vision. https://doi.org/10.1109/ICCVW.2017.233

    Article  Google Scholar 

  23. Lukezic A, Vojir T, Zajc LC (2017) Discriminative correlation filter with channel and spatial reliability. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.515

    Article  Google Scholar 

  24. Gladh S, Danelljan M, Khan FS (2016) Deep motion features for visual tracking. Int Conf Pattern Recogn. https://doi.org/10.1109/ICPR.2016.7899807

    Article  Google Scholar 

  25. Bhat G, Johnander J, Danelljan M (2018) Unveiling the power of deep tracking. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-030-01216-8_30

    Article  Google Scholar 

  26. Zhang K, Liu W, Wu Y (2016) Robust visual tracking via convolutional networks without training. IEEE Trans Image Process 25(4):1779–1792. https://doi.org/10.1109/TIP.2016.2531283

    Article  MathSciNet  MATH  Google Scholar 

  27. Smeulders AWM, Chu DM, Cucchiara R (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://doi.org/10.1109/TPAMI.2013.230

    Article  Google Scholar 

  28. Russakovsky O, Deng J, Su H (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/S11263-015-0816-Y

    Article  MathSciNet  Google Scholar 

  29. Huang L, Zhao X, Huang K (2019) GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2957464

    Article  Google Scholar 

  30. Held D, Thrun S, Savarese S (2016) Learning to track at 100 FPS with deep regression networks. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-319-46448-0_45

    Article  Google Scholar 

  31. Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.158

    Article  Google Scholar 

  32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint. https://arxiv.org/abs/1409.1556

  33. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.90

    Article  Google Scholar 

  34. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. IEEE Conf Computer Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2019.00472

    Article  Google Scholar 

  35. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: evolution of siamese visual tracking with very deep networks. IEEE Conf Computer Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2019.00441

    Article  Google Scholar 

  36. Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: siamese fully convolutional classification and regression for visual tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR42600.2020.00630

    Article  Google Scholar 

  37. Yang K, He Z, Zhou Z, Fan N (2020) SiamAtt: Siamese attention network for visual tracking. Knowled Based Syst. https://doi.org/10.1016/J.KNOSYS.2020.106079

    Article  Google Scholar 

  38. Henriques JF, Caseiro R, Martins P (2015) High-speed tracking with Kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596. https://doi.org/10.1109/TPAMI.2014.2345390

    Article  Google Scholar 

  39. Danelljan M, Häger G, Khan FS (2014) Accurate scale estimation for robust visual tracking. British Mach Vision Conf. https://doi.org/10.5244/C.28.65

    Article  Google Scholar 

  40. Han B, Sim J, Adam H (2017) BranchOut: regularization for online ensemble tracking with convolutional neural networks. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.63

    Article  Google Scholar 

  41. Zhang X, Miao Z, Li Y (2017) Ensemble tracking based on CNN. International symposium on computational intelligence and design, pp. 131–134. https://doi.org/10.1109/ISCID.2017.98

  42. Avidan S, Mitsubishi (2007) Ensemble tracking. IEEE Trans on Pattern Analysis and Machine Intelligence. 29(2): 261–271. https://doi.org/10.1109/TPAMI.2007.35

  43. Zhu G, Wang J, Lu H (2014) Clustering ensemble tracking. Asian Conf Comput Vision. https://doi.org/10.1007/978-3-319-16814-2_25

    Article  Google Scholar 

  44. Han Y, Zhuang P, Zhou T (2020) Ensemble tracking based on diverse collaborative framework with multi-cue dynamic fusion. IEEE Trans Multimed 22(10):2698–2710. https://doi.org/10.1109/TMM.2019.2958759

    Article  Google Scholar 

  45. Choi J, Chang HJ, Yun S (2017) Attentional correlation filter network for adaptive visual tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.513

    Article  Google Scholar 

  46. Meshgi K, Oba S, Ishii S (2018) Efficient diverse ensemble for discriminative co-tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2018.00506

    Article  Google Scholar 

  47. Guo J, Xu T (2017) Deep ensemble tracking. IEEE Signal Process Lett 24(10):1562–1566. https://doi.org/10.1109/LSP.2017.2749458

    Article  Google Scholar 

  48. Tang M, Yu B, Zhuang F (2018) High-Speed Tracking with Multi-kernel Correlation Filters. Computer Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2018.00512

    Article  Google Scholar 

  49. Fan H, Ling H (2017) Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2017.585

    Article  Google Scholar 

  50. Danelljan M, Hager G, Khan FS (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928

    Article  Google Scholar 

  51. Wang N, Zhou W, Hong R (2018) Multi-cue Correlation Filters for Robust Visual Tracking. Comput Vision Pattern Recogn, pp. 4844–4853

  52. Bai Q, Wu Z, Sclaroff S (2013) Randomized ensemble tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2013.255

    Article  Google Scholar 

  53. Zhang L, Varadarajan J, Suganthan PN (2017) Robust visual tracking using oblique random forests. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.617

    Article  Google Scholar 

  54. Jiang C, Xiao J, Xie Y (2018) Siamese network ensemble for visual tracking. Neurocomputing 275:892–2903. https://doi.org/10.1016/J.NEUCOM.2017.10.043

    Article  Google Scholar 

  55. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst, pp. 1097–1105

  56. Burges CJC, Lucent A (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167. https://doi.org/10.1023/A:1009715923555

    Article  Google Scholar 

  57. Aarts EE, Korst JJ (2016) Simulated annealing and Boltzmann machines. Handbook of Brain Theory Neural Netw, pp. 1039–1044

  58. Bertinetto L, Valmadre J, Golodetz S (2016) Staple: complementary learners for real-time tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.156

    Article  Google Scholar 

  59. Galoogahi HK, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. IEEE Comput Soc. https://doi.org/10.1109/ICCV.2017.129

    Article  Google Scholar 

  60. Danelljan M, Hager G, Khan FS (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.159

    Article  Google Scholar 

  61. Li F, Tian C, Zou W (2018) Learning spatial-temporal regularized correlation filters for visual tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2018.00515

    Article  Google Scholar 

  62. Yang L, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-319-16181-5_18

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Foundation of National Natural Science Foundation of China (Grant No.61972307).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guixi Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, H., Liu, G., Zhang, Y. et al. Ensemble siamese networks for object tracking. Neural Comput & Applic 34, 8173–8191 (2022). https://doi.org/10.1007/s00521-022-06911-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-06911-4

Keywords