Skip to main content
Log in

Robust adaptive learning with Siamese network architecture for visual tracking

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Correlation filters and deep learning methods are the two main directions in the research field of visual object tracking. However, these trackers do not balance accuracy and speed very well at the same time. The application of the Siamese networks brings great improvement in accuracy and speed, and an increasing number of researchers are paying attention to this aspect. Therefore, based on the advantages of the Siamese networks model, we conduct feasibility research and improvement of current visual tracking algorithms to improve the tracking performance. In this paper, we propose a robust adaptive learning visual tracking algorithm. HOG features, CN features and deep convolution features are extracted from the template frame and search region frame, respectively, and we analyze the merits of each feature and perform feature adaptive fusion to improve the validity of feature representation. Then, we update the two branch models with two learning change factors and realize a more similar match to locate the target. Besides, we propose a model update strategy that employs the average peak-to-correlation energy (APCE) to determinate whether to update the learning change factors to improve the accuracy of tracking model and reduce the tracking drift in the case of tracking failure, deformation or background blur, etc. Extensive experiments on the benchmark datasets (OTB-50, OTB-100, VOT2016) demonstrate that the proposed visual tracking algorithm is superior to several state-of-the-art methods in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Lee, K.H., Hwang, J.N.: On-road pedestrian tracking across multiple driving recorders. IEEE. Trans. Multimed. 17(9), 1429–1438 (2015)

    Article  Google Scholar 

  2. Iqbal, U., Milan, A., Gall, J.: Posetrack: Joint multi-person pose estimation and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2011–2020 (2017)

  3. Liu, P., Guo, J.M., Chamnongthai, K., et al.: Fusion of color histogram and LBP-based features for texture image retrieval and classification. Inf. Sci. 390, 95–111 (2017)

    Article  Google Scholar 

  4. Liu, P., Guo, J.M., Wu, C.Y., et al.: Fusion of deep learning and compressed domain features for content-based image retrieval. IEEE Trans. Image Process. 26(12), 5706–5717 (2017)

    Article  MathSciNet  Google Scholar 

  5. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)

    Article  Google Scholar 

  6. Henriques, J.F., Caseiro, R., Martins, P., et al.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)

    Article  Google Scholar 

  7. Danelljan, M., Hager, G., Shahbaz, K.F., et al.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)

  8. Ma, C., Yang, X., Zhang, C., et al.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015)

  9. Henriques, J.F., Caseiro, R., Martins, P., et al.: Exploiting the circulant structure of tracking-by-detection with kernels. European Conference on Computer Vision, pp. 702–715. Springer, Berlin (2012)

    Google Scholar 

  10. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. European Conference on Computer Vision, pp. 254–265. Springer, Cham (2014)

    Google Scholar 

  11. Chen, Z., Liu, P., Du, Y., et al.: Correlation tracking via self-adaptive fusion of multiple features. Information 9(10), 241 (2018)

    Article  Google Scholar 

  12. Danelljan, M., Robinson, A., Khan, F.S., et al.: Beyond correlation filters: learning continuous convolution operators for visual tracking. European Conference on Computer Vision, pp. 472–488. Springer, Cham (2016)

    Google Scholar 

  13. Danelljan, M., Bhat, G., Shahbaz, K.F., et al.: ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)

  14. Bertinetto, L., Valmadre, J., Golodetz, S., et al.: Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

  15. Danelljan, M., Hager, G., Shahbaz, K.F., et al.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)

  16. Ma, C., Huang, J.B., Yang, X., et al.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)

  17. Zhang, W., Luo, Y., Chen, Z., et al.: A robust visual tracking algorithm based on spatial-temporal context hierarchical response fusion. Algorithms 12(1), 8 (2019)

    Article  Google Scholar 

  18. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)

  19. Han, B., Sim, J., Adam. H.: Branchout: Regularization for online ensemble tracking with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3356–3365 (2017)

  20. Wang, L., Ouyang, W., Wang, X., et al.: Stct: sequentially training convolutional networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1373–1381 (2016)

  21. Makhura, O.J., Woods, J.C.: Learn-select-track: an approach to multi-object tracking. Signal Process. Image Commun. 74, 153–161 (2019)

    Article  Google Scholar 

  22. Lin, L., Liu, B., Xiao, Y.: COB method with online learning for object tracking. Neurocomputing (2019). https://doi.org/10.1016/j.neucom.2019.01.116

    Article  Google Scholar 

  23. Tu, B., Kuang, W., Shang, Y., et al.: A multi-view object tracking using triplet model. J. Vis. Commun. Image Represent. 60, 64–68 (2019)

    Article  Google Scholar 

  24. Sharma, V.K., Mahapatra, K.K., Acharya, B.: Visual object tracking based on discriminant DCT features. Digit. Signal Proc. 95, 102572 (2019)

    Article  Google Scholar 

  25. Heinrich, S., Springstübe, P., Knöppler, T., et al.: Continuous convolutional object tracking in developmental robot scenarios. Neurocomputing 342, 137–144 (2019)

    Article  Google Scholar 

  26. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. European Conference on Computer Vision, pp. 749–765. Springer, Cham (2016)

    Google Scholar 

  27. Bertinetto, L., Valmadre, J., Henriques, J.F., et al.: Fully-convolutional siamese networks for object tracking. European Conference on Computer Vision, pp. 850–865. Springer, Cham (2016)

    Google Scholar 

  28. Bolme, D.S., Beveridge, J.R., Draper, B.A., et al.: Visual object tracking using adaptive correlation filters. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2544–2550 (2010)

  29. Danelljan, M., Häger, G., Khan, F., et al.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, Nottingham. BMVA Press (2014)

  30. Ren, S., He, K., Girshick, R. et al: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  31. Krizhevsky, A., Sutskever, I., Hinton, G.E. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  32. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  33. Song, Y., Ma, C., Gong, L., et al.: Crest: convolutional residual learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2555–2564 (2017)

  34. Valmadre, J., Bertinetto, L., Henriques, J., et al.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)

  35. Wang, Q., Gao, J., Xing, J., et al.: DCFNet: Discriminant correlation filters network for visual tracking (2017). arXiv preprint arXiv:1704.04057

  36. Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

  37. Yang, T., Chan, A.B. Recurrent filter learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2010–2019 (2017)

  38. Xu, H., Gao, Y., Yu, F., et al.: End-to-end learning of driving models from large-scale video datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2174–2182 (2017)

  39. Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 105–114 (2017)

  40. Guo, Q., Feng, W., Zhou, C., et al.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)

  41. Zhu, Z., Wu, W., Zou, W., et al.: End-to-end flow correlation tracking with spatial-temporal attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 548–557 (2018)

  42. Li, B., Yan, J., Wu, W., et al.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  43. Zhu, Z., Wang, Q., Li, B., et al.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 101–117 (2018)

  44. Li, B., Wu, W., Wang, Q., et al.: SiamRPN ++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  45. Wang, Q., Zhang, L., Bertinetto, L., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  46. Wang, M., Liu, Y., Huang, Z.: Large margin object tracking with circulant feature maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4021–4029 (2017)

  47. Ma, C., Huang, J.B., Yang, X., et al.: Robust visual tracking via hierarchical convolutional features. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)

  48. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)

  49. Van De Weijer, J., Schmid, C., Verbeek, J., et al.: Learning color names for real-world applications. IEEE Trans. Image Process. 18(7), 1512–1523 (2009)

    Article  MathSciNet  Google Scholar 

  50. Khan, F.S., Weijer, J.V.D., Vanrell, M.: Modulating shape features by color attention for object recognition. Int. J. Comput. Vision 98(1), 49–64 (2012)

    Article  Google Scholar 

  51. Khan, F.S., Anwer, R.M., Van De Weijer, J., et al.: Color attributes for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3306–3313 (2012)

  52. Khan, F.S., Anwer, R.M., Van De Weijer, J., et al.: Coloring action recognition in still images. Int. J. Comput. Vision 105(3), 205–221 (2013)

    Article  Google Scholar 

  53. Possegger, H., Mauthner, T., Bischof, H.: In defense of color-based model-free tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2113–2120 (2015)

  54. Scholkopf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge (2001)

    Google Scholar 

  55. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

  56. Zhang, T., Xu, C., Yang, M.H.: Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4335–4343 (2017)

Download references

Funding

This work was supported by Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (No. ZQN-PY518), and the grants from National Natural Science Foundation of China (No. 61605048), in part by Fujian Provincial Big Data Research Institute of Intelligent Manufacturing, in part by the Quanzhou scientific and technological planning projects (Nos. 2017G024, 2018N072S and 2019C099R), and in part by the Subsidized Project for Postgraduates’ Innovative Fund in Scientific Research of Huaqiao University under Grant 17014084014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peizhong Liu.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Du, Y., Chen, Z. et al. Robust adaptive learning with Siamese network architecture for visual tracking. Vis Comput 37, 881–894 (2021). https://doi.org/10.1007/s00371-020-01839-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01839-z

Keywords

Navigation