Skip to main content
Log in

Structural-appearance information fusion for visual tracking

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets generated or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Chen, F., Wang, X., Zhao, Y., Lv, S., Niu, X.: Visual object tracking: a survey. Comput. Vis. Image Underst. 222, 103508 (2022)

    Article  Google Scholar 

  2. Abbass, M.Y., Kwon, K.-C., Kim, N., Abdelwahab, S.A., El-Samie, F.E.A., Khalaf, A.A.: A survey on online learning for visual tracking. Vis. Comput. 37, 993–1014 (2021)

    Article  Google Scholar 

  3. Zhou, L., Ding, X., Li, W., Leng, J., Lei, B., Yang, W.: A location-aware siamese network for high-speed visual tracking. Appl. Intell., 1–17 (2022)

  4. He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4834–4843 (2018)

  5. Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1369–1378 (2019)

  6. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4644–4654 (2019)

  7. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)

  8. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)

  9. Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)

  10. Abbass, M.Y., Kwon, K.-C., Kim, N., Abdelwahab, S.A., El-Samie, F.E.A., Khalaf, A.A.: Efficient object tracking using hierarchical convolutional features model and correlation filters. Vis. Comput. 37, 831–842 (2021)

    Article  Google Scholar 

  11. Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., Fu, C.: Tctrack: temporal contexts for aerial tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)

  12. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the European Conference on Computer Vision, pp. 254–265 (2014). Springer

  13. Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., Yang, M.-H.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)

  14. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)

  15. Choi, J., Chang, H.J., Jeong, J., Demiris, Y., Choi, J.Y.: Visual tracking using attention-modulated disintegration and integration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4321–4330 (2016)

  16. Ma, C., Huang, J.-B., Yang, X., Yang, M.-H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)

  17. Bhat, G., Johnander, J., Danelljan, M., Khan, F.S., Felsberg, M.: Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision, pp. 483–498 (2018)

  18. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: Proceedings of the European Conference on Computer Vision, pp. 749–765 (2016). Springer

  19. Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

  20. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 850–865 (2016). Springer

  21. Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)

  22. Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1430–1438 (2016)

  23. Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.-H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4904–4913 (2018)

  24. Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European Conference on Computer Vision, pp. 472–488 (2016). Springer

  25. Dong, X., Shen, J., Wang, W., Shao, L., Ling, H., Porikli, F.: Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1515–1529 (2019)

    Article  Google Scholar 

  26. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)

  27. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  28. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4277–4286 (2019)

  29. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

  30. Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  31. Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X.: ’skimming-perusal’ tracking: a framework for real-time and robust long-term tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2385–2393 (2019)

  32. Ma, D., Wu, X.: Capsulerrt: relationships-aware regression tracking via capsules. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10943–10952 (2021)

  33. Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D.P., Yu, F., Van Gool, L.: Transforming model prediction for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8731–8740 (2022)

  34. Song, Z., Yu, J., Chen, Y.-P.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8791–8800 (2022)

  35. Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1571–1580 (2021)

  36. Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 13774–13783 (2021)

  37. Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H.: Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 366–382 (2018)

  38. Yin, J., Wang, W., Meng, Q., Yang, R., Shen, J.: A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6768–6777 (2020)

  39. Fan, H., Ling, H.: Sanet: structure-aware network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–49 (2017)

  40. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 101–117 (2018)

  41. Cheng, S., Zhong, B., Li, G., Liu, X., Tang, Z., Li, X., Wang, J.: Learning to filter: siamese relation network for robust tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4419–4429 (2021)

  42. Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6181–6190 (2019)

  43. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4655–4664 (2019)

  44. Wang, N., Zhou, W., Qi, G., Li, H.: Post: policy-based switch tracking. In: Proceedings of the Association for the Advancement of Artificial Intelligence, vol. 34, pp. 12184–12191 (2020)

  45. Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2021)

  46. Zhang, L., Gonzalez-Garcia, A., Weijer, J.V.D., Danelljan, M., Khan, F.S.: Learning the model update for siamese trackers. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4009–4018 (2019)

  47. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a ’siamese’ time delay neural network. Advances in neural information processing systems 6 (1993)

  48. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92 (2015). Springer

  49. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742 (2006). IEEE

  50. Ni, J., Liu, J., Zhang, C., Ye, D., Ma, Z.: Fine-grained patient similarity measuring using deep metric learning. In: Proceedings of the ACM on Conference on Information and Knowledge Management, pp. 1189–1198 (2017)

  51. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)

  52. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. Adv.Neural Inf. Proc. Syst.29 (2016)

  53. Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2593–2601 (2017)

  54. Li, Y., Gu, C., Dullien, T., Vinyals, O., Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. In: Proceedings of the International Conference on Machine Learning, pp. 3835–3845 (2019). PMLR

  55. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

  56. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  57. Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Top. Comput. Intell. 2(2), 92–102 (2018)

    Article  Google Scholar 

  58. Shen, G., Tan, Q., Zhang, H., Zeng, P., Xu, J.: Deep learning with gated recurrent unit networks for financial sequence predictions. Proc. Comput. Sci. 131, 895–903 (2018)

    Article  Google Scholar 

  59. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)

  60. Tan, Q., Ye, M., Yang, B., Liu, S., Ma, A.J., Yip, T.C.-F., Wong, G.L.-H., Yuen, P.: Data-gru: dual-attention time-aware gated recurrent unit for irregular multivariate time series. In: Proceedings of the Association for the Advancement of Artificial Intelligence, vol. 34, pp. 930–937 (2020)

  61. Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online high tracking using recurrent neural networks. In: Proceedings of the Association for the Advancement of Artificial Intelligence (2017)

  62. Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1449–1458 (2016)

  63. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  64. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014). Springer

  65. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2021)

    Article  Google Scholar 

  66. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)

  67. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: Proceedings of the European Conference on Computer Vision, pp. 445–461 (2016). Springer

  68. Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: Proceedings of the Association for the Advancement of Artificial Intelligence (2017)

  69. Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24(12), 5630–5644 (2015)

    Article  MathSciNet  Google Scholar 

  70. Xiao, D., Tan, K., Wei, Z., Zhang, G.: Siamese block attention network for online update object tracking. Appl. Intell., 1–13 (2022)

  71. Zhang, J., Ma, S., Sclaroff, S.: Meem: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European Conference on Computer Vision (2014)

  72. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

  73. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)

  74. Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6161–6170 (2019)

  75. Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W., Yang, M.-H.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)

  76. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)

  77. Dong, X., Shen, J., Shao, L., Porikli, F.: Clnet: A compact latent network for fast adjusting siamese trackers. In: European Conference on Computer Vision, pp. 378–395 (2020). Springer

  78. Tan, H., Wang, M., Liang, T., Xu, L., Tang, Y., Lan, L., Yang, W.: Self-aware circular response-guided attention for robust siamese tracking. Appl. Intell., 1–16 (2022)

  79. Gupta, D.K., Arya, D., Gavves, E.: Rotation equivariant siamese networks for tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12362–12371 (2021)

  80. Wang, G., Luo, C., Xiong, Z., Zeng, W.: Spm-tracker: Series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3643–3652 (2019)

  81. Danelljan, M., Van Gool, L., Timofte, R.: Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192 (2020)

  82. Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)

  83. Chen, B., Wang, D., Li, P., Wang, S., Lu, H.: Real-time ’actor-critic’ tracking. In: Proceedings of the European Conference on Computer Vision, pp. 318–334 (2018)

  84. Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5487–5495 (2017)

  85. Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 15457–15466 (2021)

  86. Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: Autotrack: towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11923–11932 (2020)

  87. Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., Li, H.: Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4844–4853 (2018)

  88. Cao, Y., Ji, H., Zhang, W., Shirani, S.: Feature aggregation networks based on dual attention capsules for visual object tracking. IEEE Trans. Circuits Syst. Video Technol. 32(2), 674–689 (2021)

    Article  Google Scholar 

  89. Fu, C., Jin, J., Ding, F., Li, Y., Lu, G.: Spatial reliability enhanced correlation filter: an efficient approach for real-time uav tracking. IEEE Trans. Multimedia (2021)

  90. Yuan, Y., Chen, Y., Jing, Y., Zhou, P., Zhang, Y.: Fratcf: Feature-residue real-time uav tracking based on automatic spatio-temporal regularization correlation filter. In: 2022 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2022)

  91. Zhu, X.-F., Wu, X.-J., Xu, T., Feng, Z.-H., Kittler, J.: Robust visual object tracking via adaptive attribute-aware discriminative correlation filters. IEEE Trans. Multimedia 24, 301–312 (2021)

    Article  Google Scholar 

  92. Wang, X., Zeng, D., Zhao, Q., Li, S.: Rank-based filter pruning for real-time uav tracking. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 01–06 (2022). IEEE

  93. Fu, C., Cao, Z., Li, Y., Ye, J., Feng, C.: Siamese anchor proposal network for high-speed aerial tracking. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 510–516 (2021). IEEE

  94. Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Siamapn++: siamese attentional aggregation network for real-time uav tracking. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3086–3092 (2021). IEEE

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62072042.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Ma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Yang, Z., Ma, B. et al. Structural-appearance information fusion for visual tracking. Vis Comput 40, 3103–3117 (2024). https://doi.org/10.1007/s00371-023-03013-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03013-7

Keywords

Navigation