Abstract
In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.
Similar content being viewed by others
Data Availability
The datasets generated or analyzed during the current study are available from the corresponding author on reasonable request.
References
Chen, F., Wang, X., Zhao, Y., Lv, S., Niu, X.: Visual object tracking: a survey. Comput. Vis. Image Underst. 222, 103508 (2022)
Abbass, M.Y., Kwon, K.-C., Kim, N., Abdelwahab, S.A., El-Samie, F.E.A., Khalaf, A.A.: A survey on online learning for visual tracking. Vis. Comput. 37, 993–1014 (2021)
Zhou, L., Ding, X., Li, W., Leng, J., Lei, B., Yang, W.: A location-aware siamese network for high-speed visual tracking. Appl. Intell., 1–17 (2022)
He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4834–4843 (2018)
Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1369–1378 (2019)
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4644–4654 (2019)
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)
Abbass, M.Y., Kwon, K.-C., Kim, N., Abdelwahab, S.A., El-Samie, F.E.A., Khalaf, A.A.: Efficient object tracking using hierarchical convolutional features model and correlation filters. Vis. Comput. 37, 831–842 (2021)
Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., Fu, C.: Tctrack: temporal contexts for aerial tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)
Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the European Conference on Computer Vision, pp. 254–265 (2014). Springer
Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., Yang, M.-H.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)
Choi, J., Chang, H.J., Jeong, J., Demiris, Y., Choi, J.Y.: Visual tracking using attention-modulated disintegration and integration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4321–4330 (2016)
Ma, C., Huang, J.-B., Yang, X., Yang, M.-H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)
Bhat, G., Johnander, J., Danelljan, M., Khan, F.S., Felsberg, M.: Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision, pp. 483–498 (2018)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: Proceedings of the European Conference on Computer Vision, pp. 749–765 (2016). Springer
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 850–865 (2016). Springer
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1430–1438 (2016)
Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.-H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4904–4913 (2018)
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European Conference on Computer Vision, pp. 472–488 (2016). Springer
Dong, X., Shen, J., Wang, W., Shao, L., Ling, H., Porikli, F.: Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1515–1529 (2019)
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4277–4286 (2019)
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)
Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X.: ’skimming-perusal’ tracking: a framework for real-time and robust long-term tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2385–2393 (2019)
Ma, D., Wu, X.: Capsulerrt: relationships-aware regression tracking via capsules. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10943–10952 (2021)
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D.P., Yu, F., Van Gool, L.: Transforming model prediction for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8731–8740 (2022)
Song, Z., Yu, J., Chen, Y.-P.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8791–8800 (2022)
Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1571–1580 (2021)
Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 13774–13783 (2021)
Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H.: Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 366–382 (2018)
Yin, J., Wang, W., Meng, Q., Yang, R., Shen, J.: A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6768–6777 (2020)
Fan, H., Ling, H.: Sanet: structure-aware network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–49 (2017)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 101–117 (2018)
Cheng, S., Zhong, B., Li, G., Liu, X., Tang, Z., Li, X., Wang, J.: Learning to filter: siamese relation network for robust tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4419–4429 (2021)
Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6181–6190 (2019)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4655–4664 (2019)
Wang, N., Zhou, W., Qi, G., Li, H.: Post: policy-based switch tracking. In: Proceedings of the Association for the Advancement of Artificial Intelligence, vol. 34, pp. 12184–12191 (2020)
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2021)
Zhang, L., Gonzalez-Garcia, A., Weijer, J.V.D., Danelljan, M., Khan, F.S.: Learning the model update for siamese trackers. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4009–4018 (2019)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a ’siamese’ time delay neural network. Advances in neural information processing systems 6 (1993)
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92 (2015). Springer
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742 (2006). IEEE
Ni, J., Liu, J., Zhang, C., Ye, D., Ma, Z.: Fine-grained patient similarity measuring using deep metric learning. In: Proceedings of the ACM on Conference on Information and Knowledge Management, pp. 1189–1198 (2017)
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. Adv.Neural Inf. Proc. Syst.29 (2016)
Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2593–2601 (2017)
Li, Y., Gu, C., Dullien, T., Vinyals, O., Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. In: Proceedings of the International Conference on Machine Learning, pp. 3835–3845 (2019). PMLR
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Top. Comput. Intell. 2(2), 92–102 (2018)
Shen, G., Tan, Q., Zhang, H., Zeng, P., Xu, J.: Deep learning with gated recurrent unit networks for financial sequence predictions. Proc. Comput. Sci. 131, 895–903 (2018)
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)
Tan, Q., Ye, M., Yang, B., Liu, S., Ma, A.J., Yip, T.C.-F., Wong, G.L.-H., Yuen, P.: Data-gru: dual-attention time-aware gated recurrent unit for irregular multivariate time series. In: Proceedings of the Association for the Advancement of Artificial Intelligence, vol. 34, pp. 930–937 (2020)
Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online high tracking using recurrent neural networks. In: Proceedings of the Association for the Advancement of Artificial Intelligence (2017)
Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1449–1458 (2016)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014). Springer
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2021)
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: Proceedings of the European Conference on Computer Vision, pp. 445–461 (2016). Springer
Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: Proceedings of the Association for the Advancement of Artificial Intelligence (2017)
Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24(12), 5630–5644 (2015)
Xiao, D., Tan, K., Wei, Z., Zhang, G.: Siamese block attention network for online update object tracking. Appl. Intell., 1–13 (2022)
Zhang, J., Ma, S., Sclaroff, S.: Meem: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European Conference on Computer Vision (2014)
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6161–6170 (2019)
Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W., Yang, M.-H.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)
Dong, X., Shen, J., Shao, L., Porikli, F.: Clnet: A compact latent network for fast adjusting siamese trackers. In: European Conference on Computer Vision, pp. 378–395 (2020). Springer
Tan, H., Wang, M., Liang, T., Xu, L., Tang, Y., Lan, L., Yang, W.: Self-aware circular response-guided attention for robust siamese tracking. Appl. Intell., 1–16 (2022)
Gupta, D.K., Arya, D., Gavves, E.: Rotation equivariant siamese networks for tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12362–12371 (2021)
Wang, G., Luo, C., Xiong, Z., Zeng, W.: Spm-tracker: Series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3643–3652 (2019)
Danelljan, M., Van Gool, L., Timofte, R.: Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192 (2020)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)
Chen, B., Wang, D., Li, P., Wang, S., Lu, H.: Real-time ’actor-critic’ tracking. In: Proceedings of the European Conference on Computer Vision, pp. 318–334 (2018)
Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5487–5495 (2017)
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 15457–15466 (2021)
Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: Autotrack: towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11923–11932 (2020)
Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., Li, H.: Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4844–4853 (2018)
Cao, Y., Ji, H., Zhang, W., Shirani, S.: Feature aggregation networks based on dual attention capsules for visual object tracking. IEEE Trans. Circuits Syst. Video Technol. 32(2), 674–689 (2021)
Fu, C., Jin, J., Ding, F., Li, Y., Lu, G.: Spatial reliability enhanced correlation filter: an efficient approach for real-time uav tracking. IEEE Trans. Multimedia (2021)
Yuan, Y., Chen, Y., Jing, Y., Zhou, P., Zhang, Y.: Fratcf: Feature-residue real-time uav tracking based on automatic spatio-temporal regularization correlation filter. In: 2022 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2022)
Zhu, X.-F., Wu, X.-J., Xu, T., Feng, Z.-H., Kittler, J.: Robust visual object tracking via adaptive attribute-aware discriminative correlation filters. IEEE Trans. Multimedia 24, 301–312 (2021)
Wang, X., Zeng, D., Zhao, Q., Li, S.: Rank-based filter pruning for real-time uav tracking. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 01–06 (2022). IEEE
Fu, C., Cao, Z., Li, Y., Ye, J., Feng, C.: Siamese anchor proposal network for high-speed aerial tracking. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 510–516 (2021). IEEE
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Siamapn++: siamese attentional aggregation network for real-time uav tracking. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3086–3092 (2021). IEEE
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 62072042.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Yang, Z., Ma, B. et al. Structural-appearance information fusion for visual tracking. Vis Comput 40, 3103–3117 (2024). https://doi.org/10.1007/s00371-023-03013-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03013-7