Skip to main content
Log in

Learning attention modules for visual tracking

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Siamese networks have been widely used in visual tracking. However, it is difficult to deal with complex appearance variations when the discriminative background information is ignored and an offline training strategy is adopted. In this paper, we present a novel backbone network based on CNN model and attention mechanism in the Siamese framework. The attention mechanism is composed of a channel attention module and a spatial attention module. The channel attention module uses the learned global information to selectively focus on the convolution features, which enhances a network representation ability. Besides, the spatial attention module obtains more contextual information and semantic features of target candidates. The designed attention mechanism-based backbone is lightweight and has a real-time tracking performance. We utilize GOT-10K as a training set to offline adjust trained model parameters. The extensive experimental evaluations on OTB2015, VOT2016, VOT2018, GOT-10k and UAV123 datasets demonstrate that the proposed algorithm has excellent performances against state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  2. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)

  3. Gomaa, A., Abdelwahab, M.M., Abo-Zahhad, M.: Efficient vehicle detection and tracking strategy in aerial videos by employing morphological operations and feature points motion analysis. In: Multimedia Tools and Applications, pp. 26023–26043 (2020)

  4. Gomaa, A., Abdelwahab, M.M., Abo-Zahhad, M.: Real-time algorithm for simultaneous vehicle detection and tracking in aerial view videos. In: IEEE 61st International Midwest Symposium on Circuits and Systems, pp. 222–225 (2018)

  5. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp. 3–19 (2018)

  6. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

  7. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 58–66 (2015)

  8. Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., Yu, N.: Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4836–4845 (2017)

  9. Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 548–557 (2018)

  10. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional Siamese networks for object tracking. In: European conference on computer vision, pp. 850–865 (2016)

  11. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 101–117 (2018)

  12. He, A., Luo, C., Tian, X., Zeng, W.: A twofold Siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4834–4843 (2018)

  13. Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional Siamese network for high performance online visual tracking. In: IEEE conference on computer vision and pattern recognition, pp. 4854–4863 (2018)

  14. Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable Siamese attention networks for visual object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6728–6737 (2020)

  15. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1834–1848 (2015)

  16. Battistone, F., Santopietro, V., Petrosino, A.: The visual object tracking VOT2016 challenge results. In: The 14th European Conference on Computer Vision (2016)

  17. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A. et al.: The sixth visual object tracking VOT2018 challenge results. In: Proceedings of the European conference on computer vision (2018)

  18. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: European conference on computer vision. Springer, pp. 445–461 (2016)

  19. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5000–5008 (2017)

  20. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

  21. Wang, M., Liu, Y., Huang, Z.: Large margin object tracking with circulant feature maps. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4800–4808 (2017)

  22. Zhang, J., Ma, S., Sclaroff, S.: (2014) Meem: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision

  23. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

  24. Possegger, H., Mauthner, T., Bischof, H.: In defense of color-based model-free tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2113–2120 (2015)

  25. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision (2016)

  26. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265 (2014)

  27. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 583–596 (2015)

  28. Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (2014)

  29. Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)

  30. Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1135–1143 (2017)

  31. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)

  32. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (2016)

  33. Li, Y., Fu, C., Ding, F., Huang, Z.: Autotrack: towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11923–11932 (2020)

  34. Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.-H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4904–4913 (2018)

  35. Abdelpakey, M.H., Shehata, M.S.: DP-Siam: dynamic policy Siamese network for robust object tracking. In: IEEE Transactions on Image Processing, pp. 1479–1492 (2019)

  36. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the Jiangxi Science and Technology Research Project of Education within the Department of China (No: GJJ190955, GJJ180939), and by the National Natural Science Foundation of China (No: 61861032, 61865012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Meng, C., Deng, C. et al. Learning attention modules for visual tracking. SIViP 16, 2149–2156 (2022). https://doi.org/10.1007/s11760-022-02177-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02177-4

Keywords