Skip to main content
Log in

STASiamRPN: visual tracking based on spatiotemporal and attention

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

A Correction to this article was published on 13 May 2022

This article has been updated

Abstract

Visual tracking is an important research topic in the field of computer vision. The Siamese network tracker based on the region proposal network has achieved promising tracking results in terms of speed and accuracy. However, for fast-moving objects, the structure of the tracking system mainly focuses on information regarding the object appearance, ignoring information related to movement and change at any moment. The original 2D convolutional neural network cannot extract the spatiotemporal information of tracking object and cannot pay attention to the features of tracking object. In this research, a new tracking method is proposed that can extract the spatiotemporal features of tracking objects by constructing a 3D convolutional neural network and integrating the cascade attention mechanism and distinguish similar objects by background suppression and highlighting techniques. To verify the effectiveness of the proposed tracker (STASiamRPN), experiments on the OTB2015, GOT-10K and UAV123 benchmark datasets demonstrated that the proposed tracker was highly comparable to other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Change history

References

  1. Galoogahi, H.K., Fagg, A., Lucey, S.: Learning background-aware correlation ilters for visual tracking. In: International Conference on Computer Vision (ICCV) (2017)

  2. Smeulders, A.. Wm., Chu, M.D., Cucchiara, R., Calderara, S., Dehghan, A.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2013)

    Google Scholar 

  3. Zuo, W., Wu, X., Lin, L., Zhang, L., Yang, M.H.: Learning support correlation filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41(5), 1158–1172 (2018)

    Article  Google Scholar 

  4. Alismail, H., Browning, B., Lucey, S.: Robust tracking in low light and sudden illumination changes. In: Fourth International Conference on 3d Vision (3DV), pp. 389–398 (2016)

  5. Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation ilters. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2010)

  6. Bouchrika, I., Carter, J.N., Nixon, M.S.: Towards automated visual surveillance using gait for identity recognition and tracking across multiple non-intersecting cameras. Multimed. Tools Appl. 75(2), 1201–1221 (2016)

    Article  Google Scholar 

  7. Du, X., Clancy, N., Arya, S., Hanna, G.B., Kelly, J., Elson, D.S., Stoyanov, D.: Robust surface tracking combining features, intensity and illumination compensation. Int. J. Comput. Assist. Radiol. Surg. (IJCARS) 10(12), 1915–1926 (2015)

    Article  Google Scholar 

  8. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)

    Article  Google Scholar 

  9. Li, K., He, F.Z., Yu, H.P.: Robust visual tracking based on convolutional features with illumination and occlusion handing. J. Comput. Sci. Technol. 33(1), 223–236 (2018)

    Article  Google Scholar 

  10. Tokekar, P., Isler, V., Franchi, A.: Multi-target visual tracking with aerial robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2014)

  11. Guo, Q., Wei, F., Zhou, C., Rui, H., Song, W.: Learning dynamic siamese network for visual object tracking. In: International Conference on Computer Vision (ICCV) (2017)

  12. He, A., Chong, L., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)

  13. Jiang, C., Xiao, J., Xie, Y., Tillo, T., Huang, K.: Siamese network ensemble for visual tracking. Neurocomputing 275, 2892–2903 (2018)

    Article  Google Scholar 

  14. Zhang, Y., Wang, L.,Qi, J., Wang, D., Feng, M., Lu, H.: Structured siamese network for real-time visual tracking. In: European Conference on Computer Vision (ECCV) (2018)

  15. Bo, L., Yan, J., Wei, W., Zheng, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)

  16. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  17. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25(2), 1097–1105 (2012)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  19. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Rabinovich, A.: Going deeper with convolutions. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2014)

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2016)

  21. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2020)

  22. Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: Tracking by Attention Using Spatio-Temporal Features. arXiv preprint arXiv:2011.09524 (2020)

  23. Tao, R., Gavves, E., Smeulders, A. W. M.: Siamese instance search for tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  24. Bertinetto, L., Valmadre, J., Henriques, Joo F., Vedaldi, A., Torr, Phs.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision (ECCV) (2016)

  25. Valmadre, J., Bertinetto, L., Henriques, J. F., Vedaldi, A., Torr, Phs.: End-to-end representation learning for correlation filter based tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  26. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: European Conference on Computer Vision (ECCV) (2018)

  27. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2020)

  28. Jie, H., Li, S., Gang, S., Albanie, S.: Squeeze-and-excitation networks. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)

  29. Woo, S., Park, J., Lee, J. Y., Kweon, I. S.: Cbam: convolutional block attention module. In: European Conference on Computer Vision (ECCV) (2018)

  30. Fei, W., Jiang, M., Chen, Q., Yang, S., Tang, X.: Residual attention network for image classification. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2017)

  31. Wang, X, Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  32. Kopuklu, O., Kose, N., Gunduz, A., Rigoll, G.: Resource efficient 3d convolutional neural networks. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2019)

  33. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: International Conference on Computer Vision (ICCV) (2015)

  34. Simon, M., Amende, K., Kraus, A., Honer, J., Smann, T., Kaulbersch, H., Milz, S., Gross, H.M.: Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2020)

  35. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2019)

  36. Zhu, T., Xing, J., Qiang, W., Lang, C., Yi, J.: Robust object tracking based on temporal and spatial deep networks. In: International Conference on Computer Vision (ICCV) (2017)

  37. Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)

  38. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)

  39. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2019

  40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 1–42 (2014)

    Google Scholar 

  41. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  42. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European Conference on Computer Vision (ECCV) (2016)

  43. Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: International Conference on Computer Vision (ICCV) (2020)

  44. Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: International Conference on Computer Vision (ICCV) (2015)

  45. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O.,Torr, Phs.: Staple: Complementary learners for real-time tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  46. Hong, Z., Zhe, C., Wang, C., Xue, M., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2015)

  47. Danelljan, M., Häger, G, Khan, F. S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (BMVC) (2014)

  48. Sauer, A., Aljalbout, E., Haddadin, S.: Tracking holistic object representations. In: British Machine Vision Conference (BMVC) (2019)

  49. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision (ECCV) (2016)

  50. Danelljan, M., Robinson, A., Khan, F. S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision (ECCV) (2016)

  51. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: efficient convolution operators for tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2016)

  52. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2016)

  53. Zhang, J., Ma, S., Sclaroff, S.: Meem: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision (ECCV) (2014)

  54. Shi, P., Yibing, S., Chao, M., Honggang, Z., Ming-Hsuan, Y.: Deep attentive tracking via reciprocative learning. arXiv preprint arXiv:1810.03851 (2018)

  55. Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2012)

  56. Yang, L., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision (ECCV) (2014)

Download references

Acknowledgements

This work was supported by the New-Generation AI Major Scientific and Technological Special Project of Tianjin (18ZXZNGX00150) and the Special Foundation for Technology Innovation of Tianjin (21YDTPJC00250).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xianbin Wen or Zhanlu Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, R., Wen, X., Liu, Z. et al. STASiamRPN: visual tracking based on spatiotemporal and attention. Multimedia Systems 28, 1543–1555 (2022). https://doi.org/10.1007/s00530-021-00845-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00845-y

Keywords

Navigation