Abstract
Visual tracking is an important research topic in the field of computer vision. The Siamese network tracker based on the region proposal network has achieved promising tracking results in terms of speed and accuracy. However, for fast-moving objects, the structure of the tracking system mainly focuses on information regarding the object appearance, ignoring information related to movement and change at any moment. The original 2D convolutional neural network cannot extract the spatiotemporal information of tracking object and cannot pay attention to the features of tracking object. In this research, a new tracking method is proposed that can extract the spatiotemporal features of tracking objects by constructing a 3D convolutional neural network and integrating the cascade attention mechanism and distinguish similar objects by background suppression and highlighting techniques. To verify the effectiveness of the proposed tracker (STASiamRPN), experiments on the OTB2015, GOT-10K and UAV123 benchmark datasets demonstrated that the proposed tracker was highly comparable to other state-of-the-art methods.
Similar content being viewed by others
Change history
13 May 2022
A Correction to this paper has been published: https://doi.org/10.1007/s00530-022-00924-8
References
Galoogahi, H.K., Fagg, A., Lucey, S.: Learning background-aware correlation ilters for visual tracking. In: International Conference on Computer Vision (ICCV) (2017)
Smeulders, A.. Wm., Chu, M.D., Cucchiara, R., Calderara, S., Dehghan, A.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2013)
Zuo, W., Wu, X., Lin, L., Zhang, L., Yang, M.H.: Learning support correlation filters for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41(5), 1158–1172 (2018)
Alismail, H., Browning, B., Lucey, S.: Robust tracking in low light and sudden illumination changes. In: Fourth International Conference on 3d Vision (3DV), pp. 389–398 (2016)
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation ilters. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2010)
Bouchrika, I., Carter, J.N., Nixon, M.S.: Towards automated visual surveillance using gait for identity recognition and tracking across multiple non-intersecting cameras. Multimed. Tools Appl. 75(2), 1201–1221 (2016)
Du, X., Clancy, N., Arya, S., Hanna, G.B., Kelly, J., Elson, D.S., Stoyanov, D.: Robust surface tracking combining features, intensity and illumination compensation. Int. J. Comput. Assist. Radiol. Surg. (IJCARS) 10(12), 1915–1926 (2015)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Li, K., He, F.Z., Yu, H.P.: Robust visual tracking based on convolutional features with illumination and occlusion handing. J. Comput. Sci. Technol. 33(1), 223–236 (2018)
Tokekar, P., Isler, V., Franchi, A.: Multi-target visual tracking with aerial robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2014)
Guo, Q., Wei, F., Zhou, C., Rui, H., Song, W.: Learning dynamic siamese network for visual object tracking. In: International Conference on Computer Vision (ICCV) (2017)
He, A., Chong, L., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)
Jiang, C., Xiao, J., Xie, Y., Tillo, T., Huang, K.: Siamese network ensemble for visual tracking. Neurocomputing 275, 2892–2903 (2018)
Zhang, Y., Wang, L.,Qi, J., Wang, D., Feng, M., Lu, H.: Structured siamese network for real-time visual tracking. In: European Conference on Computer Vision (ECCV) (2018)
Bo, L., Yan, J., Wei, W., Zheng, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25(2), 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Rabinovich, A.: Going deeper with convolutions. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2016)
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2020)
Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: Tracking by Attention Using Spatio-Temporal Features. arXiv preprint arXiv:2011.09524 (2020)
Tao, R., Gavves, E., Smeulders, A. W. M.: Siamese instance search for tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Bertinetto, L., Valmadre, J., Henriques, Joo F., Vedaldi, A., Torr, Phs.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision (ECCV) (2016)
Valmadre, J., Bertinetto, L., Henriques, J. F., Vedaldi, A., Torr, Phs.: End-to-end representation learning for correlation filter based tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: European Conference on Computer Vision (ECCV) (2018)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: evolution of siamese visual tracking with very deep networks. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2020)
Jie, H., Li, S., Gang, S., Albanie, S.: Squeeze-and-excitation networks. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)
Woo, S., Park, J., Lee, J. Y., Kweon, I. S.: Cbam: convolutional block attention module. In: European Conference on Computer Vision (ECCV) (2018)
Fei, W., Jiang, M., Chen, Q., Yang, S., Tang, X.: Residual attention network for image classification. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2017)
Wang, X, Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Kopuklu, O., Kose, N., Gunduz, A., Rigoll, G.: Resource efficient 3d convolutional neural networks. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2019)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: International Conference on Computer Vision (ICCV) (2015)
Simon, M., Amende, K., Kraus, A., Honer, J., Smann, T., Kaulbersch, H., Milz, S., Gross, H.M.: Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2020)
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2019)
Zhu, T., Xing, J., Qiang, W., Lang, C., Yi, J.: Robust object tracking based on temporal and spatial deep networks. In: International Conference on Computer Vision (ICCV) (2017)
Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2018)
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2019
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 1–42 (2014)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European Conference on Computer Vision (ECCV) (2016)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: International Conference on Computer Vision (ICCV) (2020)
Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: International Conference on Computer Vision (ICCV) (2015)
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O.,Torr, Phs.: Staple: Complementary learners for real-time tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Hong, Z., Zhe, C., Wang, C., Xue, M., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2015)
Danelljan, M., Häger, G, Khan, F. S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (BMVC) (2014)
Sauer, A., Aljalbout, E., Haddadin, S.: Tracking holistic object representations. In: British Machine Vision Conference (BMVC) (2019)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision (ECCV) (2016)
Danelljan, M., Robinson, A., Khan, F. S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision (ECCV) (2016)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: efficient convolution operators for tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2016)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2016)
Zhang, J., Ma, S., Sclaroff, S.: Meem: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision (ECCV) (2014)
Shi, P., Yibing, S., Chao, M., Honggang, Z., Ming-Hsuan, Y.: Deep attentive tracking via reciprocative learning. arXiv preprint arXiv:1810.03851 (2018)
Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: International Conference on Computer Vision and Pattern Recogintion (CVPR) (2012)
Yang, L., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision (ECCV) (2014)
Acknowledgements
This work was supported by the New-Generation AI Major Scientific and Technological Special Project of Tianjin (18ZXZNGX00150) and the Special Foundation for Technology Innovation of Tianjin (21YDTPJC00250).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, R., Wen, X., Liu, Z. et al. STASiamRPN: visual tracking based on spatiotemporal and attention. Multimedia Systems 28, 1543–1555 (2022). https://doi.org/10.1007/s00530-021-00845-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00845-y