Learning convolutional self-attention module for unmanned aerial vehicle tracking

Wang, Jun; Meng, Chenchen; Deng, Chengzhi; Wang, Yuanyun

doi:10.1007/s11760-022-02449-z

Learning convolutional self-attention module for unmanned aerial vehicle tracking

Original Paper
Published: 22 December 2022

Volume 17, pages 2323–2331, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jun Wang¹,
Chenchen Meng¹,
Chengzhi Deng¹ &
…
Yuanyun Wang¹

217 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Siamese network-based trackers have been proven to maintain splendid performance. Recently, visual tracking has been applied in unmanned aerial vehicle(UAV) tasks. However, it is a challenging task because of the influences by aspect ratio changes, out-of-view and scale variation, etc. Some Siamese-based trackers ignore context-related information generated in the time dimension of continuous frames, lose a lot of foreground information and generate redundant background information. In this paper, we propose a novel the feature fusion network based on convolutional self-attention blocks. The convolutional self-attention blocks are composed of ResNet bottleneck blocks with multi-head self-attention (MHSA) blocks. We eliminate the spatial (\(3\times 3\)) convolution operator limitation through the MHSA blocks in the last stage bottleneck blocks of ResNet. Convolutional self-attention blocks capture the global context-related information of the given target images and further improve the accuracy of global match between a given target and a search region. Extensive experimental evaluations on OTB2015 and four UAV benchmarks, i.e., UAV123, UAV20L, DTB70 and UAV123@10fps. The experimental results demonstrate that the proposed tracker can achieve excellent performances against SOTA trackers for UAV tracking and lead to real-time average tracking speed of 181fps on a single GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-performance UAVs visual tracking using deep convolutional feature

Article 30 March 2022

Shuaidong Yang, Jin Xu, … Min Wang

Siamese object tracking for unmanned aerial vehicle: a review and comprehensive analysis

Article 27 July 2023

Changhong Fu, Kunhan Lu, … Geng Lu

High-performance UAVs visual tracking based on siamese network

Article 13 August 2021

Shuaidong Yang, Haiyun Chen, … Jiemin Yuan

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 211–252 (2015)
Article MathSciNet Google Scholar
Wang, J., Meng, C., Deng, C., Wang, Y.: Learning attentionmodules for visual tracking. Signal Image Video Process. (2022). https://doi.org/10.1007/s11760-022-02177-4
Article Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision. Springer, pp. 850–865 (2016)
Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.-H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4904–4913 (2018)
Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable Siamese attention networks for visual object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6728–6737 (2020)
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8126–8135 (2021)
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: IEEE International Conference on Computer Vision, pp. 3286–3295 (2019)
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. arXiv preprint arXiv:1906.05909 (2019)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 1834–1848 (2015)
Article Google Scholar
Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: AAAI Conference on Artificial Intelligence (2017)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European Conference on Computer Vision, pp. 445–461 (2016)
Li, Y., Zhu, J., Hoi, S.C., Song, W., Wang, Z., Liu, H.: Robust estimation of similarity transformation for visual object tracking. In: AAAI Conference on Artificial Intelligence, pp. 8666–8673 (2019)
Wang, N., Song, Y., Ma, C., Zhou, W., Liu, W., Li, H.: Unsupervised deep tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1308–1317 (2019)
Choi, J., Chang, H.J., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., Choi, J.Y.: Context-aware deep feature compression for high-speed visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 479–488 (2018)
Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1369–1378 (2019)
Dunnhofer, M., Martinel, N., Micheloni, C.: Tracking-by-trackers with a distilled and reinforced model. In: Asian Conference on Computer Vision (2020)
Pu, S., Song, Y., Ma, C., Zhang, H., Yang, M.-H.: Learning recurrent memory activation networks for visual tracking. In: IEEE Transactions on Image Processing, vol. 30. IEEE, pp. 725–738 (2021)
Lu, X., Ma, C., Shen, J., Yang, X., Reid, I., Yang, M.-H.: Deep object tracking with shrinkage loss. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE (2020)
Abdelpakey, M.H., Shehata, M.S.: Dp-siam: Dynamic policy Siamese network for robust object tracking. In: IEEE Transactions on Image Processing, vol. 29. IEEE, pp. 1479–1492 (2019)
Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: Autotrack: towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11923–11932 (2020)
Huang, Z., Fu, C., Li, Y., Lin, F., Lu, P.: Learning aberrance repressed correlation filters for real-time uav tracking. In: IEEE International Conference on Computer Vision, pp. 2891–2900 (2019)
Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., Li, H.: Multi-cue correlation filters for robust visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4844–4853 (2018)
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: hierarchical feature transformer for aerial tracking. In: IEEE International Conference on Computer Vision, pp. 15457–15466 (2021)
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)
Zheng, G., Fu, C., Ye, J., Lin, F., Ding, F.: Mutation sensitive correlation filter for real-time uav tracking with adaptive hybrid label. arXiv preprint arXiv:2106.08073 (2021)
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5296–5305 (2017)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No: 61861032, 61865012), and by the Jiangxi Science and Technology Research Project of Education within the Department of China (No: GJJ190955).

Author information

Authors and Affiliations

School of Information Engineering, Nanchang Institute of Technology, Nanchang, 330029, China
Jun Wang, Chenchen Meng, Chengzhi Deng & Yuanyun Wang

Authors

Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chenchen Meng
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhi Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanyun Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Meng, C., Deng, C. et al. Learning convolutional self-attention module for unmanned aerial vehicle tracking. SIViP 17, 2323–2331 (2023). https://doi.org/10.1007/s11760-022-02449-z

Download citation

Received: 14 March 2022
Revised: 17 June 2022
Accepted: 13 December 2022
Published: 22 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11760-022-02449-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning convolutional self-attention module for unmanned aerial vehicle tracking

Abstract

Access this article

Similar content being viewed by others

High-performance UAVs visual tracking using deep convolutional feature

Siamese object tracking for unmanned aerial vehicle: a review and comprehensive analysis

High-performance UAVs visual tracking based on siamese network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning convolutional self-attention module for unmanned aerial vehicle tracking

Abstract

Access this article

Similar content being viewed by others

High-performance UAVs visual tracking using deep convolutional feature

Siamese object tracking for unmanned aerial vehicle: a review and comprehensive analysis

High-performance UAVs visual tracking based on siamese network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation