Skip to main content
Log in

High-performance UAVs visual tracking using deep convolutional feature

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The application of visual tracking down unmanned aerial vehicles (UAVs) is an important research direction. Although many existing UAVs visual trackers exploit the features of deep convolution to effectively improve the robustness of trackers, the target features extracted by convolutional neural network (CNN) are difficult to distinguish when facing occlusion, illumination variation, viewpoint change, deformation, and scale variation. Especially for distractors (such as similar objects), these trackers cannot capture temporary appearance changes. In this work, we propose an efficient UAVs visual tracker, which can effectively alleviate the impact of occlusion, viewpoint change, and illumination. First, we stretch the width of the network to acquire affluent target appearance feature information. Then, we design an attention information fusion module (AIFM) to enhance feature extraction, which can effectively establish the correspondence relationship of long-range pixel pairs between the template frame and the detection frame. The ability of the tracker to distinguish the target can be effectively improved through suppressing the global background response. Furthermore, we design a multi-spectral information fusion module (MSIFM) to dynamically learn the appearance features of the detection frame target corresponding to the template frame features, which can improve the prediction accuracy of the bounding box. Finally, the distance intersection over union is employed to evaluate the object location and complete the prediction of the bounding box. Abundant experiments demonstrate that the proposed method has powerful tracking performance in a diversity of UAVs scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13-es

  2. Tian B, Yao Q, Gu Y et al (2011) Video processing techniques for traffic flow monitoring: a survey. In: International IEEE conference on intelligent transportation systems (ITSC), pp 1103–1108

  3. Cheng H, Lin L, Zheng Z et al (2017) An autonomous vision-based target tracking system for rotorcraft unmanned aerial vehicles. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 1732–1838

  4. Bonatti R, Ho C, Wang W et al (2019) Towards a robust aerial cinematography platform: localizing and tracking moving targets in unstructured environments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 229–236

  5. Fu C, Carrio A, Olivares-Mendez MA et al (2014) Robust real-time vision-based aircraft tracking from unmanned aerial vehicles In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 5441–5446

  6. Bolme DS, Beveridge JR, Draper BA et al (2010) Visual object tracking using adaptive correlation filters. In: IEEE computer society conference on computer vision and pattern recognition, pp 2544–2550

  7. Henriques JF, Caseiro R, Martins P et al (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, pp 702–715

  8. Henriques JF, Caseiro R, Martins P et al (2014) High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  9. Danelljan M, Häger G, Khan F et al (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham. BMVA Press

  10. Kiani Galoogahi H, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (ICCV), pp 1135–1143

  11. Li F, Tian C, Zuo W et al (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4904–4913

  12. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37:583–596

    Article  Google Scholar 

  13. Zhang X, Xia GS, Lu Q et al (2018) Visual object tracking by correlation filters and online learning. ISPRS J Photogramm Remote Sens 140:77–89

    Article  Google Scholar 

  14. Danelljan M, Shahbaz Khan F, Felsberg M et al (2014) Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1090–1097

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1907–1105

    Google Scholar 

  16. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556

  17. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  18. Bertinetto L, Valmadre J, Henriques JF et al (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp 850–865

  19. Huang Z, Fu C, Li Y et al (2019) Learning aberrance repressed correlation filters for real-time uav tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 2891–2900

  20. Wang N, Song Y, Ma C et al (2019) Unsupervised deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1308–1317

  21. Li Y, Fu C, Ding F et al (2020) AutoTrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11923–11932

  22. Cao Z, Fu C, Ye J et al (2021) HiFT: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 15457–15466

  23. Li B, Wu W, Wang Q et al (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4282–4291

  24. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations (ICLR). arXiv preprint arXiv:1511.07122

  25. Zheng Z, Wang P, Liu W et al (2020) Distance-IoU loss: faster and better learning for bounding box regression. Proc AAAI Conf Artif Intell 34(07):12993–13000

    Google Scholar 

  26. Li, Y, Zhu, J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, pp 254–265

  27. Danelljan M, Robinson A, Khan F S et al (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision, pp 472–488

  28. Danelljan M, Bhat G, Shahbaz KhanF, et al (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6638–6646

  29. Li X, Ma C, Wu B et al (2019) Target-aware deep tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1369–1378

  30. Bertinetto L, Valmadre J, Golodetz S et al (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1401–1409

  31. Hong Z, Chen Z, Wang C et al (2015) Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 749–758

  32. Wang N, Zhou W, Tian Q et al (2018) Multi-cuecorrelationfiltersfor robust visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4844–4853

  33. Zhang T, Xu C, Yang MH (2017) Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4335–4343

  34. Wang Y, Luo X, Ding L et al (2019) Adaptive sampling for UAV tracking. Neural Comput Appl 31(9):5029–5043

    Article  Google Scholar 

  35. Guo Q, Feng W, Zhou C et al (2017) Learning Dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1781–1789

  36. Li B, Yan J, Wu W et al (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8971–8980

  37. Zhu Z, Wang Q, Li B et al (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117

  38. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4591–4600

  39. Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region Proposa lNetworks. Adv Neural Inf Process Syst (NeurIPS) 28:91–99

    Google Scholar 

  40. Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1492–1500

  41. Guo D, Wang J, Cui Y et al (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6269–6277

  42. Tian Z, Shen C, Chen H et al (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (CVPR), pp 9627–9636

  43. Yang K, He Z, Pei W et al (2021) SiamCorners: Siamese corner networks for visual tracking. IEEE Trans Multim. https://doi.org/10.1109/TMM.2021.3074239

    Article  Google Scholar 

  44. Danelljan M, Bhat G, Khan, FS et al (2019) ATOM: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4660–4669

  45. Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7183–7192

  46. Hou Q, Zhang L, Cheng MM et al (2020) Strip pooling: Rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4003–4012

  47. Qilong W, Banggu W, Pengfei Z et al (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  48. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141

  49. Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  50. Cao, Y, Xu, J, Lin S et al (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE international conference on computer vision workshops (ICCV)

  51. Qin Z, Zhang P, Wu F et al (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 783–792

  52. Gao Z, Xie J, Wang Q et al (2019) Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3024–3033

  53. Wang Q, Teng Z, Xing J et al (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4854–4863

  54. Zhao H, Zhang Y, Liu S et al (2018) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283

  55. Zhang H, Zu K, Lu J et al (2021) Epsanet: An efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447

  56. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

  57. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  58. Real E, Shlens J, Mazzocchi S et al (2017) Youtube-boundingboxes: a large high-precision humanannotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5296–5305

  59. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision, pp 445–461

  60. Du D, Qi Y, Yu H et al (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386

  61. Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: Proceedings of the AAAI conference on artificial intelligence (AAAI)

  62. Danelljan M, Hager G, Shahbaz Khan F et al (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4310–4318

  63. Li F, Yao Y, Li P et al (2017) Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. In: Proceedings of international conference on computer vision workshops (ICCV), pp 2001–2009

  64. Valmadre J, Bertinetto L, Henriques J et al (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2805–2813

  65. Danelljan M, Häger G, Khan FS et al (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575

  66. Zhang L, Suganthan PN (2017) Robust visual tracking via co-trained kernelized correlation filters. Pattern Recogn 69:82–93

    Article  Google Scholar 

  67. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4293–4302

  68. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. European conference on computer vision. Springer, Cham, pp 749–765

    Google Scholar 

  69. Wang C, Zhang L, Xie L et al (2018) Kernel cross-correlator. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 4179–4186

  70. Yun S, Choi J, Yoo Y, et al (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2711–2720

  71. Dai K, Wang D, Lu H et al (2019) Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4670–4679

  72. Song Y, Ma C, Gong L, et al (2017) Crest: Convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2555–2564

  73. Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. In: IEEE transactions on pattern analysis and machine intelligence, pp 1562–1577

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos.62006200), the Key Projects in High-tech Field of Sichuan Province(Nos.2022YFG0117), the Special Project of Science and Technology Strategic Cooperation between Nanchong City and Southwest Petroleum University(Nos.SXHZ026, SXJBS002, SXHZ053). We are very grateful to the anonymous reviewers for their efforts to help us improve our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Xu.

Ethics declarations

Conflict of interest

All the authors declare that we have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Xu, J., Chen, H. et al. High-performance UAVs visual tracking using deep convolutional feature. Neural Comput & Applic 34, 13539–13558 (2022). https://doi.org/10.1007/s00521-022-07181-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07181-w

Keywords

Navigation