Abstract
The application of visual tracking down unmanned aerial vehicles (UAVs) is an important research direction. Although many existing UAVs visual trackers exploit the features of deep convolution to effectively improve the robustness of trackers, the target features extracted by convolutional neural network (CNN) are difficult to distinguish when facing occlusion, illumination variation, viewpoint change, deformation, and scale variation. Especially for distractors (such as similar objects), these trackers cannot capture temporary appearance changes. In this work, we propose an efficient UAVs visual tracker, which can effectively alleviate the impact of occlusion, viewpoint change, and illumination. First, we stretch the width of the network to acquire affluent target appearance feature information. Then, we design an attention information fusion module (AIFM) to enhance feature extraction, which can effectively establish the correspondence relationship of long-range pixel pairs between the template frame and the detection frame. The ability of the tracker to distinguish the target can be effectively improved through suppressing the global background response. Furthermore, we design a multi-spectral information fusion module (MSIFM) to dynamically learn the appearance features of the detection frame target corresponding to the template frame features, which can improve the prediction accuracy of the bounding box. Finally, the distance intersection over union is employed to evaluate the object location and complete the prediction of the bounding box. Abundant experiments demonstrate that the proposed method has powerful tracking performance in a diversity of UAVs scenarios.
Similar content being viewed by others
References
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13-es
Tian B, Yao Q, Gu Y et al (2011) Video processing techniques for traffic flow monitoring: a survey. In: International IEEE conference on intelligent transportation systems (ITSC), pp 1103–1108
Cheng H, Lin L, Zheng Z et al (2017) An autonomous vision-based target tracking system for rotorcraft unmanned aerial vehicles. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 1732–1838
Bonatti R, Ho C, Wang W et al (2019) Towards a robust aerial cinematography platform: localizing and tracking moving targets in unstructured environments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 229–236
Fu C, Carrio A, Olivares-Mendez MA et al (2014) Robust real-time vision-based aircraft tracking from unmanned aerial vehicles In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 5441–5446
Bolme DS, Beveridge JR, Draper BA et al (2010) Visual object tracking using adaptive correlation filters. In: IEEE computer society conference on computer vision and pattern recognition, pp 2544–2550
Henriques JF, Caseiro R, Martins P et al (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, pp 702–715
Henriques JF, Caseiro R, Martins P et al (2014) High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Danelljan M, Häger G, Khan F et al (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham. BMVA Press
Kiani Galoogahi H, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (ICCV), pp 1135–1143
Li F, Tian C, Zuo W et al (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4904–4913
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37:583–596
Zhang X, Xia GS, Lu Q et al (2018) Visual object tracking by correlation filters and online learning. ISPRS J Photogramm Remote Sens 140:77–89
Danelljan M, Shahbaz Khan F, Felsberg M et al (2014) Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1090–1097
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1907–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Bertinetto L, Valmadre J, Henriques JF et al (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp 850–865
Huang Z, Fu C, Li Y et al (2019) Learning aberrance repressed correlation filters for real-time uav tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 2891–2900
Wang N, Song Y, Ma C et al (2019) Unsupervised deep tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1308–1317
Li Y, Fu C, Ding F et al (2020) AutoTrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11923–11932
Cao Z, Fu C, Ye J et al (2021) HiFT: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 15457–15466
Li B, Wu W, Wang Q et al (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4282–4291
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations (ICLR). arXiv preprint arXiv:1511.07122
Zheng Z, Wang P, Liu W et al (2020) Distance-IoU loss: faster and better learning for bounding box regression. Proc AAAI Conf Artif Intell 34(07):12993–13000
Li, Y, Zhu, J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, pp 254–265
Danelljan M, Robinson A, Khan F S et al (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision, pp 472–488
Danelljan M, Bhat G, Shahbaz KhanF, et al (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6638–6646
Li X, Ma C, Wu B et al (2019) Target-aware deep tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1369–1378
Bertinetto L, Valmadre J, Golodetz S et al (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1401–1409
Hong Z, Chen Z, Wang C et al (2015) Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 749–758
Wang N, Zhou W, Tian Q et al (2018) Multi-cuecorrelationfiltersfor robust visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4844–4853
Zhang T, Xu C, Yang MH (2017) Multi-task correlation particle filter for robust object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4335–4343
Wang Y, Luo X, Ding L et al (2019) Adaptive sampling for UAV tracking. Neural Comput Appl 31(9):5029–5043
Guo Q, Feng W, Zhou C et al (2017) Learning Dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1781–1789
Li B, Yan J, Wu W et al (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8971–8980
Zhu Z, Wang Q, Li B et al (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4591–4600
Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region Proposa lNetworks. Adv Neural Inf Process Syst (NeurIPS) 28:91–99
Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1492–1500
Guo D, Wang J, Cui Y et al (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6269–6277
Tian Z, Shen C, Chen H et al (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (CVPR), pp 9627–9636
Yang K, He Z, Pei W et al (2021) SiamCorners: Siamese corner networks for visual tracking. IEEE Trans Multim. https://doi.org/10.1109/TMM.2021.3074239
Danelljan M, Bhat G, Khan, FS et al (2019) ATOM: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4660–4669
Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7183–7192
Hou Q, Zhang L, Cheng MM et al (2020) Strip pooling: Rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4003–4012
Qilong W, Banggu W, Pengfei Z et al (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Cao, Y, Xu, J, Lin S et al (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE international conference on computer vision workshops (ICCV)
Qin Z, Zhang P, Wu F et al (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 783–792
Gao Z, Xie J, Wang Q et al (2019) Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3024–3033
Wang Q, Teng Z, Xing J et al (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4854–4863
Zhao H, Zhang Y, Liu S et al (2018) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283
Zhang H, Zu K, Lu J et al (2021) Epsanet: An efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Real E, Shlens J, Mazzocchi S et al (2017) Youtube-boundingboxes: a large high-precision humanannotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5296–5305
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision, pp 445–461
Du D, Qi Y, Yu H et al (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386
Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: Proceedings of the AAAI conference on artificial intelligence (AAAI)
Danelljan M, Hager G, Shahbaz Khan F et al (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4310–4318
Li F, Yao Y, Li P et al (2017) Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. In: Proceedings of international conference on computer vision workshops (ICCV), pp 2001–2009
Valmadre J, Bertinetto L, Henriques J et al (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2805–2813
Danelljan M, Häger G, Khan FS et al (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Zhang L, Suganthan PN (2017) Robust visual tracking via co-trained kernelized correlation filters. Pattern Recogn 69:82–93
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4293–4302
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. European conference on computer vision. Springer, Cham, pp 749–765
Wang C, Zhang L, Xie L et al (2018) Kernel cross-correlator. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 4179–4186
Yun S, Choi J, Yoo Y, et al (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2711–2720
Dai K, Wang D, Lu H et al (2019) Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4670–4679
Song Y, Ma C, Gong L, et al (2017) Crest: Convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2555–2564
Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. In: IEEE transactions on pattern analysis and machine intelligence, pp 1562–1577
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos.62006200), the Key Projects in High-tech Field of Sichuan Province(Nos.2022YFG0117), the Special Project of Science and Technology Strategic Cooperation between Nanchong City and Southwest Petroleum University(Nos.SXHZ026, SXJBS002, SXHZ053). We are very grateful to the anonymous reviewers for their efforts to help us improve our work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that we have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, S., Xu, J., Chen, H. et al. High-performance UAVs visual tracking using deep convolutional feature. Neural Comput & Applic 34, 13539–13558 (2022). https://doi.org/10.1007/s00521-022-07181-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07181-w