skip to main content
10.1145/3394171.3413959acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Accurate UAV Tracking with Distance-Injected Overlap Maximization

Published: 12 October 2020 Publication History

Abstract

UAV tracking is usually challenged by the dual-dynamic disturbances that arise from not only diverse moving target but also motion camera, leading to a more serious model drift issue than traditional visual tracking. In this work, we propose to alleviate this issue with distance-injected overlap maximization. Our idea is improving the accuracy of target localization by deriving a conceptually simple target localization loss and a global feature recalibration scheme in a mutual reinforced way. In particular, the target localization loss is designed by simply incorporating the normalized distance of target offset and generic semantic IoU loss, resulting in the distance-injected semantic IoU loss, and its minimal solution can alleviate the drift problem caused by camera motion. Moreover, the deep feature extractor is reconstructed and alternated with a feature recalibration network, which can leverage the global information to recalibrate significant features and suppress negligible features. Following by multi-scale feature concat, the proposed tracker can improve the discriminative capability of feature representation for UAV targets on the fly. Extensive experimental results on four benchmarks, i.e. UAV123, UAVDT, DTB70, and VisDrone, demonstrate the superiority of the proposed tracker against existing state-of-the-arts on UAV tracking.

References

[1]
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H.S. Torr. 2016. Fully-convolutional siamese networks for object tracking. In European Conference on Computer Vision Workshops. 850--865.
[2]
G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte. 2019. Learning discriminative model prediction for tracking. In IEEE International Conference on Computer Vision. 6181--6190.
[3]
G. Bhat, J. Johnander, M. Danelljan, F. S. Khan, and M. Felsberg. 2018. Unveiling the Power of Deep Tracking. In European Conference on Computer Vision. 493--509.
[4]
M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2017. ECO: Efficient Convolution Operators for Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 6931--6939.
[5]
M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2019. ATOM: Accurate Tracking by Overlap Maximization. In IEEE Conference on Computer Vision and Pattern Recognition. 4660--4669.
[6]
M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. 2015. Convolutional features for correlation filter based visual tracking. In IEEE International Conference on Computer Vision Workshops. 621--629.
[7]
M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg. 2017. Discriminative Scale Space Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 8 (2017), 1561--1575.
[8]
M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In European Conference on Computer Vision. 472--488.
[9]
D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, W. Zhang, Q. Huang, and Q. Tian. 2018. The unmanned aerial vehicle benchmark: object detection and tracking. In European Conference on Computer Vision. 375--391.
[10]
H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, H. Bai, Y. Xu, C. Liao, and H. Ling. 2019. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 5369--5378.
[11]
H. Fan and H. Ling. 2017. Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking. In IEEE International Conference on Computer Vision. 5487--5495.
[12]
H. Fan and H. Ling. 2019. Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 7952--7961.
[13]
C. Fu, Y. Zhang, R. Duan, and Z. Xie. 2018. Robust Scalable Part-Based Visual Tracking for UAV with Background-Aware Correlation Filter. In IEEE International Conference on Robotics and Biomimetics. 2245--2252.
[14]
S. Ge, Z. Luo, C. Zhang, Y. Hua, and D. Tao. 2020. Distilling Channels for Efficient Deep Tracking. IEEE Transactions on Image Processing, Vol. 29 (2020), 2610--2621.
[15]
A. He, C. Luo, X. Tian, and W. Zeng. 2018. A Twofold Siamese Network for Real-Time Object Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4834--4843.
[16]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[17]
D. Held, S. Thrun, and S. Savarese. 2016. Learning to Track at 100 FPS with Deep Regression Networks. In European Conference on Computer Vision. 749--765.
[18]
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. 2015. High speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 583--596.
[19]
J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-Excitation Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.
[20]
Z. Huang, C. Fu, Y. Li, F. Lin, and P. Lu. 2019. Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking. In IEEE International Conference on Computer Vision. 2891--2900.
[21]
B. Jiang, R. Luo, J. Mao, T. Xiao, and Y. Jiang. 2018. Acquisition of localization confidence for accurate object detection. In European Conference on Computer Vision. 816--832.
[22]
M. Kristan, A. Leonardis, J. Matas, and ηl. 2018. The sixth visual object tracking vot2018 challenge results. In European Conference on Computer Vision Workshops. 3--53.
[23]
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. SiamRPN+: evolution of siamese visual tracking with very deep networks. In IEEE Conference on Computer Vision and Pattern Recognition. 4282--4291.
[24]
B. Li, J. Yan, W. Wu, and J. Yan. 2018. High Performance Visual Tracking with Siamese Region Proposal Network. In IEEE Conference on Computer Vision and Pattern Recognition. 8971--8980.
[25]
S. Li and D. Y. Yeung. 2017. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In AAAI Conference on Artificial Intelligenc. 4140--4146.
[26]
Y. Li and J. Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In European Conference on Computer Vision Workshops. 254--265.
[27]
T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. 2014. Microsoft COCO: common objects in context. In European Conference on Computer Vision. 740--755.
[28]
Z. Liu, L. Cheng, A. Liu, L. Zhang, X. He, and R. Zimmermann. 2017. Multiview and Multimodal Pervasive Indoor Localization. In ACM Multimedia. 109--117.
[29]
C. Ma, J. Huang, X. Yang, and M. H. Yang. 2015. Hierarchical convolutional features for visual tracking. In IEEE International Conference on Computer Vision. 3074--3082.
[30]
M. Matthias, S. Neil, and G. Bernard. 2016. A benchmark and simulator for uav tracking. In European Conference on Computer Vision. 445--461.
[31]
H. Nam and B. Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4293--4302.
[32]
H. Possegger, T. Mauthner, and H. Bischof. 2015. In defense of color-based model-free tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 2113--2120.
[33]
Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M. H. Yang. 2016. Hedged Deep Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4303--4311.
[34]
O. Russakovsky, J. Deng, H. Su, and ηl. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (2015), 211--252.
[35]
Y. Song, C. Ma, L. Gong, J. Zhang, R. Lau, and M. H. Yang. 2017. CREST: Convolutional residual learning for visual tracking. In IEEE International Conference on Computer Vision. 2574--2583.
[36]
Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, R. Lau, and M. H. Yang. 2018. Vital: Visual tracking via adversarial learning. In IEEE Confernce on Computer Vision and Pattern Recognition. 8990--8999.
[37]
C. Sun, H. Lu, and M. H. Yang. 2018. Learning spatial-aware regressions for visual tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 8962--8970.
[38]
J. Valmadre, L. Bertinetto, J. F. Henriques, A. Vedaldi, and P. H. S. Torr. 2017. End-to-end representation learning for correlation filter based tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 5000--5008.
[39]
N. Wang, J. Shi, A. Gupta, and D. Y. Yeung. 2015a. Transferring rich feature hierarchies for robust visual tracking. In arXiv.
[40]
N. Wang, J. Shi, D. Y. Yeung, and J. Jia. 2015b. Understanding and diagnosing visual tracking systems. In IEEE International Conference on Computer Vision. 3101--3109.
[41]
Q. Wang, Z. Teng, J. Xing, J. Gao, W. Hu, and S. Maybank. 2018a. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4854--4863.
[42]
Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. S. Torr. 2019. Fast online object tracking and segmentation: a unifying approach. In IEEE Conference on Computer Vision and Pattern Recognition. 4282--4291.
[43]
Q. Wang, M. Zhang, J. Xing, J. Gao, W. Hu, and S. J. Maybank. 2018b. Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking. In AAAI Conference on Artificial Intelligence. 985--991.
[44]
Y. Wu, J. Lim, and M. H. Yang. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.
[45]
T. Xu, Z. Feng, X. Wu, and J. Kittler. 2019. Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking. IEEE Transactions on Image Processing, Vol. 28, 11 (2019), 5596--5609.
[46]
J. Zhang, S. Ma, and S. Sclaroff. 2014. MEEM: Robust Tracking via Multiple Experts using Entropy Minimization. In European Conference on Computer Vision. 188--203.
[47]
Z. Zhang, H. Peng, and Q. Wang. 2019. Deeper and Wider Siamese Networks for Real-Time Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4591--4600.
[48]
P. Zhu, L. Wen, X. Bian, and Q. Hu H. Ling. 2018b. Vision Meets Drones: A Challenge. In arXiv.
[49]
Z. Zhu, Q. Wang, B. Li, and W. Hu W. Wu, J. Yan. 2018a. Distractor-aware Siamese Networks for Visual Object Tracking. In European Conference on Computer Vision. 103--119.

Cited By

View all
  • (2024)Non-Maximum Suppression Guided Label Assignment for Object Detection in Crowd ScenesIEEE Transactions on Multimedia10.1109/TMM.2023.329333326(2207-2218)Online publication date: 2024
  • (2024)SeGCN: A Semantic-Aware Graph Convolutional Network for UAV Geo-LocalizationIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.337061217(6055-6066)Online publication date: 2024
  • (2023)All in One: Exploring Unified Vision-Language Tracking with Multi-Modal AlignmentProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611803(5552-5561)Online publication date: 26-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. UAV tracking
  2. feature recalibration
  3. target refinement

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Non-Maximum Suppression Guided Label Assignment for Object Detection in Crowd ScenesIEEE Transactions on Multimedia10.1109/TMM.2023.329333326(2207-2218)Online publication date: 2024
  • (2024)SeGCN: A Semantic-Aware Graph Convolutional Network for UAV Geo-LocalizationIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.337061217(6055-6066)Online publication date: 2024
  • (2023)All in One: Exploring Unified Vision-Language Tracking with Multi-Modal AlignmentProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611803(5552-5561)Online publication date: 26-Oct-2023
  • (2023)WebUAV-3 M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.3232854(1-18)Online publication date: 2023
  • (2023)Multi-level Attention Network with Weather Suppression for All-Weather Action Detection in UAV Rescue ScenariosNeural Information Processing10.1007/978-981-99-8138-0_43(540-557)Online publication date: 26-Nov-2023
  • (2022)UAV-Satellite View Synthesis for Cross-View Geo-LocalizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.312198732:7(4804-4815)Online publication date: Jul-2022
  • (2021)The Ninth Visual Object Tracking VOT2021 Challenge Results2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00305(2711-2738)Online publication date: Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media