research-article

Accurate UAV Tracking with Distance-Injected Overlap Maximization

Authors:

Dan ZengAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 565 - 573

https://doi.org/10.1145/3394171.3413959

Published: 12 October 2020 Publication History

Abstract

UAV tracking is usually challenged by the dual-dynamic disturbances that arise from not only diverse moving target but also motion camera, leading to a more serious model drift issue than traditional visual tracking. In this work, we propose to alleviate this issue with distance-injected overlap maximization. Our idea is improving the accuracy of target localization by deriving a conceptually simple target localization loss and a global feature recalibration scheme in a mutual reinforced way. In particular, the target localization loss is designed by simply incorporating the normalized distance of target offset and generic semantic IoU loss, resulting in the distance-injected semantic IoU loss, and its minimal solution can alleviate the drift problem caused by camera motion. Moreover, the deep feature extractor is reconstructed and alternated with a feature recalibration network, which can leverage the global information to recalibrate significant features and suppress negligible features. Following by multi-scale feature concat, the proposed tracker can improve the discriminative capability of feature representation for UAV targets on the fly. Extensive experimental results on four benchmarks, i.e. UAV123, UAVDT, DTB70, and VisDrone, demonstrate the superiority of the proposed tracker against existing state-of-the-arts on UAV tracking.

References

[1]

L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H.S. Torr. 2016. Fully-convolutional siamese networks for object tracking. In European Conference on Computer Vision Workshops. 850--865.

[2]

G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte. 2019. Learning discriminative model prediction for tracking. In IEEE International Conference on Computer Vision. 6181--6190.

[3]

G. Bhat, J. Johnander, M. Danelljan, F. S. Khan, and M. Felsberg. 2018. Unveiling the Power of Deep Tracking. In European Conference on Computer Vision. 493--509.

[4]

M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2017. ECO: Efficient Convolution Operators for Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 6931--6939.

[5]

M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2019. ATOM: Accurate Tracking by Overlap Maximization. In IEEE Conference on Computer Vision and Pattern Recognition. 4660--4669.

[6]

M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. 2015. Convolutional features for correlation filter based visual tracking. In IEEE International Conference on Computer Vision Workshops. 621--629.

[7]

M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg. 2017. Discriminative Scale Space Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 8 (2017), 1561--1575.

Digital Library

[8]

M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In European Conference on Computer Vision. 472--488.

[9]

D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, W. Zhang, Q. Huang, and Q. Tian. 2018. The unmanned aerial vehicle benchmark: object detection and tracking. In European Conference on Computer Vision. 375--391.

[10]

H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, H. Bai, Y. Xu, C. Liao, and H. Ling. 2019. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 5369--5378.

[11]

H. Fan and H. Ling. 2017. Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking. In IEEE International Conference on Computer Vision. 5487--5495.

[12]

H. Fan and H. Ling. 2019. Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 7952--7961.

[13]

C. Fu, Y. Zhang, R. Duan, and Z. Xie. 2018. Robust Scalable Part-Based Visual Tracking for UAV with Background-Aware Correlation Filter. In IEEE International Conference on Robotics and Biomimetics. 2245--2252.

[14]

S. Ge, Z. Luo, C. Zhang, Y. Hua, and D. Tao. 2020. Distilling Channels for Efficient Deep Tracking. IEEE Transactions on Image Processing, Vol. 29 (2020), 2610--2621.

[15]

A. He, C. Luo, X. Tian, and W. Zeng. 2018. A Twofold Siamese Network for Real-Time Object Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4834--4843.

[16]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[17]

D. Held, S. Thrun, and S. Savarese. 2016. Learning to Track at 100 FPS with Deep Regression Networks. In European Conference on Computer Vision. 749--765.

[18]

J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. 2015. High speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 583--596.

[19]

J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-Excitation Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.

[20]

Z. Huang, C. Fu, Y. Li, F. Lin, and P. Lu. 2019. Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking. In IEEE International Conference on Computer Vision. 2891--2900.

[21]

B. Jiang, R. Luo, J. Mao, T. Xiao, and Y. Jiang. 2018. Acquisition of localization confidence for accurate object detection. In European Conference on Computer Vision. 816--832.

[22]

M. Kristan, A. Leonardis, J. Matas, and ηl. 2018. The sixth visual object tracking vot2018 challenge results. In European Conference on Computer Vision Workshops. 3--53.

[23]

B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. SiamRPN+: evolution of siamese visual tracking with very deep networks. In IEEE Conference on Computer Vision and Pattern Recognition. 4282--4291.

[24]

B. Li, J. Yan, W. Wu, and J. Yan. 2018. High Performance Visual Tracking with Siamese Region Proposal Network. In IEEE Conference on Computer Vision and Pattern Recognition. 8971--8980.

[25]

S. Li and D. Y. Yeung. 2017. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In AAAI Conference on Artificial Intelligenc. 4140--4146.

[26]

Y. Li and J. Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In European Conference on Computer Vision Workshops. 254--265.

[27]

T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. 2014. Microsoft COCO: common objects in context. In European Conference on Computer Vision. 740--755.

[28]

Z. Liu, L. Cheng, A. Liu, L. Zhang, X. He, and R. Zimmermann. 2017. Multiview and Multimodal Pervasive Indoor Localization. In ACM Multimedia. 109--117.

[29]

C. Ma, J. Huang, X. Yang, and M. H. Yang. 2015. Hierarchical convolutional features for visual tracking. In IEEE International Conference on Computer Vision. 3074--3082.

[30]

M. Matthias, S. Neil, and G. Bernard. 2016. A benchmark and simulator for uav tracking. In European Conference on Computer Vision. 445--461.

[31]

H. Nam and B. Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4293--4302.

[32]

H. Possegger, T. Mauthner, and H. Bischof. 2015. In defense of color-based model-free tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 2113--2120.

[33]

Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M. H. Yang. 2016. Hedged Deep Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4303--4311.

[34]

O. Russakovsky, J. Deng, H. Su, and ηl. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (2015), 211--252.

[35]

Y. Song, C. Ma, L. Gong, J. Zhang, R. Lau, and M. H. Yang. 2017. CREST: Convolutional residual learning for visual tracking. In IEEE International Conference on Computer Vision. 2574--2583.

[36]

Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, R. Lau, and M. H. Yang. 2018. Vital: Visual tracking via adversarial learning. In IEEE Confernce on Computer Vision and Pattern Recognition. 8990--8999.

[37]

C. Sun, H. Lu, and M. H. Yang. 2018. Learning spatial-aware regressions for visual tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 8962--8970.

[38]

J. Valmadre, L. Bertinetto, J. F. Henriques, A. Vedaldi, and P. H. S. Torr. 2017. End-to-end representation learning for correlation filter based tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 5000--5008.

[39]

N. Wang, J. Shi, A. Gupta, and D. Y. Yeung. 2015a. Transferring rich feature hierarchies for robust visual tracking. In arXiv.

[40]

N. Wang, J. Shi, D. Y. Yeung, and J. Jia. 2015b. Understanding and diagnosing visual tracking systems. In IEEE International Conference on Computer Vision. 3101--3109.

[41]

Q. Wang, Z. Teng, J. Xing, J. Gao, W. Hu, and S. Maybank. 2018a. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4854--4863.

[42]

Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. S. Torr. 2019. Fast online object tracking and segmentation: a unifying approach. In IEEE Conference on Computer Vision and Pattern Recognition. 4282--4291.

[43]

Q. Wang, M. Zhang, J. Xing, J. Gao, W. Hu, and S. J. Maybank. 2018b. Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking. In AAAI Conference on Artificial Intelligence. 985--991.

[44]

Y. Wu, J. Lim, and M. H. Yang. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.

Digital Library

[45]

T. Xu, Z. Feng, X. Wu, and J. Kittler. 2019. Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking. IEEE Transactions on Image Processing, Vol. 28, 11 (2019), 5596--5609.

Digital Library

[46]

J. Zhang, S. Ma, and S. Sclaroff. 2014. MEEM: Robust Tracking via Multiple Experts using Entropy Minimization. In European Conference on Computer Vision. 188--203.

[47]

Z. Zhang, H. Peng, and Q. Wang. 2019. Deeper and Wider Siamese Networks for Real-Time Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition. 4591--4600.

[48]

P. Zhu, L. Wen, X. Bian, and Q. Hu H. Ling. 2018b. Vision Meets Drones: A Challenge. In arXiv.

[49]

Z. Zhu, Q. Wang, B. Li, and W. Hu W. Wu, J. Yan. 2018a. Distractor-aware Siamese Networks for Visual Object Tracking. In European Conference on Computer Vision. 103--119.

Cited By

Jiang HZhang XXiang S(2024)Non-Maximum Suppression Guided Label Assignment for Object Detection in Crowd ScenesIEEE Transactions on Multimedia10.1109/TMM.2023.329333326(2207-2218)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3293333
Liu XWang ZWu YMiao Q(2024)SeGCN: A Semantic-Aware Graph Convolutional Network for UAV Geo-LocalizationIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.337061217(6055-6066)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3370612
Zhang CSun XYang YLiu LLiu QZhou XWang YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)All in One: Exploring Unified Vision-Language Tracking with Multi-Modal AlignmentProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611803(5552-5561)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611803
Show More Cited By

Index Terms

Accurate UAV Tracking with Distance-Injected Overlap Maximization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
    2. Knowledge representation and reasoning

Recommendations

Bad Snakes: Understanding and Improving Python Package Index Malware Scanning
ICSE '23: Proceedings of the 45th International Conference on Software Engineering

Open-source, community-driven package repositories see thousands of malware packages each year, but do not currently run automated malware detection systems. In this work, we explore the security goals of the repository administrators and the ...
Malicious Random-index PIR and its Application
ISCAI '23: Proceedings of the 2023 2nd International Symposium on Computing and Artificial Intelligence

Abstract. Private information retrieval (PIR) enables clients to retrieve data from a database without disclosing the specific entry being accessed. In this paper, we focus on a variant called random-index PIR (RPIR), here the retrieved index is an ...
ImageNet classification with deep convolutional neural networks

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang HZhang XXiang S(2024)Non-Maximum Suppression Guided Label Assignment for Object Detection in Crowd ScenesIEEE Transactions on Multimedia10.1109/TMM.2023.329333326(2207-2218)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3293333
Liu XWang ZWu YMiao Q(2024)SeGCN: A Semantic-Aware Graph Convolutional Network for UAV Geo-LocalizationIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.337061217(6055-6066)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3370612
Zhang CSun XYang YLiu LLiu QZhou XWang YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)All in One: Exploring Unified Vision-Language Tracking with Multi-Modal AlignmentProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611803(5552-5561)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611803
Zhang CHuang GLiu LHuang SYang YWan XGe STao D(2023)WebUAV-3 M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.3232854(1-18)Online publication date: 2023
https://doi.org/10.1109/TPAMI.2022.3232854
Liu YLi BSammut CYao L(2023)Multi-level Attention Network with Weather Suppression for All-Weather Action Detection in UAV Rescue ScenariosNeural Information Processing10.1007/978-981-99-8138-0_43(540-557)Online publication date: 26-Nov-2023
https://doi.org/10.1007/978-981-99-8138-0_43
Tian XShao JOuyang DShen H(2022)UAV-Satellite View Synthesis for Cross-View Geo-LocalizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.312198732:7(4804-4815)Online publication date: Jul-2022
https://doi.org/10.1109/TCSVT.2021.3121987
Kristan MMatas JLeonardis AFelsberg MPflugfelder RKamarainen JChang HDanelljan MZajc LLukezic ADrbohlav OKapyla JHager GYan SYang JZhang ZFernandez GAbdelpakey MBhat GCerkezi LCevikalp HChen SChen XCheng MCheng ZChiu YCirakman OCui YDai KDasari MDeng QDong XDu DDunnhofer MFeng ZFeng ZFu ZGe SGorthi RGu YGunsel BGuo QGurkan FHan WHuang YLawin FJhang SJi RJiang CJiang YJuefei-Xu FJun YKe XKhan FHak Kim BKittler JLan XLee JLeibe BLi HLi JLi XLi YLiu BLiu CLiu JLiu LLiu QLu HLu WLuiten JMa JMa ZMartinel NMayer CMemarmoghadam AMicheloni CNiu YPaudel DPeng HQiu SRajiv ARana MRobinson ASaribas HShao LShehata MShen FShen JSimonato KSong XTang ZTimofte RTorr PTsai CUzun BVan Gool LVoigtlaender PWang DWang GWang LWang LWang LWang LWang YWang YWu CWu GWu XXie FXu TXu XXue WYan BYang WYang XYe YYin JZhang CZhang CZhang HZhang KZhang KZhang XZhang XZhang XZhang ZZhao SZhen MZhong BZhu JZhu X(2021)The Ninth Visual Object Tracking VOT2021 Challenge Results2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00305(2711-2738)Online publication date: Oct-2021
https://doi.org/10.1109/ICCVW54120.2021.00305

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten