skip to main content
10.1145/3581783.3612146acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

DANet: Multi-scale UAV Target Detection with Dynamic Feature Perception and Scale-aware Knowledge Distillation

Published: 27 October 2023 Publication History

Abstract

Multi-scale infrared unmanned aerial vehicle (UAV) targets (IRUTs) detection under dynamic scenarios remains a challenging task due to weak target features, varying shapes and poses, and complex background interference. Current detection methods find it difficult to address the above issues accurately and efficiently. In this paper, we design a dynamic attentive network (DANet) incorporating a scale-adaptive feature enhancement mechanism (SaFEM) and an attention-guided cross-weighting feature aggregator (ACFA). The SaFEM adaptively adjusts the network's receptive fields at hierarchical network levels leveraging separable deformable convolution (SDC), which enhances the network's multi-scale IRUT awareness. The ACFA, modulated by two crossing attention mechanisms, strengthens structural and semantic properties on neighboring levels for the accurate representation of multi-scale IRUT features from different levels. A plug-and-play anti-distractor contrastive regularization (ADCR) is also imposed on our DANet, which enforces similarity on features of targets and distractors from a new uncompressed feature projector (UFP) to increase the network's anti-distractor ability in complex backgrounds. To further increase the multi-scale UAV detection performance of DANet while maintaining its efficiency superiority, we propose a novel scale-specific knowledge distiller (SSKD) based on a divide-and-conquer strategy. For the "divide'' stage, we intendedly construct three task-oriented teachers to learn tailored knowledge for small-, medium-, and large-scale IRUTs. For the "conquer'' stage, we propose a novel element-wise attentive distillation module (EADM), where we employ a pixel-wise attention mechanism to highlight teacher and student IRUT features, and incorporate IRUT-associated prior knowledge for the collaborative transfer of refined multi-scale IRUT features to our DANet. Extensive experiments on real infrared UAV datasets demonstrate that our DANet is able to detect multi-scale UAVs with a satisfactory balance between accuracy and efficiency.

Supplemental Material

MP4 File
Here is a video description of our work "DANet: Multi-scale UAV Target Detection with Dynamic Feature Perception and Scale-aware Knowledge Distillation". Unmanned aerial vehicles (UAVs) have been popularly employed in different situations, posing significant threats to public security. Due to weak target features, variation in target scales, and distractors in complex backgrounds, infrared UAV detection tasks remain challenging. To address the challenges, we propose a novel DANet. In DANet, we design SaFEM to dynamically highlight features according to target scales, ACFA to fuse multi-level features with attentive directions, and ADCR to discriminate distractors and real UAVs by enforcing similarity computation. Furthermore, we introduce SSKD, adopting three teachers responsible for small-, medium-, and large-scale UAVs, respectively, and using them to improve the multi-scale detection performance of our DANet. Extensive experiments have verified the effectiveness of our work.

References

[1]
Cancan Chen, Runqiu Xia, Yang Liu, and Yue Liu. 2023. A Simplified Dual-Weighted Three-layer Window Local Contrast Method for Infrared Small Target Detection. IEEE Geoscience and Remote Sensing Letters 20 (2023), 1--5.
[2]
Fang Chen, Chenqiang Gao, Fangcen Liu, Yue Zhao, Yuxi Zhou, Deyu Meng, and Wangmeng Zuo. 2022. Local Patch Network with Global Attention for Infrared Small Target Detection. IEEE Transactions on Aerospace and Electronic Systems 58, 5 (2022), 3979--3991.
[3]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In International Conference on Machine Learning. PMLR, 1597--1607.
[4]
Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, and Ming-Hsuan Yang. 2022. Video Salient Object Detection via Contrastive Features and Attention Modules. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1320--1329.
[5]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision. 764--773.
[6]
Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. 2021. Asymmetric Contextual Modulation for Infrared Small Target Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 950--959.
[7]
Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. 2021. Attentional Local Contrast Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing 59, 11 (2021), 9813--9824.
[8]
Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. 2021. RepVGG: Making VGG-style Convnets Great Again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13733--13742.
[9]
Houzhang Fang, Zikai Liao, Xuhua Wang, Yi Chang, and Luxin Yan. 2023. Differentiated Attention Guided Network Over Hierarchical and Aggregated Features for Intelligent UAV Surveillance. IEEE Transactions on Industrial Informatics 19, 9 (2023), 9909--9920.
[10]
Houzhang Fang, Mingjiang Xia, Gang Zhou, Yi Chang, and Luxin Yan. 2021. Infrared Small UAV Target Detection based on Residual Image Prediction via Global and Local Dilated Residual Networks. IEEE Geoscience and Remote Sensing Letters 19 (2021), 1--5.
[11]
Zilin Gao, Jiangtao Xie, Qilong Wang, and Peihua Li. 2019. Global Second-order Pooling Convolutional Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024--3033.
[12]
Jianping Gou, Liyuan Sun, Baosheng Yu, Lan Du, Kotagiri Ramamohanarao, and Dacheng Tao. 2022. Collaborative Knowledge Distillation via Multiknowledge Transfer. IEEE Transactions on Neural Networks and Learning Systems (2022), 1--13.
[13]
Ismail Guvenc, Farshad Koohifar, Simran Singh, Mihail L. Sichitiu, and David Matolak. 2018. Detection, Tracking, and Interdiction for Amateur Drones. IEEE Communications Magazine 56, 4 (2018), 75--81.
[14]
Jinhui Han, Saed Moradi, Iman Faramarzi, Honghui Zhang, Qian Zhao, Xiaojian Zhang, and Nan Li. 2020. Infrared Small Target Detection based on the Weighted Strengthened Local Contrast Measure. IEEE Geoscience and Remote Sensing Letters 18, 9 (2020), 1670--1674.
[15]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 9729--9738.
[16]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 (2015).
[17]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017).
[18]
Hanzhe Hu, Jinshi Cui, and Liwei Wang. 2021. Region-Aware Contrastive Learning for Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16291--16301.
[19]
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2020. Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 8 (2020), 2011--2023.
[20]
Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. CCNet: Criss-Cross Attention for Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 603--612.
[21]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. Advances in Neural Information Processing Systems 33 (2020), 18661--18673.
[22]
Qizhen Lan and Qing Tian. 2022. Instance, Scale, and Teacher Adaptive Knowledge Distillation for Visual Detection in Autonomous Driving. IEEE Transactions on Intelligent Vehicles 8, 3 (2022), 2358--2370.
[23]
Boyang Li, Chao Xiao, Longguang Wang, Yingqian Wang, Zaiping Lin, Miao Li, Wei An, and Yulan Guo. 2022. Dense nested attention network for infrared small Target Detection. IEEE Transactions on Image Processing 32 (2022), 1745--1758.
[24]
Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, and Gang Wang. 2022. Knowledge Distillation via the Target-aware Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10915--10924.
[25]
Yuang Liu, Wei Zhang, and Jun Wang. 2020. Adaptive Multi-Teacher Multi-level Knowledge Distillation. Neurocomputing 415 (2020), 106--113.
[26]
Jianyuan Ni, Anne HH Ngu, and Yan Yan. 2022. Progressive Cross-modal Knowledge Distillation for Human Action Recognition. In Proceedings of the 30th ACM International Conference on Multimedia. 5903--5912.
[27]
Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, and Chunhua Shen. 2021. Channel-wise Knowledge Distillation for Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5311--5320.
[28]
Wonchul Son, Jaemin Na, Junyong Choi, and Wonjun Hwang. 2021. Densely guided knowledge distillation using multiple teacher assistants. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9395--9404.
[29]
Jingxian Sun, Lichao Zhang, Yufei Zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, and Yanning Zhang. 2021. Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking. In Proceedings of the 29th ACM International Conference on Multimedia. 2262--2270.
[30]
Zhicheng Sun and Yadong Mu. 2022. Patch-based Knowledge Distillation for Lifelong Person Re-Identification. In Proceedings of the 30th ACM International Conference on Multimedia. 696--707.
[31]
Fredrik Svanström, Fernando Alonso-Fernandez, and Cristofer Englund. 2021. A dataset for Multi-sensor drone detection. Data in Brief 39 (2021), 107521.
[32]
Maofeng Tang, Konstantinos Georgiou, Hairong Qi, Cody Champion, and Marc Bosch. 2023. Semantic Segmentation in Aerial Imagery Using Multi-Level Contrastive Learning With Local Consistency. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3798--3807.
[33]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2020. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2020), 1922--1933.
[34]
Huan Wang, Luping Zhou, and Lei Wang. 2019. Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8509--8518.
[35]
Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11534--11542.
[36]
Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. 2022. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. arXiv preprint arXiv:2211.05778 (2022).
[37]
Xuehui Wang, Kai Zhao, Ruixin Zhang, Shouhong Ding, Yan Wang, and Wei Shen. 2022. ContrastMask: Contrastive Learning to Segment Every Thing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11604--11613.
[38]
Yantao Wei, Xinge You, and Hong Li. 2016. Multiscale Patch-based Contrast Measure for Small Infrared Target Detection. Pattern Recognition 58 (2016), 216--226.
[39]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision. 3--19.
[40]
Lang Wu, Yong Ma, Fan Fan, Minghui Wu, and Jun Huang. 2020. A Double-Neighborhood Gradient Method for Infrared Small Target Detection. IEEE Geo-science and Remote Sensing Letters 18, 8 (2020), 1476--1480.
[41]
Yunkai Xu, Minjie Wan, Xiaojie Zhang, Jian Wu, Yili Chen, Qian Chen, and Guohua Gu. 2023. Infrared Small Target Detection Based on Local Contrast-Weighted Multidirectional Derivative. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1--16.
[42]
Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, and Chun Yuan. 2022. Focal and Global Knowledge Distillation for Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4643--4652.
[43]
Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. Reppoints: Point Set Representation for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9657--9666.
[44]
Lewei Yao, Renjie Pi, Hang Xu, Wei Zhang, Zhenguo Li, and Tong Zhang. 2021. G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3591--3600.
[45]
Linfeng Zhang and Kaisheng Ma. 2021. Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors. In International Conference on Learning Representations.
[46]
Landan Zhang and Zhenming Peng. 2019. Infrared Small Target Detection based on Partial Sum of the Tensor Nuclear Norm. Remote Sensing 11, 4 (2019), 382.
[47]
Mingjin Zhang, Haichen Bai, Jing Zhang, Rui Zhang, Chaoyue Wang, Jie Guo, and Xinbo Gao. 2022. RKformer: Runge-Kutta Transformer with Random-Connection Attention for Infrared Small Target Detection. In Proceedings of the 30th ACM International Conference on Multimedia. 1730--1738.
[48]
Mingjin Zhang, Ke Yue, Jing Zhang, Yunsong Li, and Xinbo Gao. 2022. Exploring Feature Compensation and Cross-level Correlation for Infrared Small Target Detection. In Proceedings of the 30th ACM International Conference on Multimedia. 1857--1865.
[49]
Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, and Jie Guo. 2022. ISNet: Shape Matters for Infrared Small Target Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 877--886.
[50]
Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z Li. 2020. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9759--9768.
[51]
Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11953--11962.
[52]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable Convnets v2: More Deformable, Better Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9308--9316

Cited By

View all
  • (2024)Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open WorldProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681212(1991-2000)Online publication date: 28-Oct-2024
  • (2024)Explore Hybrid Modeling for Moving Infrared Small Target DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680887(6172-6181)Online publication date: 28-Oct-2024
  • (2024)SCINet: Spatial and Contrast Interactive Super-Resolution Assisted Infrared UAV Target DetectionIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.347178662(1-22)Online publication date: 2024

Index Terms

  1. DANet: Multi-scale UAV Target Detection with Dynamic Feature Perception and Scale-aware Knowledge Distillation

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MM '23: Proceedings of the 31st ACM International Conference on Multimedia
        October 2023
        9913 pages
        ISBN:9798400701085
        DOI:10.1145/3581783
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 October 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. attention mechanism
        2. contrastive learning
        3. knowledge distillation
        4. multi-scale infrared target detection
        5. unmanned aerial vehicle

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        MM '23
        Sponsor:
        MM '23: The 31st ACM International Conference on Multimedia
        October 29 - November 3, 2023
        Ottawa ON, Canada

        Acceptance Rates

        Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)106
        • Downloads (Last 6 weeks)9
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open WorldProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681212(1991-2000)Online publication date: 28-Oct-2024
        • (2024)Explore Hybrid Modeling for Moving Infrared Small Target DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680887(6172-6181)Online publication date: 28-Oct-2024
        • (2024)SCINet: Spatial and Contrast Interactive Super-Resolution Assisted Infrared UAV Target DetectionIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.347178662(1-22)Online publication date: 2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media