ABSTRACT
Objects in aerial images possess distinctive features such as large-scale variations, intricate backgrounds full of distractions, and versatile viewpoints. These factors present significant challenges for common object detectors based on convolutional networks. The latest transformer-based detector, DEtection TRansformer (DETR), performs impressively. Nevertheless, the attention mechanism's numerous operations limit its application to high-resolution aerial imagery. We have analyzed the design of the DETR-like detector and put forward Decoupled DETR (D2ETR), which aims to process multiscale feature information of aerial images efficiently to balance computational efficiency and accuracy. Our proposal involves an effective decoupled encoder that handles long multiscale feature sequences. The encoder comprises two modules: an attention-based semantic enhancement module and a convolution-based cross-scale fusion module. We have developed a feature stabilization module to counter feature information disorder caused by different processing mechanisms. In addition, we have applied a small-object-friendly loss function to the prediction layer to improve the model's ability to adapt to small targets in aerial images. Experimental results using the VisDrone and DIOR datasets demonstrate our approach reduces computation while maintaining the transformer's excellent performance.
- Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, and Fahad Shahbaz Khan. 2023. Transformers in Remote Sensing: A Survey. Remote Sensing 15, 7 (March 2023), 1860. https://doi.org/10.3390/rs15071860Google ScholarCross Ref
- Abdelmalek Bouguettaya, Hafed Zarzour, Ahmed Kechida, and Amine Mohammed Taberkit. 2022. Vehicle Detection From UAV Imagery With Deep Learning: A Review. IEEE Trans. Neural Netw. Learning Syst. 33, 11 (November 2022), 6047–6067. https://doi.org/10.1109/TNNLS.2021.3080276Google ScholarCross Ref
- Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. 6154–6162. . https://doi.org/10.1109/CVPR.2018.00644Google ScholarCross Ref
- Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. 2023. Hierarchical Regression and Classification for Accurate Object Detection. IEEE Trans. Neural Netw. Learning Syst. 34, 5 (May 2023), 2425–2439. https://doi.org/10.1109/TNNLS.2021.3106641Google ScholarCross Ref
- Xipeng Cao, Peng Yuan, Bailan Feng, and Kun Niu. 2022. CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence 36, (June 2022), 185–193. https://doi.org/10.1609/aaai.v36i1.19893Google ScholarCross Ref
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), 2020, Cham. Springer International Publishing, Cham, 213–229. . https://doi.org/10.1007/978-3-030-58452-8_13Google ScholarDigital Library
- Linhui Dai, Hong Liu, Hao Tang, Zhiwei Wu, and Pinhao Song. 2023. AO2-DETR: Arbitrary-Oriented Object Detection Transformer. IEEE Trans. Circuits Syst. Video Technol. 33, 5 (May 2023), 2342–2356. https://doi.org/10.1109/TCSVT.2022.3222906Google ScholarDigital Library
- Bowei Du, Yecheng Huang, Jiaxin Chen, and Di Huang. 2023. Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023. 13435–13444. . https://doi.org/10.1109/CVPR52729.2023.01291Google ScholarCross Ref
- Wei Han, Jia Chen, Lizhe Wang, Ruyi Feng, Fengpeng Li, Lin Wu, Tian Tian, and Jining Yan. 2021. Methods for Small, Weak Object Detection in Optical High-Resolution Remote Sensing Images: A survey of advances and challenges. IEEE Geosci. Remote Sens. Mag. 9, 4 (December 2021), 8–34. https://doi.org/10.1109/MGRS.2020.3041450Google ScholarCross Ref
- Yecheng Huang, Jiaxin Chen, and Di Huang. 2022. UFPMP-Det:Toward Accurate and Efficient Object Detection on Drone Imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, June 28, 2022. 1026–1033. . https://doi.org/10.1609/aaai.v36i1.19986Google ScholarCross Ref
- Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, and Shanyue Guan. 2020. Density Map Guided Object Detection in Aerial Images. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020. 737–746. . https://doi.org/10.1109/CVPRW50498.2020.00103Google ScholarCross Ref
- Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, and Lionel M. Ni. 2023. Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, March 13, 2023. IEEE Press, 18558–18567. . Retrieved April 3, 2023 from http://arxiv.org/abs/2303.07335Google Scholar
- Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, and Lei Zhang. 2022. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 13609–13617. . https://doi.org/10.1109/CVPR52688.2022.01325Google ScholarCross Ref
- Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, and Junwei Han. 2020. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 159, (January 2020), 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023Google ScholarCross Ref
- Mengyuan Li, Changqing Cao, Zhejun Feng, Xiangkai Xu, Zengyan Wu, Shubing Ye, and Jiawei Yong. 2023. Remote Sensing Object Detection Based on Strong Feature Extraction and Prescreening Network. IEEE Geosci. Remote Sensing Lett. 20, (2023), 1–5. https://doi.org/10.1109/LGRS.2023.3236777Google ScholarCross Ref
- Qingyun Li, Yushi Chen, and Ying Zeng. 2022. Transformer with Transfer CNN for Remote-Sensing-Image Object Detection. Remote Sensing 14, 4 (January 2022), 984. https://doi.org/10.3390/rs14040984Google ScholarCross Ref
- Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, and Jian Yang. 2021. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection. 2021. 11632–11641. . https://doi.org/10.1109/CVPR46437.2021.01146Google ScholarCross Ref
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (February 2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826Google ScholarCross Ref
- Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. 8759–8768. . https://doi.org/10.1109/CVPR.2018.00913Google ScholarCross Ref
- Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A ConvNet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 11966–11976. . https://doi.org/10.1109/CVPR52688.2022.01167Google ScholarCross Ref
- Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, and David Doermann. 2021. Oriented Object Detection with Transformer. https://doi.org/10.48550/arXiv.2106.03146Google ScholarCross Ref
- Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. https://doi.org/10.48550/arXiv.1804.02767Google ScholarCross Ref
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (June 2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarDigital Library
- Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, and Ming-Hsuan Yang. 2022. An Extendable, Efficient and Effective Transformer-based Object Detector. April 17, 2022. arXiv. . https://doi.org/10.48550/arXiv.2204.07962Google ScholarCross Ref
- Susquehanna University, Selinsgrove, Pennsylvania, USA and Edisanter Lo. 2019. Target Detection Algorithms in Hyperspectral Imaging Based on Discriminant Analysis. JOIG 7, 4 (2019), 140–144. https://doi.org/10.18178/joig.7.4.140-144Google ScholarCross Ref
- Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. 9626–9635. . https://doi.org/10.1109/ICCV.2019.00972Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), December 04, 2017, Red Hook, NY, USA. Curran Associates Inc., Red Hook, NY, USA, 6000–6010. .Google ScholarDigital Library
- Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. 2020. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020. 1571–1580. . https://doi.org/10.1109/CVPRW50498.2020.00203Google ScholarCross Ref
- Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. RepPoints: Point Set Representation for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. 9656–9665. . https://doi.org/10.1109/ICCV.2019.00975Google ScholarCross Ref
- Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, and Jindong Chen. 2022. DETR++: Taming Your Multi-Scale Detection Transformer. https://doi.org/10.48550/arXiv.2206.02977Google ScholarCross Ref
- Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. July 11, 2022. . https://doi.org/10.48550/arXiv.2203.03605Google ScholarCross Ref
- Yan Zhang, Xi Liu, Shiyun Wa, Shuyu Chen, and Qin Ma. 2022. GANsformer: A Detection Network for Aerial Images with High Performance Combining Convolutional Network and Transformer. Remote Sensing 14, 4 (January 2022), 923. https://doi.org/10.3390/rs14040923Google ScholarCross Ref
- Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. 2022. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44, 11 (November 2022), 7380–7399. https://doi.org/10.1109/TPAMI.2021.3119563Google ScholarCross Ref
- Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In arXiv e-prints, 2021. arXiv. . https://doi.org/10.48550/arXiv.2010.04159Google ScholarCross Ref
- Yixing Zhu, Jun Du, and Xueqing Wu. 2020. Adaptive Period Embedding for Representing Oriented Objects in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing 58, 10 (October 2020), 7247–7257. https://doi.org/10.1109/TGRS.2020.2981203Google ScholarCross Ref
Index Terms
- D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images
Recommendations
Adaptive dynamic networks for object detection in aerial images
Highlights- Adaptively allocate computing resource to input regions for better network inference.
- Patch sampling algorithm reduces redundant calculation costs in overlapping regions.
- Comparable performance is achieved on two datasets by ...
Graphical abstractDisplay Omitted
AbstractIn this paper, we propose an entropy-dynamic resolution detection (EDRdet) method for object detection in aerial images. Most conventional object detection methods usually detect each region in aerial images directly with a fixed resolution, so ...
Multi-scale Fusion based Multi-stage Small Object Detection in Aerial Images ∗
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringIn aerial images, the objects are mostly small. The number of objects is large and the scale is diverse, so it is difficult to extract the features of multiple scale objects at the same time. The location distribution of object in aerial images is ...
An algorithm for automatic detection of runways in aerial images
This paper presents an automatic algorithm to detect runways of the military airport in aerial images. Firstly, we design a model of runway based on its features. Then, we find the runway in a hypothesis and test paradigm. Hypotheses are formed by ...
Comments