research-article

D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images

Authors:
Changfeng Feng

Army Engineering University, China

Army Engineering University, China

0000-0003-4413-1611
View Profile

,
Chunping Wang

University of Sanya, China

University of Sanya, China

0000-0002-3841-1919
View Profile

,
Qiang Fu

Army Engineering University, China

Army Engineering University, China

0000-0002-3831-9856
View Profile

,
Renke Kou

Army Engineering University, China

Army Engineering University, China

0000-0001-5893-3127
View Profile

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics ProcessingJanuary 2024Pages 369–376https://doi.org/10.1145/3647649.3647707

Published:03 May 2024Publication History

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

Pages 369–376

ABSTRACT

Objects in aerial images possess distinctive features such as large-scale variations, intricate backgrounds full of distractions, and versatile viewpoints. These factors present significant challenges for common object detectors based on convolutional networks. The latest transformer-based detector, DEtection TRansformer (DETR), performs impressively. Nevertheless, the attention mechanism's numerous operations limit its application to high-resolution aerial imagery. We have analyzed the design of the DETR-like detector and put forward Decoupled DETR (D2ETR), which aims to process multiscale feature information of aerial images efficiently to balance computational efficiency and accuracy. Our proposal involves an effective decoupled encoder that handles long multiscale feature sequences. The encoder comprises two modules: an attention-based semantic enhancement module and a convolution-based cross-scale fusion module. We have developed a feature stabilization module to counter feature information disorder caused by different processing mechanisms. In addition, we have applied a small-object-friendly loss function to the prediction layer to improve the model's ability to adapt to small targets in aerial images. Experimental results using the VisDrone and DIOR datasets demonstrate our approach reduces computation while maintaining the transformer's excellent performance.

References

Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, and Fahad Shahbaz Khan. 2023. Transformers in Remote Sensing: A Survey. Remote Sensing 15, 7 (March 2023), 1860. https://doi.org/10.3390/rs15071860Google ScholarCross Ref
Abdelmalek Bouguettaya, Hafed Zarzour, Ahmed Kechida, and Amine Mohammed Taberkit. 2022. Vehicle Detection From UAV Imagery With Deep Learning: A Review. IEEE Trans. Neural Netw. Learning Syst. 33, 11 (November 2022), 6047–6067. https://doi.org/10.1109/TNNLS.2021.3080276Google ScholarCross Ref
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. 6154–6162. . https://doi.org/10.1109/CVPR.2018.00644Google ScholarCross Ref
Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. 2023. Hierarchical Regression and Classification for Accurate Object Detection. IEEE Trans. Neural Netw. Learning Syst. 34, 5 (May 2023), 2425–2439. https://doi.org/10.1109/TNNLS.2021.3106641Google ScholarCross Ref
Xipeng Cao, Peng Yuan, Bailan Feng, and Kun Niu. 2022. CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence 36, (June 2022), 185–193. https://doi.org/10.1609/aaai.v36i1.19893Google ScholarCross Ref
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), 2020, Cham. Springer International Publishing, Cham, 213–229. . https://doi.org/10.1007/978-3-030-58452-8_13Google ScholarDigital Library
Linhui Dai, Hong Liu, Hao Tang, Zhiwei Wu, and Pinhao Song. 2023. AO2-DETR: Arbitrary-Oriented Object Detection Transformer. IEEE Trans. Circuits Syst. Video Technol. 33, 5 (May 2023), 2342–2356. https://doi.org/10.1109/TCSVT.2022.3222906Google ScholarDigital Library
Bowei Du, Yecheng Huang, Jiaxin Chen, and Di Huang. 2023. Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023. 13435–13444. . https://doi.org/10.1109/CVPR52729.2023.01291Google ScholarCross Ref
Wei Han, Jia Chen, Lizhe Wang, Ruyi Feng, Fengpeng Li, Lin Wu, Tian Tian, and Jining Yan. 2021. Methods for Small, Weak Object Detection in Optical High-Resolution Remote Sensing Images: A survey of advances and challenges. IEEE Geosci. Remote Sens. Mag. 9, 4 (December 2021), 8–34. https://doi.org/10.1109/MGRS.2020.3041450Google ScholarCross Ref
Yecheng Huang, Jiaxin Chen, and Di Huang. 2022. UFPMP-Det:Toward Accurate and Efficient Object Detection on Drone Imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, June 28, 2022. 1026–1033. . https://doi.org/10.1609/aaai.v36i1.19986Google ScholarCross Ref
Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, and Shanyue Guan. 2020. Density Map Guided Object Detection in Aerial Images. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020. 737–746. . https://doi.org/10.1109/CVPRW50498.2020.00103Google ScholarCross Ref
Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, and Lionel M. Ni. 2023. Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, March 13, 2023. IEEE Press, 18558–18567. . Retrieved April 3, 2023 from http://arxiv.org/abs/2303.07335Google Scholar
Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, and Lei Zhang. 2022. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 13609–13617. . https://doi.org/10.1109/CVPR52688.2022.01325Google ScholarCross Ref
Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, and Junwei Han. 2020. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 159, (January 2020), 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023Google ScholarCross Ref
Mengyuan Li, Changqing Cao, Zhejun Feng, Xiangkai Xu, Zengyan Wu, Shubing Ye, and Jiawei Yong. 2023. Remote Sensing Object Detection Based on Strong Feature Extraction and Prescreening Network. IEEE Geosci. Remote Sensing Lett. 20, (2023), 1–5. https://doi.org/10.1109/LGRS.2023.3236777Google ScholarCross Ref
Qingyun Li, Yushi Chen, and Ying Zeng. 2022. Transformer with Transfer CNN for Remote-Sensing-Image Object Detection. Remote Sensing 14, 4 (January 2022), 984. https://doi.org/10.3390/rs14040984Google ScholarCross Ref
Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, and Jian Yang. 2021. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection. 2021. 11632–11641. . https://doi.org/10.1109/CVPR46437.2021.01146Google ScholarCross Ref
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (February 2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826Google ScholarCross Ref
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. 8759–8768. . https://doi.org/10.1109/CVPR.2018.00913Google ScholarCross Ref
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A ConvNet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 11966–11976. . https://doi.org/10.1109/CVPR52688.2022.01167Google ScholarCross Ref
Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, and David Doermann. 2021. Oriented Object Detection with Transformer. https://doi.org/10.48550/arXiv.2106.03146Google ScholarCross Ref
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. https://doi.org/10.48550/arXiv.1804.02767Google ScholarCross Ref
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (June 2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarDigital Library
Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, and Ming-Hsuan Yang. 2022. An Extendable, Efficient and Effective Transformer-based Object Detector. April 17, 2022. arXiv. . https://doi.org/10.48550/arXiv.2204.07962Google ScholarCross Ref
Susquehanna University, Selinsgrove, Pennsylvania, USA and Edisanter Lo. 2019. Target Detection Algorithms in Hyperspectral Imaging Based on Discriminant Analysis. JOIG 7, 4 (2019), 140–144. https://doi.org/10.18178/joig.7.4.140-144Google ScholarCross Ref
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. 9626–9635. . https://doi.org/10.1109/ICCV.2019.00972Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), December 04, 2017, Red Hook, NY, USA. Curran Associates Inc., Red Hook, NY, USA, 6000–6010. .Google ScholarDigital Library
Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. 2020. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020. 1571–1580. . https://doi.org/10.1109/CVPRW50498.2020.00203Google ScholarCross Ref
Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. RepPoints: Point Set Representation for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. 9656–9665. . https://doi.org/10.1109/ICCV.2019.00975Google ScholarCross Ref
Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, and Jindong Chen. 2022. DETR++: Taming Your Multi-Scale Detection Transformer. https://doi.org/10.48550/arXiv.2206.02977Google ScholarCross Ref
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. July 11, 2022. . https://doi.org/10.48550/arXiv.2203.03605Google ScholarCross Ref
Yan Zhang, Xi Liu, Shiyun Wa, Shuyu Chen, and Qin Ma. 2022. GANsformer: A Detection Network for Aerial Images with High Performance Combining Convolutional Network and Transformer. Remote Sensing 14, 4 (January 2022), 923. https://doi.org/10.3390/rs14040923Google ScholarCross Ref
Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. 2022. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44, 11 (November 2022), 7380–7399. https://doi.org/10.1109/TPAMI.2021.3119563Google ScholarCross Ref
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In arXiv e-prints, 2021. arXiv. . https://doi.org/10.48550/arXiv.2010.04159Google ScholarCross Ref
Yixing Zhu, Jun Du, and Xueqing Wu. 2020. Adaptive Period Embedding for Representing Oriented Objects in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing 58, 10 (October 2020), 7247–7257. https://doi.org/10.1109/TGRS.2020.2981203Google ScholarCross Ref

Index Terms

D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
        Object recognition

Recommendations

Adaptive dynamic networks for object detection in aerial images
Highlights
- Adaptively allocate computing resource to input regions for better network inference.
- Patch sampling algorithm reduces redundant calculation costs in overlapping regions.
- Comparable performance is achieved on two datasets by ...
Graphical abstract

Display Omitted

Abstract
In this paper, we propose an entropy-dynamic resolution detection (EDRdet) method for object detection in aerial images. Most conventional object detection methods usually detect each region in aerial images directly with a fixed resolution, so ...
Read More
Multi-scale Fusion based Multi-stage Small Object Detection in Aerial Images ∗
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

In aerial images, the objects are mostly small. The number of objects is large and the scale is diverse, so it is difficult to extract the features of multiple scale objects at the same time. The location distribution of object in aerial images is ...
Read More
An algorithm for automatic detection of runways in aerial images

This paper presents an automatic algorithm to detect runways of the military airport in aerial images. Firstly, we design a model of runway based on its features. Then, we find the runway in a hypothesis and test paradigm. Hypotheses are formed by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
January 2024
480 pages
ISBN:9798400716720
DOI:10.1145/3647649

Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 11
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive dynamic networks for object detection in aerial images

Multi-scale Fusion based Multi-stage Small Object Detection in Aerial Images ∗

An algorithm for automatic detection of runways in aerial images

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive dynamic networks for object detection in aerial images

Multi-scale Fusion based Multi-stage Small Object Detection in Aerial Images ∗

An algorithm for automatic detection of runways in aerial images

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media