skip to main content
10.1145/3647649.3647707acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images

Published:03 May 2024Publication History

ABSTRACT

Objects in aerial images possess distinctive features such as large-scale variations, intricate backgrounds full of distractions, and versatile viewpoints. These factors present significant challenges for common object detectors based on convolutional networks. The latest transformer-based detector, DEtection TRansformer (DETR), performs impressively. Nevertheless, the attention mechanism's numerous operations limit its application to high-resolution aerial imagery. We have analyzed the design of the DETR-like detector and put forward Decoupled DETR (D2ETR), which aims to process multiscale feature information of aerial images efficiently to balance computational efficiency and accuracy. Our proposal involves an effective decoupled encoder that handles long multiscale feature sequences. The encoder comprises two modules: an attention-based semantic enhancement module and a convolution-based cross-scale fusion module. We have developed a feature stabilization module to counter feature information disorder caused by different processing mechanisms. In addition, we have applied a small-object-friendly loss function to the prediction layer to improve the model's ability to adapt to small targets in aerial images. Experimental results using the VisDrone and DIOR datasets demonstrate our approach reduces computation while maintaining the transformer's excellent performance.

References

  1. Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, and Fahad Shahbaz Khan. 2023. Transformers in Remote Sensing: A Survey. Remote Sensing 15, 7 (March 2023), 1860. https://doi.org/10.3390/rs15071860Google ScholarGoogle ScholarCross RefCross Ref
  2. Abdelmalek Bouguettaya, Hafed Zarzour, Ahmed Kechida, and Amine Mohammed Taberkit. 2022. Vehicle Detection From UAV Imagery With Deep Learning: A Review. IEEE Trans. Neural Netw. Learning Syst. 33, 11 (November 2022), 6047–6067. https://doi.org/10.1109/TNNLS.2021.3080276Google ScholarGoogle ScholarCross RefCross Ref
  3. Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. 6154–6162. . https://doi.org/10.1109/CVPR.2018.00644Google ScholarGoogle ScholarCross RefCross Ref
  4. Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. 2023. Hierarchical Regression and Classification for Accurate Object Detection. IEEE Trans. Neural Netw. Learning Syst. 34, 5 (May 2023), 2425–2439. https://doi.org/10.1109/TNNLS.2021.3106641Google ScholarGoogle ScholarCross RefCross Ref
  5. Xipeng Cao, Peng Yuan, Bailan Feng, and Kun Niu. 2022. CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence 36, (June 2022), 185–193. https://doi.org/10.1609/aaai.v36i1.19893Google ScholarGoogle ScholarCross RefCross Ref
  6. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), 2020, Cham. Springer International Publishing, Cham, 213–229. . https://doi.org/10.1007/978-3-030-58452-8_13Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Linhui Dai, Hong Liu, Hao Tang, Zhiwei Wu, and Pinhao Song. 2023. AO2-DETR: Arbitrary-Oriented Object Detection Transformer. IEEE Trans. Circuits Syst. Video Technol. 33, 5 (May 2023), 2342–2356. https://doi.org/10.1109/TCSVT.2022.3222906Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bowei Du, Yecheng Huang, Jiaxin Chen, and Di Huang. 2023. Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023. 13435–13444. . https://doi.org/10.1109/CVPR52729.2023.01291Google ScholarGoogle ScholarCross RefCross Ref
  9. Wei Han, Jia Chen, Lizhe Wang, Ruyi Feng, Fengpeng Li, Lin Wu, Tian Tian, and Jining Yan. 2021. Methods for Small, Weak Object Detection in Optical High-Resolution Remote Sensing Images: A survey of advances and challenges. IEEE Geosci. Remote Sens. Mag. 9, 4 (December 2021), 8–34. https://doi.org/10.1109/MGRS.2020.3041450Google ScholarGoogle ScholarCross RefCross Ref
  10. Yecheng Huang, Jiaxin Chen, and Di Huang. 2022. UFPMP-Det:Toward Accurate and Efficient Object Detection on Drone Imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, June 28, 2022. 1026–1033. . https://doi.org/10.1609/aaai.v36i1.19986Google ScholarGoogle ScholarCross RefCross Ref
  11. Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, and Shanyue Guan. 2020. Density Map Guided Object Detection in Aerial Images. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020. 737–746. . https://doi.org/10.1109/CVPRW50498.2020.00103Google ScholarGoogle ScholarCross RefCross Ref
  12. Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, and Lionel M. Ni. 2023. Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, March 13, 2023. IEEE Press, 18558–18567. . Retrieved April 3, 2023 from http://arxiv.org/abs/2303.07335Google ScholarGoogle Scholar
  13. Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, and Lei Zhang. 2022. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 13609–13617. . https://doi.org/10.1109/CVPR52688.2022.01325Google ScholarGoogle ScholarCross RefCross Ref
  14. Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, and Junwei Han. 2020. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 159, (January 2020), 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023Google ScholarGoogle ScholarCross RefCross Ref
  15. Mengyuan Li, Changqing Cao, Zhejun Feng, Xiangkai Xu, Zengyan Wu, Shubing Ye, and Jiawei Yong. 2023. Remote Sensing Object Detection Based on Strong Feature Extraction and Prescreening Network. IEEE Geosci. Remote Sensing Lett. 20, (2023), 1–5. https://doi.org/10.1109/LGRS.2023.3236777Google ScholarGoogle ScholarCross RefCross Ref
  16. Qingyun Li, Yushi Chen, and Ying Zeng. 2022. Transformer with Transfer CNN for Remote-Sensing-Image Object Detection. Remote Sensing 14, 4 (January 2022), 984. https://doi.org/10.3390/rs14040984Google ScholarGoogle ScholarCross RefCross Ref
  17. Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, and Jian Yang. 2021. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection. 2021. 11632–11641. . https://doi.org/10.1109/CVPR46437.2021.01146Google ScholarGoogle ScholarCross RefCross Ref
  18. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (February 2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826Google ScholarGoogle ScholarCross RefCross Ref
  19. Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. 8759–8768. . https://doi.org/10.1109/CVPR.2018.00913Google ScholarGoogle ScholarCross RefCross Ref
  20. Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A ConvNet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 11966–11976. . https://doi.org/10.1109/CVPR52688.2022.01167Google ScholarGoogle ScholarCross RefCross Ref
  21. Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, and David Doermann. 2021. Oriented Object Detection with Transformer. https://doi.org/10.48550/arXiv.2106.03146Google ScholarGoogle ScholarCross RefCross Ref
  22. Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. https://doi.org/10.48550/arXiv.1804.02767Google ScholarGoogle ScholarCross RefCross Ref
  23. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (June 2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, and Ming-Hsuan Yang. 2022. An Extendable, Efficient and Effective Transformer-based Object Detector. April 17, 2022. arXiv. . https://doi.org/10.48550/arXiv.2204.07962Google ScholarGoogle ScholarCross RefCross Ref
  25. Susquehanna University, Selinsgrove, Pennsylvania, USA and Edisanter Lo. 2019. Target Detection Algorithms in Hyperspectral Imaging Based on Discriminant Analysis. JOIG 7, 4 (2019), 140–144. https://doi.org/10.18178/joig.7.4.140-144Google ScholarGoogle ScholarCross RefCross Ref
  26. Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. 9626–9635. . https://doi.org/10.1109/ICCV.2019.00972Google ScholarGoogle ScholarCross RefCross Ref
  27. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), December 04, 2017, Red Hook, NY, USA. Curran Associates Inc., Red Hook, NY, USA, 6000–6010. .Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. 2020. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020. 1571–1580. . https://doi.org/10.1109/CVPRW50498.2020.00203Google ScholarGoogle ScholarCross RefCross Ref
  29. Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. RepPoints: Point Set Representation for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. 9656–9665. . https://doi.org/10.1109/ICCV.2019.00975Google ScholarGoogle ScholarCross RefCross Ref
  30. Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, and Jindong Chen. 2022. DETR++: Taming Your Multi-Scale Detection Transformer. https://doi.org/10.48550/arXiv.2206.02977Google ScholarGoogle ScholarCross RefCross Ref
  31. Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. July 11, 2022. . https://doi.org/10.48550/arXiv.2203.03605Google ScholarGoogle ScholarCross RefCross Ref
  32. Yan Zhang, Xi Liu, Shiyun Wa, Shuyu Chen, and Qin Ma. 2022. GANsformer: A Detection Network for Aerial Images with High Performance Combining Convolutional Network and Transformer. Remote Sensing 14, 4 (January 2022), 923. https://doi.org/10.3390/rs14040923Google ScholarGoogle ScholarCross RefCross Ref
  33. Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. 2022. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44, 11 (November 2022), 7380–7399. https://doi.org/10.1109/TPAMI.2021.3119563Google ScholarGoogle ScholarCross RefCross Ref
  34. Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In arXiv e-prints, 2021. arXiv. . https://doi.org/10.48550/arXiv.2010.04159Google ScholarGoogle ScholarCross RefCross Ref
  35. Yixing Zhu, Jun Du, and Xueqing Wu. 2020. Adaptive Period Embedding for Representing Oriented Objects in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing 58, 10 (October 2020), 7247–7257. https://doi.org/10.1109/TGRS.2020.2981203Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. D2ETR: A Decoupled DETR for Efficient Detection in Aerial Images

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
        January 2024
        480 pages
        ISBN:9798400716720
        DOI:10.1145/3647649

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 May 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)11
        • Downloads (Last 6 weeks)11

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format