skip to main content
10.1145/3573910.3573919acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicraiConference Proceedingsconference-collections
research-article

Multi-Scale Deformable Context Fusion Pyramid Network for Object Detection in Aerial Images

Published: 20 January 2023 Publication History

Abstract

In recent years, although many methods have made good progress in the object detection of aerial images, it is still difficult to improve the detection accuracy due to the large scale variation, exceptional aspect ratios and a large number of small and dense objects in aerial images. In this paper, we propose a Multi-Scale Deformable Context Fusion Pyramid Network (Multi-Scale DCFPN) for the object detection of aerial images. In the proposed Multi-Scale DCFPN, we design a Multi-Scale Deformable Context Fusion (MDCF) module, which injects context information into Feature Pyramid Network (FPN) in a residual manner. In particularly, MDCF module consists of a set of parallel Adaptive Deformable Context (ADC) module with special dilated rates and Adaptive Context Fusion (ACF) module. ADC module focus on the objects with exceptional aspect ratios by introducing a deformable paradigm, and ACF module adaptively combines the multi-scale context instead of summation. To verify the effectiveness of the proposed method, we execute some comparative experiments on aerial public datasets NWPU VHR-10 and DIOR, and the mAP increased by 3.5 and 1.0 points compared with the baseline, respectively. The experimental results also show that the proposed method outperforms the state-of-the-art methods.

References

[1]
Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietik¨ainen. 2020. Deep learning for generic object detection: A survey. International journal of computer vision 128, 2 (2020), 261–318.
[2]
Gong Cheng and Junwei Han. 2016. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote sensing 117 (2016), 11–28.
[3]
Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, and Junwei Han. 2020. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 159 (2020), 296–307.
[4]
Jiaming Han, Jian Ding, Nan Xue, and Gui-Song Xia. 2021. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2786–2795.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
[6]
Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, 2021. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14454–14463.
[7]
Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.
[8]
Gong Cheng, Yongjie Si, Hailong Hong, Xiwen Yao, and Lei Guo. 2020. Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geoscience and Remote Sensing Letters 18, 3 (2020), 431–435.
[9]
Zhipeng Deng, Hao Sun, Shilin Zhou, Juanping Zhao, Lin Lei, and Huanxin Zou. 2018. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS journal of photogrammetry and remote sensing 145 (2018), 3–22.
[10]
Xue Yang, Junchi Yan, Ziming Feng, and Tao He. 2021. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 3163–3171.
[11]
Gong Cheng, Peicheng Zhou, and Junwei Han. 2016. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 54, 12 (2016), 7405–7415.
[12]
Li, Ke, "Object detection in optical remote sensing images: A survey and a new benchmark." ISPRS Journal of Photogrammetry and Remote Sensing 159 (2020): 296-307.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[14]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.
[15]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764–773.
[16]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9308–9316.
[17]
Chaoxu Guo, Bin Fan, Qian Zhang, Shiming Xiang, and Chunhong Pan. 2020. Augfpn: Improving multi-scale feature learning for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12595–12604.
[18]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303–338.
[19]
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, 2019. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
[20]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[21]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
[22]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
[23]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.
[24]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162.
[25]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794–7803.
[26]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768.
[27]
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, 2019a. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4974–4983.
[28]
Kang Kim and Hee Seok Lee. 2020. Probabilistic anchor assignment with iou prediction for object detection. In European Conference on Computer Vision. Springer, 355–371.
[29]
Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, Wangmeng Zuo, and Ming-Ming Cheng. 2022. Localization distillation for object detection. arXiv preprint arXiv:2204.05957 (2022).

Index Terms

  1. Multi-Scale Deformable Context Fusion Pyramid Network for Object Detection in Aerial Images

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICRAI '22: Proceedings of the 8th International Conference on Robotics and Artificial Intelligence
    November 2022
    89 pages
    ISBN:9781450397544
    DOI:10.1145/3573910
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Aerial images
    2. Context fusion
    3. Deformable paradigm
    4. Object detection

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICRAI 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 85
      Total Downloads
    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media