skip to main content
research-article

Towards Accurate Oriented Object Detection in Aerial Images with Adaptive Multi-level Feature Fusion

Published: 05 January 2023 Publication History

Abstract

Detecting objects in aerial images is a long-standing and challenging problem since the objects in aerial images vary dramatically in size and orientation. Most existing neural network based methods are not robust enough to provide accurate oriented object detection results in aerial images since they do not consider the correlations between different levels and scales of features. In this paper, we propose a novel two-stage network-based detector with adaptive feature fusion towards highly accurate oriented object detection in aerial images, named AFF-Det. First, a multi-scale feature fusion module (MSFF) is built on the top layer of the extracted feature pyramids to mitigate the semantic information loss in the small-scale features. We also propose a cascaded oriented bounding box regression method to transform the horizontal proposals into oriented ones. Then the transformed proposals are assigned to all feature pyramid network (FPN) levels and aggregated by the weighted RoI feature aggregation (WRFA) module. The above modules can adaptively enhance the feature representations in different stages of the network based on the attention mechanism. Finally, a rotated decoupled-RCNN head is introduced to obtain the classification and localization results. Extensive experiments are conducted on the DOTA and HRSC2016 datasets to demonstrate the advantages of our proposed AFF-Det. The best detection results can achieve 80.73% mAP and 90.48% mAP, respectively, on these two datasets, outperforming recent state-of-the-art methods.

References

[1]
Seyed Majid Azimi, Eleonora Vig, Reza Bahmanyar, Marco Körner, and Peter Reinartz. 2018. Towards multi-class object detection in unconstrained remote sensing imagery. In Asian Conference on Computer Vision. 150–165.
[2]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6154–6162.
[3]
Zhiming Chen, Kean Chen, Weiyao Lin, John See, Hui Yu, Yan Ke, and Cong Yang. 2020. PIoU loss: Towards accurate oriented object detection in complex environments. In European Conference on Computer Vision. 195–211.
[4]
Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, and Qikai Lu. 2019. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2849–2858.
[5]
Kun Fu, Zhonghan Chang, Yue Zhang, and Xian Sun. 2021. Point-based estimator for arbitrary-oriented object detection in aerial images. IEEE Transactions on Geoscience and Remote Sensing 59, 5 (2021), 4370–4387.
[6]
Jiaming Han, Jian Ding, Jie Li, and Gui-Song Xia. 2021. Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing (2021), 1–11.
[7]
Jiaming Han, Jian Ding, Nan Xue, and Gui-Song Xia. 2021. ReDet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2786–2795.
[8]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[10]
Ruyi Ji, Zeyu Liu, Libo Zhang, Jianwei Liu, Xin Zuo, Yanjun Wu, Chen Zhao, Haofeng Wang, and Lin Yang. 2021. Multi-peak graph-based multi-instance learning for weakly supervised object detection. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 2s (2021), 1–21.
[11]
Chengzheng Li, Chunyan Xu, Zhen Cui, Dan Wang, Tong Zhang, and Jian Yang. 2019. Feature-attentioned object detection in remote sensing imagery. In IEEE International Conference on Image Processing. 3886–3890.
[12]
Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai. 2018. Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5909–5918.
[13]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.
[14]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.
[15]
Youtian Lin, Pengming Feng, and Jian Guan. 2019. IENet: Interacting embranchment one stage anchor free detector for orientation aerial object detection. arXiv preprint arXiv:1912.00969 (2019).
[16]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. 21–37.
[17]
Xiaofan Luo, Fukoeng Wong, and Haifeng Hu. 2020. FIN: Feature integrated network for object detection. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 2 (2020), 1–18.
[18]
Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 20, 11 (2018), 3111–3122.
[19]
Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Hongwei Zhang, and Linhao Li. 2021. Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2355–2363.
[20]
Xingjia Pan, Yuqiang Ren, Kekai Sheng, Weiming Dong, Haolei Yuan, Xiaowei Guo, Chongyang Ma, and Changsheng Xu. 2020. Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11207–11216.
[21]
Wen Qian, Xue Yang, Silong Peng, Junchi Yan, and Yue Guo. 2021. Learning modulated loss for rotated object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2458–2466.
[22]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[23]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91–99.
[24]
Xiangbo Shu, Jiawen Yang, Rui Yan, and Yan Song. 2022. Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[25]
Peng Sun, Guang Chen, and Yi Shang. 2020. Adaptive saliency biased loss for object detection in aerial images. IEEE Transactions on Geoscience and Remote Sensing 58, 10 (2020), 7154–7165.
[26]
Chao Tong, Baoyu Liang, Mengze Zhang, Rongshan Chen, Arun Kumar Sangaiah, Zhigao Zheng, Tao Wan, Chenyang Yue, and Xinyi Yang. 2020. Pulmonary nodule detection based on ISODATA-improved faster RCNN and 3D-CNN with focal loss. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–9.
[27]
Jinwang Wang, Jian Ding, Haowen Guo, Wensheng Cheng, Ting Pan, and Wen Yang. 2019. Mask OBB: A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote Sensing 11, 24 (2019), 2930.
[28]
J. Wang, W. Yang, H. Li, H. Zhang, and G. Xia. 2021. Learning center probability map for detecting objects in aerial images. IEEE Transactions on Geoscience and Remote Sensing 59, 5 (2021), 4307–4323.
[29]
Peijin Wang, Xian Sun, Wenhui Diao, and Kun Fu. 2019. FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 58, 5 (2019), 3377–3390.
[30]
Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, and Yun Fu. 2020. Rethinking classification and localization for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10186–10195.
[31]
Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liangpei Zhang. 2018. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3974–3983.
[32]
Chunyan Xu, Chengzheng Li, Zhen Cui, Tong Zhang, and Jian Yang. 2020. Hierarchical semantic propagation for object detection in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 58, 6 (2020), 4353–4364.
[33]
Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, and Xiang Bai. 2021. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 4 (2021), 1452–1459.
[34]
Xue Yang, Liping Hou, Yue Zhou, Wentao Wang, and Junchi Yan. 2021. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15819–15829.
[35]
Xue Yang and Junchi Yan. 2020. Arbitrary-oriented object detection with circular smooth label. In European Conference on Computer Vision. 677–694.
[36]
Xue Yang, Junchi Yan, Ziming Feng, and Tao He. 2021. R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3163–3171.
[37]
Xue Yang, Junchi Yan, Qi Ming, Wentao Wang, Xiaopeng Zhang, and Qi Tian. 2021. Rethinking rotated object detection with Gaussian Wasserstein distance loss. In Proceedings of the International Conference on Machine Learning, Vol. 139. 11830–11841.
[38]
Xue Yang, Junchi Yan, Xiaokang Yang, Jin Tang, Wenlong Liao, and Tao He. 2020. SCRDet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. arXiv preprint arXiv:2004.13316. (2020).
[39]
Xue Yang, Jirui Yang, Junchi Yan, Yue Zhang, Tengfei Zhang, Zhi Guo, Xian Sun, and Kun Fu. 2019. SCRDet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision. 8232–8241.
[40]
Xue Yang, Xiaojiang Yang, Jirui Yang, Qi Ming, Wentao Wang, Qi Tian, and Junchi Yan. 2021. Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. arXiv preprint arXiv:2106.01883. (2021).
[41]
Xue Yang, Yue Zhou, and Junchi Yan. 2021. AlphaRotate: A rotation detection benchmark using TensorFlow. (2021). https://github.com/yangxue0827/RotationDetection.
[42]
Jingru Yi, Pengxiang Wu, Bo Liu, Qiaoying Huang, Hui Qu, and Dimitris Metaxas. 2021. Oriented object detection in aerial images with box boundary-aware vectors. In IEEE Winter Conference on Applications of Computer Vision. 2150–2159.
[43]
Gongjie Zhang, Shijian Lu, and Wei Zhang. 2019. CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 57, 12 (2019), 10015–10024.
[44]
Xiangrong Zhang, Guanchun Wang, Peng Zhu, Tianyang Zhang, Chen Li, and Licheng Jiao. 2021. GRS-Det: An anchor-free rotation ship detector based on Gaussian-mask in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 59, 4 (2021), 3518–3531.
[45]
Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. 2019. Objects as points. arXiv preprint arXiv:1904.07850. (2019).
[46]
Y. Zhu, J. Du, and X. Wu. 2020. Adaptive period embedding for representing oriented objects in aerial images. IEEE Transactions on Geoscience and Remote Sensing 58, 10 (2020), 7247–7257.

Cited By

View all
  • (2024)HVConv: Horizontal and Vertical Convolution for Remote Sensing Object DetectionRemote Sensing10.3390/rs1611188016:11(1880)Online publication date: 24-May-2024
  • (2024)Multi-Scale Feature Attention Fusion for Image Splicing Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369877021:1(1-20)Online publication date: 7-Oct-2024
  • (2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
  • Show More Cited By

Index Terms

  1. Towards Accurate Oriented Object Detection in Aerial Images with Adaptive Multi-level Feature Fusion

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
      January 2023
      505 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572858
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 January 2023
      Online AM: 18 February 2022
      Accepted: 21 January 2022
      Revised: 18 December 2021
      Received: 28 June 2021
      Published in TOMM Volume 19, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Remote sensing images
      2. aerial images
      3. oriented object detection
      4. convolutional neural network

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Key Research and Development Program of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)234
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)HVConv: Horizontal and Vertical Convolution for Remote Sensing Object DetectionRemote Sensing10.3390/rs1611188016:11(1880)Online publication date: 24-May-2024
      • (2024)Multi-Scale Feature Attention Fusion for Image Splicing Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369877021:1(1-20)Online publication date: 7-Oct-2024
      • (2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
      • (2024)Composite Perception and Multiscale Fusion Network for Arbitrary-Oriented Object Detection in Remote Sensing ImageryIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.348655962(1-16)Online publication date: 2024
      • (2024)Enhancing Vehicle Detection in Aerial Images Through Improved YOLOv82024 4th Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS)10.1109/ACCTCS61748.2024.00038(173-178)Online publication date: 24-Feb-2024
      • (2024)Dynamic weighting label assignment for oriented object detectionMemetic Computing10.1007/s12293-024-00427-116:3(285-297)Online publication date: 2-Aug-2024
      • (2023)On-Board Multi-Class Geospatial Object Detection Based on Convolutional Neural Network for High Resolution Remote Sensing ImagesRemote Sensing10.3390/rs1516396315:16(3963)Online publication date: 10-Aug-2023
      • (2023)E-detector: Asynchronous Spatio-temporal for Event-based Object Detection in Intelligent Transportation SystemACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358436120:2(1-20)Online publication date: 27-Sep-2023
      • (2023)CABDet: context-and-attention-based detector for small object detection in remote sensing imagesJournal of Applied Remote Sensing10.1117/1.JRS.17.04451517:04Online publication date: 1-Oct-2023
      • (2023)A Highly Compressed Accelerator With Temporal Optical Flow Feature Fusion and Tensorized LSTM for Video Action Recognition on Terminal DeviceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.324111342:10(3129-3142)Online publication date: 1-Oct-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media