Skip to main content
Log in

Object detection in aerial remote sensing images using bidirectional enhancement FPN and attention module with data augmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection for aerial remote sensing images is a foundation task in earth observation community. However, various challenges still exist in this field, including the varied appearances of targets to be detected, the complexity of image background and the expensive manual annotation. To tackle these problems, we proposed a Faster R-CNN based framework with several elaborate designs. Our detector employs a bidirectional enhancement feature pyramid network into the framework, which can improve multi-scale feature extraction so as to effectively handle objects with different sizes. In the meantime, an attention module is present to further suppress noisy background. Moreover, we augment training sets by using a count-guided deep descriptor transforming (CG-DDT) algorithm, which can automatically generate coarse object bounding boxes for images with only class label and per-class object count. We have evaluated the proposed method on popular aerial remote sensing benchmarks, i.e., NWPU VHR-10 and DOTA, and the experimental results show that it can accurately detect targets while reducing the cost of manual annotations during training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.

Similar content being viewed by others

References

  1. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, USA, 20-25 June

  2. Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft COCO:common objects in context, In Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 6-12 Septemebr

  3. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39:1137–1149

    Article  Google Scholar 

  4. Cheng X, Liu L, Song C (2021) A Cyclic Information–Interaction Model for Remote Sensing Image Segmentation. Remote Sens 13:3871

    Article  Google Scholar 

  5. Han X; Zhong Y; Zhang L (2017) An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens 9(7), Article No.666

  6. Yun R, Zhu C, Xiao S (2018) Deformable Faster R-CNN with Aggregating Multi-Layer Features for Partially Occluded Object Detection in Optical Remote Sensing Images. Remote Sens 10(9), Article No.1470

  7. Li K, Cheng G, Bu S, You X (2018) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE T Geosci Remote 56:2337–2348

    Article  Google Scholar 

  8. Guo W, Yang W, Zhang H, Hua G (2018) Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. REMOTE SENS 10, Article No.131

  9. Yan J, Wang H, Yan M, Diao W, Sun X, Li H (2019) IoUadaptive deformable R-CNN: Make full use of IoU for multi-class object detection in remote sensing imagery. REMOTE SENS 11, Article No.286

  10. Qiu H; Li H; Wu Q; Meng F; Shi H (2019) A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images. Remote Sens, 11, Article No.1594

  11. Zhang X; Zhu K; Chen G; Tan X; Gong Y (2019) Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid Network. Remote Sens 11(7), Article No. 755

  12. Hou J; Ma H; Wang S (2020) Parallel Cascade R-CNN for Object Detection in Remote Sensing Imagery. JPCS 1544, Article No. 012124

  13. Xu ZZ; Xu X; Wang L; Yang R; Pu F L (2017) Deformable ConvNet with aspect ratio constrained NMS for object detection in remote sensing imagery. Remote Sens 9(12), Article No. 1312.

  14. Azimi SM; Vig E; Bahmanyar R; Korner M; Reinartz P (2018) Towards multi-class object detection in unconstrained remote sensing imagery. In Proceedings of the 14th Asian Conference on Computer Vision, Perth, Australia, 2-6 December

  15. Wang JW; Ding J; Guo HW; Cheng W S; Pan T; Yang W (2019) Mask OBB: A Semantic Attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote Sens, 11(24), Article No. 2930

  16. Xu CY, Li CZ, Cui Z, Zhang T, Yang J (2020) Hierarchical semantic propagation for object detection in remote sensing imagery. IEEE T Geosci Remote 58(6):4353–4364

    Article  Google Scholar 

  17. Chen SQ; Zhan RH; Zhang J (2018) Geospatial object detection in remote sensing imagery based on multiscale single-shot detector with activated semantics. Remote Sens, 10(6), Article No. 820

  18. Fu K; Chen Z; Zhang Y; Sun X (2019) Enhanced feature representation in detection for optical remote sensing images. Remote Sens, 11(18), Article No. 2095

  19. Sun P, Chen G, Shang Y (2020) Adaptive saliency biased loss for object detection in aerial images. IEEE Geosci Remote 58(10):7154–7165

    Article  Google Scholar 

  20. Wang PJ, Sun X, Diao WH, Fu K (2020) FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE T Geosci Remote 58(5):3377–3390

    Article  Google Scholar 

  21. Xu K; Ba J; Kiros R; Cho K; Courville A; Salakhudinov R; Zemel R; Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France,6-11 July

  22. Zhu F; Li H; Ouyang W; Yu N; Wang X (2017) Learning spatial regularization with image level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 21-26 July

  23. Chu X; Yang W; Ouyang W; Ma C; Yuille AL; Wang X (2017) Multi-context attention for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 21-26 July

  24. Hu J; Shen L; Sun G (2017) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 21-26 July

  25. Zhang GJ, Lu SJ, Zhang W (2019) CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Geosci Remote 57(12):10015–10024

    Article  Google Scholar 

  26. Yang F; Li WT; Hu HW; Li WY; Wang P (2020) Multi-scale feature integrated attention-based rotation network for object detection in VHR aerial images. Sensors, 20(6), Article No.1686.

  27. Li CZ; Xu CY; Cui Z; Wang D; Zhang T; Yang J (2019) Featureattentioned object detection in remote sensing imagery. In Proceedings of the IEEE International Conference on Image Processing, Taipei, China,22-25 September

  28. Yang X; Yang JR; Yan JC; Zhang Y; Zhang TF; Guo Z (2019) SCRDet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 20-26 October

  29. Yang X; Yan J C; Yang X K; Tang J; Liao W L; He T. SCRDet ++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. arXiv preprint arXiv 2020, 2004.13316.

  30. Chen S, Shao D, Shu X, Zhang C, Wang J (2020) FCC-Net: A Full-Coverage Collaborative Network for Weakly Supervised Remote Sensing Object Detection. Electronics 9:1356

    Article  Google Scholar 

  31. Wei Z, Wenping M, Licheng J, Puhua C, Shuyuan Y, Biao H (2019) Multi-Scale Image Block-Level F-CNN for Remote Sensing Images Object Detection. IEEE Access 7:43607–43621

    Article  Google Scholar 

  32. Lin TY; Dollar P; Girshick R; He K; Hariharan B; Belongie S (2017) Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA,21-26 July

  33. Yang K; Li D; Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 20-26 October

  34. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object Detection in Optical Remote Sensing Images A Survey and A New Benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 15:296–307

    Article  Google Scholar 

  35. Wei X-S, Zhang C-L, Wu J, Shen C, Zhou Z-H (2019) Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming. Pattern Recogn 88:113–126

    Article  Google Scholar 

  36. Cheng G, Han JW, Zhou PC, Guo L (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing 98:119–132

    Article  Google Scholar 

  37. Xia GS; Bai X; Ding J; Zhu Z; Belongie S; Luo JB (2018) DOTA: A large-scale dataset for object detection in aerial images. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA,18-23 June

  38. Wei Y; Shen Z; Cheng B; Shi H; Xiong J; Feng J; Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In Proceedings of the European Conference on Computer Vision, Germany, Munich, 9-14 September

  39. Ding K; He G; Gu H; Zhong Z; Xiang S; Pan C (2022) Train in Dense and Test in Sparse: A Method for Sparse Object Detection in Aerial Images. IEEE Geoscience and Remote Sensing Letters 19, PP(99):1-5

  40. Wang W; Xie E; Song X; Zang Y; Wang W; Lu T; Yu G; Shen C (2019) Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  41. Yan D, Li G, Li X, Zhang H, Lei H, Lu K, Cheng M, Zhu F (2021) An Improved Faster R-CNN Method to Detect Tailings Ponds from High-Resolution Remote Sensing Images. Remote Sens 13:2052

    Article  Google Scholar 

  42. He K; Gkioxari G; Dollár P; Girshick R (2017) Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision

Download references

Acknowledgments

The authors would like to thank all reviewers and editors for their constructive comments for this study.

Funding

This work was supported by the National Natural Science Foundation of China (62172229), the Natural Science Foundation of Jiangsu Province (BK20211294, BK20211295), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX22_0996).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guowei Yang.

Ethics declarations

Conflicts of Interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, P., Yu, D. & Yang, G. Object detection in aerial remote sensing images using bidirectional enhancement FPN and attention module with data augmentation. Multimed Tools Appl 83, 38635–38656 (2024). https://doi.org/10.1007/s11042-023-16973-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16973-8

Keywords

Navigation