Skip to main content
Log in

RAOD: refined oriented detector with augmented feature in remote sensing images object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Object detection is a challenging task in remote sensing. Aerial images are distinguished by complex backgrounds, arbitrary orientations, and dense distributions. Considering those difficulties, this paper proposes a two-stage refined oriented detector with augmented features named RAOD. First, a novel Augmented Feature Pyramid Network (A-FPN) is built to enhance fusion both in spatial and channel dimensions. Specifically, it mainly consists of three modules: Scale Transfer Module (STM), Feature Aggregate Module (FAM) and Feature Refinement Module (FRM). STM reduces information loss when fusing features in the top-down pathway. FAM aggregates features from different scales. FRM aims to refine the integrated features using a lightweight attention module. Then, we adopt a two-step processing, which consists of a coarse stage and a refinement stage. In the coarse stage, deformable RoI pooling is adopted to improve the network’s ability of modeling spatial transformations and then horizontal proposals are transformed into oriented ones. In the refinement stage, Rotated RoI align (RRoI align) is used to extract rotation-invariant features from rotated RoIs and further optimize the localization. To enhance stability and robustness during training, smooth Ln is chosen as regression loss as it has better ability in terms of robustness and stability than smooth L1 loss. Extensive experiments on several rotation detection datasets demonstrate the effectiveness of our method. Results show that our method is able to achieve 79.78%, 74.7% and 94.82% on DOTA-v1.0, DOTA-v1.5 and HRSC2016, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://captain-whu.github.io/DOTA/

References

  1. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  2. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  3. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  4. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  5. Yang X, Liu Q, Yan J, Li A, Zhang Z, Yu G (2019) R3det: Refined single-stage detector with feature refinement for rotating object. arXiv:190805612 2(4)

  6. Han J, Ding J, Li J, Xia G-S (2021) Align deep features for oriented object detection. IEEE Trans Geosci Remote Sens

  7. Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K (2019) Scrdet: Towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8232–8241

  8. Qian W, Yang X, Peng S, Guo Y, Yan J (2019) Learning modulated loss for rotated object detection. arXiv:1911.08299

  9. Yang X, Yan J (2020) Arbitrary-oriented object detection with circular smooth label. In: European Conference on Computer Vision. Springer, pp 677–694

  10. Yang X, Hou L, Zhou Y, Wang W, Yan J (2021) Dense label encoding for boundary discontinuity free rotation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15819–15829

  11. Lowe D G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  12. Shu C, Ding X, Fang C (2011) Histogram of the oriented gradient for face recognition. Tsinghua Sci Technol 16(2):216–224

    Article  Google Scholar 

  13. Wang Z (2022) Automatic and robust hand gesture recognition by sdd features based model matching. Appl Intell:1–12

  14. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  16. Zhu D, Xia S, Zhao J, Zhou Y, Niu Q, Yao R, Chen Y (2021) Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection. Appl Intell:1–16

  17. Zhang K, Zeng Q, Yu X (2021) Rosd: Refined oriented staged detector for object detection in aerial image. IEEE Access 9:66560–66569

    Article  Google Scholar 

  18. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  19. Ding J, Xue N, Long Y, Xia G-S, Lu Q (2018) Learning roi transformer for detecting oriented objects in aerial images. arXiv:1812.00155

  20. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  21. Liu Y, Jin L (2017) Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1962–1969

  22. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  23. Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3974–3983

  24. Liu Z, Yuan L, Weng L, Yang Y (2017) A high resolution optical satellite image dataset for ship recognition and some new baselines. In: International conference on pattern recognition applications and methods, vol 2. SCITEPRESS, pp 324–331

  25. Xu Y, Fu M, Wang Q, Wang Y, Chen K, Xia G-S, Bai X (2020) Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans Pattern Anal Mach Intell 43 (4):1452–1459

    Article  Google Scholar 

  26. Qin R, Liu Q, Gao G, Huang D, Wang Y (2020) Mrdet: A multi-head network for accurate oriented object detection in aerial images. arXiv:2012.13135

  27. Han J, Ding J, Xue N, Xia G-S (2021) Redet: A rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2786–2795

  28. Yang X, Yan J, Yang X, Tang J, Liao W, He T (2020) Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. arXiv:2004.13316

  29. Yi J, Wu P, Liu B, Huang Q, Qu H, Metaxas D (2021) Oriented object detection in aerial images with box boundary-aware vectors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2150–2159

  30. Li W, Zhu J (2021) Oriented reppoints for aerial object detection. arXiv:2105.11111

  31. Ma T, Mao M, Zheng H, Gao P, Wang X, Han S, Ding E, Zhang B, Doermann D (2021) Oriented object detection with transformer. arXiv:2106.03146

  32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  33. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision. Springer, pp 213–229

  34. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  35. Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12595–12604

  36. Ghiasi G, Lin T-Y, Le Q V (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045

  37. Tan M, Pang R, Le Q V (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  38. Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv:2005.11475

  39. Luo Y, Cao X, Zhang J, Guo J, Shen H, Wang T, Feng Q (2021) Ce-fpn: Enhancing channel information for object detection. arXiv:2103.10643

  40. Ma J, Chen B (2020) Dual refinement feature pyramid networks for object detection. arXiv:2012.01733

  41. Zhang D, Zhang H, Tang J, Wang M, Hua X, Sun Q (2020) Feature pyramid transformer. In: European Conference on Computer Vision. Springer, pp 323–339

  42. Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28:2017–2025

    Google Scholar 

  43. Zhou Y, Ye Q, Qiu Q, Jiao J (2017) Oriented response networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 519–528

  44. Shi W, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1874–1883

  45. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 0–0

  46. Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J et al (2019) Mmdetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155

  47. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499

  48. Zhang G, Lu S, Zhang W (2019) Cad-net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024

    Article  Google Scholar 

  49. Pan X, Ren Y, Sheng K, Dong W, Yuan H, Guo X, Ma C, Xu C (2020) Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11207–11216

  50. Wang J, Yang W, Li H-C, Zhang H, Xia G-S (2020) Learning center probability map for detecting objects in aerial images. IEEE Trans Geosci Remote Sens 59(5):4307–4323

    Article  Google Scholar 

  51. He Z, Ren Z, Yang X, Yang Y, Zhang W (2021) Mead: a mask-guided anchor-free detector for oriented aerial object detection. Appl Intell:1–16

  52. Li C, Xu C, Cui Z, Wang D, Jie Z, Zhang T, Yang J (2019) Learning object-wise semantic representation for detection in remote sensing imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 20–27

  53. Guo Z, Liu C, Zhang X, Jiao J, Ji X, Ye Q (2021) Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8792–8801

  54. Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang M Y, Belongie S, Luo J, Datcu M, Pelillo M et al (2021) Object detection in aerial images: A large-scale benchmark and challenges. arXiv:2102.12219

  55. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  56. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  57. Zhang Z, Guo W, Zhu S, Yu W (2018) Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks. IEEE Geosci Remote Sens Lett 15(11):1745–1749

    Article  Google Scholar 

  58. Ming Q, Zhou Z, Miao L, Zhang H, Li L (2020) Dynamic anchor learning for arbitrary-oriented object detection. arXiv:2012.04150 1(2):6

  59. Song Q, Yang F, Yang L, Liu C, Hu M, Xia L (2020) Learning point-guided localization for detection in remote sensing images. IEEE J Sel Top Appl Earth Observ Remote Sens 14:1084–1094

    Article  Google Scholar 

Download references

Acknowledgements

The authors greatly appreciate the financial supports of the Shanghai Association for Science and Technology under Grant 17DZ1100808.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhu.

Ethics declarations

Conflict of Interests

The author(s) declared no conflicts of interest with respect to the research, authorship, and publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Q., Zhu, Y., Fang, C. et al. RAOD: refined oriented detector with augmented feature in remote sensing images object detection. Appl Intell 52, 15278–15294 (2022). https://doi.org/10.1007/s10489-022-03393-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03393-8

Keywords

Navigation