Skip to main content
Log in

Weakly supervised object detection from remote sensing images via self-attention distillation and instance-aware mining

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Weakly supervised object detection (WSOD) is an effective method to train object detectors using only image-level category labels, and has been concerned in the field of remote sensing image processing due to its inexpensive annotation cost. Without object-level labels, however, WSOD methods are prone to detect discriminative object parts. Furthermore, remote sensing images often contain objects with varying scales, or dense objects positioned in extremely close proximity, which makes the detection task more challenging. To address these issues, we propose a new weakly supervised learning based detector with several elaborate designs. Our detector employs a feature pyramid network with the adaptive pooling block, which can be helpful to effectively handle objects with different scales. In the meantime, a layer-wise self-attention distillation (SAD) module is performed in our framework to improve feature representation and prevent the detector focusing on the most discriminative object parts. Moreover, an instance-aware mining (IAM) algorithm is proposed to generate more precise pseudo-labels, and thereby alleviating the problem that adjacent small targets may be detected as a single object. Since SAD utilizes the feature-level knowledge distillation, and iterative refinement with IAM is a kind of instance-level knowledge distillation, they are effectively complementary to each other. We have evaluated the proposed method on popular benchmarks, including NWPU VHR-10 and DIOR, and the experimental results show that it can locate objects with more accurate bounding boxes for remote sensing images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available in following WEB links.

1. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit

2. http://www.escience.cn/people/gongcheng/DIOR.html”.

References

  1. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2005, San Diego, CA, USA, 21–23; Volume 1, pp. 886–893

  2. Cheng G, Han J, Zhou P, Gou L (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98:119–132

    Article  Google Scholar 

  3. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2005, San Diego, CA, USA, 21–23; Volume 2, pp. 524–531

  4. Xu S, Fang T, Li D, Wang S (2010) Object classification of aerial images with bag-of-visual words. IEEE Geosci Remote Sens Lett 7(2):366–370

    Article  Google Scholar 

  5. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust Face Recognition via Sparse Representation. IEEE Trans Pattern Anal Mach Intell 31:210–227

    Article  Google Scholar 

  6. Yokoya N, Iwasaki N (2015) Object detection based on sparse representation and Hough voting for optical remote sensing imagery. IEEE J Sel Topics Appl Earth Observ Remote Sens 8(5):2053–2062

    Article  Google Scholar 

  7. Tao C, Tan Y, Cai H, Tian J (2011) Airport detection from large IKONOS images using clustered SIFT keypoints and region information. IEEE Geosci Remote Sens Lett 8(1):128–132

    Article  Google Scholar 

  8. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    Article  Google Scholar 

  9. Freund Y (1995) Boosting a Weak Learning Algorithm by Majority. Inf Comput 121:256–285

    Article  MathSciNet  Google Scholar 

  10. Lafferty J, Mccallum A, Pereira FCN, Fper FP (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, 28; pp. 282–289

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In Proc. NIPS, pp. 1097–1105

  12. Xiao Y, Liu B, Hao Z, Cao L (2014) A Similarity-Based Classification Framework for Multiple-Instance Learning. IEEE T Cybernetics

  13. Chen S, Zhan R, Zhang J (2018) Geospatial object detection in remote sensing imagery based on multiscale single-shot detector with activated semantics. Remote Sens 10(6):820

    Article  Google Scholar 

  14. Fu K, Chen Z, Zhang Y, Sun X (2019) Enhanced feature representation in detection for optical remote sensing images. Remote Sens 11(18):2095

    Article  Google Scholar 

  15. Sun P, Chen G, Shang Y (2020) Adaptive saliency biased loss for object detection in aerial images. IEEE Geosci Remote 58(10):7154–7165

    Article  Google Scholar 

  16. Wang P, Sun X, Diao W, Fu K (2020) FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE T Geosci Remote 58(5):3377–3390

    Article  Google Scholar 

  17. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Machine Intell 39:1137–1149

    Article  Google Scholar 

  18. Han X, Zhong Y, Zhang L (2017) An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens 9(7):666

    Article  Google Scholar 

  19. Guo W, Yang W, Zhang H, Hua G (2018) Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sens 10:131

    Article  Google Scholar 

  20. Yun R, Zhu C, Xiao S (2018) Deformable Faster R-CNN with Aggregating Multi-Layer Features for Partially Occluded Object Detection in Optical Remote Sensing Images. Remote Sens 10(9):1470

    Article  Google Scholar 

  21. Hou J, Ma H, Wang S (2020) Parallel Cascade R-CNN for Object Detection in Remote Sensing Imagery. JPCS 1544:012124

    Google Scholar 

  22. Yan J, Wang H, Yan M, Diao W, Sun X, Li H (2019) IoU adaptive deformable R-CNN: Make full use of IoU for multi-class object detection in remote sensing imagery. Remote Sens 11:286

    Article  Google Scholar 

  23. Qiu H, Li H, Wu Q, Meng F, Shi H (2019) A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images. Remote Sens 11:1594

    Article  Google Scholar 

  24. Zhang X, Zhu K, Chen G, Tan X, Gong Y (2019) Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid Network. Remote Sens 11(7):755

    Article  Google Scholar 

  25. Li K, Cheng G, Bu S, You X (2018) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE T Geosci Remote 56:2337–2348

    Article  Google Scholar 

  26. Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA; pp. 2846–2854

  27. Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA; pp. 3059–3067

  28. Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) PCL: Proposal Cluster Learning for Weakly Supervised Object Detection. IEEE Trans Pattern Anal Mach Intell 42:176–191

    Article  Google Scholar 

  29. Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In Proceedings of ICCV

  30. Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self-distillation for weakly-supervised object detection. In Proceedings of NIPS

  31. Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object Instance Mining for Weakly Supervised Object Detection. In Proceedings of AAAI

  32. Ren Z, Yu Z, Yang X, Liu M, Lee YJ, Schwing AG, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In Proceedings of CVPR

  33. Zhang F, Du B, Zhang L, Xu M (2016) Weakly supervised learning based on coupled convolutional neural networks for aircraft detection. IEEE Trans Geosci Remote Sens 54(9):5553–5563

    Article  Google Scholar 

  34. Ji J, Zhang T, Yang Z, Jiang L, Zhong W, Xiong H (2019) Aircraft Detection from Remote Sensing Image Based on A Weakly Supervised Attention Model. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan; pp. 322–325

  35. Xu J, Wan S, Jin P, Tian Q (2019) An active region corrected method for weakly supervised aircraft detection in remote sensing images. In Proceedings of the Eleventh International Conference on Digital Image Processing, Guangzhou, China, 11–13; Volume 11179, p. 111792H

  36. Wu X, Hong D, Tian J, Kieft R, Tao R (2019) A weakly-supervised deep network for DSM-aided vehicle detection. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2; pp. 1318–1321

  37. Zhao W, Ma W, Jiao L, Chen P, Yang S, Hou B (2019) Multi-Scale Image Block-Level F-CNN for Remote Sensing Images Object Detection. IEEE Access 7:43607–43621

    Article  Google Scholar 

  38. Chen S, Shao D, Shu X, Zhang C, Wang J (2020) FCC-Net: A Full-Coverage Collaborative Network for Weakly Supervised Remote Sensing Object Detection. Electronics 9:1356

    Article  Google Scholar 

  39. Wang H, Li H, Qian W, Diao W, Zhao L, Zhang J, Zhang D (2021) Dynamic Pseudo-Label Generation for Weakly Supervised Object Detection in Remote Sensing Images. Remote Sens 13:1461

    Article  Google Scholar 

  40. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML

  41. Zhu F, Li H, Ouyang W, Yu N, Wang X (2017) Learning spatial regularization with image level supervisions for multi-label image classification. In CVPR

  42. Chu X, Yang W, Ouyang W, Ma C, Yuille A. L. and Wang X (2017) Multi-context attention for human pose estimation. In CVPR

  43. Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. In Proceedings of CVPR

  44. Li C, Xu C, Cui Z, Wang D, Zhang T, Yang J (2017) Feature-Attentioned Object Detection in Remote Sensing Imagery. In Proceedings of ICIP

  45. Zhang G, Lu S, Zhang W (2019) CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Trans Geoence Remote Sens 57(12):10015–10024

    Article  Google Scholar 

  46. Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In Proceedings of NIPS

  47. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In Proceedings of CVPR

  48. Chen G, Choi W, Xiang Y, Han T. and Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In Proceedings of NIPS

  49. Liu X, Yang H, Ravichandran A, Bhotika R. and Soatto S (2020) Continual universal object detection. arXiv:2002.05347

  50. Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured Knowledge Distillation for Semantic Segmentation. In Proceedings of CVPR

  51. Chen Z, Fu Z, Huang J, Tao M, Jiang R, Tian X, Chen Y, Hua X (n.d.) Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection. arXiv:2204.06899v1

  52. Lin T.Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of CVPR

  53. Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit (accessed on 10 September 2022)

  54. Li K, Wan G, Cheng G et al (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J Photogrammetry Remote Sens 159:296–307

    Article  Google Scholar 

  55. Cheng, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415

    Article  Google Scholar 

  56. Dong B, Huang Z, Guo Y, Wang Q, Niu Z and Zuo W (2021) Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters. In Proceedings of CVPR

  57. Liu S, Qi L, Qin H, Sh J and Jia J (2018) Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  58. Li Y, Zhang Y, Huang X, Alan L, Yuille (2018) Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images. ISPRS J Photogrammetry Remote Sens 146:182–196

    Article  Google Scholar 

  59. Yao X, Feng X, Han J, Cheng G, Guo L (2021) Automatic Weakly Supervised Object Detection From High Spatial Resolution Remote Sensing Images via Dynamic Curriculum Learning. IEEE Trans Geosci Remote Sens 59(1):675–685

    Article  Google Scholar 

  60. Wang G, Zhang X, Peng Z, Tang X, Zhou H, Jiao L (2022) Absolute Wrong Makes Better: Boosting Weakly Supervised Object Detection via Negative Deterministic Information. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI)

Download references

Acknowledgments

We would like to thank all reviewers and editors for their constructive comments for this study.

Funding

This work was supported by the National Natural Science Foundation of China (62172229), the Natural Science Foundation of Jiangsu Province (BK20211294, BK20211295) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No.SJCX22_0996).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guowei Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, P., Zhou, S., Wang, L. et al. Weakly supervised object detection from remote sensing images via self-attention distillation and instance-aware mining. Multimed Tools Appl 83, 39073–39095 (2024). https://doi.org/10.1007/s11042-023-17237-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17237-1

Keywords

Navigation