Abstract
Weakly supervised object detection (WSOD) is an effective method to train object detectors using only image-level category labels, and has been concerned in the field of remote sensing image processing due to its inexpensive annotation cost. Without object-level labels, however, WSOD methods are prone to detect discriminative object parts. Furthermore, remote sensing images often contain objects with varying scales, or dense objects positioned in extremely close proximity, which makes the detection task more challenging. To address these issues, we propose a new weakly supervised learning based detector with several elaborate designs. Our detector employs a feature pyramid network with the adaptive pooling block, which can be helpful to effectively handle objects with different scales. In the meantime, a layer-wise self-attention distillation (SAD) module is performed in our framework to improve feature representation and prevent the detector focusing on the most discriminative object parts. Moreover, an instance-aware mining (IAM) algorithm is proposed to generate more precise pseudo-labels, and thereby alleviating the problem that adjacent small targets may be detected as a single object. Since SAD utilizes the feature-level knowledge distillation, and iterative refinement with IAM is a kind of instance-level knowledge distillation, they are effectively complementary to each other. We have evaluated the proposed method on popular benchmarks, including NWPU VHR-10 and DIOR, and the experimental results show that it can locate objects with more accurate bounding boxes for remote sensing images.
Similar content being viewed by others
Data availability
The datasets analysed during the current study are available in following WEB links.
1. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit
References
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2005, San Diego, CA, USA, 21–23; Volume 1, pp. 886–893
Cheng G, Han J, Zhou P, Gou L (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98:119–132
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2005, San Diego, CA, USA, 21–23; Volume 2, pp. 524–531
Xu S, Fang T, Li D, Wang S (2010) Object classification of aerial images with bag-of-visual words. IEEE Geosci Remote Sens Lett 7(2):366–370
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust Face Recognition via Sparse Representation. IEEE Trans Pattern Anal Mach Intell 31:210–227
Yokoya N, Iwasaki N (2015) Object detection based on sparse representation and Hough voting for optical remote sensing imagery. IEEE J Sel Topics Appl Earth Observ Remote Sens 8(5):2053–2062
Tao C, Tan Y, Cai H, Tian J (2011) Airport detection from large IKONOS images using clustered SIFT keypoints and region information. IEEE Geosci Remote Sens Lett 8(1):128–132
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Freund Y (1995) Boosting a Weak Learning Algorithm by Majority. Inf Comput 121:256–285
Lafferty J, Mccallum A, Pereira FCN, Fper FP (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, 28; pp. 282–289
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In Proc. NIPS, pp. 1097–1105
Xiao Y, Liu B, Hao Z, Cao L (2014) A Similarity-Based Classification Framework for Multiple-Instance Learning. IEEE T Cybernetics
Chen S, Zhan R, Zhang J (2018) Geospatial object detection in remote sensing imagery based on multiscale single-shot detector with activated semantics. Remote Sens 10(6):820
Fu K, Chen Z, Zhang Y, Sun X (2019) Enhanced feature representation in detection for optical remote sensing images. Remote Sens 11(18):2095
Sun P, Chen G, Shang Y (2020) Adaptive saliency biased loss for object detection in aerial images. IEEE Geosci Remote 58(10):7154–7165
Wang P, Sun X, Diao W, Fu K (2020) FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE T Geosci Remote 58(5):3377–3390
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Machine Intell 39:1137–1149
Han X, Zhong Y, Zhang L (2017) An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens 9(7):666
Guo W, Yang W, Zhang H, Hua G (2018) Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sens 10:131
Yun R, Zhu C, Xiao S (2018) Deformable Faster R-CNN with Aggregating Multi-Layer Features for Partially Occluded Object Detection in Optical Remote Sensing Images. Remote Sens 10(9):1470
Hou J, Ma H, Wang S (2020) Parallel Cascade R-CNN for Object Detection in Remote Sensing Imagery. JPCS 1544:012124
Yan J, Wang H, Yan M, Diao W, Sun X, Li H (2019) IoU adaptive deformable R-CNN: Make full use of IoU for multi-class object detection in remote sensing imagery. Remote Sens 11:286
Qiu H, Li H, Wu Q, Meng F, Shi H (2019) A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images. Remote Sens 11:1594
Zhang X, Zhu K, Chen G, Tan X, Gong Y (2019) Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid Network. Remote Sens 11(7):755
Li K, Cheng G, Bu S, You X (2018) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE T Geosci Remote 56:2337–2348
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA; pp. 2846–2854
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA; pp. 3059–3067
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) PCL: Proposal Cluster Learning for Weakly Supervised Object Detection. IEEE Trans Pattern Anal Mach Intell 42:176–191
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In Proceedings of ICCV
Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self-distillation for weakly-supervised object detection. In Proceedings of NIPS
Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object Instance Mining for Weakly Supervised Object Detection. In Proceedings of AAAI
Ren Z, Yu Z, Yang X, Liu M, Lee YJ, Schwing AG, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In Proceedings of CVPR
Zhang F, Du B, Zhang L, Xu M (2016) Weakly supervised learning based on coupled convolutional neural networks for aircraft detection. IEEE Trans Geosci Remote Sens 54(9):5553–5563
Ji J, Zhang T, Yang Z, Jiang L, Zhong W, Xiong H (2019) Aircraft Detection from Remote Sensing Image Based on A Weakly Supervised Attention Model. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan; pp. 322–325
Xu J, Wan S, Jin P, Tian Q (2019) An active region corrected method for weakly supervised aircraft detection in remote sensing images. In Proceedings of the Eleventh International Conference on Digital Image Processing, Guangzhou, China, 11–13; Volume 11179, p. 111792H
Wu X, Hong D, Tian J, Kieft R, Tao R (2019) A weakly-supervised deep network for DSM-aided vehicle detection. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2; pp. 1318–1321
Zhao W, Ma W, Jiao L, Chen P, Yang S, Hou B (2019) Multi-Scale Image Block-Level F-CNN for Remote Sensing Images Object Detection. IEEE Access 7:43607–43621
Chen S, Shao D, Shu X, Zhang C, Wang J (2020) FCC-Net: A Full-Coverage Collaborative Network for Weakly Supervised Remote Sensing Object Detection. Electronics 9:1356
Wang H, Li H, Qian W, Diao W, Zhao L, Zhang J, Zhang D (2021) Dynamic Pseudo-Label Generation for Weakly Supervised Object Detection in Remote Sensing Images. Remote Sens 13:1461
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML
Zhu F, Li H, Ouyang W, Yu N, Wang X (2017) Learning spatial regularization with image level supervisions for multi-label image classification. In CVPR
Chu X, Yang W, Ouyang W, Ma C, Yuille A. L. and Wang X (2017) Multi-context attention for human pose estimation. In CVPR
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. In Proceedings of CVPR
Li C, Xu C, Cui Z, Wang D, Zhang T, Yang J (2017) Feature-Attentioned Object Detection in Remote Sensing Imagery. In Proceedings of ICIP
Zhang G, Lu S, Zhang W (2019) CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Trans Geoence Remote Sens 57(12):10015–10024
Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In Proceedings of NIPS
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In Proceedings of CVPR
Chen G, Choi W, Xiang Y, Han T. and Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In Proceedings of NIPS
Liu X, Yang H, Ravichandran A, Bhotika R. and Soatto S (2020) Continual universal object detection. arXiv:2002.05347
Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured Knowledge Distillation for Semantic Segmentation. In Proceedings of CVPR
Chen Z, Fu Z, Huang J, Tao M, Jiang R, Tian X, Chen Y, Hua X (n.d.) Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection. arXiv:2204.06899v1
Lin T.Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of CVPR
Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit (accessed on 10 September 2022)
Li K, Wan G, Cheng G et al (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J Photogrammetry Remote Sens 159:296–307
Cheng, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415
Dong B, Huang Z, Guo Y, Wang Q, Niu Z and Zuo W (2021) Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters. In Proceedings of CVPR
Liu S, Qi L, Qin H, Sh J and Jia J (2018) Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Li Y, Zhang Y, Huang X, Alan L, Yuille (2018) Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images. ISPRS J Photogrammetry Remote Sens 146:182–196
Yao X, Feng X, Han J, Cheng G, Guo L (2021) Automatic Weakly Supervised Object Detection From High Spatial Resolution Remote Sensing Images via Dynamic Curriculum Learning. IEEE Trans Geosci Remote Sens 59(1):675–685
Wang G, Zhang X, Peng Z, Tang X, Zhou H, Jiao L (2022) Absolute Wrong Makes Better: Boosting Weakly Supervised Object Detection via Negative Deterministic Information. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI)
Acknowledgments
We would like to thank all reviewers and editors for their constructive comments for this study.
Funding
This work was supported by the National Natural Science Foundation of China (62172229), the Natural Science Foundation of Jiangsu Province (BK20211294, BK20211295) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No.SJCX22_0996).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, P., Zhou, S., Wang, L. et al. Weakly supervised object detection from remote sensing images via self-attention distillation and instance-aware mining. Multimed Tools Appl 83, 39073–39095 (2024). https://doi.org/10.1007/s11042-023-17237-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17237-1