Skip to main content
Log in

Weakly supervised target detection in remote sensing images based on transferred deep features and negative bootstrapping

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Target detection in remote sensing images (RSIs) is a fundamental yet challenging problem faced for remote sensing images analysis. More recently, weakly supervised learning, in which training sets require only binary labels indicating whether an image contains the object or not, has attracted considerable attention owing to its obvious advantages such as alleviating the tedious and time consuming work of human annotation. Inspired by its impressive success in computer vision field, in this paper, we propose a novel and effective framework for weakly supervised target detection in RSIs based on transferred deep features and negative bootstrapping. On one hand, to effectively mine information from RSIs and improve the performance of target detection, we develop a transferred deep model to extract high-level features from RSIs, which can be achieved by pre-training a convolutional neural network model on a large-scale annotated dataset (e.g. ImageNet) and then transferring it to our task by domain-specifically fine-tuning it on RSI datasets. On the other hand, we integrate negative bootstrapping scheme into detector training process to make the detector converge more stably and faster by exploiting the most discriminative training samples. Comprehensive evaluations on three RSI datasets and comparisons with state-of-the-art weakly supervised target detection approaches demonstrate the effectiveness and superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://pan.baidu.com/s/1hqwzXeG.

References

  • Bosch, A., Zisserman, A., & Munoz, X. (2007). Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval (pp. 401–408).

  • Capobianco, L., Garzelli, A., & Camps-Valls, G. (2009). Target detection with semisupervised kernel orthogonal subspace projection. IEEE Transactions on Geoscience and Remote Sensing, 47(11), 3822–3833.

    Article  Google Scholar 

  • Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.

    Article  Google Scholar 

  • Cheng, G., Guo, L., Zhao, T., Han, J., Li, H., & Fang, J. (2013a). Automatic landslide detection from remote-sensing imagery using a scene classification method based on boVW and pLSA. International Journal of Remote Sensing, 34(1), 45–59.

    Article  Google Scholar 

  • Cheng, G., Han, J., Guo, L., & Liu, T. (2015a). Learning coarse-to-fine sparselets for efficient object detection and scene classification. In Proceedings of the 28th IEEE conference on computer vision and pattern recognition (pp. 1173–1181).

  • Cheng, G., Han, J., Guo, L., Liu, Z., Bu, S., & Ren, J. (2015b). Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 53(8), 4238–4249.

    Article  Google Scholar 

  • Cheng, G., Han, J., Guo, L., Qian, X., Zhou, P., Yao, X., et al. (2013b). Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS Journal of Photogrammetry and Remote Sensing, 85, 32–43.

    Article  Google Scholar 

  • Cheng, G., Han, J., Zhou, P., & Guo, L. (2014). Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing, 98, 119–132.

    Article  Google Scholar 

  • Cheng, G., Zhou, P., Han, J., Guo, L., & Han, J. (2015c). Auto-encoder-based shared mid-level visual dictionary learning for scene classification using very high resolution remote sensing images. IET Computer Vision, 9(5), 639–647.

    Article  Google Scholar 

  • Cramer, M. (2010). The DGPF-test on digital airborne camera evaluation—Overview and test design. Photogrammetrie-Fernerkundung-Geoinformation, 2, 73–82.

    Article  Google Scholar 

  • Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (pp. 1–2).

  • Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., & Fei-Fei, L. (2012). ImageNet large scale visual recognition competition 2012 (ILSVRC2012).

  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., & Tzeng, E., et al. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.

  • Feng, Y., Ren, J., & Jiang, J. (2011). Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications. IEEE Transactions on Broadcasting, 57(2 PART 2), 500–509.

    Article  Google Scholar 

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524.

  • Han, J., He, S., Qian, X., Wang, D., Guo, L., & Liu, T. (2013a). An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Transactions on Circuits and Systems for Video Technology, 23(12), 2009–2021.

    Article  Google Scholar 

  • Han, J., Ji, X., Hu, X., Zhu, D., Li, K., Jiang, X., et al. (2013b). Representing and retrieving video shots in human-centric brain imaging space. IEEE Transactions on Image Processing, 22(7), 2723–2736.

    Article  MathSciNet  Google Scholar 

  • Han, J., Ngan, K. N., Li, M., & Zhang, H.-J. (2006). Unsupervised extraction of visual attention objects in color images. IEEE Transactions on Circuits and Systems for Video Technology, 16(1), 141–145.

    Article  Google Scholar 

  • Han, J., Zhang, D., Cheng, G., Guo, L., & Ren, J. (2015a). Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Transactions on Geoscience and Remote Sensing, 53(6), 3325–3337.

    Article  Google Scholar 

  • Han, J., Zhang, D., Hu, X., Guo, L., Ren, J., & Wu, F. (2015b). Background prior based salient object detection via deep reconstruction residual. IEEE Transactions on Circuits and Systems for Video Technology, 25(8), 1309–1321.

    Article  Google Scholar 

  • Han, J., Zhang, D., Wen, S., Guo, L., Liu, T., & Li, X. (2015c). Two-stage learning to predict human eye fixations via SDAEs. IEEE Transactions on Cybernetics, online published.

  • Han, J., Zhou, P., Zhang, D., Cheng, G., Guo, L., Liu, Z., et al. (2014). Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS Journal of Photogrammetry and Remote Sensing, 89, 37–48.

    Article  Google Scholar 

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., & Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM international conference on multimedia (pp. 675–678).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 1097–1105). South Lake Tahoe, NV: NIPS foundation.

  • Li, S., Si, S., Dui, H., Cai, Z., & Sun, S. (2014). A novel decision diagrams extension method. Reliability Engineering & System Safety, 126, 107–115.

    Article  Google Scholar 

  • Li, X., Snoek, C. G., Worring, M., Koelma, D., & Smeulders, A. W. (2013). Bootstrapping visual categorization with relevant negatives. IEEE Transactions on Multimedia, 15(4), 933–945.

    Article  Google Scholar 

  • Li, X., Snoek, C. G., Worring, M., & Smeulders, A. W. (2011). Social negative bootstrapping for visual categorization. In Proceedings of the 1st ACM international conference on multimedia retrieval.

  • Liu, L., Shao, L., Zheng, F., & Li, X. (2014). Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recognition, 47(12), 3819–3827.

    Article  Google Scholar 

  • Liu, Q., Liao, X., & Carin, L. (2008). Detection of unexploded ordnance via efficient semisupervised and active learning. IEEE Transactions on Geoscience and Remote Sensing, 46(9), 2558–2567.

    Article  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Natsev, A. P., Naphade, M. R., & TešiĆ, J. (2005). Learning the semantics of multimedia queries and concepts from a small number of examples. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 598–607).

  • Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In 27th IEEE Conference on computer vision and pattern recognition (pp. 1717–1724).

  • Pandey, M., & Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of the 2011 IEEE international conference on computer vision (pp. 1307–1314).

  • Ren, J., & Jiang, J. (2009). Hierarchical modeling and adaptive clustering for real-time summarization of rush videos. IEEE Transactions on Multimedia, 11(5), 906–917.

    Article  Google Scholar 

  • Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229.

  • Shao, L., Liu, L., & Li, X. (2014a). Feature learning for image classification via multiobjective genetic programming. IEEE Transactions on Neural Networks and Learning Systems, 25(7), 1359–1371.

    Article  Google Scholar 

  • Shao, L., Wu, D., & Li, X. (2014b). Learning deep and wide: A spectral method for learning deep networks. IEEE Transactions on Neural Networks and Learning Systems, 25(12), 2303–2308.

    Article  Google Scholar 

  • Shi, Z., Hospedales, T. M., & Xiang, T. (2013). Bayesian joint topic modelling for weakly supervised object localisation. In Proceedings of the 2013 IEEE international conference on computer vision (pp. 2984–2991).

  • Sirmacek, B., & Unsalan, C. (2009). Urban-area and building detection using SIFT keypoints and graph theory. IEEE Transactions on Geoscience and Remote Sensing, 47(4), 1156–1167.

    Article  Google Scholar 

  • Siva, P., Russell, C., & Xiang, T. (2012). In defence of negative mining for annotating weakly labelled data. In Proceedings of the 12th European conference on computer vision (pp. 594–608).

  • Siva, P., & Xiang, T. (2011). Weakly supervised object detector learning with model drift detection. In Proceedings of the 2011 IEEE international conference on computer vision (pp. 343–350).

  • Sun, H., Sun, X., Wang, H., Li, Y., & Li, X. (2012). Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model. IEEE Geoscience and Remote Sensing Letters, 9(1), 109–113.

    Article  Google Scholar 

  • Tao, D., Tang, X., Li, X., & Wu, X. (2006). Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1088–1099.

    Article  Google Scholar 

  • Tello, M., López-Martínez, C., & Mallorqui, J. J. (2005). A novel algorithm for ship detection in SAR imagery based on the wavelet transform. IEEE Geoscience and Remote Sensing Letters, 2(2), 201–205.

    Article  Google Scholar 

  • Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE Conference on computer vision and pattern recognition (pp. 3360–3367).

  • Yang, W., Dai, D., Triggs, B., & Xia, G.-S. (2012). SAR-based terrain classification using weakly supervised hierarchical Markov aspect models. IEEE Transactions on Image Processing, 21(9), 4232–4243.

    Article  MathSciNet  Google Scholar 

  • Zhang, D., Han, J., Cheng, G., Liu, Z., Bu, S., & Guo, L. (2015). Weakly Supervised Learning for Target Detection in Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 12(4), 701–705.

    Article  Google Scholar 

  • Zhang, L., Zhen, X., & Shao, L. (2014). Learning object-to-class kernels for scene classification. IEEE Transactions on Image Processing, 23(8), 3241–3253.

    Article  MathSciNet  Google Scholar 

  • Zhao, C., Li, X., Ren, J., & Marshall, S. (2013). Improved sparse representation using adaptive spatial support for effective target detection in hyperspectral imagery. International Journal of Remote Sensing, 34(24), 8669–8684.

    Article  Google Scholar 

  • Zhou, P., Zhang, D., Cheng, G., & Han, J. (2015). Negative bootstrapping for weakly supervised target detection in remote sensing images. InProceedings of the 2015 IEEE international conference on multimedia big data (pp. 318–323).

  • Zhu, F., & Shao, L. (2014). Weakly-supervised cross-domain dictionary learning for visual recognition. International Journal of Computer Vision, 109(1–2), 42–59.

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Science Foundation of China under Grants 61401357 and U1261111HZ, the China Postdoctoral Science Foundation under Grants 2014M552491 and 2015T81050, and the Aerospace Science Foundation of China under Grant 20140153003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gong Cheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, P., Cheng, G., Liu, Z. et al. Weakly supervised target detection in remote sensing images based on transferred deep features and negative bootstrapping. Multidim Syst Sign Process 27, 925–944 (2016). https://doi.org/10.1007/s11045-015-0370-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-015-0370-3

Keywords

Navigation