Skip to main content
Log in

Spatial attention model based target detection for aerial robotic systems

  • Regular Paper
  • Published:
International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript

Abstract

Detecting interested targets on aerial robotic systems is a challenging task. Due to the long view distance of air-to-ground observation, the target size is small and the number is large in the scene. In addition, the target only occupies part of the image, and the complex background environment can easily cover the feature information of the target. In this paper, a novel target detection method based on spatial attention model is designed, which changes the existing methods to enhance the features of target areas by enhancing global semantic information. By learning the feature weights of different spatial locations in feature space, the method proposed can focus attention on the target regions of interest in an image, and suppress the background interference features, which enhances the feature information of the target regions, and deals with the class imbalance problem in detection. The experimental results show that the algorithm improves the detection accuracy of small air-to-ground targets and has a good detection effect for dense target areas. Compared with RefineDet, the state-of-art small target detector, our method can achieve better performance at a lower cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer vision—ECCV 2006. Springer, Berlin, Heidelberg, pp. 404–417 (2006)

    Chapter  Google Scholar 

  • Cao, Y., Chen, K., et al.: Prime sample attention in object detection (2019). arXiv preprint arXiv:1904.04821

  • Chen, L.C., Yang, Y., Wang, J., et al.: Attention to scale: scale-aware semantic image segmentation (2015). arXiv preprint arXiv:1511.03339

  • Chu, W., Cai, D.: Deep feature based contextual model for object detection. Neurocomputing 275, 1035–1042 (2016)

    Article  Google Scholar 

  • Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387, 1, 3, 6, 7, 8 (2016)

  • Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, pp. 886–893 (2005)

  • Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision. IEEE Computer Society, pp. 1440–1448 (2015)

  • Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, 2014. 1, 3, 4, 8 (2014)

  • He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 770–778 (2016)

  • Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR:abs/1704.0486 (2017)

  • Huang, G., Liu, Z., Laurens, V.D.M., et al.: Densely connected convolutional networks (2016). arXiv preprint arXiv:1608.06993v5

  • Kaiming, H., Georgia, G., Piotr, D., et al.: Mask R-CNN. In: ICCV (2017)

  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. Curran Associates Inc., pp. 1097–1105 (2012)

  • Li, W., Liu, G.: A single-shot object detector with feature aggregation and enhancement (2019). arXiv preprint arXiv:1902.02923

  • Li, J., Liang, X., Li, J., et al.: Multi-stage object detection with group recursive learning (2016). arXiv preprint arXiv:1608.05159

  • Li, J., Wei, Y., Liang, X., et al.: Attentive contexts for object detection. IEEE Trans. Multimedia 19(5), 944–954 (2017)

    Article  Google Scholar 

  • Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, 1, 3, 7, 8 (2017)

  • Lindeberg, T.: Scale invariant feature transform. Scholarpedia. pp. 2012–2021 (2012)

    Article  Google Scholar 

  • Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot MultiBox detector. In: Computer Vision—ECCV 2016. Springer International Publishing, pp. 21–37 (2016)

  • Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection (2017). arXiv preprint arXiv:1711.07767

  • Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  • Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition. IEEE, pp. 779–788 (2016)

  • Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International conference on neural information processing systems, MIT Press, pp. 91–99 (2015)

  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014). arXiv:1409.1556

  • Wang, X., Cai, Z., et al.: Towards universal object detection by domain attention (2019). arXiv preprint arXiv:1904.04402

  • Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. CVPR 2, 3 (2017)

    Google Scholar 

  • Xiang, W., Zhang, D.Q., Yu, H., et al.: Context-aware single-shot detector (2017). arXiv preprint arXiv:1707.08682

  • Zeng, X., Ouyang, W., Yan, J., et al.: Crafting GBD-Net for object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2109–2123 (2016)

    Google Scholar 

  • Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018 (2018a)

  • Zhang, X., Wang, T., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: CVPR, pp. 714–722 (2018b)

  • Zhao, Q., Sheng, T., Wang, Y., et al.: CFENet: an accurate and efficient single-shot object detector for autonomous driving (2018). arXiv preprint arXiv:1806.09790

  • Zheng, L., Fu, C., Zhao, Y.: Extend the shallow part of Single Shot MultiBox detector via convolutional neural network (2018). arXiv preprint arXiv:1801.05918

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (Grant nos. 61673017, 61403398), and the Natural Science Foundation of Shanxi Province (Grant nos. 2017JM6077, 2018ZDXM-GY-039).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongfang Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Wang, S., Yang, D. et al. Spatial attention model based target detection for aerial robotic systems. Int J Intell Robot Appl 3, 471–479 (2019). https://doi.org/10.1007/s41315-019-00108-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41315-019-00108-0

Keywords

Navigation