Abstract
Detecting interested targets on aerial robotic systems is a challenging task. Due to the long view distance of air-to-ground observation, the target size is small and the number is large in the scene. In addition, the target only occupies part of the image, and the complex background environment can easily cover the feature information of the target. In this paper, a novel target detection method based on spatial attention model is designed, which changes the existing methods to enhance the features of target areas by enhancing global semantic information. By learning the feature weights of different spatial locations in feature space, the method proposed can focus attention on the target regions of interest in an image, and suppress the background interference features, which enhances the feature information of the target regions, and deals with the class imbalance problem in detection. The experimental results show that the algorithm improves the detection accuracy of small air-to-ground targets and has a good detection effect for dense target areas. Compared with RefineDet, the state-of-art small target detector, our method can achieve better performance at a lower cost.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer vision—ECCV 2006. Springer, Berlin, Heidelberg, pp. 404–417 (2006)
Cao, Y., Chen, K., et al.: Prime sample attention in object detection (2019). arXiv preprint arXiv:1904.04821
Chen, L.C., Yang, Y., Wang, J., et al.: Attention to scale: scale-aware semantic image segmentation (2015). arXiv preprint arXiv:1511.03339
Chu, W., Cai, D.: Deep feature based contextual model for object detection. Neurocomputing 275, 1035–1042 (2016)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387, 1, 3, 6, 7, 8 (2016)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, pp. 886–893 (2005)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision. IEEE Computer Society, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, 2014. 1, 3, 4, 8 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 770–778 (2016)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR:abs/1704.0486 (2017)
Huang, G., Liu, Z., Laurens, V.D.M., et al.: Densely connected convolutional networks (2016). arXiv preprint arXiv:1608.06993v5
Kaiming, H., Georgia, G., Piotr, D., et al.: Mask R-CNN. In: ICCV (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. Curran Associates Inc., pp. 1097–1105 (2012)
Li, W., Liu, G.: A single-shot object detector with feature aggregation and enhancement (2019). arXiv preprint arXiv:1902.02923
Li, J., Liang, X., Li, J., et al.: Multi-stage object detection with group recursive learning (2016). arXiv preprint arXiv:1608.05159
Li, J., Wei, Y., Liang, X., et al.: Attentive contexts for object detection. IEEE Trans. Multimedia 19(5), 944–954 (2017)
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, 1, 3, 7, 8 (2017)
Lindeberg, T.: Scale invariant feature transform. Scholarpedia. pp. 2012–2021 (2012)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot MultiBox detector. In: Computer Vision—ECCV 2016. Springer International Publishing, pp. 21–37 (2016)
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection (2017). arXiv preprint arXiv:1711.07767
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition. IEEE, pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International conference on neural information processing systems, MIT Press, pp. 91–99 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014). arXiv:1409.1556
Wang, X., Cai, Z., et al.: Towards universal object detection by domain attention (2019). arXiv preprint arXiv:1904.04402
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. CVPR 2, 3 (2017)
Xiang, W., Zhang, D.Q., Yu, H., et al.: Context-aware single-shot detector (2017). arXiv preprint arXiv:1707.08682
Zeng, X., Ouyang, W., Yan, J., et al.: Crafting GBD-Net for object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2109–2123 (2016)
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018 (2018a)
Zhang, X., Wang, T., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: CVPR, pp. 714–722 (2018b)
Zhao, Q., Sheng, T., Wang, Y., et al.: CFENet: an accurate and efficient single-shot object detector for autonomous driving (2018). arXiv preprint arXiv:1806.09790
Zheng, L., Fu, C., Zhao, Y.: Extend the shallow part of Single Shot MultiBox detector via convolutional neural network (2018). arXiv preprint arXiv:1801.05918
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (Grant nos. 61673017, 61403398), and the Natural Science Foundation of Shanxi Province (Grant nos. 2017JM6077, 2018ZDXM-GY-039).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, M., Wang, S., Yang, D. et al. Spatial attention model based target detection for aerial robotic systems. Int J Intell Robot Appl 3, 471–479 (2019). https://doi.org/10.1007/s41315-019-00108-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41315-019-00108-0