Skip to main content
Log in

SCA-YOLO: a new small object detection model for UAV images

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Object detection from UAV (unmanned aerial vehicle) images is a crucial and challenging task in the field of computer vision. The task suffers from the difficulties of small dense objects, low pixel occupation of objects, and features that are not easily extracted in images. In this paper, we proposed a multilayer feature fusion algorithm named SCA-YOLO (spatial and coordinate attention enhancement YOLO) for small object detection with hybrid attention mechanisms. It uses the single-stage detection algorithm YOLOv5 as the base framework. Firstly, a hybrid attention module with associated coordinate attention is designed to enhance the feature extraction of small objects. Secondly, to address the problem that small objects are vulnerable to being disturbed by the complex background information on UAV images, an improved SEB (simple and efficient bottleneck) module is designed to further distinguish foreground and background features. Thirdly, a multilayer feature fusion structure is built to perform channel stitching of shallow and deep feature maps, as well as to enrich the semantic information of shallow features by adding horizontal jump connections. Finally, experiments are conducted on the VisDrone2020 dataset, which involves a large number of small objects photographed by drones. In addition, we also conduct extended experiments on the DOTA dataset and PASCAL VOC dataset. Comparative experimental results indicate that the proposed method considerably improves the accuracy of small object detection on multiple benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The dataset in our study is based on VisDrone2020 dataset (http://aiskyeye.com/visdrone-2020/), DOTA dataset (https://captain-whu.github.io/DOTA/index.html), and PASCAL VOC dataset (http://host.robots.ox.ac.uk/pascal/VOC/).

References

  1. Jiang, B., Qu, R., Li, Y., Li, C.: VC-YOLO: towards real-time object detection in aerial images. J. Circuits Syst. Comput. 31(08), 2250147 (2022)

    Article  Google Scholar 

  2. Chandana, R., Ramachandra, A.: Real time object detection system with YOLO and CNN models: a review (2022)

  3. An, F., Liu, J., Bai, L.: Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network. Vis. Comput. 38(2), 541–553 (2022)

    Article  Google Scholar 

  4. Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38(8), 2939–2970 (2022)

    Article  PubMed  Google Scholar 

  5. Yang, H., Zhang, Y.: A context- and level-aware feature pyramid network for object detection with attention mechanism. Vis Comput (2023). https://doi.org/10.1007/s00371-022-02758-x

    Article  Google Scholar 

  6. Zhang, T., Li, Z., Sun, Z., Zhu, L.: A fully convolutional anchor-free object detector. Vis Comput (2022). https://doi.org/10.1007/s00371-021-02357-2

    Article  PubMed  PubMed Central  Google Scholar 

  7. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)

  8. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)

  9. Redmon, J., Farhadi, A., Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  10. Bochkovskiy, A., Wang, C., Liao, H., Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  11. Wei, L., Dragomir, A., Dumitru, E., Christian, S., Scott, R., Yang, F., and etc, SSD: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp. 21–37 (2016)

  12. Lin, T., Goyal, P., Girshick, R., He, K., and Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision pp. 2980–2988 (2017)

  13. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)

  14. Ren, S., He, K., Girshick, R., and Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  15. He, K., Gkioxari, G., Dollar, P., and Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)

  16. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, Springer, Cham, pp. 213–229 (2020)

  17. Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Wei, Z., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: CVPR (2021)

  18. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR (2020)

  19. Cui, Y., Yang, L., Liu, D.: Dynamic proposals for efficient object detection. arXiv preprint arXiv:2207.05252 (2022)

  20. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  21. Fu, C., Liu, W., Ranga, A., Tyagi, A., and Berg, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  22. Liu, G., Han, J., Rong, W.: Feedback-driven loss function for small object detection. Image Vis. Comput. 111, 104197 (2021)

    Article  Google Scholar 

  23. Zhang, Q., Zhang, H., Lu, X., and Han, X.: Anchor-free small object detection algorithm based on multi-scale feature fusion. In: 2022 5th international conference on pattern recognition and artificial intelligence (PRAI). IEEE, pp. 370–374 (2022)

  24. Shi, L., Tang, Z., Wang, T., Xu, X., Liu, J., Zhang, J.: Aircraft detection in remote sensing images based on deconvolution and position attention. Int. J. Remote Sens. 42(11), 4241–4260 (2021)

    Article  Google Scholar 

  25. Zhao, X., Zhang, J., Tian, J., Zhuo, L., Zhang, J.: Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention. Int. J. Remote Sens. 42(15), 5764–5783 (2021)

    Article  Google Scholar 

  26. Wang, Q., Zhang, H., Hong, X., and Zhou, Q.: Small object detection based on modified FSSD and model compression. (2021)

  27. Cui, Y., Yan, L., Cao, Z., and Liu, D.: Tf-blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8138–8147 (2021)

  28. Liu, D., Cui, Y., Tan, W., and Chen, Y.: Sg-net: spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9816–9825 (2021)

  29. Yan, L., Wang, Q., Cui, Y., Feng, F., Quan, X., Zhang, X., et al.: Gl-rg: global-local representation granularity for video captioning. arXiv preprint arXiv:2205.10706 (2022)

  30. Wang, C. Y., Liao, H., Wu, Y., Chen, P., Hsieh, J., and Yeh, I.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391 (2020)

  31. He, K. M., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  32. Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500 (2017)

  33. Huang, G., Liu, Z., Laurens, V., and Weinberger, K.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)

  34. Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)

  35. Ghiasi, G., Lin, T., Le, Q.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7036–7045 (2019)

  36. Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790 (2020)

  37. Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., Guo, Z.: Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sensing 10(1), 132 (2018)

    Article  ADS  Google Scholar 

  38. Qi, G., Zhang, Y., Wang, K., Mazur, N., Liu, Y., Malaviya, D.: Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion. Remote Sensing 14(2), 420 (2022)

    Article  ADS  Google Scholar 

  39. Liu, J., Fan, X., Jiang, J., Liu, R., Luo, Z.: Learning a deep multi-scale feature ensemble and edge-attention guidance for image fusion. IEEE Trans. Circuits Syst. Video Technol. 32(1), 105–119 (2021)

    Article  CAS  Google Scholar 

  40. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)

  41. Woo, S., Park, J. Lee, J., and Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)

  42. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722 (2021)

  43. Yang, L., Zhang, R., Li, L., and Xie, X.: SimAM: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp. 11863–11874 (2021)

  44. Wang, K., Wei, Z.: YOLO V4 with hybrid dilated convolution attention module for object detection in the aerial dataset. Int. J. Remote Sens. 43(4), 1323–1344 (2022)

    Article  Google Scholar 

  45. Gromada, K., Siemiatkowska, B., Stecz, W., Plochocki, K., Wozniak, K.: Real-time object detection and classification by UAV equipped with SAR. Sensors 22(5), 2068 (2022)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  46. Mozaffari, M., Li, Y., and Ko, Y.: Detecting flashover in a room fire based on the sequence of thermal infrared images using convolutional neural networks. In: Proceedings of the Canadian conference on artificial intelligence (2022)

  47. Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3258–3267 (2021)

  48. Albaba, B., Ozer, S.: SyNet: an ensemble network for object detection in UAV images. In: 2020 25th international conference on pattern recognition (ICPR), IEEE, pp. 10227–10234 (2021)

  49. Ali, S., Siddique, A., Ates, H., and Gunturk, B.: Improved YOLOv4 for aerial object detection. In: 2021 29th signal processing and communications applications conference (SIU). IEEE, pp. 1–4 (2021)

  50. Cao, Y., He, Z., Wang, L., Wang, W., Yuan, Y., Zhang, D., et al.: VisDrone-DET2021: the vision meets drone object detection challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2847–2854 (2021)

  51. Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., et al.: VisDrone-DET2019: the vision meets drone object detection in image challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision workshops (ICCVW). Seoul: IEEE Press, pp. 213–226 (2019)

  52. Zhao, H., Zhou, Y., Zhang, L., Peng, Y., Hu, X., Peng, H., et al.: Mixed YOLOv3-LITE: a lightweight real-time object detection method. Sensors 20(7), 1861 (2020)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  53. Mao, G.T., Deng, T., Yun, J.: Object detection in UAV images based on multi-scale split attention. Acta Aeronaut. Astronaut. Sin 43(12), 326738 (2022). https://doi.org/10.7527/S1000-6893.2021.326738

    Article  Google Scholar 

  54. Yin, Q., Yang, W., Ran, M., Wang, S.: FD-SSD: an improved SSD object detection algorithm based on feature fusion and dilated convolution. Signal Process. Image Commun. 98, 1164 (2021)

    Article  Google Scholar 

  55. Qu, Z., Shang, X., Xia, S., Yi, T., and Zhou, D.: A method of single-shot target detection with multi-scale feature fusion and feature enhancement. IET Image Process. 16 (2022)

  56. Zhang, L., Wang, L., Jin, M., Geng, X., Shen, Q.: Small object detection in remote sensing images based on attention mechanism and multi-scale feature fusion. Int. J. Remote Sens. 43(9), 3280–3297 (2022)

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the editors and reviewers for their work on this manuscript. The authors also thank the Hebei Province important research project (22370301D) and the Postgraduate Research and Innovation Project of Hebei Province (HBU2023SS009) for their financial support and the support of the High-Performance Computing Center of Hebei University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenzhu Yang.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, S., Yang, W., Jiao, Y. et al. SCA-YOLO: a new small object detection model for UAV images. Vis Comput 40, 1787–1803 (2024). https://doi.org/10.1007/s00371-023-02886-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02886-y

Keywords

Navigation