SCA-YOLO: a new small object detection model for UAV images

Zeng, Shuang; Yang, Wenzhu; Jiao, Yanyan; Geng, Lei; Chen, Xinting

doi:10.1007/s00371-023-02886-y

SCA-YOLO: a new small object detection model for UAV images

Original article
Published: 25 May 2023

Volume 40, pages 1787–1803, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shuang Zeng¹,
Wenzhu Yang^1,2,
Yanyan Jiao¹,
Lei Geng¹ &
…
Xinting Chen¹

1218 Accesses
6 Citations
Explore all metrics

Abstract

Object detection from UAV (unmanned aerial vehicle) images is a crucial and challenging task in the field of computer vision. The task suffers from the difficulties of small dense objects, low pixel occupation of objects, and features that are not easily extracted in images. In this paper, we proposed a multilayer feature fusion algorithm named SCA-YOLO (spatial and coordinate attention enhancement YOLO) for small object detection with hybrid attention mechanisms. It uses the single-stage detection algorithm YOLOv5 as the base framework. Firstly, a hybrid attention module with associated coordinate attention is designed to enhance the feature extraction of small objects. Secondly, to address the problem that small objects are vulnerable to being disturbed by the complex background information on UAV images, an improved SEB (simple and efficient bottleneck) module is designed to further distinguish foreground and background features. Thirdly, a multilayer feature fusion structure is built to perform channel stitching of shallow and deep feature maps, as well as to enrich the semantic information of shallow features by adding horizontal jump connections. Finally, experiments are conducted on the VisDrone2020 dataset, which involves a large number of small objects photographed by drones. In addition, we also conduct extended experiments on the DOTA dataset and PASCAL VOC dataset. Comparative experimental results indicate that the proposed method considerably improves the accuracy of small object detection on multiple benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Data availability

The dataset in our study is based on VisDrone2020 dataset (http://aiskyeye.com/visdrone-2020/), DOTA dataset (https://captain-whu.github.io/DOTA/index.html), and PASCAL VOC dataset (http://host.robots.ox.ac.uk/pascal/VOC/).

References

Jiang, B., Qu, R., Li, Y., Li, C.: VC-YOLO: towards real-time object detection in aerial images. J. Circuits Syst. Comput. 31(08), 2250147 (2022)
Article Google Scholar
Chandana, R., Ramachandra, A.: Real time object detection system with YOLO and CNN models: a review (2022)
An, F., Liu, J., Bai, L.: Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network. Vis. Comput. 38(2), 541–553 (2022)
Article Google Scholar
Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38(8), 2939–2970 (2022)
Article PubMed Google Scholar
Yang, H., Zhang, Y.: A context- and level-aware feature pyramid network for object detection with attention mechanism. Vis Comput (2023). https://doi.org/10.1007/s00371-022-02758-x
Article Google Scholar
Zhang, T., Li, Z., Sun, Z., Zhu, L.: A fully convolutional anchor-free object detector. Vis Comput (2022). https://doi.org/10.1007/s00371-021-02357-2
Article PubMed PubMed Central Google Scholar
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A., Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C., Liao, H., Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Wei, L., Dragomir, A., Dumitru, E., Christian, S., Scott, R., Yang, F., and etc, SSD: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp. 21–37 (2016)
Lin, T., Goyal, P., Girshick, R., He, K., and Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision pp. 2980–2988 (2017)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., and Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
He, K., Gkioxari, G., Dollar, P., and Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, Springer, Cham, pp. 213–229 (2020)
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Wei, Z., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: CVPR (2021)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR (2020)
Cui, Y., Yang, L., Liu, D.: Dynamic proposals for efficient object detection. arXiv preprint arXiv:2207.05252 (2022)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Fu, C., Liu, W., Ranga, A., Tyagi, A., and Berg, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Liu, G., Han, J., Rong, W.: Feedback-driven loss function for small object detection. Image Vis. Comput. 111, 104197 (2021)
Article Google Scholar
Zhang, Q., Zhang, H., Lu, X., and Han, X.: Anchor-free small object detection algorithm based on multi-scale feature fusion. In: 2022 5th international conference on pattern recognition and artificial intelligence (PRAI). IEEE, pp. 370–374 (2022)
Shi, L., Tang, Z., Wang, T., Xu, X., Liu, J., Zhang, J.: Aircraft detection in remote sensing images based on deconvolution and position attention. Int. J. Remote Sens. 42(11), 4241–4260 (2021)
Article Google Scholar
Zhao, X., Zhang, J., Tian, J., Zhuo, L., Zhang, J.: Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention. Int. J. Remote Sens. 42(15), 5764–5783 (2021)
Article Google Scholar
Wang, Q., Zhang, H., Hong, X., and Zhou, Q.: Small object detection based on modified FSSD and model compression. (2021)
Cui, Y., Yan, L., Cao, Z., and Liu, D.: Tf-blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8138–8147 (2021)
Liu, D., Cui, Y., Tan, W., and Chen, Y.: Sg-net: spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9816–9825 (2021)
Yan, L., Wang, Q., Cui, Y., Feng, F., Quan, X., Zhang, X., et al.: Gl-rg: global-local representation granularity for video captioning. arXiv preprint arXiv:2205.10706 (2022)
Wang, C. Y., Liao, H., Wu, Y., Chen, P., Hsieh, J., and Yeh, I.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391 (2020)
He, K. M., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500 (2017)
Huang, G., Liu, Z., Laurens, V., and Weinberger, K.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)
Ghiasi, G., Lin, T., Le, Q.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7036–7045 (2019)
Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790 (2020)
Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., Guo, Z.: Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sensing 10(1), 132 (2018)
Article ADS Google Scholar
Qi, G., Zhang, Y., Wang, K., Mazur, N., Liu, Y., Malaviya, D.: Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion. Remote Sensing 14(2), 420 (2022)
Article ADS Google Scholar
Liu, J., Fan, X., Jiang, J., Liu, R., Luo, Z.: Learning a deep multi-scale feature ensemble and edge-attention guidance for image fusion. IEEE Trans. Circuits Syst. Video Technol. 32(1), 105–119 (2021)
Article CAS Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Woo, S., Park, J. Lee, J., and Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722 (2021)
Yang, L., Zhang, R., Li, L., and Xie, X.: SimAM: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp. 11863–11874 (2021)
Wang, K., Wei, Z.: YOLO V4 with hybrid dilated convolution attention module for object detection in the aerial dataset. Int. J. Remote Sens. 43(4), 1323–1344 (2022)
Article Google Scholar
Gromada, K., Siemiatkowska, B., Stecz, W., Plochocki, K., Wozniak, K.: Real-time object detection and classification by UAV equipped with SAR. Sensors 22(5), 2068 (2022)
Article PubMed PubMed Central ADS Google Scholar
Mozaffari, M., Li, Y., and Ko, Y.: Detecting flashover in a room fire based on the sequence of thermal infrared images using convolutional neural networks. In: Proceedings of the Canadian conference on artificial intelligence (2022)
Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3258–3267 (2021)
Albaba, B., Ozer, S.: SyNet: an ensemble network for object detection in UAV images. In: 2020 25th international conference on pattern recognition (ICPR), IEEE, pp. 10227–10234 (2021)
Ali, S., Siddique, A., Ates, H., and Gunturk, B.: Improved YOLOv4 for aerial object detection. In: 2021 29th signal processing and communications applications conference (SIU). IEEE, pp. 1–4 (2021)
Cao, Y., He, Z., Wang, L., Wang, W., Yuan, Y., Zhang, D., et al.: VisDrone-DET2021: the vision meets drone object detection challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2847–2854 (2021)
Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., et al.: VisDrone-DET2019: the vision meets drone object detection in image challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision workshops (ICCVW). Seoul: IEEE Press, pp. 213–226 (2019)
Zhao, H., Zhou, Y., Zhang, L., Peng, Y., Hu, X., Peng, H., et al.: Mixed YOLOv3-LITE: a lightweight real-time object detection method. Sensors 20(7), 1861 (2020)
Article PubMed PubMed Central ADS Google Scholar
Mao, G.T., Deng, T., Yun, J.: Object detection in UAV images based on multi-scale split attention. Acta Aeronaut. Astronaut. Sin 43(12), 326738 (2022). https://doi.org/10.7527/S1000-6893.2021.326738
Article Google Scholar
Yin, Q., Yang, W., Ran, M., Wang, S.: FD-SSD: an improved SSD object detection algorithm based on feature fusion and dilated convolution. Signal Process. Image Commun. 98, 1164 (2021)
Article Google Scholar
Qu, Z., Shang, X., Xia, S., Yi, T., and Zhou, D.: A method of single-shot target detection with multi-scale feature fusion and feature enhancement. IET Image Process. 16 (2022)
Zhang, L., Wang, L., Jin, M., Geng, X., Shen, Q.: Small object detection in remote sensing images based on attention mechanism and multi-scale feature fusion. Int. J. Remote Sens. 43(9), 3280–3297 (2022)
Article Google Scholar

Download references

Acknowledgements

The authors thank the editors and reviewers for their work on this manuscript. The authors also thank the Hebei Province important research project (22370301D) and the Postgraduate Research and Innovation Project of Hebei Province (HBU2023SS009) for their financial support and the support of the High-Performance Computing Center of Hebei University.

Author information

Authors and Affiliations

School of Cyber Security and Computer, Hebei University, Baoding, 071002, China
Shuang Zeng, Wenzhu Yang, Yanyan Jiao, Lei Geng & Xinting Chen
Machine Vision Engineering Research Center, Hebei University, Baoding, 071002, China
Wenzhu Yang

Authors

Shuang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Geng
View author publications
You can also search for this author in PubMed Google Scholar
Xinting Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenzhu Yang.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zeng, S., Yang, W., Jiao, Y. et al. SCA-YOLO: a new small object detection model for UAV images. Vis Comput 40, 1787–1803 (2024). https://doi.org/10.1007/s00371-023-02886-y

Download citation

Accepted: 23 April 2023
Published: 25 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02886-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCA-YOLO: a new small object detection model for UAV images

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SCA-YOLO: a new small object detection model for UAV images

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation