Abstract
In recent years, UAV aerial object detection has been widely applied in fields such as urban planning, environmental monitoring, agricultural management, and disaster assessment. However, challenges such as low accuracy, missed detections, and occlusion still persist when detecting small objects from aerial perspectives. This study proposes the PSO—YOLO (Precise Small Object—You Only Look Once) algorithm, which is developed based on multi-scale feature fusion and extraction techniques for small object detection. Firstly, the MSFE (Multi-Scale Feature Extraction) module was developed, leveraging Transformer-based multi-scale feature fusion and self-attention mechanisms to enhance the acquisition of fine-grained multi-scale details. Secondly, a small object-enhanced STE-Neck (Small Object Enhancement-Neck) network is introduced to fuse multi-scale features during downsampling, effectively capturing detailed and local features to prevent missed detections and feature loss due to occlusion. The fused features are then input into a small object detection head. Finally, the Swin Transformer is incorporated into the YOLOv8n backbone network to improve the C2f module, thereby enhancing its ability to capture long—range contextual information and further improve small object feature extraction. Additionally, the SPPF—LSKA (Spatial Pyramid Pooling Fast—Large Separable Kernel Attention) module is introduced to replace the SPPF (Spatial Pyramid Pooling Fast) module, aiming to enhance the algorithm's ability to extract key features. Experiments on the VisDrone2019 dataset show that the PSO-YOLO algorithm achieves an mAP@50% of 30.9% and an mAP@50–95% of 17.4%, representing improvements of 4.6% and 2.8%, respectively, over YOLOv8n.













Similar content being viewed by others
Data availability
Datasets for this research are available from the corresponding author on request.
References
Alhawsawi AN, Khan SD, Rehman FU (2024) Enhanced YOLOv8-based model with context enrichment module for crowd counting in complex drone imagery. Remote Sens 16(22):4175
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. Preprint at https://arxiv.org/abs/2004.10934
Ding X, Zhang X, Ma N et al (2021) Repvgg: making vgg-style convnets great again. In: Proc IEEE/CVF Conf Comp Vis Patt Recognit (pp 13733–13742)
Duan CZ, Wei ZW, Zhang C et al (2021) Coarse-grained density map guided object detection in aerial images. In: Proc IEEE/CVF Int Conf Comp Vis (pp 2789–2798)
Gavrilescu R, Zet C, Foșalau C et al (2018, October) Faster R-CNN: an approach to real-time object detection. In 2018 International Conference and Exposition on Electrical and Power Engineering (EPE) (pp. 0165-0168). IEEE, Iasi, Romania, 18-19 October 2018. https://doi.org/10.1109/ICEPE.2018.8559776
Ge Z, Liu S, Wang F et al (2021) Yolox: exceeding yolo series in 2021. Preprint at https://arxiv.org/abs/2107.08430
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp 2961–2969)
Khan SD, Alarabi L, Basalamah S (2022) A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab J Sci Eng 47(8):9489–9504
Kim M, Jeong J, Kim S (2021) ECAP-YOLO: efficient channel attention pyramid YOLO for small object detection in aerial image. Remote Sens 13(23):4851
Lau KW, Po LM, ur Rehman YA (2023) Large separable kernel attention: rethinking the large kernel attention design in CNN. Preprint at https://arxiv.org/abs/2309.01439
Li Y, Fan Q, Huang H et al (2023) A modified YOLOv8 detection network for UAV aerial image recognition. Drones 7(5):304
Li W, Chen Y, Hu K et al (2022) Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1829-1838)
Li CY, Li LL, Jiang HL et al (2022) YOLOv6: a single-stage object detection framework for industrial applications. Preprint at https://arxiv.org/abs/2209.02976
Li H, Li J, Wei H et al (2022) Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles. Preprint at https://arxiv.org/abs/2206.02424
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. Computer vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, Berlin, pp 21–37
Liu Z, Gao XH, Wan Y et al (2023) An improved YOLOv5 method for small object detection in UAV capture scenes. IEEE Access 11:14365–14374
Liu Z, Gao G, Sun L et al (2021) HRDNet: High-resolution detection network for small objects. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1-6). IEEE, Shenzhen, China, 05-09 July 2021. https://doi.org/10.1109/ICME51207.2021.9428241
Ma S, Xu Y (2023) MPDIoU: a loss for efficient and accurate bounding box regression. Preprint at https://arxiv.org/abs/2307.07662
Meng S, Zhang C, Shi Q et al (2023) A robust infrared small target detection method jointing multiple information and noise prediction: Algorithm and benchmark. IEEE Trans Geosci Remote Sensing. https://doi.org/10.1109/TGRS.2023.3295932
PurkaiT P, Zhao C, Zach C (2017) SPP-Net: deep absolute pose regression with synthetic views. Preprint at https://arxiv.org/abs/1712.03452
Qi XM, Chai R, Gao YM (2023) Algorithm of reconstructed SPPCSPC and optimized downsampling for small object detection. Comput Eng Appl 59(20):159–166
Qiu Z, Huang Z, Mo D et al (2024) GSE-YOLO: a lightweight and high-precision model for identifying the ripeness of pitaya (Dragon Fruit) based on the YOLOv8n improvement. Horticulturae 10(8):852
Sun L, Wang Q, Chen Y et al (2023) CRNet: Channel-enhanced remodeling-based network for salient object detection in optical remote sensing images. IEEE Trans Geosci Remote Sensing. https://doi.org/10.1109/TGRS.2023.3305021
Tong Z, Jieyu L, Zhiqiang D (2019) UAV target detection based on RetinaNet. In 2019 Chinese control and decision conference (CCDC) (pp 3342–3346). IEEE
Wang CY, Bochkovskiy A, Liao HYM (2023a) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc IEEE/CVF Conf Comp Vis Patt Recognit (pp 7464–7475)
Wang G, Chen YF, An P et al (2023b) UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAVaerial photography scenarios. Sensors 23(16):7190
Wang A, Chen H, Liu L et al (2024) Yolov10: real-time end-to-end object detection. Preprint at https://arxiv.org/abs/2405.14458
Wei P, Chao W, Chunyu Q et al (2024) An improved model of YOLOv8s for small target detection from UAV viewpoint. Computer engineering and applications 1–10. http://kns.cnki.net/kcms/detail/11.2127.TP.20240308.1652.008.html. Accessed 27 Mar 2024
Xueli S, Lingchao W (2024) Target detection in UAV aerial photography based on YOLOv8n. Comp Syst Appl 33(07):139–148. https://doi.org/10.15888/j.cnki.csa.009567
Yan P, Zhao J, Hou R et al (2024) Clustered remote sensing target distribution detection aided by density-based spatial analysis. Int J Appl Earth Obs Geoinf 132:104019
Zhang Z (2023) Drone-YOLO: an efficient neural network method for target detection in drone images. Drones 7(8):526
Zhang Z, Xu Y, Song J et al (2023) Planet craters detection based on unsupervised domain adaptation. IEEE Trans Aerosp Electron Syst 59(5):7140–7152
Zhong M, Huang F, Li S (2023) Lightweight you only look once v8: an upgraded you only look once v8 algorithm for small object identification in unmanned aerial vehicle images. Appl Sci 13(22):12369
Funding
This research is supported by the National Natural Science Foundation of China (Nos.11972236) and the Shijiazhuang Tiedao University Graduate Innovation Funding Project (Grant No.YC202552).
Author information
Authors and Affiliations
Contributions
Z.Z. and X.L. conceived the study and wrote the main manuscript text. X.L. developed the PSO-YOLO algorithm and conducted experiments. P.H. assisted with data collection and validation. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Communicated by Hassan Babaie
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Z., Liu, X. & He, P. PSO-YOLO: a contextual feature enhancement method for small object detection in UAV aerial images. Earth Sci Inform 18, 258 (2025). https://doi.org/10.1007/s12145-025-01780-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12145-025-01780-6