Skip to main content
Log in

PSO-YOLO: a contextual feature enhancement method for small object detection in UAV aerial images

  • REVIEW
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

In recent years, UAV aerial object detection has been widely applied in fields such as urban planning, environmental monitoring, agricultural management, and disaster assessment. However, challenges such as low accuracy, missed detections, and occlusion still persist when detecting small objects from aerial perspectives. This study proposes the PSO—YOLO (Precise Small Object—You Only Look Once) algorithm, which is developed based on multi-scale feature fusion and extraction techniques for small object detection. Firstly, the MSFE (Multi-Scale Feature Extraction) module was developed, leveraging Transformer-based multi-scale feature fusion and self-attention mechanisms to enhance the acquisition of fine-grained multi-scale details. Secondly, a small object-enhanced STE-Neck (Small Object Enhancement-Neck) network is introduced to fuse multi-scale features during downsampling, effectively capturing detailed and local features to prevent missed detections and feature loss due to occlusion. The fused features are then input into a small object detection head. Finally, the Swin Transformer is incorporated into the YOLOv8n backbone network to improve the C2f module, thereby enhancing its ability to capture long—range contextual information and further improve small object feature extraction. Additionally, the SPPF—LSKA (Spatial Pyramid Pooling Fast—Large Separable Kernel Attention) module is introduced to replace the SPPF (Spatial Pyramid Pooling Fast) module, aiming to enhance the algorithm's ability to extract key features. Experiments on the VisDrone2019 dataset show that the PSO-YOLO algorithm achieves an mAP@50% of 30.9% and an mAP@50–95% of 17.4%, representing improvements of 4.6% and 2.8%, respectively, over YOLOv8n.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Datasets for this research are available from the corresponding author on request.

References

  • Alhawsawi AN, Khan SD, Rehman FU (2024) Enhanced YOLOv8-based model with context enrichment module for crowd counting in complex drone imagery. Remote Sens 16(22):4175

    Article  Google Scholar 

  • Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. Preprint at https://arxiv.org/abs/2004.10934

  • Ding X, Zhang X, Ma N et al (2021) Repvgg: making vgg-style convnets great again. In: Proc IEEE/CVF Conf Comp Vis Patt Recognit (pp 13733–13742)

  • Duan CZ, Wei ZW, Zhang C et al (2021) Coarse-grained density map guided object detection in aerial images. In: Proc IEEE/CVF Int Conf Comp Vis (pp 2789–2798)

  • Gavrilescu R, Zet C, Foșalau C et al (2018, October) Faster R-CNN: an approach to real-time object detection. In 2018 International Conference and Exposition on Electrical and Power Engineering (EPE) (pp. 0165-0168). IEEE, Iasi, Romania, 18-19 October 2018. https://doi.org/10.1109/ICEPE.2018.8559776

  • Ge Z, Liu S, Wang F et al (2021) Yolox: exceeding yolo series in 2021. Preprint at https://arxiv.org/abs/2107.08430

  • He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp 2961–2969)

  • Khan SD, Alarabi L, Basalamah S (2022) A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab J Sci Eng 47(8):9489–9504

    Article  Google Scholar 

  • Kim M, Jeong J, Kim S (2021) ECAP-YOLO: efficient channel attention pyramid YOLO for small object detection in aerial image. Remote Sens 13(23):4851

    Article  Google Scholar 

  • Lau KW, Po LM, ur Rehman YA (2023) Large separable kernel attention: rethinking the large kernel attention design in CNN. Preprint at https://arxiv.org/abs/2309.01439

  • Li Y, Fan Q, Huang H et al (2023) A modified YOLOv8 detection network for UAV aerial image recognition. Drones 7(5):304

    Article  Google Scholar 

  • Li W, Chen Y, Hu K et al (2022) Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1829-1838)

  • Li CY, Li LL, Jiang HL et al (2022) YOLOv6: a single-stage object detection framework for industrial applications. Preprint at https://arxiv.org/abs/2209.02976

  • Li H, Li J, Wei H et al (2022) Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles. Preprint at https://arxiv.org/abs/2206.02424

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. Computer vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, Berlin, pp 21–37

    Chapter  Google Scholar 

  • Liu Z, Gao XH, Wan Y et al (2023) An improved YOLOv5 method for small object detection in UAV capture scenes. IEEE Access 11:14365–14374

    Article  Google Scholar 

  • Liu Z, Gao G, Sun L et al (2021) HRDNet: High-resolution detection network for small objects. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1-6). IEEE, Shenzhen, China, 05-09 July 2021. https://doi.org/10.1109/ICME51207.2021.9428241

  • Ma S, Xu Y (2023) MPDIoU: a loss for efficient and accurate bounding box regression. Preprint at https://arxiv.org/abs/2307.07662

  • Meng S, Zhang C, Shi Q et al (2023) A robust infrared small target detection method jointing multiple information and noise prediction: Algorithm and benchmark. IEEE Trans Geosci Remote Sensing. https://doi.org/10.1109/TGRS.2023.3295932

    Article  Google Scholar 

  • PurkaiT P, Zhao C, Zach C (2017) SPP-Net: deep absolute pose regression with synthetic views. Preprint at https://arxiv.org/abs/1712.03452

  • Qi XM, Chai R, Gao YM (2023) Algorithm of reconstructed SPPCSPC and optimized downsampling for small object detection. Comput Eng Appl 59(20):159–166

    Google Scholar 

  • Qiu Z, Huang Z, Mo D et al (2024) GSE-YOLO: a lightweight and high-precision model for identifying the ripeness of pitaya (Dragon Fruit) based on the YOLOv8n improvement. Horticulturae 10(8):852

    Article  Google Scholar 

  • Sun L, Wang Q, Chen Y et al (2023) CRNet: Channel-enhanced remodeling-based network for salient object detection in optical remote sensing images. IEEE Trans Geosci Remote Sensing. https://doi.org/10.1109/TGRS.2023.3305021

    Article  Google Scholar 

  • Tong Z, Jieyu L, Zhiqiang D (2019) UAV target detection based on RetinaNet. In 2019 Chinese control and decision conference (CCDC) (pp 3342–3346). IEEE

  • Wang CY, Bochkovskiy A, Liao HYM (2023a) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc IEEE/CVF Conf Comp Vis Patt Recognit (pp 7464–7475)

  • Wang G, Chen YF, An P et al (2023b) UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAVaerial photography scenarios. Sensors 23(16):7190

  • Wang A, Chen H, Liu L et al (2024) Yolov10: real-time end-to-end object detection. Preprint at https://arxiv.org/abs/2405.14458

  • Wei P, Chao W, Chunyu Q et al (2024) An improved model of YOLOv8s for small target detection from UAV viewpoint. Computer engineering and applications 1–10. http://kns.cnki.net/kcms/detail/11.2127.TP.20240308.1652.008.html. Accessed 27 Mar 2024

  • Xueli S, Lingchao W (2024) Target detection in UAV aerial photography based on YOLOv8n. Comp Syst Appl 33(07):139–148. https://doi.org/10.15888/j.cnki.csa.009567

    Article  Google Scholar 

  • Yan P, Zhao J, Hou R et al (2024) Clustered remote sensing target distribution detection aided by density-based spatial analysis. Int J Appl Earth Obs Geoinf 132:104019

    Google Scholar 

  • Zhang Z (2023) Drone-YOLO: an efficient neural network method for target detection in drone images. Drones 7(8):526

    Article  Google Scholar 

  • Zhang Z, Xu Y, Song J et al (2023) Planet craters detection based on unsupervised domain adaptation. IEEE Trans Aerosp Electron Syst 59(5):7140–7152

    Google Scholar 

  • Zhong M, Huang F, Li S (2023) Lightweight you only look once v8: an upgraded you only look once v8 algorithm for small object identification in unmanned aerial vehicle images. Appl Sci 13(22):12369

    Article  Google Scholar 

Download references

Funding

This research is supported by the National Natural Science Foundation of China (Nos.11972236) and the Shijiazhuang Tiedao University Graduate Innovation Funding Project (Grant No.YC202552).

Author information

Authors and Affiliations

Authors

Contributions

Z.Z. and X.L. conceived the study and wrote the main manuscript text. X.L. developed the PSO-YOLO algorithm and conducted experiments. P.H. assisted with data collection and validation. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Zhihong Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Communicated by Hassan Babaie

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Z., Liu, X. & He, P. PSO-YOLO: a contextual feature enhancement method for small object detection in UAV aerial images. Earth Sci Inform 18, 258 (2025). https://doi.org/10.1007/s12145-025-01780-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12145-025-01780-6

Keywords