Abstract
Unmanned aerial vehicles (UAVs) possess high mobility and a wide field of view, leading to challenges such as a high proportion of small objects, significant variation in object size, object aggregation, and complex backgrounds in aerial images. Existing object detection methods often overlook the texture information in high-level features, which is crucial for detecting small objects in complex backgrounds. To improve the detection performance of small objects in complex scenes, we propose an efficient feature aggregation network (EFA-Net) based on YOLOv7. The backbone of the network seamlessly integrates a lightweight hybrid feature extraction module (LHFE), which replaces traditional convolutions with depthwise convolutions and employs a hybrid channel attention mechanism to capture local and global information concurrently. This design can effectively reduce the parameters without sacrificing detection accuracy and enhance the network’s representative capacity. In the neck, we design an innovative adaptive multi-scale feature fusion module (AMSFM) that improves the model’s adaptability to small objects and complex backgrounds by fusing multi-scale features with high-level semantic information and capturing the texture information in high-level features. Additionally, we incorporate a residual spatial pyramid pooling (RSPP) module to strengthen information fusion from various receptive fields and reduce the interference of complex backgrounds on small object detection. To further improve the model’s robustness and generalization ability, we propose an enhanced complete intersection over union (ECIoU) loss function to balance the influence of large and small objects during training. Experimental results demonstrate the effectiveness of the proposed method, achieving \({mAP_{50}}\) scores of 51.6% and 48.5%, and mAP scores of 29.6% and 29.5% on the VisDrone 2019 and UAVDT datasets, respectively.












Similar content being viewed by others
References
Xue Y, Jin G, Shen T, Tan L, Wang N, Gao J, Wang L (2023) Smalltrack: wavelet pooling and graph enhanced classification for UAV small object tracking. IEEE Trans Geosci Remote Sens. https://doi.org/10.1109/TGRS.2023.3305728
Tao S, Yang M, Wang M, Yang R, Shen Q (2024) Small object change detection in UAV imagery via a siamese network enhanced with temporal mutual attention and contextual features: A case study concerning solar water heaters. ISPRS J Photogramm Remote Sens 218:352–367
Zhong R, Peng E, Li Z, Ai Q, Han T, Tang Y (2024) Spd-yolov8: an small-size object detection model of uav imagery in complex scene. J Supercomput. https://doi.org/10.1007/s11227-024-06121-w
Lyu Y, Zhang T, Li X, Liu A, Shi G (2025) Lightuav-yolo: a lightweight object detection model for unmanned aerial vehicle image. J Supercomput 81(1):105
Ye T, Qin W, Zhao Z, Gao X, Deng X, Ouyang Y (2023) Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans Instrum Meas 72:1–13
Wang B, Tian Z, Liu X, Xia Y, She W, Liu W (2025) A multi-center federated learning mechanism based on consortium blockchain for data secure sharing. Knowl-Based Syst 310:112962
Chen N, Li Y, Yang Z, Lu Z, Wang S, Wang J (2023) Lodnu: lightweight object detection network in UAV vision. J Supercomput 79(9):10117–10138
Lu W, Lan C, Niu C, Liu W, Lyu L, Shi Q, Wang S (2023) A CNN-transformer hybrid model based on cswin transformer for UAV image object detection. IEEE J Select Top Appl Earth Observ Remote Sens 16:1211–1231
Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307
Yu Y, Zhang K, Wang X, Wang N, Gao X (2023) An adaptive region proposal network with progressive attention propagation for tiny person detection from UAV images. IEEE Transactions on Circuits and Systems for Video Technology
Lin J, Zhao Y, Wang S, Tang Y (2023) Yolo-da: an efficient yolo-based detector for remote sensing object detection. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/LGRS.2023.3303896
Jiang L, Yuan B, Du J, Chen B, Xie H, Tian J, Yuan Z (2024) Mffsodnet: multi-scale feature fusion small object detection network for UAV aerial images. IEEE Trans Instr Measur. https://doi.org/10.1109/TIM.2024.3381272
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1–48
Liu M, Wang X, Zhou A, Fu X, Ma Y, Piao C (2020) Uav-yolo: small object detection on unmanned aerial vehicle perspective. Sensors 20(8):2238
Wu X, Hong D, Tian J, Chanussot J, Li W, Tao R (2019) Orsim detector: a novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans Geosci Remote Sens 57(7):5146–5158
Wu X, Hong D, Chanussot J (2022) Uiu-net: U-net in u-net for infrared small object detection. IEEE Trans Image Process 32:364–376
Wang J, Guo W, Pan T, Yu H, Duan L, Yang W (2018) Bottle detection in the wild using low-altitude unmanned aerial vehicles. In: 2018 21st International Conference on Information Fusion (FUSION). IEEE, pp. 439–444
Sun W, Dai L, Zhang X, Chang P, He X (2022) Rsod: real-time small object detection algorithm in UAV-based traffic monitoring. Appl Intell. https://doi.org/10.1007/s10489-021-02893-3
Chalavadi V, Jeripothula P, Datla R, Ch SB et al (2022) msodanet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Patt Recogn 126:108548
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125
Li J, Xie C, Wu S, Ren Y (2024) Uav-yolov5: a swin-transformer-enabled small object detection model for long-range UAV images. Ann Data Sci. https://doi.org/10.1007/s40745-024-00546-z
Yuan Y, Wu Y, Fan X, Gong M, Miao Q, Ma W (2024) Inlier confidence calibration for point cloud registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5312–5321
Yuan Y, Wu Y, Gong M, Miao Q, Qin AK (2024) One-nearest neighborhood guides inlier estimation for unsupervised point cloud registration. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2024.3476114
Yuan Y, Wu Y, Yue M, Gong M, Fan X, Ma W, Miao Q (2024) Learning discriminative features via multi-hierarchical mutual information for unsupervised point cloud registration. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2024.3379220
Zhao Z, Wang T, Xin H, Wang R, Nie F (2025) Multi-view clustering via high-order bipartite graph fusion. Inform Fusion 113:102630
Xiong S, Li B, Zhu S (2023) Dcgnn: a single-stage 3d object detection network based on density clustering and graph neural network. Compl Intell Syst 9(3):3399–3408
Yang F, Fan H, Chu P, Blasch E, Ling H (2019) Clustered object detection in aerial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8311–8320
Koyun OC, Keser RK, Akkaya IB, Töreyin BU (2022) Focus-and-detect: a small object detection framework for aerial images. Signal Process: Image Commun 104:116675
Liao J, Piao Y, Su J, Cai G, Huang X, Chen L, Huang Z, Wu Y (2021) Unsupervised cluster guided object detection in aerial images. IEEE J Select Topics Appl Earth Observ Remote Sens 14:11204–11216
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19
Guo M-H, Xu T-X, Liu J-J, Liu Z-N, Jiang P-T, Mu T-J, Zhang S-H, Martin RR, Cheng M-M, Hu S-M (2022) Attention mechanisms in computer vision: a survey. Comput. Visual Media 8(3):331–368
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475
Liu S, Zha J, Sun J, Li Z, Wang G (2023) Edgeyolo: An edge-real-time object detector. arXiv preprint arXiv:2302.07483
Jocher G, Chaurasia JQA (2023) Yolov8 by ultralytics. https://github.com/ultralytics/ultralytics
Khanam R, Hussain M (2024) Yolov11: an overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725
Duan C, Wei Z, Zhang C, Qu S, Wang H (2021) Coarse-grained density map guided object detection in aerial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2789–2798
Zhou L, Liu Z, Zhao H, Hou Y-E, Liu Y, Zuo X, Dang L (2023) A multi-scale object detector based on coordinate and global information aggregation for uav aerial images. Remote Sens 15(14):3468
Li C, Yang T, Zhu S, Chen C, Guan S (2020) Density map guided object detection in aerial images, pp 737–746
Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, Liu Y, Chen J (2024) Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16965–16974
Sun F, He N, Li R, Wang X, Xu S (2024) Gd-pan: a multiscale fusion architecture applied to object detection in UAV aerial images. Multim Syst 30(3):143
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430
Author information
Authors and Affiliations
Contributions
X.L. wrote the main manuscript text. G.Z. provided some suggestions for revision of the manuscript. G.Z. provided funding. B.Z. suggested the structure of the manuscript. G.Z. provided some support on the experimental equipment. B.Z. gave some help to the typesetting of the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Zhang, G. & Zhou, B. An efficient feature aggregation network for small object detection in UAV aerial images. J Supercomput 81, 548 (2025). https://doi.org/10.1007/s11227-025-06987-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-06987-4