Skip to main content

Advertisement

Log in

MLSA-YOLO: a multi-level feature fusion and scale-adaptive framework for small object detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Due to the limited target area occupied by small objects, certain feature extraction paradigms that are not well-suited for small objects can further exacerbate the loss of their already limited information. Additionally, inconsistencies between features at different levels in FPN can result in suboptimal feature fusion, hindering the accurate representation of multi-scale features. As a result, even high-performance detectors struggle to recognize small objects effectively. To resolve the above issues, we propose MLSA-YOLO, a small object detection algorithm based on multi-level feature fusion and scale-adaptive. Initially, we restructured the network architecture using SPD-Conv with the proposed Convolutional Space-to-Depth (CSPD) module to improve the network’s capacity for capturing local spatial details in images and to ensure that information is preserved during the downsampling process. Furthermore, to address the challenges in feature fusion, we employed a three-layer PAFPN structure at the neck and combined it with the proposed multi-level Feature Fusion and Scale-Adaptive (MLSA) feature pyramid network. This method enhances the complementarity of multi-level information, while effectively filtering the conflicting information generated during the fusion phase. To improve the quality of feature extraction, we incorporated the designed DCN_C2f module into the neck network. This module can accurately capture foreground object features, while enhancing the network’s adaptability to geometric deformations of objects. Experimental results show that our approach performs better than other state-of-the-art detection algorithms on the VisDrone2019, DOTA, and FocusTiny datasets. Compared to YOLOv8s, mAP50 improved by 9.5%, 3.4%, and 5.1%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The VisDrone2019 [38] and DOTA [39] datasets used in this study are publicly available. The authors do not have permission to share the FocusTiny dataset used.

Code availability

The code of the study are available from the corresponding author upon reasonable request.

References

  1. Feng Q, Xu X, Wang Z (2023) Deep learning-based small object detection: a survey. Math Biosci Eng 20(4):6551–6590

    Article  MATH  Google Scholar 

  2. Cheng G, Yuan X, Yao X, Yan K, Zeng Q, Xie X, Han J (2023) Towards large-scale small object detection: survey and benchmarks. IEEE Trans Pattern Anal Mach Intell 45:13467–13488

    Google Scholar 

  3. Rekavandi AM, Rashidi S, Boussaid F, Hoefs S, Akbas E et al (2023) Transformers in small object detection: a benchmark and survey of state-of-the-art. arXiv preprint arXiv:2309.04902

  4. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  5. Girshick R (2015) Fast r-CNN. arXiv preprint arXiv:1504.08083

  6. Ren S (2015) Faster r-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497

  7. Redmon J (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  8. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp 7263–7271

  9. Redmon J (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  10. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  11. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, Berlin, pp 21–37

  12. Lin T (2017) Focal loss for dense object detection. arXiv preprint arXiv:1708.02002

  13. Wang M, Yang W, Wang L, Chen D, Wei F, KeZiErBieKe H, Liao Y (2023) Fe-yolov5: feature enhancement network based on yolov5 for small object detection. J Vis Commun Image Represent 90:103752

    Article  MATH  Google Scholar 

  14. Xue C, Xia Y, Wu M, Chen Z, Cheng F, Yun L (2024) EL-YOLO: an efficient and lightweight low-altitude aerial objects detector for onboard applications. Expert Syst Appl 256:124848

    Article  Google Scholar 

  15. Ghiasi G, Lin T-Y, Le QV (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045

  16. Fu Y, Ran T, Xiao W, Yuan L, Zhao J, He L, Mei J (2024) GD-YOLO: an improved convolutional neural network architecture for real-time detection of smoking and phone use behaviors. Digit Signal Process 151:104554

    Article  Google Scholar 

  17. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  18. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768

  19. Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10781–10790

  20. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516

  21. Yang G, Lei J, Zhu Z, Cheng S, Feng Z, Liang R (2023) AFPN: asymptotic feature pyramid network for object detection. In: 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, pp 2184–2189

  22. Pang Y, Zhao X, Zhang L, Lu H (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9413–9422

  23. Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2778–2788

  24. Vaswani A (2017) Attention is all you need. Adv neural inf process syst

  25. Jocher G, Stoken A, Chaurasia A et al {2020) Ultralytics YOLOv5. https://doi.org/10.5281/zenodo.3908559. https://github.com/ultralytics/yolov5

  26. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  27. Yang C, Huang Z, Wang N (2022) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13668–13677

  28. Zhang Z (2023) Drone-YOLO: an efficient neural network method for target detection in drone images. Drones 7(8):526

    Article  MATH  Google Scholar 

  29. Jocher G, Chaurasia A, Qiu J, Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics

  30. Li Y, Fan Q, Huang H, Han Z, Gu Q (2023) A modified yolov8 detection network for UAV aerial image recognition. Drones 7(5):304

    Article  MATH  Google Scholar 

  31. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1580–1589

  32. Wang Y, Zou H, Yin M, Zhang X (2023) SMFF-YOLO: a scale-adaptive yolo algorithm with multi-level feature fusion for object detection in UAV scenes. Remote Sens 15(18):4580

    Article  MATH  Google Scholar 

  33. Shi Y, Jia Y, Zhang X (2024) FocusDet: an efficient object detector for small object. Sci Rep 14(1):10697

    Article  MATH  Google Scholar 

  34. Sunkara R, Luo T (2022) No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, pp 443–459

  35. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  36. Peng Y, Sonka M, Chen DZ (2023) U-net v2: rethinking the skip connections of u-net for medical image segmentation. arXiv preprint arXiv:2311.17791

  37. Wang W, Dai J, Chen Z, Huang Z, Li Z, Zhu X, Hu X, Lu T, Lu L, Li H (2023) Internimage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14408–14419

  38. Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, Ling H (2021) Detection and tracking meet drones challenge. IEEE Trans Pattern Anal Mach Intell 44(11):7380–7399

    Article  MATH  Google Scholar 

  39. Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3974–3983

  40. Cai Z, Vasconcelos N (2018) Cascade r-CNN: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162

  41. Wang A, Chen H, Liu L, Chen K, Lin Z, Han J, Ding G (2024) Yolov10: real-time end-to-end object detection. arXiv preprint arXiv:2405.14458

  42. Jocher G et al. Ultralytics YOLO11. https://github.com/ultralytics/ultralytics

  43. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

  44. Mao G, Deng T, Yu N (2022) Object detection in UAV images based on multi-scale split attention. Acta Aeronaut Astronaut Sin 43(12):326738

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 52275003, in part by the National Key Research and Development Program of China under Grant 2023YFB4704000.

Author information

Authors and Affiliations

Authors

Contributions

JP was involved in conceptualization, data curation, methodology, software, validation, visualization, writing—original draft, writing—review and editing. KL was involved in supervision, validation, resources, funding acquisition, writing—review and editing. GW was involved in visualization, software. WX was involved in supervision. TR was involved in visualization, software. LY was involved in supervision, resources, Funding acquisition, writing—review and editing.

Corresponding author

Correspondence to Liang Yuan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, J., Lv, K., Wang, G. et al. MLSA-YOLO: a multi-level feature fusion and scale-adaptive framework for small object detection. J Supercomput 81, 528 (2025). https://doi.org/10.1007/s11227-025-06961-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-06961-0

Keywords