MLSA-YOLO: a multi-level feature fusion and scale-adaptive framework for small object detection

Peng, Jiayu; Lv, Kai; Wang, Guoliang; Xiao, Wendong; Ran, Teng; Yuan, Liang

doi:10.1007/s11227-025-06961-0

MLSA-YOLO: a multi-level feature fusion and scale-adaptive framework for small object detection

Published: 21 February 2025

Volume 81, article number 528, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jiayu Peng¹,
Kai Lv²,
Guoliang Wang²,
Wendong Xiao²,
Teng Ran² &
…
Liang Yuan²

334 Accesses
Explore all metrics

Abstract

Due to the limited target area occupied by small objects, certain feature extraction paradigms that are not well-suited for small objects can further exacerbate the loss of their already limited information. Additionally, inconsistencies between features at different levels in FPN can result in suboptimal feature fusion, hindering the accurate representation of multi-scale features. As a result, even high-performance detectors struggle to recognize small objects effectively. To resolve the above issues, we propose MLSA-YOLO, a small object detection algorithm based on multi-level feature fusion and scale-adaptive. Initially, we restructured the network architecture using SPD-Conv with the proposed Convolutional Space-to-Depth (CSPD) module to improve the network’s capacity for capturing local spatial details in images and to ensure that information is preserved during the downsampling process. Furthermore, to address the challenges in feature fusion, we employed a three-layer PAFPN structure at the neck and combined it with the proposed multi-level Feature Fusion and Scale-Adaptive (MLSA) feature pyramid network. This method enhances the complementarity of multi-level information, while effectively filtering the conflicting information generated during the fusion phase. To improve the quality of feature extraction, we incorporated the designed DCN_C2f module into the neck network. This module can accurately capture foreground object features, while enhancing the network’s adaptability to geometric deformations of objects. Experimental results show that our approach performs better than other state-of-the-art detection algorithms on the VisDrone2019, DOTA, and FocusTiny datasets. Compared to YOLOv8s, mAP50 improved by 9.5%, 3.4%, and 5.1%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Fusion Detection Network for Multi-scale Object Detection

Hierarchical Focused Feature Pyramid Network for Small Object Detection

Real-time detector design for small targets based on bi-channel feature fusion mechanism

Article 22 June 2021

Data availability

The VisDrone2019 [38] and DOTA [39] datasets used in this study are publicly available. The authors do not have permission to share the FocusTiny dataset used.

Code availability

The code of the study are available from the corresponding author upon reasonable request.

References

Feng Q, Xu X, Wang Z (2023) Deep learning-based small object detection: a survey. Math Biosci Eng 20(4):6551–6590
Article MATH Google Scholar
Cheng G, Yuan X, Yao X, Yan K, Zeng Q, Xie X, Han J (2023) Towards large-scale small object detection: survey and benchmarks. IEEE Trans Pattern Anal Mach Intell 45:13467–13488
Google Scholar
Rekavandi AM, Rashidi S, Boussaid F, Hoefs S, Akbas E et al (2023) Transformers in small object detection: a benchmark and survey of state-of-the-art. arXiv preprint arXiv:2309.04902
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Girshick R (2015) Fast r-CNN. arXiv preprint arXiv:1504.08083
Ren S (2015) Faster r-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497
Redmon J (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp 7263–7271
Redmon J (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, Berlin, pp 21–37
Lin T (2017) Focal loss for dense object detection. arXiv preprint arXiv:1708.02002
Wang M, Yang W, Wang L, Chen D, Wei F, KeZiErBieKe H, Liao Y (2023) Fe-yolov5: feature enhancement network based on yolov5 for small object detection. J Vis Commun Image Represent 90:103752
Article MATH Google Scholar
Xue C, Xia Y, Wu M, Chen Z, Cheng F, Yun L (2024) EL-YOLO: an efficient and lightweight low-altitude aerial objects detector for onboard applications. Expert Syst Appl 256:124848
Article Google Scholar
Ghiasi G, Lin T-Y, Le QV (2019) NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045
Fu Y, Ran T, Xiao W, Yuan L, Zhao J, He L, Mei J (2024) GD-YOLO: an improved convolutional neural network architecture for real-time detection of smoking and phone use behaviors. Digit Signal Process 151:104554
Article Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10781–10790
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516
Yang G, Lei J, Zhu Z, Cheng S, Feng Z, Liang R (2023) AFPN: asymptotic feature pyramid network for object detection. In: 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, pp 2184–2189
Pang Y, Zhao X, Zhang L, Lu H (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9413–9422
Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2778–2788
Vaswani A (2017) Attention is all you need. Adv neural inf process syst
Jocher G, Stoken A, Chaurasia A et al {2020) Ultralytics YOLOv5. https://doi.org/10.5281/zenodo.3908559. https://github.com/ultralytics/yolov5
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Yang C, Huang Z, Wang N (2022) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13668–13677
Zhang Z (2023) Drone-YOLO: an efficient neural network method for target detection in drone images. Drones 7(8):526
Article MATH Google Scholar
Jocher G, Chaurasia A, Qiu J, Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics
Li Y, Fan Q, Huang H, Han Z, Gu Q (2023) A modified yolov8 detection network for UAV aerial image recognition. Drones 7(5):304
Article MATH Google Scholar
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1580–1589
Wang Y, Zou H, Yin M, Zhang X (2023) SMFF-YOLO: a scale-adaptive yolo algorithm with multi-level feature fusion for object detection in UAV scenes. Remote Sens 15(18):4580
Article MATH Google Scholar
Shi Y, Jia Y, Zhang X (2024) FocusDet: an efficient object detector for small object. Sci Rep 14(1):10697
Article MATH Google Scholar
Sunkara R, Luo T (2022) No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, pp 443–459
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Peng Y, Sonka M, Chen DZ (2023) U-net v2: rethinking the skip connections of u-net for medical image segmentation. arXiv preprint arXiv:2311.17791
Wang W, Dai J, Chen Z, Huang Z, Li Z, Zhu X, Hu X, Lu T, Lu L, Li H (2023) Internimage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14408–14419
Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, Ling H (2021) Detection and tracking meet drones challenge. IEEE Trans Pattern Anal Mach Intell 44(11):7380–7399
Article MATH Google Scholar
Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3974–3983
Cai Z, Vasconcelos N (2018) Cascade r-CNN: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162
Wang A, Chen H, Liu L, Chen K, Lin Z, Han J, Ding G (2024) Yolov10: real-time end-to-end object detection. arXiv preprint arXiv:2405.14458
Jocher G et al. Ultralytics YOLO11. https://github.com/ultralytics/ultralytics
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850
Mao G, Deng T, Yu N (2022) Object detection in UAV images based on multi-scale split attention. Acta Aeronaut Astronaut Sin 43(12):326738
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 52275003, in part by the National Key Research and Development Program of China under Grant 2023YFB4704000.

Author information

Authors and Affiliations

School of Software, Xinjiang University, 830091, Urumqi, China
Jiayu Peng
School of Mechanical Engineering, Xinjiang University, 830017, Urumqi, China
Kai Lv, Guoliang Wang, Wendong Xiao, Teng Ran & Liang Yuan

Authors

Jiayu Peng
View author publications
You can also search for this author inPubMed Google Scholar
Kai Lv
View author publications
You can also search for this author inPubMed Google Scholar
Guoliang Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wendong Xiao
View author publications
You can also search for this author inPubMed Google Scholar
Teng Ran
View author publications
You can also search for this author inPubMed Google Scholar
Liang Yuan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

JP was involved in conceptualization, data curation, methodology, software, validation, visualization, writing—original draft, writing—review and editing. KL was involved in supervision, validation, resources, funding acquisition, writing—review and editing. GW was involved in visualization, software. WX was involved in supervision. TR was involved in visualization, software. LY was involved in supervision, resources, Funding acquisition, writing—review and editing.

Corresponding author

Correspondence to Liang Yuan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Peng, J., Lv, K., Wang, G. et al. MLSA-YOLO: a multi-level feature fusion and scale-adaptive framework for small object detection. J Supercomput 81, 528 (2025). https://doi.org/10.1007/s11227-025-06961-0

Download citation

Accepted: 16 January 2025
Published: 21 February 2025
DOI: https://doi.org/10.1007/s11227-025-06961-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MLSA-YOLO: a multi-level feature fusion and scale-adaptive framework for small object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Feature Fusion Detection Network for Multi-scale Object Detection

Hierarchical Focused Feature Pyramid Network for Small Object Detection

Real-time detector design for small targets based on bi-channel feature fusion mechanism

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now