Abstract
In recent years, the technology for detecting small and dim infrared targets has played a crucial role in both military and civilian security fields. Deep learning-based methods have also achieved remarkable progress in this area. However, it is still restricted by challenges such as small target size, low signal-to-noise ratio, and complex backgrounds. Therefore, this paper proposes an improved model IRE-YOLO based on You Only Look Once (YOLO) to enhance the detection accuracy of small targets. To improve the model’s feature extraction ability for targets, a receptive field enhancement module based on dilated convolution and shared weights is proposed. By expanding the receptive field of the feature map, it can extract the detailed features and local information of multi-scale targets. Secondly, to address the difficulties of small target size and low image resolution, a space-to-depth convolution is added to the backbone network. By converting the spatial dimension into the depth dimension, it can effectively capture the context information of small targets. In addition, to enhance the accuracy of the model for real detection boxes, this paper proposes an SNS algorithm, which can effectively remove redundant detection boxes. IRE-YOLO is compared and evaluated with other models on two public datasets, IDTA and SIRST. The experimental results show that compared with the baseline YOLOv5s, the mean average precision (mAP) of IRE-YOLO has increased by 2% and 2.1%, respectively, significantly improving the detection accuracy of small and dim infrared targets.












Similar content being viewed by others
Data Availability
No datasets were generated or analysed during the current study.
Change history
27 March 2025
The original online version of this article was revised: " The Funding information section was missing from this article and should have read '1. National Natural Science Foundation of China (Grant No. 62261004) 2. Guangxi Distinguished Young Scholars Research Fund (Grant No. 2025GXNSFFA069014) 3. Guangxi Distinguished Young Scholars Research Fund (Grant No. 2023GXNSFFA026002)'".
01 April 2025
A Correction to this paper has been published: https://doi.org/10.1007/s11227-025-07223-9
References
Hou F, Zhang Y, Zhou Y, Zhang M, Lv B, Wu J (2022) Review on infrared imaging technology. Sustainability 14(18):11161
Sun C, Dai G, Wang M, Peng L, Chen X, Song Z (2024) High-resolution network for static infrared weak and small targets detection. Eng Appl Artif Intell 133:107924
Li B, Xiao C, Wang L, Wang Y, Lin Z, Li M, An W, Guo Y (2022) Dense nested attention network for infrared small target detection. IEEE Trans Image Process 32:1745–1758
Gao C, Meng D, Yang Y, Wang Y, Zhou X, Hauptmann AG (2013) Infrared patch-image model for small target detection in a single image. IEEE Trans Image Process 22(12):4996–5009
Han J, Ma Y, Zhou B, Fan F, Liang K, Fang Y (2014) A robust infrared small target detection algorithm based on human visual system. IEEE Geosci Remote Sens Lett 11(12):2168–2172
Ran Q, Wang Q, Zhao B, Wu Y, Pu S, Li Z (2021) Lightweight oriented object detection using multiscale context and enhanced channel attention in remote sensing images. IEEE J Select Topics Appl Earth Observ Remote Sens 14:5786–5795
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162
Girshick R (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788
Zhou J, Feng K, Li W, Han J, Pan F (2022) Ts4net: Two-stage sample selective strategy for rotating object detection. Neurocomputing 501:753–764
Zhang J, Wu X, Hoi SC, Zhu J (2020) Feature agglomeration networks for single stage face detection. Neurocomputing 380:180–189
Wang L, Hua S, Zhang C, Yang G, Ren J, Li J (2024) YOLOdrive: A Lightweight Autonomous Driving Single-Stage Target Detection Approach. IEEE Internet of Things Journal 11(22):36099-36113
Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, Fang J, Yifu Z, Wong C, Montes D, et al (2022) ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. https://doi.org/10.5281/zenodo.7347926
Wang X, Peng Z, Zhang P, He Y (2017) Infrared small target detection via nonnegativity-constrained variational mode decomposition. IEEE Geosci Remote Sens Lett 14(10):1700–1704
Gao C, Meng D, Yang Y, Wang Y, Zhou X, Hauptmann AG (2013) Infrared patch-image model for small target detection in a single image. IEEE Trans Image Process 22(12):4996–5009
Kim S, Lee J (2012) Scale invariant small target detection by optimizing signal-to-clutter ratio in heterogeneous background for infrared search and track. Pattern Recogn 45(1):393–406
Ma T, Cheng K, Chai T, Prasad S, Zhao D, Li J, Zhou H (2024) Mdcenet: Multi-dimensional cross-enhanced network for infrared small target detection. Infrared Phys Technol 141:105475
Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 839–846. IEEE
Cao Y, Liu R, Yang J (2008) Small target detection using two-dimensional least mean square (tdlms) filter based on neighborhood analysis. Int J Infrared Millimeter Waves 29:188–200
Comaniciu D (2003) An algorithm for data-driven bandwidth selection. IEEE Trans Pattern Anal Mach Intell 25(2):281–288
Bai X, Zhou F (2010) Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recogn 43(6):2145–2156
Gao C, Meng D, Yang Y, Wang Y, Zhou X, Hauptmann AG (2013) Infrared patch-image model for small target detection in a single image. IEEE Trans Image Process 22(12):4996–5009
Dai Y, Wu Y, Song Y, Guo J (2017) Non-negative infrared patch-image model: Robust target-background separation via partial sum minimization of singular values. Inf Phys Technol 81:182–194
Chen CP, Li H, Wei Y, Xia T, Tang YY (2013) A local contrast method for small infrared target detection. IEEE Trans Geosci Remote Sens 52(1):574–581
Han J, Ma Y, Zhou B, Fan F, Liang K, Fang Y (2014) A robust infrared small target detection algorithm based on human visual system. IEEE Geosci Remote Sens Lett 11(12):2168–2172
Li C, Zhou A, Yao A (2022) Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947
Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039
Wang J, Xu C, Yang W, Yu L (2021) A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125
Li H, Xiong P, An J, Wang L (2018) Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180
Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cyber 52(8):8574–8586
Yu Z, Huang H, Chen W, Su Y, Liu Y, Wang X. Yolo-facev2: A scale and occlusion aware face detector. arxiv 2022. arXiv preprint arXiv:2208.02019
Sunkara R, Luo T (2022) No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 443–459. Springer
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569
Gevorgyan Z (2022) Siou loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740
Hui B, Song Z, Fan H, Zhong P, Hu W, Zhang X, Ling J, Su H, Jin W, Zhang Y et al (2020) A dataset for infrared detection and tracking of dim-small aircraft targets under ground/air background. China Sci Data 5(3):291–302
Dai Y, Wu Y, Zhou F, Barnard K (2021) Asymmetric contextual modulation for infrared small target detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 950–959
Arthur D, Vassilvitskii S (2006) k-means++: The advantages of careful seeding. Technical report, Stanford
Fan X, Ding W, Qin W, Xiao D, Min L, Yuan H (2023) Fusing self-attention and coordconv to improve the yolov5s algorithm for infrared weak target detection. Sensors 23(15):6755
Funding
National Natural Science Foundation of China (Grant No. 62261004, Guangxi Distinguished Young Scholars Research Fund (Grant No. 2025GXNSFFA069014), Guangxi Distinguished Young Scholars Research Fund (Grant No. 2023GXNSFFA026002).
Author information
Authors and Affiliations
Contributions
Ma: Undertook the primary writing of the first draft of the dissertation. Organically integrated the research results from various sections and presented them in a clear and logically coherent manner. Carefully organized the structure of the paper, including various sections such as abstract, introduction, methods, results, discussion and conclusion, to ensure that the paper is complete and well-organized. Fan: Took the lead in planning the entire study, presenting the research questions and general ideas. Based on the current status and potential research gaps in the research area, research directions and objectives were defined, and a framework was set up for subsequent research work. Chen: After the completion of the first draft of the thesis, the language expression, logical clarity and formatting specifications of the thesis were meticulously checked and embellished. We improved the readability of the thesis in terms of grammar, vocabulary and sentence structure, and made many constructive suggestions for revision, which made the thesis more accurate and concise in expression. Li:Mainly responsible for the drawing of tables and pictures throughout the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: " The Funding information section was missing from this article and should have read '1. National Natural Science Foundation of China (Grant No. 62261004) 2. Guangxi Distinguished Young Scholars Research Fund (Grant No. 2025GXNSFFA069014) 3. Guangxi Distinguished Young Scholars Research Fund (Grant No. 2023GXNSFFA026002)'".
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, Q., Fan, X., Chen, H. et al. IRE-YOLO: Infrared weak target detection algorithm based on the fusion of multi-scale receptive fields and efficient convolution. J Supercomput 81, 558 (2025). https://doi.org/10.1007/s11227-025-07057-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-07057-5