Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion

Xiang, Xinjian; Zhang, Guolong; Huang, Li; Zheng, Yongping; Xie, Zongyi; Sun, Siqi; Yuan, Tianshun; Chen, Xizhao

doi:10.1007/s11554-024-01607-5

Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion

Research
Published: 24 December 2024

Volume 22, article number 31, (2025)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Xinjian Xiang¹,
Guolong Zhang¹,
Li Huang²,
Yongping Zheng¹,
Zongyi Xie¹,
Siqi Sun¹,
Tianshun Yuan¹ &
…
Xizhao Chen¹

340 Accesses
Explore all metrics

Abstract

Infrared imaging technology relies on detecting the electromagnetic waves emitted by an object's spontaneous thermal radiation for imaging. It can overcome the adverse effects of complex lighting conditions on the detection of pedestrians and vehicles on the road. To address the issues of low accuracy and missed detection in visual detection under complex traffic conditions, such as during rain, snow, or at night, a pedestrian and vehicle detection model using infrared imaging has been proposed. This model improves the neck network and incorporates an attention mechanism. First, by adding a multi-scale feature fusion small-object detection layer to the model's neck, enhancing the capture of detailed information about small infrared objects and reducing missed detections. Second, a novel dual-layer routing attention mechanism is designed, allowing the model to focus on the most relevant feature areas and improving the detection accuracy of small infrared objects. Next, the CARAFE upsampling method is used for adaptive upsampling and context information fusion, which enhances the model's ability to reorganize features and capture details. Finally, a lightweight CSPPC module is constructed using partial convolutions to replace the C2f module in the neck network, which improves the model's frame rate. Experimental results show that, compared to the baseline model, BCC-YOLOv8n improves precision, recall, mAP@0.5, and mAP@0.5:0.95 by 1.4%, 4.8%, 5.3%, and 4.5%, respectively, while reducing the number of parameters by approximately 7%. Additionally, a frame rate of 70.8 FPS was achieved, satisfying the requirements for real-time detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

Article 04 April 2022

RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image

Article 11 May 2024

Data availability

No datasets were generated or analysed during the current study.

References

World Health Organization: Global Status Report on Road Safety 2018. World Health Organization, Geneva (2019)
Google Scholar
Zhou, Z., Wang, Y., Liu, R., Wei, C., Du, H., Yin, C.: Short-term lateral behavior reasoning for target vehicles considering driver preview characteristic. IEEE Trans. Intell. Transport. Syst. 23(8), 11801–11810 (2022)
Article Google Scholar
Premebida, C., Monteiro, G., Nunes, U., Peixoto, P.: A lidar and vision-based approach for pedestrian and vehicle detection and tracking. In: 2007 IEEE Intelligent Transportation Systems Conference, Bellevue, WA, USA, pp. 1044–1049 (2007)
Liu, Z., Zhu, Y., Wang, H.: Multi-target real-time detection based on convolutional neural network. Comput. Eng. Des. 40(4), 1085–1090 (2019)
MATH Google Scholar
Chen, J., Wang, Q., Cheng, H.H., Peng, W., Xu, W.: A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transport. Syst. 23(11), 19954–19979 (2022)
Article Google Scholar
Bhadoriya, A.S., Vegamoor, V., Rathinam, S.: Vehicle detection and tracking using thermal cameras in adverse visibility conditions. Sensors 22, 4567 (2022)
Article MATH Google Scholar
Meng, S., Zhang, C., Shi, Q., Chen, Z., Hu, W., Lu, F.: A robust infrared small target detection method jointing multiple information and noise prediction: algorithm and benchmark. IEEE Trans. Geosci. Remote Sens. 61, 1–17 (2023)
Google Scholar
Elhanashi, A., Saponara, S., Dini, P., et al.: An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring. J. Real-Time Image Proc. 20, 95 (2023)
Article Google Scholar
Liu, X., Li, F., Liu, S.: Improved SSD infrared image pedestrian detection algorithm. Electro Opt. Control 20, 42–49 (2020)
MATH Google Scholar
Liu, W., et al.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing (2016)
Yan, P., Zhao, J., Hou, R., Duan, X., Cai, S., Wang, X.: Clustered remote sensing target distribution detection aided by density-based spatial analysis. Int. J. Appl. Earth Observ. Geoinf. 132, 104019 (2024)
Google Scholar
Li, J., Ma, L., Zhu, J., Yue, Y., Zhao, D., Shan, W., Dong, X.: Research on infrared pedestrian and vehicle detection algorithm from the perspective of UAV—small target detection based on YOLOv8. In: Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering (CAICE ‘24). Association for Computing Machinery, New York, NY, USA, pp. 740–744 (2024)
Xue, T., Zhang, Z., Ma, W., Li, Y., Yang, A., Ji, T.: Nighttime pedestrian and vehicle detection based on a fast saliency and multifeature fusion algorithm for infrared images. IEEE Trans. Intell. Transp. Syst. 23(9), 16741–16751 (2022)
Article Google Scholar
Chen, Y., Shin, H.: Pedestrian detection at night in infrared images using an attention-guided encoder–decoder convolutional neural network. Appl. Sci. 10, 809 (2020)
Article MATH Google Scholar
Wei, J., et al.: Infrared pedestrian detection using improved UNet and YOLO through sharing visible light domain information. Measurement 221, 113442 (2023)
Article MATH Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer International Publishing (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Liu, Y., Su, H., Zeng, C., Li, X.: A robust thermal infrared vehicle and pedestrian detection method in complex scenes. Sensors 21, 1240 (2021)
Article MATH Google Scholar
Terven, J., Córdova-Esparza, D.-M., Romero-González, J.-A.: A comprehensive review of YOLO architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extract. 5, 1680–1716 (2023)
Article Google Scholar
FLIR Conservator: Teledyne FLIR Free ADAS Thermal Dataset v2. This publicly available dataset, provided by Teledyne FLIR, contains thermal imaging data designed for ADAS (Advanced Driver Assistance Systems) applications. It includes a variety of scenes for detecting and classifying objects such as vehicles, pedestrians, and cyclists in diverse environmental conditions. https://adas-dataset-v2.flirconservator.com/#downloadguide
Xu, Z., et al.: SCUT FIR Pedestrian Dataset. GitHub Repository, 2019. https://gitcode.com/gh_mirrors/sc/SCUT_FIR_Pedestrian_Dataset. Accessed on Oct. 15, 2024
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y. M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)
Jocher, G. et al.: YOLOv5: an improved version of YOLOv4. GitHub Repository, 2020. https://github.com/ultralytics/yolov5. Accessed on Jul. 27, 2024
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 936–944 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8759–8768 (2018)
Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)
Google Scholar
Khan, S.D., Alarabi, L., Basalamah, S.: A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab. J. Sci. Eng. 47, 9489–9504 (2022)
Article MATH Google Scholar
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.: BiFormer: vision transformer with bi-level routing attention. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 10323–10333 (2023)
Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C.C., Lin, D.: CARAFE: content-aware reassembly of features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 3007–3016 (2019)
Chen, J. et al.: Run, don’t walk: chasing higher FLOPS for faster neural networks. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 12021–12031 (2023)
Ouyang, D. et al.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, pp. 1–5 (2023)
Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 4784–4793 (2022)
Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X.: Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 12272–12281 (2020)
Wan, D., et al.: Mixed local channel attention for object detection. Eng. Appl. Artif. Intell. 123, 106442 (2023)
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Zhejiang Provincial Key Research and Development Program under Grant (2024C01071), the 2022 Zhejiang Provincial Department of Transportation Science and Technology Project under Grant (202206), and the Zhejiang University of Science and Technology 2023 Postgraduate Research Innovation Fund Projects under Grant (2023yjskc05).

Author information

Authors and Affiliations

School of Automation and Electrical Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China
Xinjian Xiang, Guolong Zhang, Yongping Zheng, Zongyi Xie, Siqi Sun, Tianshun Yuan & Xizhao Chen
Zhejiang Safun Industrial Co., Ltd., Jinhua, 321300, China
Li Huang

Authors

Xinjian Xiang
View author publications
You can also search for this author inPubMed Google Scholar
Guolong Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Li Huang
View author publications
You can also search for this author inPubMed Google Scholar
Yongping Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Zongyi Xie
View author publications
You can also search for this author inPubMed Google Scholar
Siqi Sun
View author publications
You can also search for this author inPubMed Google Scholar
Tianshun Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Xizhao Chen
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

X.X. (Xinjian Xiang) is the corresponding author and supervised the project. X.X., G.Z. (Guolong Zhang), and L.H. (Li Huang) conceived the study and designed the experiments. Y.Z. (Yongping Zheng) and Z.X. (Zongyi Xie) conducted the experiments and collected the data. S.S. (Siqi Sun) and T.Y. (Tianshun Yuan) performed data analysis and interpretation. X.C. (Xizhao Chen) assisted with manuscript preparation. X.X. and G.Z. wrote the main manuscript text, and all authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Xinjian Xiang.

Ethics declarations

Conflict of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiang, X., Zhang, G., Huang, L. et al. Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion. J Real-Time Image Proc 22, 31 (2025). https://doi.org/10.1007/s11554-024-01607-5

Download citation

Received: 28 September 2024
Accepted: 09 December 2024
Published: 24 December 2024
DOI: https://doi.org/10.1007/s11554-024-01607-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now