Abstract
Infrared imaging technology relies on detecting the electromagnetic waves emitted by an object's spontaneous thermal radiation for imaging. It can overcome the adverse effects of complex lighting conditions on the detection of pedestrians and vehicles on the road. To address the issues of low accuracy and missed detection in visual detection under complex traffic conditions, such as during rain, snow, or at night, a pedestrian and vehicle detection model using infrared imaging has been proposed. This model improves the neck network and incorporates an attention mechanism. First, by adding a multi-scale feature fusion small-object detection layer to the model's neck, enhancing the capture of detailed information about small infrared objects and reducing missed detections. Second, a novel dual-layer routing attention mechanism is designed, allowing the model to focus on the most relevant feature areas and improving the detection accuracy of small infrared objects. Next, the CARAFE upsampling method is used for adaptive upsampling and context information fusion, which enhances the model's ability to reorganize features and capture details. Finally, a lightweight CSPPC module is constructed using partial convolutions to replace the C2f module in the neck network, which improves the model's frame rate. Experimental results show that, compared to the baseline model, BCC-YOLOv8n improves precision, recall, mAP@0.5, and mAP@0.5:0.95 by 1.4%, 4.8%, 5.3%, and 4.5%, respectively, while reducing the number of parameters by approximately 7%. Additionally, a frame rate of 70.8 FPS was achieved, satisfying the requirements for real-time detection.















Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
World Health Organization: Global Status Report on Road Safety 2018. World Health Organization, Geneva (2019)
Zhou, Z., Wang, Y., Liu, R., Wei, C., Du, H., Yin, C.: Short-term lateral behavior reasoning for target vehicles considering driver preview characteristic. IEEE Trans. Intell. Transport. Syst. 23(8), 11801–11810 (2022)
Premebida, C., Monteiro, G., Nunes, U., Peixoto, P.: A lidar and vision-based approach for pedestrian and vehicle detection and tracking. In: 2007 IEEE Intelligent Transportation Systems Conference, Bellevue, WA, USA, pp. 1044–1049 (2007)
Liu, Z., Zhu, Y., Wang, H.: Multi-target real-time detection based on convolutional neural network. Comput. Eng. Des. 40(4), 1085–1090 (2019)
Chen, J., Wang, Q., Cheng, H.H., Peng, W., Xu, W.: A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transport. Syst. 23(11), 19954–19979 (2022)
Bhadoriya, A.S., Vegamoor, V., Rathinam, S.: Vehicle detection and tracking using thermal cameras in adverse visibility conditions. Sensors 22, 4567 (2022)
Meng, S., Zhang, C., Shi, Q., Chen, Z., Hu, W., Lu, F.: A robust infrared small target detection method jointing multiple information and noise prediction: algorithm and benchmark. IEEE Trans. Geosci. Remote Sens. 61, 1–17 (2023)
Elhanashi, A., Saponara, S., Dini, P., et al.: An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring. J. Real-Time Image Proc. 20, 95 (2023)
Liu, X., Li, F., Liu, S.: Improved SSD infrared image pedestrian detection algorithm. Electro Opt. Control 20, 42–49 (2020)
Liu, W., et al.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing (2016)
Yan, P., Zhao, J., Hou, R., Duan, X., Cai, S., Wang, X.: Clustered remote sensing target distribution detection aided by density-based spatial analysis. Int. J. Appl. Earth Observ. Geoinf. 132, 104019 (2024)
Li, J., Ma, L., Zhu, J., Yue, Y., Zhao, D., Shan, W., Dong, X.: Research on infrared pedestrian and vehicle detection algorithm from the perspective of UAV—small target detection based on YOLOv8. In: Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering (CAICE ‘24). Association for Computing Machinery, New York, NY, USA, pp. 740–744 (2024)
Xue, T., Zhang, Z., Ma, W., Li, Y., Yang, A., Ji, T.: Nighttime pedestrian and vehicle detection based on a fast saliency and multifeature fusion algorithm for infrared images. IEEE Trans. Intell. Transp. Syst. 23(9), 16741–16751 (2022)
Chen, Y., Shin, H.: Pedestrian detection at night in infrared images using an attention-guided encoder–decoder convolutional neural network. Appl. Sci. 10, 809 (2020)
Wei, J., et al.: Infrared pedestrian detection using improved UNet and YOLO through sharing visible light domain information. Measurement 221, 113442 (2023)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer International Publishing (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Liu, Y., Su, H., Zeng, C., Li, X.: A robust thermal infrared vehicle and pedestrian detection method in complex scenes. Sensors 21, 1240 (2021)
Terven, J., Córdova-Esparza, D.-M., Romero-González, J.-A.: A comprehensive review of YOLO architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extract. 5, 1680–1716 (2023)
FLIR Conservator: Teledyne FLIR Free ADAS Thermal Dataset v2. This publicly available dataset, provided by Teledyne FLIR, contains thermal imaging data designed for ADAS (Advanced Driver Assistance Systems) applications. It includes a variety of scenes for detecting and classifying objects such as vehicles, pedestrians, and cyclists in diverse environmental conditions. https://adas-dataset-v2.flirconservator.com/#downloadguide
Xu, Z., et al.: SCUT FIR Pedestrian Dataset. GitHub Repository, 2019. https://gitcode.com/gh_mirrors/sc/SCUT_FIR_Pedestrian_Dataset. Accessed on Oct. 15, 2024
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y. M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)
Jocher, G. et al.: YOLOv5: an improved version of YOLOv4. GitHub Repository, 2020. https://github.com/ultralytics/yolov5. Accessed on Jul. 27, 2024
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 936–944 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8759–8768 (2018)
Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)
Khan, S.D., Alarabi, L., Basalamah, S.: A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab. J. Sci. Eng. 47, 9489–9504 (2022)
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.: BiFormer: vision transformer with bi-level routing attention. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 10323–10333 (2023)
Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C.C., Lin, D.: CARAFE: content-aware reassembly of features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 3007–3016 (2019)
Chen, J. et al.: Run, don’t walk: chasing higher FLOPS for faster neural networks. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 12021–12031 (2023)
Ouyang, D. et al.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, pp. 1–5 (2023)
Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 4784–4793 (2022)
Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X.: Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 12272–12281 (2020)
Wan, D., et al.: Mixed local channel attention for object detection. Eng. Appl. Artif. Intell. 123, 106442 (2023)
Acknowledgements
This work was supported by the Zhejiang Provincial Key Research and Development Program under Grant (2024C01071), the 2022 Zhejiang Provincial Department of Transportation Science and Technology Project under Grant (202206), and the Zhejiang University of Science and Technology 2023 Postgraduate Research Innovation Fund Projects under Grant (2023yjskc05).
Author information
Authors and Affiliations
Contributions
X.X. (Xinjian Xiang) is the corresponding author and supervised the project. X.X., G.Z. (Guolong Zhang), and L.H. (Li Huang) conceived the study and designed the experiments. Y.Z. (Yongping Zheng) and Z.X. (Zongyi Xie) conducted the experiments and collected the data. S.S. (Siqi Sun) and T.Y. (Tianshun Yuan) performed data analysis and interpretation. X.C. (Xizhao Chen) assisted with manuscript preparation. X.X. and G.Z. wrote the main manuscript text, and all authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiang, X., Zhang, G., Huang, L. et al. Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion. J Real-Time Image Proc 22, 31 (2025). https://doi.org/10.1007/s11554-024-01607-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01607-5