Skip to main content

Advertisement

Log in

Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Infrared imaging technology relies on detecting the electromagnetic waves emitted by an object's spontaneous thermal radiation for imaging. It can overcome the adverse effects of complex lighting conditions on the detection of pedestrians and vehicles on the road. To address the issues of low accuracy and missed detection in visual detection under complex traffic conditions, such as during rain, snow, or at night, a pedestrian and vehicle detection model using infrared imaging has been proposed. This model improves the neck network and incorporates an attention mechanism. First, by adding a multi-scale feature fusion small-object detection layer to the model's neck, enhancing the capture of detailed information about small infrared objects and reducing missed detections. Second, a novel dual-layer routing attention mechanism is designed, allowing the model to focus on the most relevant feature areas and improving the detection accuracy of small infrared objects. Next, the CARAFE upsampling method is used for adaptive upsampling and context information fusion, which enhances the model's ability to reorganize features and capture details. Finally, a lightweight CSPPC module is constructed using partial convolutions to replace the C2f module in the neck network, which improves the model's frame rate. Experimental results show that, compared to the baseline model, BCC-YOLOv8n improves precision, recall, mAP@0.5, and mAP@0.5:0.95 by 1.4%, 4.8%, 5.3%, and 4.5%, respectively, while reducing the number of parameters by approximately 7%. Additionally, a frame rate of 70.8 FPS was achieved, satisfying the requirements for real-time detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig.14
Fig. 15

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. World Health Organization: Global Status Report on Road Safety 2018. World Health Organization, Geneva (2019)

    Google Scholar 

  2. Zhou, Z., Wang, Y., Liu, R., Wei, C., Du, H., Yin, C.: Short-term lateral behavior reasoning for target vehicles considering driver preview characteristic. IEEE Trans. Intell. Transport. Syst. 23(8), 11801–11810 (2022)

    Article  Google Scholar 

  3. Premebida, C., Monteiro, G., Nunes, U., Peixoto, P.: A lidar and vision-based approach for pedestrian and vehicle detection and tracking. In: 2007 IEEE Intelligent Transportation Systems Conference, Bellevue, WA, USA, pp. 1044–1049 (2007)

  4. Liu, Z., Zhu, Y., Wang, H.: Multi-target real-time detection based on convolutional neural network. Comput. Eng. Des. 40(4), 1085–1090 (2019)

    MATH  Google Scholar 

  5. Chen, J., Wang, Q., Cheng, H.H., Peng, W., Xu, W.: A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transport. Syst. 23(11), 19954–19979 (2022)

    Article  Google Scholar 

  6. Bhadoriya, A.S., Vegamoor, V., Rathinam, S.: Vehicle detection and tracking using thermal cameras in adverse visibility conditions. Sensors 22, 4567 (2022)

    Article  MATH  Google Scholar 

  7. Meng, S., Zhang, C., Shi, Q., Chen, Z., Hu, W., Lu, F.: A robust infrared small target detection method jointing multiple information and noise prediction: algorithm and benchmark. IEEE Trans. Geosci. Remote Sens. 61, 1–17 (2023)

    Google Scholar 

  8. Elhanashi, A., Saponara, S., Dini, P., et al.: An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring. J. Real-Time Image Proc. 20, 95 (2023)

    Article  Google Scholar 

  9. Liu, X., Li, F., Liu, S.: Improved SSD infrared image pedestrian detection algorithm. Electro Opt. Control 20, 42–49 (2020)

    MATH  Google Scholar 

  10. Liu, W., et al.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing (2016)

  11. Yan, P., Zhao, J., Hou, R., Duan, X., Cai, S., Wang, X.: Clustered remote sensing target distribution detection aided by density-based spatial analysis. Int. J. Appl. Earth Observ. Geoinf. 132, 104019 (2024)

    Google Scholar 

  12. Li, J., Ma, L., Zhu, J., Yue, Y., Zhao, D., Shan, W., Dong, X.: Research on infrared pedestrian and vehicle detection algorithm from the perspective of UAV—small target detection based on YOLOv8. In: Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering (CAICE ‘24). Association for Computing Machinery, New York, NY, USA, pp. 740–744 (2024)

  13. Xue, T., Zhang, Z., Ma, W., Li, Y., Yang, A., Ji, T.: Nighttime pedestrian and vehicle detection based on a fast saliency and multifeature fusion algorithm for infrared images. IEEE Trans. Intell. Transp. Syst. 23(9), 16741–16751 (2022)

    Article  Google Scholar 

  14. Chen, Y., Shin, H.: Pedestrian detection at night in infrared images using an attention-guided encoder–decoder convolutional neural network. Appl. Sci. 10, 809 (2020)

    Article  MATH  Google Scholar 

  15. Wei, J., et al.: Infrared pedestrian detection using improved UNet and YOLO through sharing visible light domain information. Measurement 221, 113442 (2023)

    Article  MATH  Google Scholar 

  16. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer International Publishing (2015)

  17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  18. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)

  19. Liu, Y., Su, H., Zeng, C., Li, X.: A robust thermal infrared vehicle and pedestrian detection method in complex scenes. Sensors 21, 1240 (2021)

    Article  MATH  Google Scholar 

  20. Terven, J., Córdova-Esparza, D.-M., Romero-González, J.-A.: A comprehensive review of YOLO architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extract. 5, 1680–1716 (2023)

    Article  Google Scholar 

  21. FLIR Conservator: Teledyne FLIR Free ADAS Thermal Dataset v2. This publicly available dataset, provided by Teledyne FLIR, contains thermal imaging data designed for ADAS (Advanced Driver Assistance Systems) applications. It includes a variety of scenes for detecting and classifying objects such as vehicles, pedestrians, and cyclists in diverse environmental conditions. https://adas-dataset-v2.flirconservator.com/#downloadguide

  22. Xu, Z., et al.: SCUT FIR Pedestrian Dataset. GitHub Repository, 2019. https://gitcode.com/gh_mirrors/sc/SCUT_FIR_Pedestrian_Dataset. Accessed on Oct. 15, 2024

  23. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y. M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)

  24. Jocher, G. et al.: YOLOv5: an improved version of YOLOv4. GitHub Repository, 2020. https://github.com/ultralytics/yolov5. Accessed on Jul. 27, 2024

  25. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 936–944 (2017)

  26. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8759–8768 (2018)

  27. Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)

    Google Scholar 

  28. Khan, S.D., Alarabi, L., Basalamah, S.: A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab. J. Sci. Eng. 47, 9489–9504 (2022)

    Article  MATH  Google Scholar 

  29. Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.: BiFormer: vision transformer with bi-level routing attention. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 10323–10333 (2023)

  30. Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C.C., Lin, D.: CARAFE: content-aware reassembly of features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 3007–3016 (2019)

  31. Chen, J. et al.: Run, don’t walk: chasing higher FLOPS for faster neural networks. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 12021–12031 (2023)

  32. Ouyang, D. et al.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, pp. 1–5 (2023)

  33. Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 4784–4793 (2022)

  34. Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X.: Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 12272–12281 (2020)

  35. Wan, D., et al.: Mixed local channel attention for object detection. Eng. Appl. Artif. Intell. 123, 106442 (2023)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Zhejiang Provincial Key Research and Development Program under Grant (2024C01071), the 2022 Zhejiang Provincial Department of Transportation Science and Technology Project under Grant (202206), and the Zhejiang University of Science and Technology 2023 Postgraduate Research Innovation Fund Projects under Grant (2023yjskc05).

Author information

Authors and Affiliations

Authors

Contributions

X.X. (Xinjian Xiang) is the corresponding author and supervised the project. X.X., G.Z. (Guolong Zhang), and L.H. (Li Huang) conceived the study and designed the experiments. Y.Z. (Yongping Zheng) and Z.X. (Zongyi Xie) conducted the experiments and collected the data. S.S. (Siqi Sun) and T.Y. (Tianshun Yuan) performed data analysis and interpretation. X.C. (Xizhao Chen) assisted with manuscript preparation. X.X. and G.Z. wrote the main manuscript text, and all authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Xinjian Xiang.

Ethics declarations

Conflict of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiang, X., Zhang, G., Huang, L. et al. Research on infrared small target pedestrian and vehicle detection algorithm based on multi-scale feature fusion. J Real-Time Image Proc 22, 31 (2025). https://doi.org/10.1007/s11554-024-01607-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01607-5

Keywords