Abstract
Aiming at the lightweight and real-time detection requirements of infrared pedestrians on edge devices, this paper proposes an improved infrared pedestrian recognition model based on You Only Look Once v5 nano. The model effectively reduces the number of parameters by introducing the Ghost Convolution and SlimNeck modules in the lightweight design. To enhance real-time performance, the model reduces the number of parameters and improves operation speed through the Performance-aware Approximation of Global Channel Pruning pruning strategy. After experimental analysis, a balance between operation speed and accuracy is achieved. The results show that compared to the original model, the proposed model decreases the number of parameters by 1.52 M, increases running speed by 21%, and decreases mAP@0.50 by only 0.4%. The running speed on Jetson Nano reaches 0.12 s per image. The proposed model effectively ensures high detection accuracy under lightweight and real-time requirements, providing technical support for the deployment of edge devices in scenarios such as autonomous driving.








Similar content being viewed by others
Data Availability
The dataset used for the experiments is available in the original paper in reference [8].
References
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (2007). https://doi.org/10.1145/1283383.1283494
Bin, Z., Chunping, W., Qiang, F., Yichao, C.: Multi-scale infrared pedestrian detection based on deep attention mechanism. Acta Opt. Sin. 40(05), 47–58 (2020). https://doi.org/10.3788/AOS202040.0504001
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: open mmlab detection toolbox and benchmark. CoRR arXiv:abs/1906.07155 (2019). https://doi.org/10.48550/arXiv.1906.07155
Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020). https://doi.org/10.1109/CVPR42600.2020.00165
Hao, S., Gao, S., Ma, X., An, B., He, T.: Anchor-free infrared pedestrian detection based on cross-scale feature fusion and hierarchical attention mechanism. Infrared Phys. Technol. 131, 104660 (2023). https://doi.org/10.1016/j.infrared.2023.104660
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Jia, X., Zhu, C., Li, M., Tang, W., Zhou, W.: Llvip: a visible-infrared paired dataset for low-light vision. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 3489–3497 (2021). https://doi.org/10.1109/ICCVW54120.2021.00389
Li, H., Li, J., Wei, H., Liu, Z., Zhan, Z., Ren, Q.: Slim neck by gsconv: a lightweight-design for real-time detector architectures. J. Real-Time Image Proc. 21(3), 62 (2024). https://doi.org/10.1007/s11554-024-01436-6
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp. 21–37 (2016)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation (2018). https://doi.org/10.48550/arXiv.1803.01534
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/cvpr.2016.91
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74
Tian, Z., Chu, X., Wang, X., Wei, X., Shen, C.: Fully convolutional one-stage 3d object detection on lidar range images. Adv Neural Inf Process Syst 35, 34899–34911 (2022). https://doi.org/10.48550/arXiv.2205.13764
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464–7475 (2023). https://doi.org/10.48550/arXiv.2207.02696
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., Ding, G.: Yolov10: real-time end-to-end object detection (2024). https://doi.org/10.48550/arXiv.2405.14458
Wang, C.Y., Yeh, I.H., Liao, H.Y.M.: Yolov9: learning what you want to learn using programmable gradient information (2024). https://doi.org/10.48550/arXiv.2402.13616
Ye, H., Zhang, B., Chen, T., Fan, J., Wang, B.: Performance-aware approximation of global channel pruning for multitask cnns (2023). https://doi.org/10.48550/arXiv.2303.11923
Yinhui, Z., Kai, J., Zifen, H., Guangchen, C.: Attention guided multi-scale infrared real-time detection of pedestrian and vehicle. Infrared Laser Eng. 53(05), 237–247 (2024)
Zhuang, M., Yong, Z., Ruimin, C., Weihua, L.: Method for fast detection of infrared targets based on key points. Acta Opt. Sin. 40(23), 136–144 (2020). https://doi.org/10.3788/A0S202040.2312006
Zifen, H., Guangchen, C., Junsong, C., Yinhui, Z.: Multiscale feature fusion lightweight real-time infrared pedestrian detection at night. Chin. J. Lasers 49(17), 130–139 (2022). https://doi.org/10.3788/CL202249.1709002
Acknowledgements
This work is supported by the Hebei Province Graduate Student Innovation Ability Training Funding Project (Grant: CXZZSS2024163) and the Key Research and Development Projects in Hebei Province (Grant: 20310103D). The authors would like to thank Ms. Zhang for her help in borrowing the lab equipment.
Author information
Authors and Affiliations
Contributions
Li Liu:Writing - Review & Editing, Conceptualization, Project Administration, Funding Acquisition, Resources, Supervision Kaiye Huang: Writing - Original Draft, Writing - Review & Editing, Methodology, Conceptualization Yujian Li: Software, Validation, Visualization, Data Curation, Methodology, Original Draft Chuxia Zhang: Data Curation, Writing - Review & Editing Shuo Zhang: Data Curation, Writing - Review & Editing Zhengyu Hu: Software.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, L., Huang, K., Li, Y. et al. Real-time pedestrian recognition model on edge device using infrared vision system. J Real-Time Image Proc 22, 27 (2025). https://doi.org/10.1007/s11554-024-01608-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01608-4