Infrared Image Object Detection of Vehicle and Person Based on Improved YOLOv5

Wang, Jintao; Song, Qingzeng; Hou, Maorui; Jin, Guanghao

doi:10.1007/978-981-99-1354-1_16

Jintao Wang⁷,
Qingzeng Song⁸,
Maorui Hou⁸ &
…
Guanghao Jin⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1784))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

423 Accesses

Abstract

Existing object detection algorithms are difficult to perform object detection tasks on embedded devices under the limitations of energy efficiency ratio and power consumption due to complex network structure and huge computational and parametric quantities. The object detection task in infrared images has low recognition rate and high false alarm rate due to long distance, weak energy and low resolution. In order to achieve the detection task at the mobile edge of infrared vehicle pedestrian target detection, this paper puts the YOLOv5 algorithm into a series of optimizations and proposes a lightweight YOLO-mini network structure. That is, instead of CSPDarknet, the MobileNetV2 network structure is used as the backbone feature extraction network with the addition of coordinate attention mechanism. Also, to make the network model more lightweight, the weights are converted to int8 type by quantized sensing training, which enables the task of the object detection algorithm for infrared vehicle pedestrian dataset on embedded devices. Experiments testing the FLIR dataset on NVIDIA Xavier NX show that this algorithm greatly reduces the number of network model parameters with less loss of accuracy and improves the FPS. mAP of YOLO-MobileNetV2 reaches 86.75%, number of parameters 2.76M, and FPS of 45; The network structure of YOLO-mini achieves 84.63% mAP, 0.69M number of parameters, and 63 FPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and PatternRecognitio, pp. 770–778 (2016)
Google Scholar
Zhu, H., Qin, L., Sun, B.: Review on parallelization of deep neural networks. J. Chin. J. Computer. 41(8), 171–191 (2018). https://doi.org/10.11897/SP.J.1016.2018.01861
Krizhevsky, I.S., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Advance Neural Information and Processing Systems, pp. 1097–1105 (2021)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of International Conference on Machine Learning, pp. 807–814 (2010)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing coadaptation of feature detectors. arXiv:1207.0580 (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. IEEE Computer Society (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K.,Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 2015 Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J.,Divvala, S., Girshick, R., et al.: You Only look once: unified, real time object detection. In: Computer Vision and Pattern Recognition, pp. 6517–6525 (2017)
Google Scholar
Redmon, J., Farhadi, A.: YOLO 9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. In: IEEE Conference on Computer Vision and Pattern Recognition.arXiv:1804.0276 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. In: IEEE Conference on Computer Vision and Pattern Recognition. arXiv:2004.10934v1 (2020)
Jocher, G., Stoken, A., Borovec, J.: Ultralytics/yolov5: V4. 0-Nn. SiLU () activations weights & biases logging PyTorch hub integration. Zenodo, Techical report. https://zenodo.org/record/4418161. https://doi.org/10.5281/zenodo.4418161(2021)
Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Fang, L., Wang, X., Wan, Y.: Adaptable active contour model with applications to infrared ship target segmentation. J. Elect. Imaging 25(4), 1–10 (2016). https://doi.org/10.1117/1.JEI.25.4.041010
Article Google Scholar
Zhao, K., Kong, X.: Background noise suppression in small targets infrared images and its method discussion. Opt. Optoelectron. Technol. 2, 9–12 (2004)
Google Scholar
Anju, T.S., Raj, N.R.N.: Shearlet transform based image denoisingusing histogram thresholding. In: Proceedings of International Conference on Communication System Network (ComNet), July 2016, pp. 162–166 (2016)
Google Scholar
Jiao, P.: Research on image classification and retrieval method based on deep learning and sparse representation. M.S. thesis, Xi’an University Technology, Xi’an, China (2019)
Google Scholar
Choi, Y., et al.: KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19(3), 934–948 (2018). https://doi.org/10.1109/TITS.2018.2791533
Article Google Scholar
FREE FLIR: Thermal Dataset for Algorithm Training. https://www.flir.in/oem/adas/adas-dataset-form
Ariffin, S.M., Jamil, N., Rahman, P.N.: DIAST variability illuminated thermal and visible ear images datasets. In: Proceedings of Signal Processing, Algorithms, Architecture, Arrangements, Application (SPA), September 2016, pp. 191–195 (2016)
Google Scholar
Li, M., Zhang, T., Cui, W.: Research of infrared small pedestrian target detection based on YOLOv3. Infr. Technoiogy 42(2), 176–181 (2020)
Article Google Scholar
Li, Y., Li, S., Du, H., Chen, L., Zhang, D., Li, Y.: YOLO-ACN: focusing on small target and occluded object detection. IEEE Access 8, 227288–227303 (2020)
Article Google Scholar
Cao, Y., Zhou, T., Zhu, X.,Su, Y.: Every feature counts: an improved one-stage detector in thermal imagery. In: Proceedings of IEEE 5th International Conference Computer Communication, (ICCC), December 2019, pp. 1965–1969 (2019)
Google Scholar
Song, X., Gao, S., Chen, C.: A multispectral feature fusion net-work for robust pedestrian detection. Alexandria Eng. J. 60(1), 73–85 (2021). https://www.sciencedirect.com/science/article/pii/S1110016820302507
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13708–13717 (2021).https://doi.org/10.1109/CVPR46437.2021.01350
Jacob, B., Kligys,S., Chen, B., et al.: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (2017)
Google Scholar
Devaguptapu, C., Akolekar, N., Sharma, M.M., Balasubramanian, V.N.: Borrow from anywhere: pseudo multi-modal object detection in thermal imagery. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–10 (2019)
Google Scholar
Cao, Y., Zhou, T., Zhu, X., Su, Y.: Every featurecounts: an improved one-stage detector in thermal imagery. In: Proceedings IEEE 5th International Conference on Computer Communication (ICCC), pp. 1965–1969 (2019)
Google Scholar
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of IEEE/CVF Conference on Computer Vision Pattern Recognition, pp. 4203–421 (2018)
Google Scholar
Li, S., Li, Y., Li, Y., Li, M., Xu, X.: YOLO-FIRI: Improved YOLOv5 for infrared image object detection. IEEE Access. 9, 141861–141875 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tiangong University, Tianjin, 300387, China
Jintao Wang
School of Computer Science and Technology, Tiangong University, Tianjin, 300387, China
Qingzeng Song & Maorui Hou
Beijing Polytechnic, Beijing, 100176, China
Guanghao Jin

Authors

Jintao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qingzeng Song
View author publications
You can also search for this author in PubMed Google Scholar
Maorui Hou
View author publications
You can also search for this author in PubMed Google Scholar
Guanghao Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghao Jin .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Shiyu Yang
Griffith University, Gold Coast, QLD, Australia
Saiful Islam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Song, Q., Hou, M., Jin, G. (2023). Infrared Image Object Detection of Vehicle and Person Based on Improved YOLOv5. In: Yang, S., Islam, S. (eds) Web and Big Data. APWeb-WAIM 2022 International Workshops. APWeb-WAIM 2022. Communications in Computer and Information Science, vol 1784. Springer, Singapore. https://doi.org/10.1007/978-981-99-1354-1_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-1354-1_16
Published: 30 March 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1353-4
Online ISBN: 978-981-99-1354-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics