Skip to main content
Log in

M-YOLO: an object detector based on global context information for infrared images

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Object detection is an important task in computer vision. While visible (VS) images are adequate for detecting objects in most scenarios, infrared (IR) images can extend the capabilities of object detection to night-time or occluded objects. For IR images, we proposes an infrared object detector based on global context information. Combined with the lightweight network (MobileNetV2) to extract features, therefore the detector is named M-YOLO. Then, dedicated to enhancing the global information perception capability of the model, this paper proposes a global contextual information aggregation model. To preserve multi-scale information and enhance expressiveness of features, a top-down and bottom-up parallel feature fusion method is proposed. Only two detection heads are used to implement a lightweight model, which improves detection accuracy and speed. We use the self-built IR dataset (GIR) and the public IR dataset (FLIR) to verify the superiority of the model. Compared with YOLOv4 (78.1%), the average accuracy of M-YOLO (83.4%) is improved by 5.3% on the FLIR dataset. The detection time (4.33 ms) is less, with a detection speed of 30.6 FPS. On the GIR dataset, the detection accuracy (76.1%) is 6.4% higher than that of YOLOv4 (69.7%), and the detection time (6.84 ms) is lower. Our method improves the performance of IR object detection. The method is able to detect IR ground targets in complex environments, and the detection speed meets the real-time requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Zhang, L.: Synthetic data generation for end-to-end thermal infrared tracking. IEEE Trans. Ima. Process. 28(4), 1837–1850 (2018)

    Article  MathSciNet  Google Scholar 

  2. Zou, Z., Shi, Z., Guo, Y.: Object detection in 20 years: a survey. arXiv Prepr. arXiv1905.05055 (2019)

  3. Zhao, F., Wei, R., Chao, Y.: Infrared bird target detection based on temporal variation filtering and a gaussian heat-map perception network. Appl. Sci. 12(11), 5679 (2022)

    Article  Google Scholar 

  4. Li, Y., Li, Z., Zhang, C.: Infrared maritime dim small target detection based on spatiotemporal cues and directional morphological filtering. Infrared Phys. Technol. 115, 103657 (2021)

    Article  Google Scholar 

  5. Lu, Y., Dong, L., Zhang, T.: A robust detection algorithm for infrared maritime small and dim targets. Sensors. 20(4), 1237 (2020)

    Article  Google Scholar 

  6. Wang, B., Motai, Y., Dong, L.: Detecting infrared maritime targets overwhelmed in sun glitters by antijitter spatiotemporal saliency. IEEE TGRS. 57(7), 5159–5173 (2019)

    Google Scholar 

  7. Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015)

  8. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards realtime object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–95 (2015)

    Google Scholar 

  9. Liu, W., Anguelov, D., Erhan, D.: SSD: single shot multibox detector. Eur. Conf. Comput. Vis. pp. 21–37 (2015)

  10. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  11. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)

  12. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv Prepr. arXiv.1804.02767 (2018)

  13. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv Prepr. arXiv2004.10934. (2020)

  14. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: 2018 European Conference on Computer Vision (ECCV), pp 734–750 (2018)

  15. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv Prepr. arXiv.1904.07850 (2019)

  16. Tian, Z., Shen, C., Chen, H.: Fcos: fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)

  17. Huang, S., He, Y., Chen, X.: M-YOLO: a nighttime vehicle detection method combining mobilenet v2 and YOLO v3. J. Phys Conf. Ser. 1883(1), 012094 (2021)

    Article  Google Scholar 

  18. Shuigen, W., Cheng, W., Zhen, C.: Infrared dim target detection based on human visual mechanism. Acta Photonica Sinica. 50(1), 173 (2021)

    Article  Google Scholar 

  19. Ghose, D., Desai, M., Bhattacharya, S.: Pedestrian detection in thermal images using saliency maps. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  20. Li, W.: Infrared image pedestrian detection via YOLO-V3. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 1052–1055 (2021)

  21. Zhao, X., Xu, Y., Wu, F.: IYOLO: multi-scale infrared target detection method based on bidirectional feature fusion. J. Phys. Conf. Ser. 1873(1), 012020 (2021)

    Article  Google Scholar 

  22. Manssor, F., Sun, S., Abdalmajed, M.: Real-time human detection in thermal infrared imaging at night using enhanced Tiny-yolov3 network. J. Real Time Image Proc. 19(2), 261–274 (2022)

    Article  Google Scholar 

  23. Du, S., Zhang, B., Zhang, P.: FA-YOLO: an improved YOLO model for infrared occlusion object detection under confusing background. Wirel. Commun. Mob. Comput. 2021 (2021)

  24. Li, S., Li, Y.: YOLO-FIRI: improved YOLOv5 for infrared image object detection. IEEE Access. 2021(9), 141861–141875 (2021)

    Article  Google Scholar 

  25. Zhao, H., Shi, J., Qi, X.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)

  26. Hoyer, L., Dai, D., Van Gool, L.: Daformer: improving network architectures and training strategies for domain-adaptive semantic segmentation. In: 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9924–9935 (2022)

  27. Zhang, X., Du, B., Wu, Z.: LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput. Appl. 34, 1–15 (2022)

    Google Scholar 

  28. Sandler, M., Howard, A., Zhu, M.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)

  29. Zhu, K., Xu, C., Wei, Y.: Fast-PLDN: fast power line detection network. J. Real Time Image Proc. 19(1), 3–13 (2022)

    Article  Google Scholar 

  30. Lin T, Y., Dollár, P., Girshick, R.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

  31. Liu, S., Qi, L., Qin, H.: Path aggregation network for instance segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)

  32. Liu, J., Yang, D., Hu, F.: Multiscale object detection in remote sensing images combined with multi-receptive-field features and relation-connected attention. Remote Sensing. 14(2), 427 (2022)

    Article  Google Scholar 

  33. Li, J., Han, Y., Zhang, M.: Multi-scale residual network model combined with global average pooling for action recognition. Multimed. Tools Appl. 81(1), 1375–1393 (2022)

    Article  Google Scholar 

  34. Woo, S., Park, J., Lee, Y.: Cbam: convolutional block attention module. In: 2018 European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  35. Huang, Z., Wang, X., Huang, L.: Ccnet: criss-cross attention for semantic segmentation. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 603–612 (2019)

  36. Cao, Y., Xu, J., Lin, S.: Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE International Conference on Computer Vision (ICCV) (2019)

  37. Zhang, H., Zu, K., Lu, J.: Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv Prepr. arXiv 2105.14447 (2021)

  38. Zhang, L., Yang, B.: Sa-net: shuffle attention for deep convolutional neural networks. In: 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239 (2021)

  39. Zhang, X., Zhou, X., Lin, M.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856 (2018)

  40. Wu, Y., He, K.: Group normalization. In: 2018 European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  41. Gong, Y., Yu, X., Ding, Y.: Effective fusion factor in FPN for tiny object detection. In: Winter Conference on Applications of Computer Vision (WACVW), pp. 1160–1168 (2021)

  42. Teledyne, F.: FREE FLIR Thermal dataset for algorithm. https://www.flir.in/oem/adas/adas-dataset-form(2018). Accessed 26 June 2021

  43. Li, C., Zhao, N., Lu, Y.: Weighted sparse representation regularized graph learning for RGB-T object tracking. In: the 25th ACM international conference on Multimedia, pp. 1856–1864 (2017)

  44. Wang, Y. Bochkovskiy, A. Liao, M.: Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 13029–13038 (2021)

  45. Long, X., Deng, K., Wang, G.: PP-YOLO: an effective and efficient implementation of object detector. arXiv Prepr. arXiv 2007.12099 (2020)

  46. Zhang, S., Chi, C., Yao, Y.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9759–9768 (2020)

  47. Cai, Y.: Yolobile: Real-time object detection on mobile devices via compression-compilation co-design. arXiv Prepr. arXiv 2009.05697 (2020)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62072370).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Sun.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest affecting the work reported in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, Z., Sun, Y., Guo, H. et al. M-YOLO: an object detector based on global context information for infrared images. J Real-Time Image Proc 19, 1009–1022 (2022). https://doi.org/10.1007/s11554-022-01242-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-022-01242-y

Keywords

Navigation