M-YOLO: an object detector based on global context information for infrared images

Hou, Zhiqiang; Sun, Ying; Guo, Hao; Li, Juanjuan; Ma, Sugang; Fan, Jiulun

doi:10.1007/s11554-022-01242-y

M-YOLO: an object detector based on global context information for infrared images

Original Research Paper
Published: 12 August 2022

Volume 19, pages 1009–1022, (2022)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Zhiqiang Hou^1,2,
Ying Sun^1,2,
Hao Guo^1,2,
Juanjuan Li^1,2,
Sugang Ma^1,2 &
…
Jiulun Fan¹

770 Accesses
2 Citations
Explore all metrics

Abstract

Object detection is an important task in computer vision. While visible (VS) images are adequate for detecting objects in most scenarios, infrared (IR) images can extend the capabilities of object detection to night-time or occluded objects. For IR images, we proposes an infrared object detector based on global context information. Combined with the lightweight network (MobileNetV2) to extract features, therefore the detector is named M-YOLO. Then, dedicated to enhancing the global information perception capability of the model, this paper proposes a global contextual information aggregation model. To preserve multi-scale information and enhance expressiveness of features, a top-down and bottom-up parallel feature fusion method is proposed. Only two detection heads are used to implement a lightweight model, which improves detection accuracy and speed. We use the self-built IR dataset (GIR) and the public IR dataset (FLIR) to verify the superiority of the model. Compared with YOLOv4 (78.1%), the average accuracy of M-YOLO (83.4%) is improved by 5.3% on the FLIR dataset. The detection time (4.33 ms) is less, with a detection speed of 30.6 FPS. On the GIR dataset, the detection accuracy (76.1%) is 6.4% higher than that of YOLOv4 (69.7%), and the detection time (6.84 ms) is lower. Our method improves the performance of IR object detection. The method is able to detect IR ground targets in complex environments, and the detection speed meets the real-time requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Ajantha Vijayakumar & Subramaniyaswamy Vairavasundaram

References

Zhang, L.: Synthetic data generation for end-to-end thermal infrared tracking. IEEE Trans. Ima. Process. 28(4), 1837–1850 (2018)
Article MathSciNet Google Scholar
Zou, Z., Shi, Z., Guo, Y.: Object detection in 20 years: a survey. arXiv Prepr. arXiv1905.05055 (2019)
Zhao, F., Wei, R., Chao, Y.: Infrared bird target detection based on temporal variation filtering and a gaussian heat-map perception network. Appl. Sci. 12(11), 5679 (2022)
Article Google Scholar
Li, Y., Li, Z., Zhang, C.: Infrared maritime dim small target detection based on spatiotemporal cues and directional morphological filtering. Infrared Phys. Technol. 115, 103657 (2021)
Article Google Scholar
Lu, Y., Dong, L., Zhang, T.: A robust detection algorithm for infrared maritime small and dim targets. Sensors. 20(4), 1237 (2020)
Article Google Scholar
Wang, B., Motai, Y., Dong, L.: Detecting infrared maritime targets overwhelmed in sun glitters by antijitter spatiotemporal saliency. IEEE TGRS. 57(7), 5159–5173 (2019)
Google Scholar
Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards realtime object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–95 (2015)
Google Scholar
Liu, W., Anguelov, D., Erhan, D.: SSD: single shot multibox detector. Eur. Conf. Comput. Vis. pp. 21–37 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv Prepr. arXiv.1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv Prepr. arXiv2004.10934. (2020)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: 2018 European Conference on Computer Vision (ECCV), pp 734–750 (2018)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv Prepr. arXiv.1904.07850 (2019)
Tian, Z., Shen, C., Chen, H.: Fcos: fully convolutional one-stage object detection. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)
Huang, S., He, Y., Chen, X.: M-YOLO: a nighttime vehicle detection method combining mobilenet v2 and YOLO v3. J. Phys Conf. Ser. 1883(1), 012094 (2021)
Article Google Scholar
Shuigen, W., Cheng, W., Zhen, C.: Infrared dim target detection based on human visual mechanism. Acta Photonica Sinica. 50(1), 173 (2021)
Article Google Scholar
Ghose, D., Desai, M., Bhattacharya, S.: Pedestrian detection in thermal images using saliency maps. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Li, W.: Infrared image pedestrian detection via YOLO-V3. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 1052–1055 (2021)
Zhao, X., Xu, Y., Wu, F.: IYOLO: multi-scale infrared target detection method based on bidirectional feature fusion. J. Phys. Conf. Ser. 1873(1), 012020 (2021)
Article Google Scholar
Manssor, F., Sun, S., Abdalmajed, M.: Real-time human detection in thermal infrared imaging at night using enhanced Tiny-yolov3 network. J. Real Time Image Proc. 19(2), 261–274 (2022)
Article Google Scholar
Du, S., Zhang, B., Zhang, P.: FA-YOLO: an improved YOLO model for infrared occlusion object detection under confusing background. Wirel. Commun. Mob. Comput. 2021 (2021)
Li, S., Li, Y.: YOLO-FIRI: improved YOLOv5 for infrared image object detection. IEEE Access. 2021(9), 141861–141875 (2021)
Article Google Scholar
Zhao, H., Shi, J., Qi, X.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)
Hoyer, L., Dai, D., Van Gool, L.: Daformer: improving network architectures and training strategies for domain-adaptive semantic segmentation. In: 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9924–9935 (2022)
Zhang, X., Du, B., Wu, Z.: LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput. Appl. 34, 1–15 (2022)
Google Scholar
Sandler, M., Howard, A., Zhu, M.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)
Zhu, K., Xu, C., Wei, Y.: Fast-PLDN: fast power line detection network. J. Real Time Image Proc. 19(1), 3–13 (2022)
Article Google Scholar
Lin T, Y., Dollár, P., Girshick, R.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Liu, S., Qi, L., Qin, H.: Path aggregation network for instance segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)
Liu, J., Yang, D., Hu, F.: Multiscale object detection in remote sensing images combined with multi-receptive-field features and relation-connected attention. Remote Sensing. 14(2), 427 (2022)
Article Google Scholar
Li, J., Han, Y., Zhang, M.: Multi-scale residual network model combined with global average pooling for action recognition. Multimed. Tools Appl. 81(1), 1375–1393 (2022)
Article Google Scholar
Woo, S., Park, J., Lee, Y.: Cbam: convolutional block attention module. In: 2018 European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Huang, Z., Wang, X., Huang, L.: Ccnet: criss-cross attention for semantic segmentation. In: 2019 IEEE International Conference on Computer Vision (ICCV), pp. 603–612 (2019)
Cao, Y., Xu, J., Lin, S.: Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE International Conference on Computer Vision (ICCV) (2019)
Zhang, H., Zu, K., Lu, J.: Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv Prepr. arXiv 2105.14447 (2021)
Zhang, L., Yang, B.: Sa-net: shuffle attention for deep convolutional neural networks. In: 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239 (2021)
Zhang, X., Zhou, X., Lin, M.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856 (2018)
Wu, Y., He, K.: Group normalization. In: 2018 European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Gong, Y., Yu, X., Ding, Y.: Effective fusion factor in FPN for tiny object detection. In: Winter Conference on Applications of Computer Vision (WACVW), pp. 1160–1168 (2021)
Teledyne, F.: FREE FLIR Thermal dataset for algorithm. https://www.flir.in/oem/adas/adas-dataset-form(2018). Accessed 26 June 2021
Li, C., Zhao, N., Lu, Y.: Weighted sparse representation regularized graph learning for RGB-T object tracking. In: the 25th ACM international conference on Multimedia, pp. 1856–1864 (2017)
Wang, Y. Bochkovskiy, A. Liao, M.: Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 13029–13038 (2021)
Long, X., Deng, K., Wang, G.: PP-YOLO: an effective and efficient implementation of object detector. arXiv Prepr. arXiv 2007.12099 (2020)
Zhang, S., Chi, C., Yao, Y.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9759–9768 (2020)
Cai, Y.: Yolobile: Real-time object detection on mobile devices via compression-compilation co-design. arXiv Prepr. arXiv 2009.05697 (2020)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62072370).

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Zhiqiang Hou, Ying Sun, Hao Guo, Juanjuan Li, Sugang Ma & Jiulun Fan
Shannxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Zhiqiang Hou, Ying Sun, Hao Guo, Juanjuan Li & Sugang Ma

Authors

Zhiqiang Hou
View author publications
You can also search for this author in PubMed Google Scholar
Ying Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Juanjuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Sugang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jiulun Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Sun.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest affecting the work reported in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hou, Z., Sun, Y., Guo, H. et al. M-YOLO: an object detector based on global context information for infrared images. J Real-Time Image Proc 19, 1009–1022 (2022). https://doi.org/10.1007/s11554-022-01242-y

Download citation

Received: 28 March 2022
Accepted: 03 August 2022
Published: 12 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11554-022-01242-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

M-YOLO: an object detector based on global context information for infrared images

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

M-YOLO: an object detector based on global context information for infrared images

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation