Skip to main content
Log in

Thermal infrared image semantic segmentation for night-time driving scenes based on deep learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Semantic segmentation of thermal infrared (ThIR) images is challenging because the images considered in this task are highly complex. The discrimination of image regions is very difficult, and the traditional techniques fail to discover the crucial semantic information from the images completely. To overcome such issue, this paper introduces a novel network model for ThIR image semantic segmentation that facilitates effective image-to-image translation and reduces semantic encoding ambiguity. The proposed model is named top-down attention and gradient alignment-based graph neural network (AGAGNN). A top-down guided attention module (GAM) is utilized in the proposed model to deal with semantic encoding ambiguity. Apart from this, an elaborate attention loss is introduced to ensure a hierarchical coding of features. Also, the edge distortion problem due to the translation of images is reduced with an organized gradient alignment loss. The proposed model is evaluated under the Python platform based on pixel-level annotations over the KAIST dataset. The proposed model has shown 98.3% accuracy, and the comparative analysis has proved that the model is more effective than the existing models in preserving semantic information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1:
Fig. 3
Algorithm 2:
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article.

References

  1. Abbadi NKE, Razaq ES (2020) Automatic gray images colorization based on lab color space. Indones J Electr Eng Comput Sci 18(3):1501–1509

    Google Scholar 

  2. Anoosheh A, Sattler T, Timofte R, Pollefeys M and Van Gool L (2019) Night-to-day image translation for retrieval-based localization. In 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 5958-5964.

  3. Asano H, Hirakawa E, Hayashi H, Hamada K, Asayama Y, Oohashi M, Uchiyama A, Higashino T (2022) A method for improving semantic segmentation using thermographic images in infants. BMC Med Imag 22(1):1–13

    Article  Google Scholar 

  4. Balit E, Chadli A (2020) GMFNet: Gated multimodal fusion network for visible-thermal semantic segmentation, In Proceedings 16th the European Conference on Computer Vision pp. 1-4

  5. Cao Y, Guan D, Huang W, Yang J, Cao Y, Qiao Y (2019) Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inform Fusion 46:206–217

    Article  Google Scholar 

  6. Choi KC, Ryu KS, Lee SH, Kim YH, Lee SJ, Park CK (2021) Thermal image semantic segmentation using multispectral unsupervised domain adaptation

  7. Deng F, Feng H, Liang M, Wang H, Yang Y, Gao Y, Chen J, Hu J, Guo X, Lam TL (2021) FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation, In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4467-4473. https://doi.org/10.1109/IROS51168.2021.9636084

  8. Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3):1341–1360

    Article  Google Scholar 

  9. He DH, Yang KF, Wan XM, Xiao F, Yan HM, Li YJ (2022) A new representation of scene layout improves saliency detection in traffic scenes, Expert Systems with Applications 193:116425.

  10. He Y, Deng B, Wang H, Cheng L, Zhou K, Cai S, Ciampa F (2021) Infrared machine vision and infrared thermography with deep learning: a review. Infrared Physics Technol 116:103754

    Article  Google Scholar 

  11. Hou J, Zhang D, Wu W, Ma J, Zhou H (2021) A generative adversarial network for infrared and visible image fusion based on semantic segmentation. Entropy. 23(3):376

    Article  MathSciNet  Google Scholar 

  12. Huang X, Liu MY, Belongie S and Kautz J (2018) Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pp. 172-189.

  13. John V, Mita S, Lakshmanan A, Boyali A, Thompson S (2021) Deep Visible and Thermal Camera-Based Optimal Semantic Segmentation Using Semantic Forecasting, J Auton Veh Syst 1(2):

  14. Khalid B, Akram MU, Khan AM (2020) Multistage deep neural network framework for people detection and localization using fusion of visible and thermal images. International Conference on Image and Signal Processing. Springer, Cham, pp 138–147

    Chapter  Google Scholar 

  15. Kim J, Kim M, Kang H and Lee K (2019) U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830

  16. Kniaz VV, Bordodymov AN (2019) Long wave infrared image colorization for person re-identification. In International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences

  17. Kuang X, Zhu J, Sui X, Liu Y, Liu C, Chen Q, Gu G (2020) Thermal infrared colorization via conditional generative adversarial network. Infrared Physics Technol 107:103338

    Article  Google Scholar 

  18. Lee HY, Tseng HY, Mao Q, Huang JB, Lu YD, Singh M, Yang MH (2020) Drit++: Diverse image-to-image translation via disentangled representations. Int J Comput Vis 128:2402–2417

    Article  Google Scholar 

  19. Li Y, Ma Y, Wu J and Long C (2021) Hybrid feature based Pyramid Network for Night-time Semantic Segmentation, In VISIGRAPP (4: VISAPP). 321-328

  20. Li C, Xia W, Yan Y, Luo B, Tang J (2021) Segmenting objects in day and night: Edge-conditioned CNN for thermal image semantic segmentation. IEEE Trans Neural Netw Learn Syst 32(7):3069–3082

    Article  Google Scholar 

  21. Li G, Yang Y, Qu X, Cao D, Li K (2021) A deep learning based image enhancement approach for autonomous driving at night. Knowl-Based Syst 213:106617

    Article  Google Scholar 

  22. Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks, Advances in neural information processing systems 30.

  23. Lu Y, Lu G (2021) An alternative of Lidar in night-time: Unsupervised depth estimation based on single thermal image, In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 3833-3843.

  24. Luo F, Li Y, Zeng G, Peng P, Wang G, Li Y (2022) Thermal Infrared Image Colorization for Night-time Driving Scenes With Top-Down Guided Attention, In IEEE Transactions On Intelligent Transportation Systems, pp. 1-16

  25. Luo F, Cao Y, Li Y (2021) Night-time thermal infrared image colorization with dynamic label mining. In Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6–8, 2021, Proceedings, Part III vol. 12890, (pp. 388-399). Cham: Springer

  26. Lyu Y, Schiopu I, Munteanu A (2020) Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation. Electron Lett 56(18):920–923

    Article  Google Scholar 

  27. Masouleh MK, Shah-Hosseini R (2019) Development and evaluation of a deep learning model for real-time ground vehicle semantic segmentation from UAV-based thermal infrared imagery. ISPRS J Photogramm Remote Sens 155:172–186

    Article  Google Scholar 

  28. Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646

  29. Müller D, Ehlen A, Valeske B (2021) Convolutional neural networks for semantic segmentation as a tool for multiclass face analysis in thermal infrared. J Nondestruct Eval 40(1):1–10

    Article  Google Scholar 

  30. Munir F, Azam S, Fatima U and Jeon M (2021) ARTSeg: Employing Attention for Thermal Images Semantic Segmentation. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13188. Springer, Cham. https://doi.org/10.1007/978-3-031-02375-0_27 arXiv preprint arXiv:2111.15257

  31. Panetta K, Kamath KS, Rajeev S, Agaian S (2021) FTNet: Feature Transverse Network for Thermal Image Semantic Segmentation. IEEE Access 9:145212–145227

    Article  Google Scholar 

  32. Pemasiri A, Nguyen K, Sridharan S, Fookes C (2021) Multi-modal semantic image segmentation. Comput Vis Image Underst 202:103085

    Article  Google Scholar 

  33. Pozzer S, Azar ER, Rosa FD, Pravia ZC (2021) Semantic Segmentation of Defects in Infrared Thermographic Images of Highly Damaged Concrete Structures. J Perform Constructed Facil 35(1):04020131

  34. Rahman AK, Raihan MFMR, Islam SMM (2021) Pedestrian Detection in Thermal Images Using Deep Saliency Map and Instance Segmentation. Int J Image Graphics Signal Process 13(1):40–49

    Article  Google Scholar 

  35. Salau AO and Jain S (2019) Feature extraction: a survey of the types, techniques, applications. In 2019 international conference on signal processing and communication (ICSC) pp. 158-164. https://doi.org/10.1109/ICSC45622.2019.8938371

  36. Salau AO, Yesufu TK, Ogundare BS (2021) Vehicle plate number localization using a modified GrabCut algorithm. J King Saud Univ-Comput Inform Sci 33(4):399–407

    Google Scholar 

  37. Shojaiee F, Baleghi Y (2023) EFASPP U-Net for semantic segmentation of night traffic scenes using fusion of visible and thermal images. Eng Applic Art Intell 117:105627

    Article  Google Scholar 

  38. Shopovska I, Jovanov L, Philips W (2019) Deep visible and thermal image fusion for enhanced pedestrian visibility. Sensors. 19(17):3727

    Article  Google Scholar 

  39. Song S, Chen W, Liu Q, Hu H, Huang T, Zhu Q (2022) A novel deep learning network for accurate lane detection in low-light environments. Proc Inst Mech Eng Part D: J Automob Eng 236(2–3):424–438

    Article  Google Scholar 

  40. Speth S, Gonçalves A, Rigault B, Suzuki S, Bouazizi M, Matsuo Y, Prendinger H (2022) D Deep learning with RGB and thermal images onboard a drone for monitoring operations, J Field Robot 39(6):840–868

  41. Sun Y, Zuo W, Liu M (2019) Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robot Autom Lett 4(3):2576–2583

    Article  Google Scholar 

  42. Sun L, Wang K, Yang K, Xiang K (2019) See clearer at night: towards robust night-time semantic segmentation through day-night image conversion. Artif Intell Mach Learn Defense Applic Int Soc Opt Photon 111(69):111690

    Google Scholar 

  43. Sun Y, Zuo W, Yun P, Wang H, Liu M (2021) FuseSeg: Semantic Segmentation of Urban Scenes Based on RGB and Thermal Data Fusion. IEEE Trans Autom Sci Eng 18:1000–1011

    Article  Google Scholar 

  44. Wang P, Bai X (2019) Thermal Infrared Pedestrian Segmentation Based on Conditional GAN. IEEE Trans Image Process 28:6007–6021

    Article  MathSciNet  MATH  Google Scholar 

  45. Xiong H, Cai W, Liu Q (2021) MCNet: Multi-level Correction Network for thermal image semantic segmentation of night-time driving scene. Infrared Physics Technol 113:103628

    Article  Google Scholar 

  46. Xu J, Lu K, Wang H (2021) Attention fusion network for multispectral semantic segmentation. Pattern Recogn Lett 146:179–184

    Article  Google Scholar 

  47. Xuan P, Cui H, Zhang H, Zhang T, Wang L, Nakaguchi T, Duh HB (2022) Dynamic graph convolutional autoencoder with node-attribute-wise attention for kidney and tumor segmentation from CT volumes. Knowl-Based Syst 236:107360

    Article  Google Scholar 

  48. Yadav R, Samir A, Rashed H, Yogamani S, Dahyot R (2020) Cnn based color and thermal image fusion for object detection in automated driving, Irish Machine Vision and Image Processing

  49. Yi S, Li J, Liu X, Yuan X (2022) CCAFFMNet: Dual-spectral semantic segmentation network with channel-coordinate attention feature fusion module. Neurocomputing. 482:236–251

    Article  Google Scholar 

  50. Zhang Q, Zhao S, Luo Y, Zhang D, Huang N, Han J (2021) ABMDRNet: Adaptive-weighted bi-directional modality difference reduction network for RGB-T semantic segmentation, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 2633-2642.

  51. Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2019) Cross-modality interactive attention network for multispectral pedestrian detection. Inform Fusion 50:20–29

    Article  Google Scholar 

  52. Zheng Z, Wu Y, Han X and Shi J (2020) Forkgan: Seeing into the rainy night. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 12348:155-170

  53. Zhu JY, Park T, Isola P and Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232.

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors have equal contributions to this work.

Corresponding author

Correspondence to B. Maheswari.

Ethics declarations

Conflict of interest

Authors have no conflict of interest to declare.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript fully consent to publish this submitted article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maheswari, B., Reeja, S.R. Thermal infrared image semantic segmentation for night-time driving scenes based on deep learning. Multimed Tools Appl 82, 44885–44910 (2023). https://doi.org/10.1007/s11042-023-15882-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15882-0

Keywords

Navigation