Abstract
Semantic segmentation of thermal infrared (ThIR) images is challenging because the images considered in this task are highly complex. The discrimination of image regions is very difficult, and the traditional techniques fail to discover the crucial semantic information from the images completely. To overcome such issue, this paper introduces a novel network model for ThIR image semantic segmentation that facilitates effective image-to-image translation and reduces semantic encoding ambiguity. The proposed model is named top-down attention and gradient alignment-based graph neural network (AGAGNN). A top-down guided attention module (GAM) is utilized in the proposed model to deal with semantic encoding ambiguity. Apart from this, an elaborate attention loss is introduced to ensure a hierarchical coding of features. Also, the edge distortion problem due to the translation of images is reduced with an organized gradient alignment loss. The proposed model is evaluated under the Python platform based on pixel-level annotations over the KAIST dataset. The proposed model has shown 98.3% accuracy, and the comparative analysis has proved that the model is more effective than the existing models in preserving semantic information.
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article.
References
Abbadi NKE, Razaq ES (2020) Automatic gray images colorization based on lab color space. Indones J Electr Eng Comput Sci 18(3):1501–1509
Anoosheh A, Sattler T, Timofte R, Pollefeys M and Van Gool L (2019) Night-to-day image translation for retrieval-based localization. In 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 5958-5964.
Asano H, Hirakawa E, Hayashi H, Hamada K, Asayama Y, Oohashi M, Uchiyama A, Higashino T (2022) A method for improving semantic segmentation using thermographic images in infants. BMC Med Imag 22(1):1–13
Balit E, Chadli A (2020) GMFNet: Gated multimodal fusion network for visible-thermal semantic segmentation, In Proceedings 16th the European Conference on Computer Vision pp. 1-4
Cao Y, Guan D, Huang W, Yang J, Cao Y, Qiao Y (2019) Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inform Fusion 46:206–217
Choi KC, Ryu KS, Lee SH, Kim YH, Lee SJ, Park CK (2021) Thermal image semantic segmentation using multispectral unsupervised domain adaptation
Deng F, Feng H, Liang M, Wang H, Yang Y, Gao Y, Chen J, Hu J, Guo X, Lam TL (2021) FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation, In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4467-4473. https://doi.org/10.1109/IROS51168.2021.9636084
Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3):1341–1360
He DH, Yang KF, Wan XM, Xiao F, Yan HM, Li YJ (2022) A new representation of scene layout improves saliency detection in traffic scenes, Expert Systems with Applications 193:116425.
He Y, Deng B, Wang H, Cheng L, Zhou K, Cai S, Ciampa F (2021) Infrared machine vision and infrared thermography with deep learning: a review. Infrared Physics Technol 116:103754
Hou J, Zhang D, Wu W, Ma J, Zhou H (2021) A generative adversarial network for infrared and visible image fusion based on semantic segmentation. Entropy. 23(3):376
Huang X, Liu MY, Belongie S and Kautz J (2018) Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pp. 172-189.
John V, Mita S, Lakshmanan A, Boyali A, Thompson S (2021) Deep Visible and Thermal Camera-Based Optimal Semantic Segmentation Using Semantic Forecasting, J Auton Veh Syst 1(2):
Khalid B, Akram MU, Khan AM (2020) Multistage deep neural network framework for people detection and localization using fusion of visible and thermal images. International Conference on Image and Signal Processing. Springer, Cham, pp 138–147
Kim J, Kim M, Kang H and Lee K (2019) U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830
Kniaz VV, Bordodymov AN (2019) Long wave infrared image colorization for person re-identification. In International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences
Kuang X, Zhu J, Sui X, Liu Y, Liu C, Chen Q, Gu G (2020) Thermal infrared colorization via conditional generative adversarial network. Infrared Physics Technol 107:103338
Lee HY, Tseng HY, Mao Q, Huang JB, Lu YD, Singh M, Yang MH (2020) Drit++: Diverse image-to-image translation via disentangled representations. Int J Comput Vis 128:2402–2417
Li Y, Ma Y, Wu J and Long C (2021) Hybrid feature based Pyramid Network for Night-time Semantic Segmentation, In VISIGRAPP (4: VISAPP). 321-328
Li C, Xia W, Yan Y, Luo B, Tang J (2021) Segmenting objects in day and night: Edge-conditioned CNN for thermal image semantic segmentation. IEEE Trans Neural Netw Learn Syst 32(7):3069–3082
Li G, Yang Y, Qu X, Cao D, Li K (2021) A deep learning based image enhancement approach for autonomous driving at night. Knowl-Based Syst 213:106617
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks, Advances in neural information processing systems 30.
Lu Y, Lu G (2021) An alternative of Lidar in night-time: Unsupervised depth estimation based on single thermal image, In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 3833-3843.
Luo F, Li Y, Zeng G, Peng P, Wang G, Li Y (2022) Thermal Infrared Image Colorization for Night-time Driving Scenes With Top-Down Guided Attention, In IEEE Transactions On Intelligent Transportation Systems, pp. 1-16
Luo F, Cao Y, Li Y (2021) Night-time thermal infrared image colorization with dynamic label mining. In Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6–8, 2021, Proceedings, Part III vol. 12890, (pp. 388-399). Cham: Springer
Lyu Y, Schiopu I, Munteanu A (2020) Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation. Electron Lett 56(18):920–923
Masouleh MK, Shah-Hosseini R (2019) Development and evaluation of a deep learning model for real-time ground vehicle semantic segmentation from UAV-based thermal infrared imagery. ISPRS J Photogramm Remote Sens 155:172–186
Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646
Müller D, Ehlen A, Valeske B (2021) Convolutional neural networks for semantic segmentation as a tool for multiclass face analysis in thermal infrared. J Nondestruct Eval 40(1):1–10
Munir F, Azam S, Fatima U and Jeon M (2021) ARTSeg: Employing Attention for Thermal Images Semantic Segmentation. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13188. Springer, Cham. https://doi.org/10.1007/978-3-031-02375-0_27 arXiv preprint arXiv:2111.15257
Panetta K, Kamath KS, Rajeev S, Agaian S (2021) FTNet: Feature Transverse Network for Thermal Image Semantic Segmentation. IEEE Access 9:145212–145227
Pemasiri A, Nguyen K, Sridharan S, Fookes C (2021) Multi-modal semantic image segmentation. Comput Vis Image Underst 202:103085
Pozzer S, Azar ER, Rosa FD, Pravia ZC (2021) Semantic Segmentation of Defects in Infrared Thermographic Images of Highly Damaged Concrete Structures. J Perform Constructed Facil 35(1):04020131
Rahman AK, Raihan MFMR, Islam SMM (2021) Pedestrian Detection in Thermal Images Using Deep Saliency Map and Instance Segmentation. Int J Image Graphics Signal Process 13(1):40–49
Salau AO and Jain S (2019) Feature extraction: a survey of the types, techniques, applications. In 2019 international conference on signal processing and communication (ICSC) pp. 158-164. https://doi.org/10.1109/ICSC45622.2019.8938371
Salau AO, Yesufu TK, Ogundare BS (2021) Vehicle plate number localization using a modified GrabCut algorithm. J King Saud Univ-Comput Inform Sci 33(4):399–407
Shojaiee F, Baleghi Y (2023) EFASPP U-Net for semantic segmentation of night traffic scenes using fusion of visible and thermal images. Eng Applic Art Intell 117:105627
Shopovska I, Jovanov L, Philips W (2019) Deep visible and thermal image fusion for enhanced pedestrian visibility. Sensors. 19(17):3727
Song S, Chen W, Liu Q, Hu H, Huang T, Zhu Q (2022) A novel deep learning network for accurate lane detection in low-light environments. Proc Inst Mech Eng Part D: J Automob Eng 236(2–3):424–438
Speth S, Gonçalves A, Rigault B, Suzuki S, Bouazizi M, Matsuo Y, Prendinger H (2022) D Deep learning with RGB and thermal images onboard a drone for monitoring operations, J Field Robot 39(6):840–868
Sun Y, Zuo W, Liu M (2019) Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robot Autom Lett 4(3):2576–2583
Sun L, Wang K, Yang K, Xiang K (2019) See clearer at night: towards robust night-time semantic segmentation through day-night image conversion. Artif Intell Mach Learn Defense Applic Int Soc Opt Photon 111(69):111690
Sun Y, Zuo W, Yun P, Wang H, Liu M (2021) FuseSeg: Semantic Segmentation of Urban Scenes Based on RGB and Thermal Data Fusion. IEEE Trans Autom Sci Eng 18:1000–1011
Wang P, Bai X (2019) Thermal Infrared Pedestrian Segmentation Based on Conditional GAN. IEEE Trans Image Process 28:6007–6021
Xiong H, Cai W, Liu Q (2021) MCNet: Multi-level Correction Network for thermal image semantic segmentation of night-time driving scene. Infrared Physics Technol 113:103628
Xu J, Lu K, Wang H (2021) Attention fusion network for multispectral semantic segmentation. Pattern Recogn Lett 146:179–184
Xuan P, Cui H, Zhang H, Zhang T, Wang L, Nakaguchi T, Duh HB (2022) Dynamic graph convolutional autoencoder with node-attribute-wise attention for kidney and tumor segmentation from CT volumes. Knowl-Based Syst 236:107360
Yadav R, Samir A, Rashed H, Yogamani S, Dahyot R (2020) Cnn based color and thermal image fusion for object detection in automated driving, Irish Machine Vision and Image Processing
Yi S, Li J, Liu X, Yuan X (2022) CCAFFMNet: Dual-spectral semantic segmentation network with channel-coordinate attention feature fusion module. Neurocomputing. 482:236–251
Zhang Q, Zhao S, Luo Y, Zhang D, Huang N, Han J (2021) ABMDRNet: Adaptive-weighted bi-directional modality difference reduction network for RGB-T semantic segmentation, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 2633-2642.
Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2019) Cross-modality interactive attention network for multispectral pedestrian detection. Inform Fusion 50:20–29
Zheng Z, Wu Y, Han X and Shi J (2020) Forkgan: Seeing into the rainy night. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 12348:155-170
Zhu JY, Park T, Isola P and Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232.
Author information
Authors and Affiliations
Contributions
All authors have equal contributions to this work.
Corresponding author
Ethics declarations
Conflict of interest
Authors have no conflict of interest to declare.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
All the authors involved have agreed to participate in this submitted article.
Consent to publish
All the authors involved in this manuscript fully consent to publish this submitted article.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maheswari, B., Reeja, S.R. Thermal infrared image semantic segmentation for night-time driving scenes based on deep learning. Multimed Tools Appl 82, 44885–44910 (2023). https://doi.org/10.1007/s11042-023-15882-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15882-0