DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation

Cai, Yuqi; Zhou, Wujie; Zhang, Liting; Yu, Lu; Luo, Ting

doi:10.1007/s00371-023-02773-6

DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation

Original article
Published: 26 January 2023

Volume 40, pages 169–179, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yuqi Cai¹,
Wujie Zhou^1,2,
Liting Zhang¹,
Lu Yu² &
…
Ting Luo³

1396 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, red-green-blue (RGB) and thermal (RGB-T) data have attracted considerable interest for semantic segmentation because they provide robust imaging under the complex lighting conditions of urban roads. Most existing RGB-T semantic segmentation methods adopt an encoder-decoder structure, and repeated upsampling causes semantic information loss during decoding. Moreover, using simple cross-modality fusion neither completely mines complementary information from different modalities nor removes noise from the extracted features. To address these problems, we developed a dual-decoding hierarchical fusion network (DHFNet) to extract RGB and thermal information for RGB-T Semantic Segmentation. DHFNet uses a novel two-layer decoder and implements boundary refinement and boundary-guided foreground/background enhancement modules. The modules process features from different levels to achieve the global guidance and local refinement of the segmentation prediction. In addition, an adaptive attention-filtering fusion module filters and extracts complementary information from the RGB and thermal modalities. Further, we introduce a graph convolutional network and an atrous spatial pyramid pooling module to obtain multiscale features and deepen the extracted semantic information. Experimental results on two benchmark datasets showed that the proposed DHFNet performed well relative to state-of-the-art semantic segmentation methods in terms of different evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

Article 23 August 2021

MMNet: RGB-t Semantic Segmentation Network Based on Multi-scale and Adaptively Mutual Enhancement Mechanism

CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation

Article 01 July 2022

Data Availability

The data that support the findings of this study are openly available in https://www.mi.t.u-tokyo.ac.jp/static/projects/mil_multispectral/https://github.com/ShreyasSkandanS/pst900_thermal_rgb.

References

Wu, J., Zhou, W., Qian, X., Lei, J., Yu, L., Luo, T.: MENet: Lightweight multimodality enhancement network for detecting salient objects in RGB-thermal images. Neurocomputing 527, 119–129 (2023)
Article Google Scholar
Zhou, W., Yue, Y., Fang, M., Qian, X., Yang, R., Yu, L.: BCINet: Bilateral cross-modal interaction network for indoor scene understanding in RGB-D images. Inf. Fusion 94, 32–42 (2023)
Article Google Scholar
Zhou, W., Lv, Y., Lei, J., Yu, L.: Global and local-contrast guides content-aware fusion for RGB-D saliency prediction. IEEE Trans. Syst. Man Cybern. Syst. 51(6), 3641–3649 (2019)
Article Google Scholar
Xu, G., Zhou, W., Qian, X., Ye, L., Lei, J., Yu, L.: CCFNet: Cross-complementary fusion network for RGB-D scene parsing of clothing images. J. Vis. Commun. Image Represent. 90, 103727 (2023)
Article Google Scholar
Zhou, W., Hong, J.: FHENet: Lightweight feature hierarchical exploration network for real-time rail surface defect inspection in RGB-D images. IEEE Trans. Instrum. Meas. (2023). https://doi.org/10.1109/TIM.2023.3237830
Article Google Scholar
Wu, J., Zhou, W., Qian, X., Lei, J., Yu, L., Luo, T.: MFENet: Multitype fusion and enhancement network for detecting salient objects in RGB-T images. Digital Signal Process. 133, 103827 (2023)
Zhou, W., Liu, C., Lei, J., Yu, L., Luo, T.: HFNet: Hierarchical feedback network with multilevel atrous spatial pyramid pooling for RGB-D saliency detection. Neurocomputing 490, 347–357 (2022)
Article Google Scholar
Jin, J., Zhou, W., Yang, R., Ye L., Yu L.: Edge detection guide network for semantic segmentation of remote-sensing images. IEEE Geosci. Remote Sens. Lett. https://doi.org/10.1109/LGRS.2023.3234257
Zhou, W., Yang, E., Lei J., Yu, L.: FRNet: Feature reconstruction network for RGB-D indoor scene parsing. IEEE J. Sel. Topics Signal Process. 16(4), 677–687 (2022)
Zhou, W., Liu, C., Lei, J., Yu, L.: RLLNet: a lightweight remaking learning network for saliency redetection on RGB-D images. Sci. China Inf. Sci. 65(6), 160107 (2022)
Article Google Scholar
Zhou, W., Guo, Q., Lei, J., Yu, L., Hwang, J.-N.: IRFR-Net: Interactive recursive feature-reshaping network for detecting salient objects in RGB-D images. IEEE Trans. Neural Netw. Learn. Syst. https://doi.org/10.1109/TNNLS.2021.3105484
Zhou, W., Yu, L., Zhou, Y., Qiu, W., Wu, M., Luo, T.,: Local and global feature learning for blind quality evaluation of screen content and natural scene images. IEEE Trans. Image Process. 27(5), 2086–2095 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. (2014) arXiv preprint arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Asian Conference on Computer Vision (ACCV), pp. 213–228 (2016)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 325–341 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)
Hu, X., Yang, K., Fei, L., Wang, K.: Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE International Conference on Image Processing, pp. 1440-1444 (2019)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. (2020) arXiv preprint arXiv:2010.11929
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. (2022) arXiv preprint arXiv:2201.03545
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Rizzoli, G., Barbato, F., Zanuttigh, P.: Multimodal semantic segmentation in autonomous driving: a review of current approaches and future perspectives. Technologies 10(4), 90 (2022)
Article Google Scholar
Liu, H., Zhang, J., Yang, K., Hu, X., Stiefelhagen, R.: CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers (2022) arXiv preprnt arXiv:2203.04838
Cui, Y., Yan, L., Cao, Z., Liu, D.: Tf-blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8138–8147 (2021)
Gong, T., Zhou, W., Qian, X., Lei, J., Yu, L.: Global contextually guided lightweight network for RGB-thermal urban scene understanding. Eng. Appl. Artif. Intell. 117, 105510 (2023)
Article Google Scholar
Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4151–4160 (2017)
Sun, Y., Zuo, W., Yun, P., Wang, H., Liu, M.: FuseSeg: semantic segmentation of urban scenes based on RGB and thermal data fusion. IEEE Trans. Autom. Sci. Eng. 18(3), 1000–1011 (2020)
Article Google Scholar
Zhang, Q., Zhao, S., Luo, Y., Zhang, D., Huang, N., Han, J.: ABMDRNet: adaptive-weighted Bi-directional modality difference reduction network for RGB-T semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2633–2642 (2021)
Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5108-5115 (2017)
Sun, Y., Zuo, W., Liu, M.: Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robot. Autom. Lett. 4(3), 2576–2583 (2019)
Article Google Scholar
Yi, S., Li, J., Liu, X., Yuan, X.: CCAFFMNet: dual-spectral semantic segmentation network with channel-coordinate attention feature fusion module. Neurocomputing 482, 236–251 (2022)
Article Google Scholar
Yan, L., Wang, Q., Cui, Y., Feng, F., Quan, X., Zhang, X., Liu, D.: GL-RG: Global-local representation granularity for video captioning (2022) arXiv preprint arXiv:2205.10706
Zhou, W., Guo, Q., Lei, J., Yu, L., Hwang, J.-N.: ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1224–1235 (2022)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cba: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. (2017) arXiv preprint arXiv:1706.05587
Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H.: Dual graph convolutional network for semantic segmentation (2019) arXiv preprint arXiv:1909.06121
Li, G., Liu, Z., Zeng, D., Lin, W., Ling, H.: Adjacent context coordination network for salient object detection in optical remote sensing images. IEEE Trans. Cybern. (2022). https://doi.org/10.1109/TCYB.2022.3162945
Article Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020) arXiv preprint arXiv:2004.10934
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241 (2015)
Zhou, W., Liu, J., Lei, J., Yu, L., Hwang, J.-N.: Gmnet: gradedfeature multilabel-learning network for rgb-thermal urban scene semantic segmentation. IEEE Trans. Image Process. 7790–7802 (2021)
Shivakumar, S.S., Rodrigues, N., Zhou, A., Miller, I.D., Kumar, V., Taylor, C.J.: Pst900: Rgb-thermal calibration, dataset and segmentation network. In: 2020 IEEE International Conference on Robotics and Automation, pp. 9441-9447 (2020)
Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4151-4160 (2017)
Zhou, W., Jin, J., Lei, J., Yu, L.: CIMFNet: Cross-layer interaction and multiscale fusion network for semantic segmentation of high-resolution remote sensing images. IEEE J. Sel. Topics Signal Process. 16(4), 666–676 (2022)
Article Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)
Liu, J., He, J., Zhang, J., Ren, J.S., Li, H.: Efficientfcn: holistically-guided decoding for semantic segmentation. Eur. Conf. Comput. Vis. 1–17 (2020)

Download references

Funding

This work was supported by National Natural Science Foundation of China (61502429, 61672337, 61972357); the Zhejiang Provincial Natural Science Foundation of China (LY18F020012, LY17F020011); and Zhejiang Key R & D Program (2019C03135).

Author information

Authors and Affiliations

School of Information and Electronic Engineering, Zhejiang University of Science & Technology, Hangzhou, 310023, Zhejiang, China
Yuqi Cai, Wujie Zhou & Liting Zhang
College of Information and Electronic Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Wujie Zhou & Lu Yu
College of Science and Technology, Ningbo University, Ningbo, 315211, Zhejiang, China
Ting Luo

Authors

Yuqi Cai
View author publications
You can also search for this author in PubMed Google Scholar
Wujie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Liting Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ting Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wujie Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cai, Y., Zhou, W., Zhang, L. et al. DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation. Vis Comput 40, 169–179 (2024). https://doi.org/10.1007/s00371-023-02773-6

Download citation

Accepted: 06 January 2023
Published: 26 January 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00371-023-02773-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation

Abstract

Access this article

Similar content being viewed by others

MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

MMNet: RGB-t Semantic Segmentation Network Based on Multi-scale and Adaptively Mutual Enhancement Mechanism

CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation

Abstract

Access this article

Similar content being viewed by others

MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

MMNet: RGB-t Semantic Segmentation Network Based on Multi-scale and Adaptively Mutual Enhancement Mechanism

CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation