GANet: geometry-aware network for RGB-D semantic segmentation

Tian, Chunqi; Xu, Weirong; Bai, Lizhi; Yang, Jun; Xu, Yanjun

doi:10.1007/s10489-025-06337-0

GANet: geometry-aware network for RGB-D semantic segmentation

Published: 15 February 2025

Volume 55, article number 454, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chunqi Tian¹,
Weirong Xu ORCID: orcid.org/0009-0000-0102-5021¹,
Lizhi Bai¹,
Jun Yang¹ &
…
Yanjun Xu¹

166 Accesses
Explore all metrics

Abstract

The field of RGB-D semantic segmentation has attracted considerable interest in recent times. The challenge is to develop an effective method for combining RGB images, which capture colour variations, with depth images, which provide robust information about object geometry regardless of lighting conditions. Treating both image types equally through the same convolution operator fails to take into account their inherent differences. Thus, in this paper, we propose a novel approach that combines a geometry-aware convolution (GAConv) module and a multiscale fusion module (MFM) with the aim of enhancing the performance of RGB-D image segmentation. The GAConv module effectively captures fine-grained geometric details from depth images, while the MFM module enables efficient integration of multi-scale features, allowing the network to utilise both spatial and semantic information. Extensive experimentation was conducted on the NYUv2 and SUN RGB-D datasets, wherein our model demonstrated consistent superiority over existing state-of-the-art methods in terms of pixel accuracy and mean intersection over union (mIoU).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Depth-Aware CNN for RGB-D Segmentation

CDMANet: central difference mutual attention network for RGB-D semantic segmentation

Article 04 December 2024

CLGFormer: Cross-Level-Guided transformer for RGB-D semantic segmentation

Article 09 May 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Availibility of Data and Materials

Data sharing not applicable.

Code Availability

Not applicable.

References

Sun Y, Zuo W, Yun P, Wang H, Liu M (2020) Fuseseg: Semantic segmentation of urban scenes based on rgb and thermal data fusion. IEEE Trans Automat Sci Eng 18(3):1000–1011
Article Google Scholar
Sun Y, Liu M, Meng MQ-H (2019) Active perception for foreground segmentation: An rgb-d data-based background modeling method. IEEE Trans Automat Sci Eng 16(4):1596–1609
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2881–2890
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Park S-J, Hong K-S, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4980–4989
Giannone G, Chidlovskii B (2019) Learning common representation from rgb and depth images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp 1440–1444 . IEEE
Bai L, Yang J, Tian C, Sun Y, Mao M, Xu Y, Xu W (2022) Dcanet: Differential convolution attention network for rgb-d semantic segmentation. arXiv:2210.06747
Yang J, Bai L, Sun Y, Tian C, Mao M, Wang G (2023) Pixel difference convolutional network for rgb-d semantic segmentation. IEEE Trans Circ Syst Video Technol, 1–1
Cao J, Leng H, Lischinski D, Cohen-Or D, Tu C, Li Y (2021) Shapeconv: Shape-aware convolutional layer for indoor rgb-d semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7088–7097
Zhang J, Liu H, Yang K, Hu X, Liu R, Stiefelhagen R (2022) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. arXiv:2203.04838
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7-13 October, 2012, Proceedings, Part V 12, pp 746–760 . Springer
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 135–150
Chen X, Lin K-Y, Qian C, Zeng G, Li H (2020) 3d sketch-aware semantic scene completion via semi-supervised structure prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4193–4202
Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11-14 October, 2016, Proceedings, Part V 14, pp 664–679. Springer
Schwarz M, Milan A, Periyasamy AS, Behnke S (2018) Rgb-d object detection and semantic segmentation for autonomous manipulation in clutter. The Int J Robot Res 37(4–5):437–451
Article Google Scholar
Chen L-Z, Lin Z, Wang Z, Yang Y-L, Cheng M-M (2021) Spatial information guided convolution for real-time rgbd semantic segmentation. IEEE Trans Image Process 30:2313–2324
Article Google Scholar
Lu Y, Yu H, Ni W, Song L (2023) 3d real-time human reconstruction with a single rgbd camera. Appl Intell 53(8):8735–8745
Article Google Scholar
Yu L, Tian L, Du Q, Bhutto JA (2023) Multi-stream adaptive 3d attention graph convolution network for skeleton-based action recognition. Appl Intell 53(12):14838–14854
Article Google Scholar
Chen S, Xu K, Zhu B, Jiang X, Sun T (2023) Deformable graph convolutional transformer for skeleton-based action recognition. Appl Intell 53(12):15390–15406
Article Google Scholar
Gao T, Wei W, Cai Z, Fan Z, Xie SQ, Wang X, Yu Q (2022) Ci-net: A joint depth estimation and semantic segmentation network using contextual information. Appl Intell 52(15):18167–18186
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6-12 September, 2014, Proceedings, Part VII 13, pp 345–360. Springer
Li Z, Gan Y, Liang X, Yu Y, Cheng H, Lin L (2016) Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11-14 October, 2016, Proceedings, Part II 14, pp 541–557 . Springer
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3029–3037
Liu S, Huang D, et al. (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 385–400
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 567–576
Zhang G, Xue J-H, Xie P, Yang S, Wang G (2021) Non-local aggregation for rgb-d semantic segmentation. IEEE Signal Process Lett 28:658–662
Article Google Scholar
Ye H, Xu D (2022) Inverted pyramid multi-task transformer for dense scene understanding. In: European Conference on Computer Vision, pp 514–530 . Springer
Girdhar R, Singh M, Ravi N, Maaten L, Joulin A, Misra I (2022) Omnivore: A single model for many visual modalities. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16102–16112
Pang Y, Zhao X, Zhang L, Lu H (2023) Comptr: Towards diverse bi-source dense prediction tasks via a simple yet general complementary transformer. arXiv:2307.12349
Xing Y, Wang J, Zeng G (2020) Malleable 2.5 d convolution: Learning receptive fields along the depth-axis for rgb-d scene parsing. In: European Conference on Computer Vision, pp 555–571 . Springer
Chen X, Lin K-Y, Wang J, Wu W, Qian C, Li H, Zeng G (2020) Bi-directional cross-modality feature propagation with separation-and-aggregation gate for rgb-d semantic segmentation. In: European Conference on Computer Vision, pp 561–577 . Springer
Borse S, Wang Y, Zhang Y, Porikli F (2021) Inverseform: A loss function for structured boundary-aware segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5901–5911
Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5199–5208
Lin D, Chen G, Cohen-Or D, Heng P-A, Huang H (2017) Cascaded feature network for semantic segmentation of rgb-d images. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1311–1319
Seichter D, Köhler M, Lewandowski B, Wengefeld T, Gross H-M (2021) Efficient rgb-d semantic segmentation for indoor scene analysis. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp 13525–13531 . IEEE
Zhou W, Yang E, Lei J, Wan J, Yu L (2022) Pgdenet: Progressive guided fusion and depth enhancement network for rgb-d indoor scene parsing. IEEE Trans Multimed, 1–1

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 100190, 201804, China
Chunqi Tian, Weirong Xu, Lizhi Bai, Jun Yang & Yanjun Xu

Authors

Chunqi Tian
View author publications
You can also search for this author inPubMed Google Scholar
Weirong Xu
View author publications
You can also search for this author inPubMed Google Scholar
Lizhi Bai
View author publications
You can also search for this author inPubMed Google Scholar
Jun Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yanjun Xu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the conception, design, and data analysis of this study. Author 1 contributed to the acquisition and interpretation of method and drafted the manuscript. Author 2 contributed to the design of the network, performed experimental analysis, and revised the manuscript. Author 3 contributed to the data collection, interpretation, and provided critical feedback on the manuscript. Author 4 contributed to the study design, data analysis, and revised the manuscript. Author 5 contributed to the interpretation of data and provided revisions to the manuscript. All authors approved the final version of the manuscript and agreed to be accountable for all aspects of the work, ensuring that any questions related to the accuracy or integrity of the study are appropriately addressed and resolved.

Corresponding author

Correspondence to Weirong Xu.

Ethics declarations

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, C., Xu, W., Bai, L. et al. GANet: geometry-aware network for RGB-D semantic segmentation. Appl Intell 55, 454 (2025). https://doi.org/10.1007/s10489-025-06337-0

Download citation

Accepted: 04 February 2025
Published: 15 February 2025
DOI: https://doi.org/10.1007/s10489-025-06337-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GANet: geometry-aware network for RGB-D semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Depth-Aware CNN for RGB-D Segmentation

CDMANet: central difference mutual attention network for RGB-D semantic segmentation

CLGFormer: Cross-Level-Guided transformer for RGB-D semantic segmentation

Explore related subjects

Availibility of Data and Materials

Code Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Ethics Approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now