Skip to main content
Log in

Overview of RGBD semantic segmentation based on deep learning

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Semantic segmentation is one of the basic tasks in computer vision. Its purpose is to achieve pixel-level scene segmentation. With the popularity of depth sensors, combining depth data with RGB images for semantic segmentation can improve the accuracy of semantic segmentation. First, this paper mainly summarizes the fusion of RGB information and depth information and then describes the RGBD semantic segmentation method, evaluation metrics, data set, and comparison of the results on the two mainstream data sets, and then make a prospect of possible future research directions, and finally, a conclusion is made. This part of the work has a certain guiding significance for future research on RGBD semantic segmentation and lays a foundation for later research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Armeni I, Sax S. Zamir AR, Savarese S (2017) Joint 2d-3d-semantic data for indoor scene understanding. https://doi.org/10.48550/arXiv.1702.01105

  • Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  • Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Zhang Y (2017) Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158

  • Chen LZ, Lin Z, Wang Z, Yang YL, Cheng MM (2021a) Spatial information guided convolution for real-time RGBD semantic segmentation. IEEE Trans Image Process 30:2313–2324

    Article  Google Scholar 

  • Chen X, Lin K Y, Wang J, Wu W, Qian C, Li H, Zeng G (2020, August) Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European conference on computer vision. Springer, Cham, pp 561–577

  • Chen S, Zhu X, Liu W, He X, Liu J (2021b) Global-local propagation network for RGB-D semantic segmentation. arXiv preprint arXiv:2101.10801

  • Cheng Y, Cai R, Li Z, Zhao X, Huang, K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3029–3037

  • Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  • Couprie C, Farabet C, Najman L, LeCun, Y (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572

  • Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5828–5839

  • Deng L, Yang M, Li T, He Y, Wang C (2019) RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation. arXiv preprint arXiv:1907.00135

  • Gao X, Yu J, Li J (2019, July) RGBD semantic segmentation based on global convolutional network. In: Proceedings of the 2019 4th international conference on robotics, control and automation, pp 192–197

  • Giannone G, Chidlovskii B (2019) Learning common representation from RGB and depth images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0

  • Gupta S, Arbeláez P, Girshick R, Malik J (2015) Indoor scene understanding with rgb-d images: bottom-up segmentation, object detection and semantic segmentation. Int J Comput Vision 112(2):133–149

    Article  MathSciNet  Google Scholar 

  • Gupta S, Girshick R, Arbeláez P, Malik, J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision Springer, Cham, pp 345–360

  • Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision. Springer, Cham, pp 213–228

  • He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4837–4846

  • Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7)

  • Hu X, Yang K, Fei L, Wang K (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1440–1444

  • Janoch A, Karayev S, Jia Y, Barron JT, Fritz M, Saenko K, Darrell T (2013) A category-level 3d object dataset: putting the kinect to work. In: Consumer depth cameras for computer vision. Springer, London, pp 141–165

  • Jia F, Liu J, Tai XC (2021) A regularized convolutional neural network for semantic image segmentation. Anal Appl 19(01):147–165

    Article  MathSciNet  MATH  Google Scholar 

  • Jiang J, Zhang Z, Huang Y, Zheng L (2017) Incorporating depth into both cnn and crf for indoor semantic segmentation. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS). IEEE, pp 525–530

  • Jiang J, Zheng L, Luo F, Zhang Z (2018) Rednet: residual encoder-decoder network for indoor rgb-d semantic segmentation. arXiv preprint arXiv:1806.01054

  • Jiao J, Wei Y, Jie Z, Shi H, Lau RW, Huang TS (2019) Geometry-aware distillation for indoor semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2869–2878

  • Kosiorek A (2017) 神经网络中的注意力机制. 机器人产业, 6

  • Lambert J, Liu Z, Sener O, Hays J, Koltun V (2020) MSeg: a composite dataset for multi-domain semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p 2879–2888

  • Li Z, Gan Y, Liang X, Yu Y, Cheng H, Lin L (2016) Lstm-cf: unifying context modeling and fusion with lstms for rgb-d scene labeling. In: European conference on computer vision. Springer, Cham, p 541–557

  • Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation. In: 2017 IEEE international conference on image processing (ICIP), pp 1262–1266. IEEE.

  • Lin D, Huang H (2019) Zig-zag network for semantic segmentation of RGB-D images. IEEE Trans Pattern Anal Mach Intell 42(10):2642–2655

    Article  Google Scholar 

  • Lin X, Sánchez-Escobedo D, Casas JR, Pardàs M (2019) Depth estimation and semantic segmentation from a single RGB image using a hybrid convolutional neural network. Sensors 19(8):1795

    Article  Google Scholar 

  • Lin D, Chen G, Cohen-Or D, Heng PA, Huang H (2017a) Cascaded feature network for semantic segmentation of RGB-D images. In: Proceedings of the IEEE international conference on computer vision, pp 1311–1319

  • Lin G, Milan A, Shen C, Reid I (2017b) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934

  • Lin D, Ji Y, Lischinski D, Cohen-Or D, Huang H (2018) Multi-scale context intertwining for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 603–619

  • Liu H, Wu W, Wang X, Qian Y (2018a) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488

    Article  Google Scholar 

  • Liu J, Wang Y, Li Y, Fu J, Li J, Lu H (2018b) Collaborative deconvolutional neural networks for joint depth estimation and semantic segmentation. IEEE Trans Neural Netw Learning Syst 29(11):5655–5666

    Article  MathSciNet  Google Scholar 

  • Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2604–2613

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440

  • McCormac J, Handa A, Leutenegger S, Davison AJ (2016) Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079.

  • Nakajima Y, Kang B, Saito H, Kitani K (2019) Incremental class discovery for semantic segmentation with RGBD sensing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 972–981

  • Park SJ, Hong KS, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4980–4989

  • Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 5199–5208

  • Schneider L, Jasch M, Fröhlich B, Weber T, Franke U, Pollefeys M, Rätsch M (2017) Multimodal neural networks: Rgb-d for semantic segmentation and object detection. In: Scandinavian conference on image analysis Springer, Cham, pp 98–109

  • Seichter D, Köhler M, Lewandowski B, Wengefeld T, Gross HM (2021) Efficient rgb-d semantic segmentation for indoor scene analysis. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 13525–13531

  • Shi W, Zhu D, Zhang G, Chen L, Wang L, Li J, Zhang X (2019) Multilevel Cross-Aware RGBD Semantic Segmentation of Indoor Environments. In: 2019 IEEE international conference on cyborg and bionic systems (CBS). IEEE, pp 346–351

  • Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, Springer, Berlin, Heidelberg, pp 746–760

  • Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576

  • Su W, Wang Z (2016) Regularized fully convolutional networks for RGB-D semantic segmentation. In: 2016 visual communications and image processing (VCIP). IEEE, pp. 1–4

  • Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for RGB-D semantic segmentation. In: 2021 IEEE international conference on multimedia and expo (ICME) IEEE, pp 1–6

  • Sun L, Yang K, Hu X, Hu W, Wang K (2020) Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Autom Lett 5(4):5558–5565

    Article  Google Scholar 

  • Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017, October) Sparsity invariant cnns. In: 2017 international conference on 3D Vision (3DV) IEEE, pp 11–20

  • Wang Y, Chen Q, Chen S, Wu J (2020b) Multi-scale convolutional features network for semantic segmentation in indoor scenes. IEEE Access 8:89575–89583

    Article  Google Scholar 

  • Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 135–150

  • Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. In: European conference on computer vision. Springer, Cham, pp 664–679

  • Wang G, Wang Z, Chen Y, Wang G, Chen J (2020) Indoor scene semantic segmentation based on RGB-D image and convolution neural network. J Phys Conf Ser 1637(1):012138

    Article  Google Scholar 

  • Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp 1625–1632

  • Xing Y, Wang J, Chen X, Zeng G (2019a) 2.5 D convolution for RGB-D semantic segmentation. In: 2019a IEEE international conference on image processing (ICIP). IEEE, pp 1410–1414

  • Xing Y, Wang J, Chen X, Zeng G (2019b) Coupling two-stream RGB-D semantic segmentation network by idempotent mappings. In: 2019b IEEE international conference on image processing (ICIP). IEEE, pp 1850–1854

  • Yue Y, Zhou W, Lei J, Yu L (2021) Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process Lett 28:1115–1119

    Article  Google Scholar 

  • Zhang G, Xue JH, Xie P, Yang S, Wang G (2021) Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process Lett 28:658–662

    Article  Google Scholar 

  • Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J (2018) Joint task-recursive learning for semantic segmentation and depth estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 235–251

  • Zhang Z, Cui Z, Xu C, Yan Y, Sebe N, Yang J (2019) Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4106–4115

  • Zhen M, Wang J, Zhou L, Fang T, Quan L (2019) Learning fully dense neural networks for image semantic segmentation. Proc AAAI Conf Artif Intell 33(1):9283–9290

    Google Scholar 

  • Zheng Z, Xie D, Chen C, Zhu Z (2020) Multi-resolution cascaded network with depth-similar residual module for real-time semantic segmentation on RGB-D images. In: 2020 IEEE international conference on networking, sensing and control (ICNSC). IEEE, pp 1–6

  • Zhou L, Xu C, Cui Z, Yang J (2019) KIL: knowledge interactiveness learning for joint depth estimation and semantic segmentation. In: Asian conference on pattern recognition, Springer, Cham, pp 835–848

  • Zhou H, Qi L, Wan Z, Huang H, Yang X (2020a) RGB-D Co-attention network for semantic segmentation. In: Proceedings of the Asian conference on computer vision

  • Zhou W, Yuan J, Lei J, Luo T (2020b) TSNet: Three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell Syst 36(4):73–78

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Victor S. Sheng or Xuefeng Xi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Sheng, V.S., Xi, X. et al. Overview of RGBD semantic segmentation based on deep learning. J Ambient Intell Human Comput 14, 13627–13645 (2023). https://doi.org/10.1007/s12652-022-03829-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03829-6

Keywords

Navigation