Skip to main content

Advertisement

Log in

Learning depth-aware features for indoor scene understanding

  • 1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Many methods have shown that jointly learning RGB image features and 3D information from RGB-D domain is favorable to the indoor scene semantic segmentation task. However, most of these methods need precise depth map as the input and this seriously limits the application of this task. This paper is based on a convolutional neural network framework which jointly learns semantic and the depth features to eliminate such strong constraint. Additionally, the proposed model effectively combines learned depth features, multi-scale contextual information with the semantic features to generate more representative features. Experimental results show that only taken an RGB image as the input, the proposed model can simultaneously obtain higher accuracy than state-of- the-art approaches on NYU-Dv2 and SUN RGBD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alonso I, Murillo AC (2020) EV-SegNet: semantic segmentation for event-based cameras. In: 2019 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). IEEE.

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  3. Chen L C, Papandreou G, Schroff F, et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.

  4. Chen LZ, Lin Z, Wang Z et al (2021) Spatial information guided convolution for real-time RGB-D semantic segmentation. IEEE Trans Image Process 30:2313–2324

    Article  Google Scholar 

  5. Chen X, Lin KY, Wang J, et al (2020) Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, 23–28 Aug 2020, proceedings, Part XI 16. Springer International Publishing, pp 561–577.

  6. Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In: EEE conference on computer vision and pattern recognition, pp712–727.

  7. Chowdhary CL (2019) 3D object recognition system based on local shape descriptors and depth data analysis. Recent Pat Comput Sci 12(1):18–24

    Article  MathSciNet  Google Scholar 

  8. Chowdhary CL, Muatjitjeja K, Jat DS (2015) Three-dimensional object recognition based intelligence system for identification. In: 2015 international conference on emerging trends in networks and computer communications (ETNCC). IEEE, pp 162–166.

  9. Cordts M, Omran M, Ramos S, et al (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223.

  10. Di L, Hui H (2019) Zig–zag network for semantic segmentation of RGB-D images. IEEE Trans Pattern Anal Mach Intell 12(3):264–279

    Google Scholar 

  11. Eigen D, Fergus R (2015) Predicting depth surface normal and semantic labels with a common multi-scale convolutional architecture. In: IEEE international conference on computer vision, pp 1872–1886.

  12. Eigen S, Girshick R, Arbel´aez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision, pp 1341–1355.

  13. Fu J, Liu J, Tian H, et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154.

  14. Gu Z, Niu L, Zhao H et al (2020) Hard pixel mining for depth privileged semantic segmentation. IEEE Trans Multimed 99:1–1

    Google Scholar 

  15. Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: IEEE conference on computer vision and pattern recognition, pp 429–447.

  16. Gupta S , Girshick R , P Arbeláez, et al (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision. Springer International Publishing.

  17. Ha Q, Watanabe K, Karasawa T, et al (2017) MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 5108–5115.

  18. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE conference on computer vision and pattern recognition, pp 2367–2382.

  19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778.

  20. Hu X, Yang K, Fei L, et al (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1440–1444.

  21. Kendall A, Badrinaray V, Cipolla R (2015) Bayesian segnet: Model uncertainty in deep convolutional encoder–decoder architectures for scene understanding. In: The British machine vision conference, pp 721–739.

  22. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Adv Neural Inf Process Syst:1387–1407.

  23. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  24. Lafferty JD, McCallum A, Pereira CN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Paper presented at the meeting of the Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA

  25. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 3D vision (3DV), pp 1265–1283.

  26. Li SZ (2009) Markov random field modeling in image analysis. Springer, Cham

    MATH  Google Scholar 

  27. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high resolution semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 219–234.

  28. Lin G, Shen C, Van Den Hengel A, Reid I (2016) Efficient piece-wise training of deep structured models for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 961–975.

  29. Lin T-Y, Doll´ar P, Girshick R, He K, Hariharan B, Be-longie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp 319–337.

  30. Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: International conference on computer vision, pp 1178–1201.

  31. Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579.

  32. Liu Z, Li X, Luo P, et al (2015) Semantic image segmentation via deep parsing network. In: IEEE international conference on computer vision, pp 1377–1385.

  33. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3203–3217.

  34. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: IEEE international conference on computer vision, pp 1520–1528.

  35. Park S-J, Hong K-S, Lee S (2017) Rdfnet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: EEE international conference on computer vision, pp 1723–1738.

  36. Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural net-works for RGB-D semantic segmentation. In: IEEE international conference on computer vision, pp 826–844.

  37. Ren X, Bo L, Fox D (2012) RGB-D scene labeling: features and algorithms. In: IEEE conference on computer vision and pattern recognition, pp 385–403.

  38. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241.

  39. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGB-D images. In: European conference on computer vision, pp 192–206.

  40. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Adv Neural Inf Process Syst arXiv:1409:1–14

    Google Scholar 

  41. Song S, Lichtenberg S, Xiao J (2015) Sun RGB-D: a RGB-D scene under-standing benchmark suite. In: IEEE conference on computer vision and pattern recognition, pp 634–649.

  42. Sun L, Yang K, Hu X et al (2020) Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Autom Lett 5(4):5558–5565

    Article  Google Scholar 

  43. Vemulapalli R, Tuzel O, Liu M Y, et al (2016) Gaussian conditional random field network for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3224–3233.

  44. Wang P, Chen P, Yuan Y, et al (2018) Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1451–1460

  45. Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille A (2015) Towards unified depth and semantic prediction from a single image. In: IEEE conference on computer vision and pattern recognition, pp 512–517.

  46. Wang W, Neumann U (2018) Depth-aware CNN for RGB-D segmentation. In: European conference on computer vision, pp 538–552.

  47. Xiang K, Yang K, Wang K (2021) Polarization-driven semantic segmentation via efficient attention-bridged fusion. Opt Express 29(4):4802–4820

    Article  Google Scholar 

  48. Xiong Z, Yuan Y, Guo N, et al (2020) Variational context-deformable ConvNets for indoor scene parsing. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE.

  49. Yu F, Koltun V . Multi-Scale Context Aggregation by Dilated Convolutions. ICLR, 2016.

  50. Zhao H, Shi J, Qi X, et al (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, pp 2881–2890.

  51. Zheng S, Jayasumana S, Romera-Paredes B, et al (2015) Conditional random fields as recurrent neural networks. In: IEEE international conference on computer vision, pp 1529–1537.

  52. Zhou H , Qi L , Wan Z , et al (2021) RGB-D co-attention network for semantic segmentation. In: Computer vision – ACCV 2020: 15th Asian conference on computer vision, Kyoto, 30 Nov–4 Dec 2020, Revised Selected Papers, Part I Springer, Cham

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61906097). The authors would like to thank all reviewers and editors for their constructive comments for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suting Chen.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Shao, D., Zhang, L. et al. Learning depth-aware features for indoor scene understanding. Multimed Tools Appl 81, 42573–42590 (2022). https://doi.org/10.1007/s11042-021-11453-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11453-3

Keywords