Skip to main content
Log in

A Survey of 3D Indoor Scene Synthesis

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Indoor scene synthesis has become a popular topic in recent years. Synthesizing functional and plausible indoor scenes is an inherently difficult task since it requires considerable knowledge to both choose reasonable object categories and arrange objects appropriately. In this survey, we propose four criteria which group a wide range of 3D (three-dimensional) indoor scene synthesis techniques according to various aspects (specifically, four groups of categories). It also provides hints, through comprehensively comparing all the techniques to demonstrate their effectiveness and drawbacks, and discussions of potential remaining problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lyons G H. Ten Common Home Decorating Mistakes & How to Avoid Them. Blue Sage Press, 2008.

  2. Germer T, Schwarz M. Procedural arrangement of furniture for real-time walkthroughs. Computer Graphics Forum, 2009, 28(8): 2068-2078.

    Article  Google Scholar 

  3. Merrell P, Schkufza E, Li Z et al. Interactive furniture layout using interior design guidelines. ACM Transactions on Graphics, 2011, 30(4): Article No. 87.

  4. Yu L F, Yeung S K, Terzopoulos D. The clutterpalette: An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(2): 1138-1148.

    Article  Google Scholar 

  5. Song S, Yu F, Zeng A et al. Semantic scene completion from a single depth image. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.1746-1754.

  6. Fu Q, Chen X, Wang X et al. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Transactions on Graphics, 2017, 36(6): Article No. 201.

  7. Li W, Saeedi S, McCormac J et al. InteriorNet: Mega-scale multi-sensor photo-realistic indoor scenes dataset. In Proc. the 29th British Machine Vision Conference, September 2018, Article No. 77.

  8. Qi S, Zhu Y, Huang S et al. Human-centric indoor scene synthesis using stochastic grammar. In Proc. the 2018 IEEE Conf. Computer Vision and Pattern Recognition, June 2018, pp.5899-5908.

  9. Li Y, Zhang J, Cheng Y et al. DF2Net: Discriminative feature learning and fusion network for RGB-D indoor scene classification. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.7041-7048.

  10. Chang A, Savva M, Manning C D. Learning spatial knowledge for text to 3D scene generation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.2028-2038.

  11. Xie H, Xu W, Wang B. Reshuffle-based interior scene synthesis. In Proc. the 12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, November 2013, pp.191-198.

  12. Nan L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics, 2012, 31(6): Article No. 137.

  13. Yang S, Xu J, Chen K et al. View suggestion for interactive segmentation of indoor scenes. Computational Visual Media, 2017, 3(2): 131-146.

    Article  Google Scholar 

  14. Satkin S, Lin J, Hebert M. Data-driven scene understanding from 3D models. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 128.

  15. Lim J J, Pirsiavash H, Torralba A. Parsing IKEA objects: Fine pose estimation. In Proc. the 2013 IEEE International Conference on Computer Vision, December 2013, pp.2992-2999.

  16. Lim J J, Khosla A, Torralba A. FPM: Fine pose parts-based model with 3D CAD models. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.478-493.

  17. Kim Y M, Mitra N J, Yan D M et al. Acquiring 3D indoor environments with variability and repetition. ACM Transactions on Graphics, 2012, 31(6): Article No. 138.

  18. Savva M, Chang A X, Hanrahan P et al. PiGraphs: Learning interaction snapshots from observations. ACM Transactions on Graphics, 2016, 35(4): Article No. 139.

  19. Bao S Y, Sun M, Savarese S. Toward coherent object detection and scene layout understanding. Image and Vision Computing, 2011, 29(9): 569-579.

    Article  Google Scholar 

  20. Jiang Y, Lim M, Zheng C et al. Learning to place new objects in a scene. The International Journal of Robotics Research, 2012, 31(9): 1021-1043.

    Article  Google Scholar 

  21. Cheng M M, Hou Q B, Zhang S H et al. Intelligent visual media processing: When graphics meets vision. Journal of Computer Science and Technology, 2017, 32(1): 110-121.

    Article  Google Scholar 

  22. Xu K, Ma R, Zhang H et al. Organizing heterogeneous scene collections through contextual focal points. ACM Transactions on Graphics, 2014, 33(4): Article No. 35.

  23. Fisher M, Savva M, Hanrahan P. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics, 2011, 30(4): Article No. 34.

  24. Wu W, Fan L, Liu L et al. MIQP-based layout design for building interiors. Computer Graphics Forum, 2018, 37(2): 511-521.

    Article  Google Scholar 

  25. Sanchez V, Zakhor A. Planar 3D modeling of building interiors from point cloud data. In Proc. the 19th IEEE International Conference on Image Processing, September 2012, pp.1777-1780

  26. Merrell P, Schkufza E, Koltun V. Computer-generated residential building layouts. ACM Transactions on Graphics, 2010, 29(6): Article No. 181.

  27. Wang W, Gao W, Hu Z. Effectively modeling piecewise planar urban scenes based on structure priors and CNN. Science China Information Sciences, 2019, 62(2): Article No. 29102.

  28. Fisher M, Hanrahan P. Context-based search for 3D models. ACM Transactions on Graphics, 2010, 29(6): Article No. 182.

  29. Ovsjanikov M, Li W, Guibas L et al. Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics, 2011, 30(4): Article No. 33.

  30. Chen D Y, Tian X P, Shen Y T et al. On visual similarity based 3D model retrieval. Computer Graphics Forum, 2003, 22(3): 223-232.

    Article  Google Scholar 

  31. Eitz M, Richter R, Boubekeur T et al. Sketch-based shape retrieval. ACM Transactions on Graphics, 2012, 31(4): Article No. 31.

  32. Chen K, Lai Y,Wu Y X et al. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Transactions on Graphics, 2014, 33(6): Article No. 208.

  33. Shen C H, Fu H, Chen K et al. Structure recovery by part assembly. ACM Transactions on Graphics, 2012, 31(6): Article No. 180.

  34. Schuster S, Krishna R, Chang A et al. Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In Proc. the 4th Workshop on Vision and Language, September 2015, pp.70-80.

  35. Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

  36. Handa A, Patraucean V, Badrinarayanan V et al. Understanding real world indoor scenes with synthetic data. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4077-4085.

  37. Fisher M, Ritchie D, Savva M et al. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics, 2012, 31(6): Article No. 135.

  38. Xu K, Chen K, Fu H et al. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics, 2013, 32(4): Article No. 123.

  39. Chang A X, Eric M, Savva M et al. SceneSeer: 3D scene design with natural language. arXiv:1703.00050, 2017. https://arxiv.org/abs/1703.00050, March 2019.

  40. Yu L F, Yeung S K, Tang C K et al. Make it home: Automatic optimization of furniture arrangement. ACM Transactions on Graphics, 2011, 30(4): Article No. 86.

  41. Wang K, Savva M, Chang A X et al. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics, 2018, 37(4): Article No. 70.

  42. Savva M, Chang A X, Agrawala M. SceneSuggest: Context-driven 3D scene design. arXiv:1703.00061, 2017. https://arxiv.org/abs/1703.00061, March 2019.

  43. Ma R, Li H, Zou C et al. Action-driven 3D indoor scene evolution. ACM Transactions on Graphics, 2016, 35(6): Article No. 173.

  44. Fisher M, Savva M, Li Y et al. Activity-centric scene synthesis for functional 3D scene modeling. ACM Transactions on Graphics, 2015, 34(6): Article No. 179.

  45. Li G, Zheng Y, Fan J et al. Crowdsourced data management: Overview and challenges. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1711-1716.

  46. Chen P P, Sun H L, Fang Y L et al. Collusion-proof result inference in crowdsourcing. Journal of Computer Science and Technology, 2018, 33(2): 351-365.

    Article  Google Scholar 

  47. Shao L, Chang A X, Su H et al. Cross-modal attribute transfer for rescaling 3D models. In Proc. the 2017 International Conference on 3D Vision, October 2017, pp.640-648.

  48. Savva M, Chang A X, Bernstein G et al. On being the right scale: Sizing large collections of 3D models. In Proc. the 2014 SIGGRAPH Asia Indoor Scene Understanding Where Graphics Meets Vision, December 2014, Article No. 4.

  49. Zhu Y, Tian Y, Metaxas D et al. Semantic amodal segmentation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3001-3009.

  50. Du G G, Yin C L, Zhou M Q et al. Isometric 3D shape partial matching using GD-DNA. Journal of Computer Science and Technology, 2018, 33(6): 1178-1191.

    Article  Google Scholar 

  51. Jo S, Jeong Y, Lee S. GPU-driven scalable parser for OBJ models. Journal of Computer Science and Technology, 2018, 33(2): 417-428.

    Article  Google Scholar 

  52. Yin L, Guo K, Zhou B et al. 3D shape co-segmentation via sparse and low rank representations. Science China Information Sciences, 2018, 61(5): Article No. 054101.

  53. Silberman N, Hoiem D, Kohli P et al. Indoor segmentation and support inference from RGBD images. In Proc. the 12th European Conference on Computer Vision, October 2012, pp.746-760.

  54. Song S, Lichtenberg S P, Xiao J. SUN RGB-D: A RGBD scene understanding benchmark suite. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.567-576.

  55. Anand A, Koppula H S, Joachims T et al. Contextually guided semantic labeling and search for three-dimensional point clouds. The International Journal of Robotics Research, 2013, 32(1): 19-34.

    Article  Google Scholar 

  56. Lai K, Bo L, Fox D. Unsupervised feature learning for 3D scene labeling. In Proc. the 2014 IEEE International Conference on Robotics and Automation, May 2014, pp.3050-3057.

  57. Mattausch O, Panozzo D, Mura C et al. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum, 2014, 33(2): 11-21.

    Article  Google Scholar 

  58. Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data: A survey. Computational Visual Media, 2015, 1(4): 267-278.

    Article  Google Scholar 

  59. Hua B S, Pham Q H, Nguyen D T et al. SceneNN: A scene meshes dataset with annotations. In Proc. the 4th International Conference on 3D Vision, October 2016, pp.92-101.

  60. Xiao J, Owens A, Torralba A. SUN3D: A database of big spaces reconstructed using SfM and object labels. In Proc. the 2013 IEEE International Conference on Computer Vision, December 2013, pp.1625-1632.

  61. Dai A, Chang A X, Savva M et al. ScanNet: Richlyannotated 3D reconstructions of indoor scenes. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2432-2443.

  62. Handa A, P˘atr˘aucean V, Stent S et al. SceneNet: An annotated model generator for indoor scsene understanding. In Proc. the 2016 IEEE International Conference on Robotics and Automation, May 2016, pp.5737-5743.

  63. McCormac J, Handa A, Leutenegger S et al. SceneNet RGB-D: Can 5M synthetic images beat generic imageNet pre-training on indoor segmentation? In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2697-2706.

  64. Chang A, Monroe W, Savva M et al. Text to 3D scene generation with rich lexical grounding. arXiv:1505.06289, 2015. https://arxiv.org/abs/1505.06289, March 2019.

  65. Chang A X, Funkhouser T, Guibas L et al. ShapeNet: An information-rich 3D model repository. arXiv:1512.03012, 2015. https://arxiv.org/abs/1512.03012, March 2019.

  66. Savva M, Chang A X, Hanrahan P. Semantically-enriched 3D models for common-sense knowledge. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2015, pp.24-31.

  67. Avetisyan A, Dahnert M, Dai A et al. Scan2CAD: Learning CAD model alignment in RGB-D scans. arXiv:1811.11187, 2018. https://arxiv.org/abs/1811.11187, March 2019.

  68. Li M, Patil A G, Xu K et al. GRAINS: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics, 2019, 38(2): Article No. 12.

  69. Yeh Y T, Yang L, Watson M et al. Synthesizing open worlds with constraints using locally annealed reversible jumpMCMC. ACM Transactions on Graphics, 2012, 31(4): Article No. 56.

  70. Liang Y, Zhang S H, Martin R R. Automatic data-driven room design generation. In Proc. the 3rd International Workshop on Next Generation Computer Animation Techniques, June 2017, pp.133-148.

  71. Ikehata S, Yang H, Furukawa Y. Structured indoor modeling. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1323-1331.

  72. Zhu J Z, Jia Y T, Xu J et al. Modeling the correlations of relations for knowledge graph embedding. Journal of Computer Science and Technology, 2018, 33(2): 323-334.

    Article  MathSciNet  Google Scholar 

  73. Zhu S C, Mumford D. A stochastic grammar of images. Foundations and Trends® in Computer Graphics and Vision, 2006, 2(4): 259-362.

    Article  MATH  Google Scholar 

  74. Savva M, Chang A X, Hanrahan P et al. SceneGrok: Inferring action maps in 3D environments. ACM Transactions on Graphics, 2014, 33(6): Article No. 212.

  75. Ritchie D, Wang K, Lin Y. Fast and flexible indoor scene synthesis via deep convolutional generative models. arXiv:1811.12463, 2018. https://arxiv.org/abs/1811.12463, March 2019.

  76. Xu W, Wang B, Yan D M. Wall grid structure for interior scene synthesis. Computers & Graphics, 2015, 46: 231-243.

    Article  Google Scholar 

  77. Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2): 498-519.

    Article  MathSciNet  MATH  Google Scholar 

  78. Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Machine Learning, 1997, 29(2/3): 131-163.

    Article  MATH  Google Scholar 

  79. Jiang Y, Lim M, Saxena A. Learning object arrangements in 3D scenes using human context. arXiv:1206.6462, 2012. https://arxiv.org/abs/1206.6462, March 2019.

  80. Gibson J J. The Ecological Approach to Visual Perception (1st edition). Routledge, 2014.

  81. Jiang Y, Koppula H S, Saxena A. Modeling 3D environments through hidden human context. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2040-2053.

    Article  Google Scholar 

  82. Socher R, Lin C C, Manning C et al. Parsing natural scenes and natural language with recursive neural networks. In Proc. the 28th International Conference on Machine Learning, June 2011, pp.129-136.

  83. Kingma D P, Welling M. Auto-encoding variational Bayes. arXiv:1312.6114, 2013. https://arxiv.org/abs/1312.6114, March 2019.

  84. Lyu F, Xi R, Han Y et al. MagicMark: A marking menu using 2D direction and 3D depth information. Science China Information Sciences, 2018, 61(6): Article No. 064101.

  85. Talton J O, Lou Y, Lesser S et al. Metropolis procedural modeling. ACM Transactions on Graphics, 2011, 30(2): Article No. 11.

  86. Kirkpatrick S. Optimization by simulated annealing: Quantitative studies. Journal of Statistical Physics, 1984, 34(5/6): 975-986.

  87. Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57(1): 97-109.

    Article  MathSciNet  MATH  Google Scholar 

  88. Metropolis N, Rosenbluth A W, Rosenbluth M N et al. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 1953, 21(6): 1087-1092.

    Article  Google Scholar 

  89. Ramage D, Hall D, Nallapati R et al. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.248-256.

  90. Chen C, Wang W, Zhang Y et al. A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC. Science China Information Sciences, 2018, 62(1): Article No. 12101.

  91. Chang A, Savva M, Manning C. Interactive learning of spatial knowledge for text to 3D scene generation. In Proc. the 2014 Association for Computational Linguistics Workshop on Interactive Language Learning, Visualization, and Interfaces, June 2014, pp.14-21.

  92. Kermani Z S, Liao Z, Tan P et al. Learning 3D scene synthesis from annotated RGB-D images. Computer Graphics Forum, 2016, 35(5): 197-206.

    Article  Google Scholar 

  93. Liang Y, Xu F, Zhang S H et al. Knowledge graph construction with structure and parameter learning for indoor scene design. Computational Visual Media, 2018, 4(2): 123-137.

    Article  Google Scholar 

  94. Ma R, Patil A G, Fisher M et al. Language-driven synthesis of 3D scenes from scene databases. In Proc. SIGGRAPH Asia 2018, September 2018, Article No. 212.

  95. Shao T, Xu W, Zhou K et al. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Transactions on Graphics, 2012, 31(6): Article No. 136.

  96. Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor. In Proc. the 2011 IEEE International Conference on Computer Vision Workshops, November 2011, pp.601-608.

  97. Berge C. Hypergraphs: Combinatorics of Finite Sets (1st edition). North Holland, 1989.

  98. Liu T, Hertzmann A, Li W et al. Style compatibility for 3D furniture models. ACM Transactions on Graphics, 2015, 34(4): Article No. 85.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song-Hai Zhang.

Electronic supplementary material

ESM 1

(PDF 220 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, SH., Zhang, SK., Liang, Y. et al. A Survey of 3D Indoor Scene Synthesis. J. Comput. Sci. Technol. 34, 594–608 (2019). https://doi.org/10.1007/s11390-019-1929-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-019-1929-5

Keywords

Navigation