GenRC: Generative 3D Room Completion from Sparse Image Collections

Li, Ming-Feng; Ku, Yueh-Feng; Yen, Hong-Xuan; Liu, Chi; Liu, Yu-Lun; Chen, Albert Y. C.; Kuo, Cheng-Hao; Sun, Min

doi:10.1007/978-3-031-72913-3_9

Ming-Feng Li¹³,
Yueh-Feng Ku¹⁴,
Hong-Xuan Yen¹⁴,
Chi Liu¹⁶,
Yu-Lun Liu¹⁵,
Albert Y. C. Chen¹⁶,
Cheng-Hao Kuo¹⁶ &
…
Min Sun^14,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15095))

Included in the following conference series:

European Conference on Computer Vision

275 Accesses

Abstract

Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC/

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

RDNeRF: relative depth guided NeRF for dense free view synthesis

Article 05 May 2023

CompNVS: Novel View Synthesis with Scene Completion

References

Anciukevičius, T., et al.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: CVPR, pp. 12608–12618 (2023)
Google Scholar
Bae, G., Budvytis, I., Cipolla, R.: IronDepth: iterative refinement of single-view depth using surface normal and its uncertainty. In: BMVC (2022)
Google Scholar
Bar-Tal, O., Yariv, L., Lipman, Y., Dekel, T.: MultiDiffusion: fusing diffusion paths for controlled image generation. In: ICML (2023)
Google Scholar
Baruch, G., et al.: ARKitScenes: a diverse real-world dataset for 3D indoor scene understanding using mobile RGB-D data. arXiv preprint arXiv:2111.08897 (2021)
Cai, S., et al.: DiffDreamer: towards consistent unsupervised single-view scene extrapolation with conditional diffusion models. In: ICCV, pp. 2139–2150 (2023)
Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV) (2017)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Cheng, Y.C., Lee, H.Y., Tulyakov, S., Schwing, A.G., Gui, L.Y.: SDFusion: multimodal 3D shape completion, reconstruction, and generation. In: CVPR, pp. 4456–4465 (2023)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR, pp. 5828–5839 (2017)
Google Scholar
Erkoç, Z., Ma, F., Shan, Q., Nießner, M., Dai, A.: HyperDiffusion: generating implicit neural fields with weight-space diffusion. In: ICCV (2023)
Google Scholar
Fridman, R., Abecasis, A., Kasten, Y., Dekel, T.: SceneScape: text-driven consistent scene generation. In: NeurIPS (2023)
Google Scholar
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)
Gao, J., et al.: GET3D: a generative model of high quality 3D textured shapes learned from images, vol. 35, pp. 31841–31854 (2022)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2Room: extracting textured 3D meshes from 2D text-to-image models. In: ICCV (2023)
Google Scholar
Johnson, J., et al.: Accelerating 3d deep learning with PyTorch3D. In: SIGGRAPH Asia 2020 Courses, p. 1 (2020)
Google Scholar
Kasten, Y., Rahamim, O., Chechik, G.: Point-cloud completion with pretrained text-to-image diffusion models. In: NeurIPS (2023)
Google Scholar
Lei, J., Tang, J., Jia, K.: RGBD2: generative scene synthesis via incremental view inpainting using RGBD diffusion models. In: CVPR, pp. 8422–8434 (2023)
Google Scholar
Li, Z., Wang, Q., Snavely, N., Kanazawa, A.: InfiniteNature-Zero:: learning perpetual view generation of natural scenes from single images. In: ECCV, pp. 515–534 (2022). https://doi.org/10.1007/978-3-031-19769-7_30
Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: CVPR, pp. 300–309 (2023)
Google Scholar
Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite nature: perpetual view generation of natural scenes from a single image. In: ICCV, pp. 14458–14467 (2021)
Google Scholar
Liu, M., et al.: One-2-3-45: any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928 (2023)
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: ICCV, pp. 9298–9309 (2023)
Google Scholar
Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-NeRF for shape-guided generation of 3D shapes and textures. In: CVPR, pp. 12663–12673 (2023)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Müller, N., Siddiqui, Y., Porzi, L., Bulo, S.R., Kontschieder, P., Nießner, M.: DiffRF: rendering-guided 3D radiance field diffusion. In: CVPR, pp. 4328–4338 (2023)
Google Scholar
Nichol, A., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR (2019)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. arXiv preprint arXiv:2209.14988 (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 1(2), 3 (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Google Scholar
Saharia, C., et al.: Palette: image-to-image diffusion models. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–10 (2022)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding, vol. 35, pp. 36479–36494 (2022)
Google Scholar
Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models, vol. 35, pp. 25278–25294 (2022)
Google Scholar
Shue, J.R., Chan, E.R., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: CVPR, pp. 20875–20886 (2023)
Google Scholar
Song, L., et al.: RoomDreamer: text-driven 3D indoor scene synthesis with coherent geometry and texture. In: ACM MM (2023)
Google Scholar
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: generative gaussian splatting for efficient 3D content creation (2024)
Google Scholar
Tang, S., Zhang, F., Chen, J., Wang, P., Furukawa, Y.: MVDiffusion: enabling holistic multi-view image generation with correspondence-aware diffusion. In: NeurIPS (2023)
Google Scholar
Voynov, A., Aberman, K., Cohen-Or, D.: Sketch-guided text-to-image diffusion models. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)
Google Scholar
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: CVPR, pp. 12619–12629 (2023)
Google Scholar
Wu, T., Zheng, C., Cham, T.J.: PanoDiffusion: depth-aided 360-degree indoor RGB panorama outpainting via latent diffusion model. In: ICLR (2024)
Google Scholar
Yang, B., et al.: Paint by example: exemplar-based image editing with diffusion models. In: CVPR, pp. 18381–18391 (2023)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV, pp. 3836–3847 (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)
Google Scholar
Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., Zhou, Z.: Structured3D: a large photo-realistic dataset for structured 3D modeling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 519–535. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_30
Chapter Google Scholar

Download references

Acknowledgements

This project is supported by the National Science and Technology Council (NSTC) and Taiwan Computing Cloud (TWCC) under the project NSTC 112-2634-F-002-006 and 113-2221-E-007-104.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Ming-Feng Li
National Tsing Hua University, Hsinchu, Taiwan
Yueh-Feng Ku, Hong-Xuan Yen & Min Sun
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Yu-Lun Liu
Amazon, Seattle, USA
Chi Liu, Albert Y. C. Chen, Cheng-Hao Kuo & Min Sun

Authors

Ming-Feng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yueh-Feng Ku
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Xuan Yen
View author publications
You can also search for this author in PubMed Google Scholar
Chi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Lun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Albert Y. C. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Hao Kuo
View author publications
You can also search for this author in PubMed Google Scholar
Min Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Feng Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10688 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, MF. et al. (2025). GenRC: Generative 3D Room Completion from Sparse Image Collections. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15095. Springer, Cham. https://doi.org/10.1007/978-3-031-72913-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-72913-3_9
Published: 02 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72912-6
Online ISBN: 978-3-031-72913-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GenRC: Generative 3D Room Completion from Sparse Image Collections