Skip to main content

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15061))

Included in the following conference series:

  • 228 Accesses

Abstract

Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model’s stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)

    Google Scholar 

  2. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: anti-aliased grid-based neural radiance fields. In: ICCV (2023)

    Google Scholar 

  3. Betker, J., et al.: Improving image generation with better captions. Computer Science (2023). https://cdnopenai.com/papers/dall-e-3.pdf

  4. Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)

  5. Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. In: ICLR (2018)

    Google Scholar 

  6. Bora, A., Price, E., Dimakis, A.G.: Ambientgan: generative models from lossy measurements. In: ICLR (2018)

    Google Scholar 

  7. Chiang, P.Z., Tsai, M.S., Tseng, H.Y., Lai, W.S., Chiu, W.C.: Stylizing 3d scene via implicit representation and hypernetwork. In: WACV (2022)

    Google Scholar 

  8. Dai, X., et al.: Emu: Enhancing image generation models using photogenic needles in a haystack. In: NeurIPS (2022)

    Google Scholar 

  9. Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. NeurIPS (2021)

    Google Scholar 

  10. Drebin, R.A., Carpenter, L., Hanrahan, P.: Volume rendering. ACM TOG (1988)

    Google Scholar 

  11. Goodfellow, I., et al.: Generative adversarial nets. NeurIPS (2014)

    Google Scholar 

  12. Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-nerf2nerf: editing 3d scenes with instructions. In: ICCV (2023)

    Google Scholar 

  13. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS (2017)

    Google Scholar 

  14. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

    Google Scholar 

  15. Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS Workshop (2021)

    Google Scholar 

  16. Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2room: extracting textured 3d meshes from 2d text-to-image models. In: ICCV (2023)

    Google Scholar 

  17. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  18. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)

    Google Scholar 

  19. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)

    Google Scholar 

  20. Liu, X., Kao, S.h., Chen, J., Tai, Y.W., Tang, C.K.: Deceptive-nerf: Enhancing nerf reconstruction using pseudo-observations from diffusion models. arXiv preprint arXiv:2305.15171 (2023)

  21. Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML (2018)

    Google Scholar 

  22. Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM TOG (2019)

    Google Scholar 

  23. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  24. Mirzaei, A., et al.: Reference-guided controllable inpainting of neural radiance fields. In: ICCV (2023)

    Google Scholar 

  25. Mirzaei, A., et al.: Spin-nerf: multiview segmentation and perceptual inpainting with neural radiance fields. In: CVPR (2023)

    Google Scholar 

  26. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG (2022)

    Google Scholar 

  27. Prabhu, K., et al.: Inpaint3d: 3d scene content generation using 2d inpainting diffusion. arXiv preprint arXiv:2312.03869 (2023)

  28. Roessle, B., Müller, N., Porzi, L., Bulò, S.R., Kontschieder, P., Nießner, M.: Ganerf: leveraging discriminators to optimize neural radiance fields. ACM TOG (2023)

    Google Scholar 

  29. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

    Google Scholar 

  30. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)

    Google Scholar 

  31. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding (2022). https://arxiv.org/abs/2205.11487

  32. Shen, I.C., Liu, H.K., Chen, B.Y.: Nerf-in: Free-form nerf inpainting with rgb-d priors. Computer Graphics and Applications (CG &A) (2024)

    Google Scholar 

  33. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)

    Google Scholar 

  34. Suvorov, R., et al.: Resolution-robust large mask inpainting with fourier convolutions. In: WACV (2022)

    Google Scholar 

  35. Tang, L., et al.: Realfill: Reference-driven generation for authentic image completion. arXiv preprint arXiv:2309.16668 (2023)

  36. Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: Fvd: A new metric for video generation (2019)

    Google Scholar 

  37. Wang, C., Jiang, R., Chai, M., He, M., Chen, D., Liao, J.: Nerf-art: text-driven neural radiance fields stylization. IEEE Trans. Visualization Comput. Graph. (2023)

    Google Scholar 

  38. Wang, D., Zhang, T., Abboud, A., Süsstrunk, S.: Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094 (2023)

  39. Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: distilling depth ranking for few-shot novel view synthesis. In: ICCV (2023)

    Google Scholar 

  40. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)

    Google Scholar 

  41. Weber, E., et al.: Nerfiller: completing scenes via generative 3d inpainting. In: CVPR (2024)

    Google Scholar 

  42. Weder, S., et al.: Removing objects from neural radiance fields. In: CVPR (2023)

    Google Scholar 

  43. Wu, R., et al.: Reconfusion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981 (2023)

  44. Wynn, J., Turmukhambetov, D.: Diffusionerf: regularizing neural radiance fields with denoising diffusion models. In: CVPR (2023)

    Google Scholar 

  45. Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: Neural radiance fields from one or few images. In: CVPR (2021)

    Google Scholar 

  46. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  47. Zhou, L., Du, Y., Wu, J.: 3d shape generation and completion through point-voxel diffusion. In: ICCV (2021)

    Google Scholar 

  48. Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: Learning view synthesis using multiplane images. ACM TOG (2018)

    Google Scholar 

  49. Zhu, J., Zhuang, P.: Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766 (2023)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chieh Hubert Lin .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3913 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, C.H. et al. (2025). Taming Latent Diffusion Model for Neural Radiance Field Inpainting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72646-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72645-3

  • Online ISBN: 978-3-031-72646-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics