Abstract
Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model’s stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: anti-aliased grid-based neural radiance fields. In: ICCV (2023)
Betker, J., et al.: Improving image generation with better captions. Computer Science (2023). https://cdnopenai.com/papers/dall-e-3.pdf
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. In: ICLR (2018)
Bora, A., Price, E., Dimakis, A.G.: Ambientgan: generative models from lossy measurements. In: ICLR (2018)
Chiang, P.Z., Tsai, M.S., Tseng, H.Y., Lai, W.S., Chiu, W.C.: Stylizing 3d scene via implicit representation and hypernetwork. In: WACV (2022)
Dai, X., et al.: Emu: Enhancing image generation models using photogenic needles in a haystack. In: NeurIPS (2022)
Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. NeurIPS (2021)
Drebin, R.A., Carpenter, L., Hanrahan, P.: Volume rendering. ACM TOG (1988)
Goodfellow, I., et al.: Generative adversarial nets. NeurIPS (2014)
Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-nerf2nerf: editing 3d scenes with instructions. In: ICCV (2023)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS (2017)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS Workshop (2021)
Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2room: extracting textured 3d meshes from 2d text-to-image models. In: ICCV (2023)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
Liu, X., Kao, S.h., Chen, J., Tai, Y.W., Tang, C.K.: Deceptive-nerf: Enhancing nerf reconstruction using pseudo-observations from diffusion models. arXiv preprint arXiv:2305.15171 (2023)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML (2018)
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM TOG (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Mirzaei, A., et al.: Reference-guided controllable inpainting of neural radiance fields. In: ICCV (2023)
Mirzaei, A., et al.: Spin-nerf: multiview segmentation and perceptual inpainting with neural radiance fields. In: CVPR (2023)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG (2022)
Prabhu, K., et al.: Inpaint3d: 3d scene content generation using 2d inpainting diffusion. arXiv preprint arXiv:2312.03869 (2023)
Roessle, B., Müller, N., Porzi, L., Bulò, S.R., Kontschieder, P., Nießner, M.: Ganerf: leveraging discriminators to optimize neural radiance fields. ACM TOG (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding (2022). https://arxiv.org/abs/2205.11487
Shen, I.C., Liu, H.K., Chen, B.Y.: Nerf-in: Free-form nerf inpainting with rgb-d priors. Computer Graphics and Applications (CG &A) (2024)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)
Suvorov, R., et al.: Resolution-robust large mask inpainting with fourier convolutions. In: WACV (2022)
Tang, L., et al.: Realfill: Reference-driven generation for authentic image completion. arXiv preprint arXiv:2309.16668 (2023)
Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: Fvd: A new metric for video generation (2019)
Wang, C., Jiang, R., Chai, M., He, M., Chen, D., Liao, J.: Nerf-art: text-driven neural radiance fields stylization. IEEE Trans. Visualization Comput. Graph. (2023)
Wang, D., Zhang, T., Abboud, A., Süsstrunk, S.: Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094 (2023)
Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: distilling depth ranking for few-shot novel view synthesis. In: ICCV (2023)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)
Weber, E., et al.: Nerfiller: completing scenes via generative 3d inpainting. In: CVPR (2024)
Weder, S., et al.: Removing objects from neural radiance fields. In: CVPR (2023)
Wu, R., et al.: Reconfusion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981 (2023)
Wynn, J., Turmukhambetov, D.: Diffusionerf: regularizing neural radiance fields with denoising diffusion models. In: CVPR (2023)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: Neural radiance fields from one or few images. In: CVPR (2021)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhou, L., Du, Y., Wu, J.: 3d shape generation and completion through point-voxel diffusion. In: ICCV (2021)
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: Learning view synthesis using multiplane images. ACM TOG (2018)
Zhu, J., Zhuang, P.: Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, C.H. et al. (2025). Taming Latent Diffusion Model for Neural Radiance Field Inpainting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-72646-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)