Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Lin, Chieh Hubert; Kim, Changil; Huang, Jia-Bin; Li, Qinbo; Ma, Chih-Yao; Kopf, Johannes; Yang, Ming-Hsuan; Tseng, Hung-Yu

doi:10.1007/978-3-031-72646-0_9

Chieh Hubert Lin^13,14,
Changil Kim¹³,
Jia-Bin Huang^13,15,
Qinbo Li¹³,
Chih-Yao Ma¹³,
Johannes Kopf¹³,
Ming-Hsuan Yang¹⁴ &
…
Hung-Yu Tseng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15061))

Included in the following conference series:

European Conference on Computer Vision

228 Accesses

Abstract

Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model’s stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Image inpainting based on tensor ring decomposition with generative adversarial network

Article 12 July 2024

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

DeepGIN: Deep Generative Inpainting Network for Extreme Image Inpainting

References

Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: anti-aliased grid-based neural radiance fields. In: ICCV (2023)
Google Scholar
Betker, J., et al.: Improving image generation with better captions. Computer Science (2023). https://cdnopenai.com/papers/dall-e-3.pdf
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. In: ICLR (2018)
Google Scholar
Bora, A., Price, E., Dimakis, A.G.: Ambientgan: generative models from lossy measurements. In: ICLR (2018)
Google Scholar
Chiang, P.Z., Tsai, M.S., Tseng, H.Y., Lai, W.S., Chiu, W.C.: Stylizing 3d scene via implicit representation and hypernetwork. In: WACV (2022)
Google Scholar
Dai, X., et al.: Emu: Enhancing image generation models using photogenic needles in a haystack. In: NeurIPS (2022)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. NeurIPS (2021)
Google Scholar
Drebin, R.A., Carpenter, L., Hanrahan, P.: Volume rendering. ACM TOG (1988)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. NeurIPS (2014)
Google Scholar
Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-nerf2nerf: editing 3d scenes with instructions. In: ICCV (2023)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS (2017)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS Workshop (2021)
Google Scholar
Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2room: extracting textured 3d meshes from 2d text-to-image models. In: ICCV (2023)
Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
Google Scholar
Liu, X., Kao, S.h., Chen, J., Tai, Y.W., Tang, C.K.: Deceptive-nerf: Enhancing nerf reconstruction using pseudo-observations from diffusion models. arXiv preprint arXiv:2305.15171 (2023)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML (2018)
Google Scholar
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM TOG (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Mirzaei, A., et al.: Reference-guided controllable inpainting of neural radiance fields. In: ICCV (2023)
Google Scholar
Mirzaei, A., et al.: Spin-nerf: multiview segmentation and perceptual inpainting with neural radiance fields. In: CVPR (2023)
Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG (2022)
Google Scholar
Prabhu, K., et al.: Inpaint3d: 3d scene content generation using 2d inpainting diffusion. arXiv preprint arXiv:2312.03869 (2023)
Roessle, B., Müller, N., Porzi, L., Bulò, S.R., Kontschieder, P., Nießner, M.: Ganerf: leveraging discriminators to optimize neural radiance fields. ACM TOG (2023)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding (2022). https://arxiv.org/abs/2205.11487
Shen, I.C., Liu, H.K., Chen, B.Y.: Nerf-in: Free-form nerf inpainting with rgb-d priors. Computer Graphics and Applications (CG &A) (2024)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)
Google Scholar
Suvorov, R., et al.: Resolution-robust large mask inpainting with fourier convolutions. In: WACV (2022)
Google Scholar
Tang, L., et al.: Realfill: Reference-driven generation for authentic image completion. arXiv preprint arXiv:2309.16668 (2023)
Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: Fvd: A new metric for video generation (2019)
Google Scholar
Wang, C., Jiang, R., Chai, M., He, M., Chen, D., Liao, J.: Nerf-art: text-driven neural radiance fields stylization. IEEE Trans. Visualization Comput. Graph. (2023)
Google Scholar
Wang, D., Zhang, T., Abboud, A., Süsstrunk, S.: Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094 (2023)
Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: distilling depth ranking for few-shot novel view synthesis. In: ICCV (2023)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)
Google Scholar
Weber, E., et al.: Nerfiller: completing scenes via generative 3d inpainting. In: CVPR (2024)
Google Scholar
Weder, S., et al.: Removing objects from neural radiance fields. In: CVPR (2023)
Google Scholar
Wu, R., et al.: Reconfusion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981 (2023)
Wynn, J., Turmukhambetov, D.: Diffusionerf: regularizing neural radiance fields with denoising diffusion models. In: CVPR (2023)
Google Scholar
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: Neural radiance fields from one or few images. In: CVPR (2021)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhou, L., Du, Y., Wu, J.: 3d shape generation and completion through point-voxel diffusion. In: ICCV (2021)
Google Scholar
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: Learning view synthesis using multiplane images. ACM TOG (2018)
Google Scholar
Zhu, J., Zhuang, P.: Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766 (2023)

Download references

Author information

Authors and Affiliations

Meta, Menlo Park, USA
Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf & Hung-Yu Tseng
University of California, Merced, USA
Chieh Hubert Lin & Ming-Hsuan Yang
University of Maryland, College Park, USA
Jia-Bin Huang

Authors

Chieh Hubert Lin
View author publications
You can also search for this author in PubMed Google Scholar
Changil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Bin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qinbo Li
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Yao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Kopf
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Yu Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chieh Hubert Lin .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3913 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, C.H. et al. (2025). Taming Latent Diffusion Model for Neural Radiance Field Inpainting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-72646-0_9
Published: 28 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Taming Latent Diffusion Model for Neural Radiance Field Inpainting