DATENeRF: Depth-Aware Text-Based Editing of NeRFs

Rojas, Sara; Philip, Julien; Zhang, Kai; Bi, Sai; Luan, Fujun; Ghanem, Bernard; Sunkavalli, Kalyan

doi:10.1007/978-3-031-73247-8_16

Sara Rojas¹³,
Julien Philip¹⁴,
Kai Zhang¹⁴,
Sai Bi¹⁴,
Fujun Luan¹⁴,
Bernard Ghanem¹³ &
…
Kalyan Sunkavalli¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15069))

Included in the following conference series:

European Conference on Computer Vision

351 Accesses

Abstract

Recent diffusion models have demonstrated impressive capabilities for text-based 2D image editing. Applying similar ideas to edit a NeRF scene [31] remains challenging as editing 2D frames individually does not produce multiview-consistent results. We make the key observation that the geometry of a NeRF scene provides a way to unify these 2D edits. We leverage this geometry in depth-conditioned ControlNet [57] to improve the consistency of individual 2D image edits. Furthermore, we propose an inpainting scheme that uses the NeRF scene depth to propagate 2D edits across images while staying robust to errors and resampling issues. We demonstrate that this leads to more consistent, realistic and detailed editing results compared to previous state-of-the-art text-based NeRF editing methods.

S. Rojas—Work done during an internship at Adobe Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Watch Your Steps: Local Image and Scene Editing by Text Instructions

LatentEditor: Text Driven Local Editing of 3D Scenes

GaussCtrl: Multi-view Consistent Text-Driven 3D Gaussian Splatting Editing

Notes

1.
We use default diffusion parameters for Instruct-NeRF2NeRF, diverging from the original paper where the weights of classifier-free guidance were manually tuned.

References

Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)
Google Scholar
Bao, C., et al.: Sine: semantic-driven image-based nerf editing with prior-guided editing field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20919–20929 (2023)
Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: MIP-nerf 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Google Scholar
Bi, S., et al.: Neural reflectance fields for appearance acquisition (2020)
Google Scholar
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: CVPR (2023)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: European Conference on Computer Vision (ECCV) (2022)
Google Scholar
Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2tex: text-driven texture synthesis via diffusion models. In: ICCV (2023)
Google Scholar
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023)
Chiang, P.Z., Tsai, M.S., Tseng, H.Y., Lai, W.S., Chiu, W.C.: Stylizing 3D scene via implicit representation and hypernetwork. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1475–1484 (2022)
Google Scholar
Dong, J., Wang, Y.X.: VICA-nerf: view-consistency-aware 3D editing of neural radiance fields. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Gordon, O., Avrahami, O., Lischinski, D.: Blended-nerf: zero-shot object generation and blending in existing neural radiance fields. arXiv preprint arXiv:2306.12760 (2023)
Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-nerf2nerf: editing 3D scenes with instructions. arXiv preprint arXiv:2303.12789 (2023)
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2012)
Article Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)
Huang, H.P., Tseng, H.Y., Saini, S., Singh, M., Yang, M.H.: Learning to stylize novel views. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13869–13878 (2021)
Google Scholar
Huang, Y.H., He, Y., Yuan, Y.J., Lai, Y.K., Gao, L.: Stylizednerf: consistent 3D scene stylization as stylized nerf via 2D-3D mutual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18342–18352 (2022)
Google Scholar
Jaganathan, V., Huang, H.H., Irshad, M.Z., Jampani, V., Raj, A., Kira, Z.: ICE-G: image conditional editing of 3D gaussian splats (2024)
Google Scholar
Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields (2022)
Google Scholar
Jambon, C., Kerbl, B., Kopanas, G., Diolatzis, S., Leimkühler, T., Drettakis, G.: Nerfshop: interactive editing of neural radiance fields. Proc. ACM Comput. Graph. Interact. Tech. 6(1) (2023). https://repo-sam.inria.fr/fungraph/nerfshop/
Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: LERF: language embedded radiance fields. In: International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv:2304.02643 (2023)
Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing nerf for editing via feature field distillation. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23311–23330 (2022)
Google Scholar
Kuang, Z., Luan, F., Bi, S., Shu, Z., Wetzstein, G., Sunkavalli, K.: Palettenerf: palette-based appearance editing of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20691–20700 (2023)
Google Scholar
Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P., Tulyakov, S.: Neroic: neural rendering of objects from online image collections. ACM Trans. Graph. 41(4) (2022)
Google Scholar
Lin, C.H., et al.: Magic3d: high-resolution text-to-3D content creation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Liu, S., et al.: Grounding dino: marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023)
Liu, S., Zhang, X., Zhang, Z., Zhang, R., Zhu, J.Y., Russell, B.: Editing conditional radiance fields. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Mikaeili, A., Perel, O., Safaee, M., Cohen-Or, D., Mahdavi-Amiri, A.: Sked: sketch-guided text-based 3D editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14607–14619 (2023)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Mirzaei, A., et al.: SPIn-NeRF: multiview segmentation and perceptual inpainting with neural radiance fields. In: CVPR (2023)
Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 102:1–102:15 (2022)
Google Scholar
Nguyen-Phuoc, T., Liu, F., Xiao, L.: Snerf: stylized neural implicit representations for 3D scenes. arXiv preprint arXiv:2207.02363 (2022)
Peng, Y., et al.: Cagenerf: cage-based neural radiance field for generalized 3D deformation and animation. In: Advances in Neural Information Processing Systems, vol. 35, pp. 31402–31415 (2022)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2023)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2020)
Article Google Scholar
Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: Texture: text-guided texturing of 3D shapes (2023)
Google Scholar
Rojas, S., et al.: Re-rend: real-time rendering of nerfs across devices. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3632–3641 (2023)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Sella, E., Fiebelman, G., Hedman, P., Averbuch-Elor, H.: Vox-e: text-guided voxel editing of 3D objects. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Tancik, M., et al.: Nerfstudio: a modular framework for neural radiance field development. In: ACM SIGGRAPH 2023 Conference Proceedings. SIGGRAPH 2023 (2023)
Google Scholar
Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-nerf: text-and-image driven manipulation of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3835–3844 (2022)
Google Scholar
Wang, C., Jiang, R., Chai, M., He, M., Chen, D., Liao, J.: Nerf-art: text-driven neural radiance fields stylization. IEEE Trans. Vis. Comput. Graph. (2023)
Google Scholar
Wang, D., Zhang, T., Abboud, A., Süsstrunk, S.: Inpaintnerf360: text-guided 3D inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094 (2023)
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: CVPR (2023)
Google Scholar
Wang, Z., et al.: Prolificdreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: Advances in Neural Information Processing Systems (NeurIPS) (2023)
Google Scholar
Wu, Q., et al.: Object-compositional neural implicit surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 197–213. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_12
Chapter Google Scholar
Wu, Q., Wang, K., Li, K., Zheng, J., Cai, J.: Objectsdf++: improved object-compositional neural implicit surfaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21764–21774 (2023)
Google Scholar
Wu, Q., Tan, J., Xu, K.: Palettenerf: palette-based color editing for nerfs. arXiv preprint arXiv:2212.12871 (2022)
Yu, L., Xiang, W., Han, K.: Edit-diffnerf: editing 3D neural radiance fields using 2D diffusion model (2023)
Google Scholar
Yuan, Y.J., Sun, Y.T., Lai, Y.K., Ma, Y., Jia, R., Gao, L.: Nerf-editing: geometry editing of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18353–18364 (2022)
Google Scholar
Zhang, K., et al.: ARF: artistic radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 717–733. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_41
Chapter Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhang, X., Srinivasan, P.P., Deng, B., Debevec, P., Freeman, W.T., Barron, J.T.: Nerfactor: neural factorization of shape and reflectance under an unknown illumination. ACM Trans. Graph. 40(6) (2021)
Google Scholar
Zhuang, J., Wang, C., Liu, L., Lin, L., Li, G.: Dreameditor: text-driven 3D scene editing with neural fields. arXiv preprint arXiv:2306.13455 (2023)

Download references

Acknowledgements

We thank Duygu Ceylan for advice during the project. We thank anonymous ECCV reviewer 2 for their support and feedback on the paper. The research reported in this publication was partially supported by funding from KAUST Center of Excellence on GenAI, under award number 5940.

Author information

Authors and Affiliations

KAUST, Thuwal, Saudi Arabia
Sara Rojas & Bernard Ghanem
Adobe Research, San Francisco, USA
Julien Philip, Kai Zhang, Sai Bi, Fujun Luan & Kalyan Sunkavalli

Authors

Sara Rojas
View author publications
You can also search for this author in PubMed Google Scholar
Julien Philip
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sai Bi
View author publications
You can also search for this author in PubMed Google Scholar
Fujun Luan
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Ghanem
View author publications
You can also search for this author in PubMed Google Scholar
Kalyan Sunkavalli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Rojas .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7712 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rojas, S. et al. (2025). DATENeRF: Depth-Aware Text-Based Editing of NeRFs. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15069. Springer, Cham. https://doi.org/10.1007/978-3-031-73247-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-73247-8_16
Published: 01 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73246-1
Online ISBN: 978-3-031-73247-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DATENeRF: Depth-Aware Text-Based Editing of NeRFs