Skip to main content

DATENeRF: Depth-Aware Text-Based Editing of NeRFs

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15069))

Included in the following conference series:

  • 351 Accesses

Abstract

Recent diffusion models have demonstrated impressive capabilities for text-based 2D image editing. Applying similar ideas to edit a NeRF scene [31] remains challenging as editing 2D frames individually does not produce multiview-consistent results. We make the key observation that the geometry of a NeRF scene provides a way to unify these 2D edits. We leverage this geometry in depth-conditioned ControlNet [57] to improve the consistency of individual 2D image edits. Furthermore, we propose an inpainting scheme that uses the NeRF scene depth to propagate 2D edits across images while staying robust to errors and resampling issues. We demonstrate that this leads to more consistent, realistic and detailed editing results compared to previous state-of-the-art text-based NeRF editing methods.

S. Rojas—Work done during an internship at Adobe Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use default diffusion parameters for Instruct-NeRF2NeRF, diverging from the original paper where the weights of classifier-free guidance were manually tuned.

References

  1. Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)

    Google Scholar 

  2. Bao, C., et al.: Sine: semantic-driven image-based nerf editing with prior-guided editing field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20919–20929 (2023)

    Google Scholar 

  3. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: MIP-nerf 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)

    Google Scholar 

  4. Bi, S., et al.: Neural reflectance fields for appearance acquisition (2020)

    Google Scholar 

  5. Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: CVPR (2023)

    Google Scholar 

  6. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  7. Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: European Conference on Computer Vision (ECCV) (2022)

    Google Scholar 

  8. Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2tex: text-driven texture synthesis via diffusion models. In: ICCV (2023)

    Google Scholar 

  9. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023)

  10. Chiang, P.Z., Tsai, M.S., Tseng, H.Y., Lai, W.S., Chiu, W.C.: Stylizing 3D scene via implicit representation and hypernetwork. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1475–1484 (2022)

    Google Scholar 

  11. Dong, J., Wang, Y.X.: VICA-nerf: view-consistency-aware 3D editing of neural radiance fields. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  12. Gordon, O., Avrahami, O., Lischinski, D.: Blended-nerf: zero-shot object generation and blending in existing neural radiance fields. arXiv preprint arXiv:2306.12760 (2023)

  13. Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-nerf2nerf: editing 3D scenes with instructions. arXiv preprint arXiv:2303.12789 (2023)

  14. He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2012)

    Article  Google Scholar 

  15. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)

  16. Huang, H.P., Tseng, H.Y., Saini, S., Singh, M., Yang, M.H.: Learning to stylize novel views. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13869–13878 (2021)

    Google Scholar 

  17. Huang, Y.H., He, Y., Yuan, Y.J., Lai, Y.K., Gao, L.: Stylizednerf: consistent 3D scene stylization as stylized nerf via 2D-3D mutual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18342–18352 (2022)

    Google Scholar 

  18. Jaganathan, V., Huang, H.H., Irshad, M.Z., Jampani, V., Raj, A., Kira, Z.: ICE-G: image conditional editing of 3D gaussian splats (2024)

    Google Scholar 

  19. Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields (2022)

    Google Scholar 

  20. Jambon, C., Kerbl, B., Kopanas, G., Diolatzis, S., Leimkühler, T., Drettakis, G.: Nerfshop: interactive editing of neural radiance fields. Proc. ACM Comput. Graph. Interact. Tech. 6(1) (2023). https://repo-sam.inria.fr/fungraph/nerfshop/

  21. Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: LERF: language embedded radiance fields. In: International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  22. Kirillov, A., et al.: Segment anything. arXiv:2304.02643 (2023)

  23. Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing nerf for editing via feature field distillation. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23311–23330 (2022)

    Google Scholar 

  24. Kuang, Z., Luan, F., Bi, S., Shu, Z., Wetzstein, G., Sunkavalli, K.: Palettenerf: palette-based appearance editing of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20691–20700 (2023)

    Google Scholar 

  25. Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P., Tulyakov, S.: Neroic: neural rendering of objects from online image collections. ACM Trans. Graph. 41(4) (2022)

    Google Scholar 

  26. Lin, C.H., et al.: Magic3d: high-resolution text-to-3D content creation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)

    Google Scholar 

  27. Liu, S., et al.: Grounding dino: marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023)

  28. Liu, S., Zhang, X., Zhang, Z., Zhang, R., Zhu, J.Y., Russell, B.: Editing conditional radiance fields. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  29. Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  30. Mikaeili, A., Perel, O., Safaee, M., Cohen-Or, D., Mahdavi-Amiri, A.: Sked: sketch-guided text-based 3D editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14607–14619 (2023)

    Google Scholar 

  31. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  32. Mirzaei, A., et al.: SPIn-NeRF: multiview segmentation and perceptual inpainting with neural radiance fields. In: CVPR (2023)

    Google Scholar 

  33. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 102:1–102:15 (2022)

    Google Scholar 

  34. Nguyen-Phuoc, T., Liu, F., Xiao, L.: Snerf: stylized neural implicit representations for 3D scenes. arXiv preprint arXiv:2207.02363 (2022)

  35. Peng, Y., et al.: Cagenerf: cage-based neural radiance field for generalized 3D deformation and animation. In: Advances in Neural Information Processing Systems, vol. 35, pp. 31402–31415 (2022)

    Google Scholar 

  36. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2023)

    Google Scholar 

  37. Radford, A., et al.: Learning transferable visual models from natural language supervision (2021)

    Google Scholar 

  38. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)

    Google Scholar 

  39. Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2020)

    Article  Google Scholar 

  40. Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: Texture: text-guided texturing of 3D shapes (2023)

    Google Scholar 

  41. Rojas, S., et al.: Re-rend: real-time rendering of nerfs across devices. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3632–3641 (2023)

    Google Scholar 

  42. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  43. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  44. Sella, E., Fiebelman, G., Hedman, P., Averbuch-Elor, H.: Vox-e: text-guided voxel editing of 3D objects. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  45. Tancik, M., et al.: Nerfstudio: a modular framework for neural radiance field development. In: ACM SIGGRAPH 2023 Conference Proceedings. SIGGRAPH 2023 (2023)

    Google Scholar 

  46. Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-nerf: text-and-image driven manipulation of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3835–3844 (2022)

    Google Scholar 

  47. Wang, C., Jiang, R., Chai, M., He, M., Chen, D., Liao, J.: Nerf-art: text-driven neural radiance fields stylization. IEEE Trans. Vis. Comput. Graph. (2023)

    Google Scholar 

  48. Wang, D., Zhang, T., Abboud, A., Süsstrunk, S.: Inpaintnerf360: text-guided 3D inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094 (2023)

  49. Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: CVPR (2023)

    Google Scholar 

  50. Wang, Z., et al.: Prolificdreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: Advances in Neural Information Processing Systems (NeurIPS) (2023)

    Google Scholar 

  51. Wu, Q., et al.: Object-compositional neural implicit surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 197–213. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_12

    Chapter  Google Scholar 

  52. Wu, Q., Wang, K., Li, K., Zheng, J., Cai, J.: Objectsdf++: improved object-compositional neural implicit surfaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21764–21774 (2023)

    Google Scholar 

  53. Wu, Q., Tan, J., Xu, K.: Palettenerf: palette-based color editing for nerfs. arXiv preprint arXiv:2212.12871 (2022)

  54. Yu, L., Xiang, W., Han, K.: Edit-diffnerf: editing 3D neural radiance fields using 2D diffusion model (2023)

    Google Scholar 

  55. Yuan, Y.J., Sun, Y.T., Lai, Y.K., Ma, Y., Jia, R., Gao, L.: Nerf-editing: geometry editing of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18353–18364 (2022)

    Google Scholar 

  56. Zhang, K., et al.: ARF: artistic radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 717–733. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_41

    Chapter  Google Scholar 

  57. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)

    Google Scholar 

  58. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  59. Zhang, X., Srinivasan, P.P., Deng, B., Debevec, P., Freeman, W.T., Barron, J.T.: Nerfactor: neural factorization of shape and reflectance under an unknown illumination. ACM Trans. Graph. 40(6) (2021)

    Google Scholar 

  60. Zhuang, J., Wang, C., Liu, L., Lin, L., Li, G.: Dreameditor: text-driven 3D scene editing with neural fields. arXiv preprint arXiv:2306.13455 (2023)

Download references

Acknowledgements

We thank Duygu Ceylan for advice during the project. We thank anonymous ECCV reviewer 2 for their support and feedback on the paper. The research reported in this publication was partially supported by funding from KAUST Center of Excellence on GenAI, under award number 5940.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Rojas .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7712 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rojas, S. et al. (2025). DATENeRF: Depth-Aware Text-Based Editing of NeRFs. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15069. Springer, Cham. https://doi.org/10.1007/978-3-031-73247-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73247-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73246-1

  • Online ISBN: 978-3-031-73247-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics