MaskEditor: Instruct 3D Object Editing with Learned Masks

Liu, Xinyao; Xu, Kai; Huang, Yuhang; Yi, Renjiao; Zhu, Chenyang

doi:10.1007/978-981-97-8508-7_20

Xinyao Liu¹⁵,
Kai Xu¹⁵,
Yuhang Huang¹⁵,
Renjiao Yi¹⁵ &
…
Chenyang Zhu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15036))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

144 Accesses

Abstract

We introduce MaskEditor, an object-level 3D neural field editing method based on text instructions. Different from manipulating the whole scene, local editing needs accurate locating and proper field fusion to provide a realistic object-level replacement. We utilize a 3D mask grid to accurately localize the target object leveraging the 2D segmentation information provided by the Segment Anything Model (SAM). The whole scene is divided into the object field and background field based on the learned 3D mask. Subsequently, we apply the Variational Score Distillation (VSD) to optimize the object field and leave the background field unaltered, which achieves editing results aligned with text instructions. Furthermore, we implement composited rendering and coarse-to-fine editing strategy to enhance the editing quality and the consistency of the edited object with the original scene. Qualitative and quantitative evaluations confirm that MaskEditor achieves more precise and superior local editing compared to baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LatentEditor: Text Driven Local Editing of 3D Scenes

DATENeRF: Depth-Aware Text-Based Editing of NeRFs

Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

References

Bao, C., et al.: Sine: semantic-driven image-based nerf editing with prior-guided editing field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20919–20929 (2023)
Google Scholar
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18392–18402 (2023)
Google Scholar
Cen, J., et al.: Segment anything in 3d with nerfs. In: NeurIPS (2023)
Google Scholar
Chen, J.K., Lyu, J., Wang, Y.X.: Neuraleditor: Editing neural radiance fields via manipulating point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12439–12448 (2023)
Google Scholar
Chen, Y., et al.: Gaussianeditor: swift and controllable 3d editing with gaussian splatting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21476–21485 (2024)
Google Scholar
Gordon, O., Avrahami, O., Lischinski, D.: Blended-nerf: Zero-shot object generation and blending in existing neural radiance fields (2023). arXiv:2306.12760
Haque, A., Tancik, M., Efros, A., Holynski, A., Kanazawa, A.: Instruct-nerf2nerf: editing 3d scenes with instructions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Google Scholar
Kirillov, A., et al.: Segment anything (2023). arXiv:2304.02643
Lin, C.H., et al.: Magic3d: High-resolution text-to-3d content creation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 300–309 (2023)
Google Scholar
Liu, K., et al.: Stylerf: zero-shot 3d style transfer of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8338–8348 (2023)
Google Scholar
Liu, S., Zhang, X., Zhang, Z., Zhang, R., Zhu, J.Y., Russell, B.: Editing conditional radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5773–5783 (2021)
Google Scholar
Mikaeili, A., Perel, O., Safaee, M., Cohen-Or, D., Mahdavi-Amiri, A.: Sked: Sketch-guided text-based 3d editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14607–14619 (2023)
Google Scholar
Mildenhall, B., Srinivasan, et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: Computer Vision-ECCV 2020-16th European Conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part I, pp. 405–421 (2020)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. In: ICLR (2023)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Google Scholar
Ren, T., et al. : Grounded sam: assembling open-world models for diverse visual tasks (2024)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Sella, E., Fiebelman, G., Hedman, P., Averbuch-Elor, H.: Vox-e: text-guided voxel editing of 3d objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 430–440 (2023)
Google Scholar
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)
Google Scholar
Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-nerf: text-and-image driven manipulation of neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3835–3844 (2022)
Google Scholar
Wang, J., Fang, J., Zhang, X., Xie, L., Tian, Q.: Gaussianeditor: editing 3d gaussians delicately with text instructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20902–20911 (2024)
Google Scholar
Wang, Q., et al.: Ibrnet: learning multi-view image-based rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2021)
Google Scholar
Wang, Z., et al.: Prolificdreamer: high-fidelity and diverse text-to-3d generation with variational score distillation. In: Advances in Neural Information Processing Systems (NeurIPS) (2023)
Google Scholar
Yu, L., Xiang, W., Han, K.: Edit-diffnerf: editing 3d neural radiance fields using 2d diffusion model (2023). arXiv:2306.09551
Zhang, K., et al.: Arf: artistic radiance fields. In: European Conference on Computer Vision, pp. 717–733 (2022)
Google Scholar
Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15838–15847 (2021)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the NSFC (62372457, 62132021, 62325211), Young Elite Scientists Sponsorship Program by CAST (2023QNRC001), the Natural Science Foundation of Hunan Province of China (2021RC3071, 2022RC1104).

Author information

Authors and Affiliations

College of Computer Science and Technology, National University of Defense Technology, Changsha, China
Xinyao Liu, Kai Xu, Yuhang Huang, Renjiao Yi & Chenyang Zhu

Authors

Xinyao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Renjiao Yi
View author publications
You can also search for this author in PubMed Google Scholar
Chenyang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenyang Zhu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Xu, K., Huang, Y., Yi, R., Zhu, C. (2025). MaskEditor: Instruct 3D Object Editing with Learned Masks. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15036. Springer, Singapore. https://doi.org/10.1007/978-981-97-8508-7_20

Download citation

DOI: https://doi.org/10.1007/978-981-97-8508-7_20
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8507-0
Online ISBN: 978-981-97-8508-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MaskEditor: Instruct 3D Object Editing with Learned Masks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LatentEditor: Text Driven Local Editing of 3D Scenes

DATENeRF: Depth-Aware Text-Based Editing of NeRFs

Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MaskEditor: Instruct 3D Object Editing with Learned Masks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LatentEditor: Text Driven Local Editing of 3D Scenes

DATENeRF: Depth-Aware Text-Based Editing of NeRFs

Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation