Text-Guided Zero-Shot 3D Style Transfer of Neural Radiance Fields

Li, Wendong; Zheng, Wei-Shi

doi:10.1007/978-3-031-78186-5_9

Wendong Li¹³ &
Wei-Shi Zheng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15308))

Included in the following conference series:

International Conference on Pattern Recognition

280 Accesses

Abstract

3D style transfer aims to generate novel, stylized views while maintaining multi-view consistency. However, current approaches primarily focus on uniformly stylizing entire 3D scenes, limiting the versatility of 3D style transfer. To address this limitation, we propose Text Guided Zero-Shot 3D Style Transfer of Neural Radiance Fields (TGStyleRF), which incorporates the language radiance field into the 3D style transfer based on NeRF, enabling flexible stylization guided by text queries. By the language modeling of the 3D neural radiance field, the spatial position can be bounded with dense semantics, so as to stylize the 3D scene selectively through text-guided. Furthermore, our method leverages both low-level texture and high-level semantics to enhance localization quality. Experimental results demonstrate that, with the integration of the language model and Cross-Feature-Localization (CFL), TGStyleRF achieves greater flexibility and precision in stylization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ARF: Artistic Radiance Fields

StyleAdapter: A Unified Stylized Image Generation Model

Article 25 October 2024

LatentEditor: Text Driven Local Editing of 3D Scenes

References

Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds. In: ICML (2018)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)
Google Scholar
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: ECCV (2022)
Google Scholar
Chen, D., Liao, J., Yuan, L., Yu, N., Hua, G.: Coherent online video style transfer. In: ICCV (2017)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)
Google Scholar
Cherti, M., : Reproducible scaling laws for contrastive language-image learning. In: CVPR (2023)
Google Scholar
Deng, Y., Tang, F., Dong, W., Sun, W., Huang, F., Xu, C.: Arbitrary style transfer via multi-adaptation network. In: ACM MM (2020)
Google Scholar
Fang, S., Xu, W., Wang, H., Yang, Y., Wang, Y., Zhou, S.: One is all: bridging the gap between neural radiance fields architectures with progressive volume distillation. In: AAAI (2023)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)
Google Scholar
Huang, H., et al.: Real-time neural style transfer for videos. In: CVPR (2017)
Google Scholar
Huang, H.P., Tseng, H.Y., Saini, S., Singh, M., Yang, M.H.: Learning to stylize novel views. In: ICCV (2021)
Google Scholar
Huang, P.H., Matzen, K., Kopf, J., Ahuja, N., Huang, J.B.: Deepmvs: learning multi-view stereopsis. In: CVPR (2018)
Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Google Scholar
Huang, Y.H., He, Y., Yuan, Y.J., Lai, Y.K., Gao, L.: Stylizednerf: consistent 3D scene stylization as stylized nerf via 2d-3d mutual learning. In: CVPR (2022)
Google Scholar
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3D neural network for multiview stereopsis. In: ICCV (2017)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: ECCV (2018)
Google Scholar
Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: language embedded radiance fields. In: ICCV (2023)
Google Scholar
Kurzman, L., Vazquez, D., Laradji, I.: Class-based styling: real-time localized style transfer with semantic segmentation. In: ICCVW (2019)
Google Scholar
Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. In: IJCV (2000)
Google Scholar
Li, B., Weinberger, K.Q., Belongie, S., Koltun, V., Ranftl, R.: Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546 (2022)
Li, G., Yun, I., Kim, J., Kim, J.: Dabnet: depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv preprint arXiv:1907.11357 (2019)
Li, X., Liu, S., Kautz, J., Yang, M.H.: Learning linear transformations for fast image and video style transfer. In: CVPR (2019)
Google Scholar
Liu, K., et al.: Stylerf: zero-shot 3d style transfer of neural radiance fields. In: CVPR (2023)
Google Scholar
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. In: TOG (2019)
Google Scholar
Mu, F., Wang, J., Wu, Y., Li, Y.: 3d photo stylization: learning to generate stylized novel views from a single image. In: CVPR (2022)
Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. In: ToG (2022)
Google Scholar
Nguyen-Phuoc, T., Liu, F., Xiao, L.: Snerf: stylized neural implicit representations for 3d scenes. arXiv preprint arXiv:2207.02363 (2022)
Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: CVPR (2021)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
Google Scholar
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3d data. In: CVPR (2016)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Google Scholar
Schuhmann, C., et al.: LAION-5b: An open large-scale dataset for training next generation image-text models. In: NeurIPS (2022)
Google Scholar
Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. In: IJCV (1999)
Google Scholar
Shafiullah, N.M.M., Paxton, C., Pinto, L., Chintala, S., Szlam, A.: Clip-fields: weakly supervised semantic fields for robotic memory. arXiv preprint arXiv:2210.05663 (2022)
Sheng, L., Lin, Z., Shao, J., Wang, X.: Avatar-net: multi-scale zero-shot style transfer by feature decoration. In: CVPR (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: CVPR (2022)
Google Scholar
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3d mesh models from single RGB images. In: ECCV (2018)
Google Scholar
Wells, A., Wood, J., Xiao, M.: Localized style transfer
Google Scholar
Wu, X., Hu, Z., Sheng, L., Xu, D.: Styleformer: real-time arbitrary style transfer via parametric style composition. In: ICCV (2021)
Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: a deep representation for volumetric shapes. In: CVPR (2015)
Google Scholar
Yariv, L., et al.: Multiview neural surface reconstruction by disentangling geometry and appearance. In: NeurIPS (2020)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Google Scholar

Download references

Acknowledgements

This work was supported partially by the Guangdong NSF Project (No. 2023B1515040025).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Wendong Li & Wei-Shi Zheng

Authors

Wendong Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Shi Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Shi Zheng .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 15139 KB)

Supplementary material 2 (mp4 14371 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, W., Zheng, WS. (2025). Text-Guided Zero-Shot 3D Style Transfer of Neural Radiance Fields. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15308. Springer, Cham. https://doi.org/10.1007/978-3-031-78186-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-78186-5_9
Published: 30 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78185-8
Online ISBN: 978-3-031-78186-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Text-Guided Zero-Shot 3D Style Transfer of Neural Radiance Fields