Real-Time 3D-Aware Portrait Editing from a Single Image

Bai, Qingyan; Shi, Zifan; Xu, Yinghao; Ouyang, Hao; Wang, Qiuyu; Yang, Ceyuan; Wang, Xuan; Wetzstein, Gordon; Shen, Yujun; Chen, Qifeng

doi:10.1007/978-3-031-72983-6_20

Qingyan Bai^13,14,
Zifan Shi¹³,
Yinghao Xu¹⁵,
Hao Ouyang^13,14,
Qiuyu Wang¹⁴,
Ceyuan Yang¹⁶,
Xuan Wang¹⁴,
Gordon Wetzstein¹⁵,
Yujun Shen¹⁴ &
…
Qifeng Chen¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15109))

Included in the following conference series:

European Conference on Computer Vision

350 Accesses

Abstract

This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., $\sim $0.04 s per image), over 100$\times $ faster than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e.g., with $\sim $5 min fine-tuning per style). Project page can be found here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-trained Diffusion Priors

Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

NEO-3DF: Novel Editing-Oriented 3D Face Creation and Reconstruction

References

Abdal, R., et al.: 3DAvatarGAN: bridging domains for personalized editable avatars. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Int. Conf. Comput. Vis. (2019)
Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)
Google Scholar
Aneja, S., Thies, J., Dai, A., Nießner, M.: ClipFace: text-guided editing of textured 3d morphable models. In: ACM SIGGRAPH Conf. Proc. (2023)
Google Scholar
Bai, Q., Xu, Y., Zhu, J., Xia, W., Yang, Y., Shen, Y.: High-fidelity GAN inversion with padding space. In: Eur. Conf. Comput. Vis. (2022)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Seminal Graphics Papers: Pushing the Boundaries (2023)
Google Scholar
Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Adv. Neural Inform. Process. Syst. (2020)
Google Scholar
Cai, S., Obukhov, A., Dai, D., Van Gool, L.: Pix2NeRF: unsupervised conditional p-gan for single image to neural radiance fields translation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: MasaCtrl: tuning-free mutual self-attention control for consistent image synthesis and editing. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Ceylan, D., Huang, C.H.P., Mitra, N.J.: Pix2Video: video editing using image diffusion. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Chai, W., Guo, X., Wang, G., Lu, Y.: StableVideo: text-driven consistency-aware diffusion video editing. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Chan, E.R., et al.: Efficient geometry-aware 3d generative adversarial networks. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3d-aware image synthesis. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)
Google Scholar
Chang, S., Kim, G., Kim, H.: HairNeRF: geometry-aware image synthesis for hairstyle transfer. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)
Google Scholar
Deng, Y., Yang, J., Chen, D., Wen, F., Tong, X.: Disentangled and controllable face image generation via 3d imitative-contrastive learning. In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)
Google Scholar
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: from single image to image set. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2021)
Google Scholar
Gal, R., et al.: An image is worth one word: Personalizing text-to-image generation using textual inversion. In: Int. Conf. Learn. Represent. (2023)
Google Scholar
Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: StyleGAN-NADA: Clip-guided domain adaptation of image generators. ACM Trans. Graph. (2022)
Google Scholar
Gao, C., Shih, Y., Lai, W.S., Liang, C.K., Huang, J.B.: Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903 (2020)
Geyer, M., Bar-Tal, O., Bagon, S., Dekel, T.: TokenFlow: consistent diffusion features for consistent video editing. In: Int. Conf. Learn. Represent. (2024)
Google Scholar
Graikos, A., Malkin, N., Jojic, N., Samaras, D.: Diffusion models as plug-and-play priors. In: Adv. Neural Inform. Process. Syst. (2022)
Google Scholar
Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d-aware generator for high-resolution image synthesis. In: Int. Conf. Learn. Represent. (2022)
Google Scholar
Guo, J., Yu, J., Lattas, A., Deng, J.: Perspective reconstruction of human faces by joint mesh and landmark regression. In: Eur. Conf. Comput. Vis. (2022)
Google Scholar
Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-NeRF2NeRF: editing 3d scenes with instructions. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. Adv. Neural Inform. Process, Syst (2020)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Hertz, A., Aberman, K., Cohen-Or, D.: Delta denoising score. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: Int. Conf. Learn. Represent. (2022)
Google Scholar
Hyung, J., Hwang, S., Kim, D., Lee, H., Choo, J.: Local 3d editing via 3d distillation of clip knowledge. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Jiang, K., Chen, S.Y., Fu, H., Gao, L.: NeRFFaceLighting: implicit and disentangled face lighting representation leveraging generative prior in neural radiance fields. ACM Trans, Graph (2023)
Google Scholar
Jin, W., Ryu, N., Kim, G., Baek, S.H., Cho, S.: Dr. 3D: Adapting 3d GANs to artistic drawings. In: ACM SIGGRAPH Asia Conf. Proc. (2022)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)
Google Scholar
Kim, G., Chun, S.Y.: DATID-3D: diversity-preserved domain adaptation using text-to-image diffusion for 3d generative model. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Ko, J., Cho, K., Choi, D., Ryoo, K., Kim, S.: 3D GAN inversion with pose optimization. In: IEEE Winter Conf. Appl. Comput. Vis. (2023)
Google Scholar
Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Li, J., et al.: PREIM3D: 3d consistent precise image attribute editing from a single image. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Li, J., et al.: InstructPix2NeRF: instructed 3d portrait editing from a single image. arXiv preprint arXiv:2311.02826 (2023)
Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Int. Conf. Mach. Learn. (2022)
Google Scholar
Lin, C., Lindell, D., Chan, E., Wetzstein, G.: 3D GAN inversion for controllable portrait image animation. In: Eur. Conf. Comput. Vis. Workshop (2022)
Google Scholar
Liu, S., Zhang, Y., Li, W., Lin, Z., Jia, J.: Video-P2P: video editing with cross-attention control. arXiv preprint arXiv:2303.04761 (2023)
Liu, Z., et al.: Cones: concept neurons in diffusion models for customized generation. In: Int. Conf. Mach. Learn. (2023)
Google Scholar
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: Int. Conf. Learn. Represent. (2022)
Google Scholar
Nitzan, Y., et al.: Domain expansion of image generators. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3d-consistent image and geometry generation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Ouyang, H., et al.: CoDeF: content deformation fields for temporally consistent video processing. arXiv preprint arXiv:2308.07926 (2023)
Pan, X., Xu, X., Loy, C.C., Theobalt, C., Dai, B.: A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Adv. Neural Inform. Process, Syst (2021)
Google Scholar
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: Int. Conf. Comput. Vis. (2021)
Google Scholar
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3d face model for pose and illumination invariant face recognition. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (2009)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3d using 2d diffusion. In: Int. Conf. Learn. Represent. (2023)
Google Scholar
Qi, C., et al.: FateZero: fusing attentions for zero-shot text-based video editing. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Int. Conf. Mach. Learn. (2021)
Google Scholar
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. ACM Trans., Graph. (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3d-aware image synthesis. In: Adv. Neural Inform. Process. Syst. (2020)
Google Scholar
Schwarz, K., Sauer, A., Niemeyer, M., Liao, Y., Geiger, A.: VoxGRAF: Fast 3d-aware image synthesis with sparse voxel grids. In: Adv. Neural Inform. Process. Syst. (2022)
Google Scholar
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)
Google Scholar
Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: Interpreting the disentangled face representation learned by GANs. IEEE Trans. Pattern Anal. Mach, Intell. (2020)
Google Scholar
Shi, Z., Peng, S., Xu, Y., Geiger, A., Liao, Y., Shen, Y.: Deep generative models on 3d representations: a survey. arXiv preprint arXiv:2210.15663 (2022)
Shi, Z., et al.: Learning 3d-aware image synthesis with unknown pose distribution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Shi, Z., Xu, Y., Shen, Y., Zhao, D., Chen, Q., Yeung, D.Y.: Improving 3d-aware image synthesis with a geometry-aware discriminator. In: Adv. Neural Inform. Process. Syst. (2022)
Google Scholar
Skorokhodov, I., Tulyakov, S., Wang, Y., Wonka, P.: EpiGRAF: rethinking training of 3d gans. In: Adv. Neural Inform. Process. Syst. (2022)
Google Scholar
Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., Liu, Y.: IDE-3D: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. ACM Trans., Graph. (2022)
Book Google Scholar
Sun, J., Wang, X., Zhang, Y., Li, X., Zhang, Q., Liu, Y., Wang, J.: FENeRF: face editing in neural radiance fields. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for StyleGAN image manipulation. ACM Trans., Graph. (2021)
Google Scholar
Tran, P., Zakharov, E., Ho, L.N., Tran, A.T., Hu, L., Li, H.: VOODOO 3D: volumetric portrait disentanglement for one-shot 3d head reenactment. In: IEEE Conf. Comput. Vis. Pattern Recog. (2024)
Google Scholar
Trevithick, A., et al.: Real-time radiance fields for single-image portrait view synthesis. ACM Trans., Graph. (2023)
Book Google Scholar
Wang, Q., Shi, Z., Zheng, K., Xu, Y., Peng, S., Shen, Y.: Benchmarking and analyzing 3d-aware image synthesis with a modularized codebase. In: Adv. Neural Inform. Process. Syst. (2023)
Google Scholar
Wang, T.C., Mallya, A., Liu, M.Y.: One-shot free-view neural talking-head synthesis for video conferencing. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)
Google Scholar
Wang, W., et al.: Zero-shot video editing using off-the-shelf image diffusion models. arXiv preprint arXiv:2303.17599 (2023)
Wang, Y., et al.: NARRATE: a normal assisted free-view portrait stylizer. arXiv preprint arXiv:2207.00974 (2022)
Xia, W., Xue, J.H.: A survey on deep generative 3d-aware image synthesis. ACM Comput. Surv. (2023)
Google Scholar
Xie, J., Ouyang, H., Piao, J., Lei, C., Chen, Q.: High-fidelity 3d GAN inversion by pseudo-multi-view optimization. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3d-aware image synthesis via learning structural and textural representations. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Xu, Y., Shen, Y., Zhu, J., Yang, C., Zhou, B.: Generative hierarchical features from synthesizing images. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)
Google Scholar
Yang, S., Jiang, L., Liu, Z., Loy, C.C.: Pastiche master: exemplar-based high-resolution portrait style transfer. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Google Scholar
Yang, S., Zhou, Y., Liu, Z., , Loy, C.C.: Rerender a video: zero-shot text-guided video-to-video translation. In: ACM SIGGRAPH Asia Conf. Proc. (2023)
Google Scholar
Yin, F., et al.: 3D GAN inversion with facial symmetry prior. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Google Scholar
Zhang, J., et al.: DeformToon3D: deformable neural radiance fields for 3d toonification. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Int. Conf. Comput. Vis. (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conf. Comput. Vis. Pattern Recog. (2018)
Google Scholar
Zhou, P., Xie, L., Ni, B., Tian, Q.: CIPS-3D: a 3d-aware generator of gans based on conditionally-independent pixel synthesis. arXiv preprint arXiv:2110.09788 (2021)
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. In: Eur. Conf. Comput. Vis. (2020)
Google Scholar
Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Eur. Conf. Comput. Vis. (2016)
Google Scholar

Download references

Acknowledgements

This project was supported by the National Key R&D Program of China under grant number 2022ZD0161501.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Qingyan Bai, Zifan Shi, Hao Ouyang & Qifeng Chen
Ant Group, Hangzhou, China
Qingyan Bai, Hao Ouyang, Qiuyu Wang, Xuan Wang & Yujun Shen
Stanford University, Stanford, USA
Yinghao Xu & Gordon Wetzstein
Shanghai AI Laboratory, Shenzhen, China
Ceyuan Yang

Authors

Qingyan Bai
View author publications
You can also search for this author in PubMed Google Scholar
Zifan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yinghao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ceyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Wetzstein
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Qifeng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, Q. et al. (2025). Real-Time 3D-Aware Portrait Editing from a Single Image. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15109. Springer, Cham. https://doi.org/10.1007/978-3-031-72983-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-72983-6_20
Published: 29 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72982-9
Online ISBN: 978-3-031-72983-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Real-Time 3D-Aware Portrait Editing from a Single Image