Abstract
This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., \(\sim \)0.04 s per image), over 100\(\times \) faster than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e.g., with \(\sim \)5 min fine-tuning per style). Project page can be found here.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdal, R., et al.: 3DAvatarGAN: bridging domains for personalized editable avatars. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Int. Conf. Comput. Vis. (2019)
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)
Aneja, S., Thies, J., Dai, A., Nießner, M.: ClipFace: text-guided editing of textured 3d morphable models. In: ACM SIGGRAPH Conf. Proc. (2023)
Bai, Q., Xu, Y., Zhu, J., Xia, W., Yang, Y., Shen, Y.: High-fidelity GAN inversion with padding space. In: Eur. Conf. Comput. Vis. (2022)
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Seminal Graphics Papers: Pushing the Boundaries (2023)
Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Brown, T., et al.: Language models are few-shot learners. In: Adv. Neural Inform. Process. Syst. (2020)
Cai, S., Obukhov, A., Dai, D., Van Gool, L.: Pix2NeRF: unsupervised conditional p-gan for single image to neural radiance fields translation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: MasaCtrl: tuning-free mutual self-attention control for consistent image synthesis and editing. In: Int. Conf. Comput. Vis. (2023)
Ceylan, D., Huang, C.H.P., Mitra, N.J.: Pix2Video: video editing using image diffusion. In: Int. Conf. Comput. Vis. (2023)
Chai, W., Guo, X., Wang, G., Lu, Y.: StableVideo: text-driven consistency-aware diffusion video editing. In: Int. Conf. Comput. Vis. (2023)
Chan, E.R., et al.: Efficient geometry-aware 3d generative adversarial networks. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3d-aware image synthesis. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)
Chang, S., Kim, G., Kim, H.: HairNeRF: geometry-aware image synthesis for hairstyle transfer. In: Int. Conf. Comput. Vis. (2023)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)
Deng, Y., Yang, J., Chen, D., Wen, F., Tong, X.: Disentangled and controllable face image generation via 3d imitative-contrastive learning. In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: from single image to image set. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2021)
Gal, R., et al.: An image is worth one word: Personalizing text-to-image generation using textual inversion. In: Int. Conf. Learn. Represent. (2023)
Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: StyleGAN-NADA: Clip-guided domain adaptation of image generators. ACM Trans. Graph. (2022)
Gao, C., Shih, Y., Lai, W.S., Liang, C.K., Huang, J.B.: Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903 (2020)
Geyer, M., Bar-Tal, O., Bagon, S., Dekel, T.: TokenFlow: consistent diffusion features for consistent video editing. In: Int. Conf. Learn. Represent. (2024)
Graikos, A., Malkin, N., Jojic, N., Samaras, D.: Diffusion models as plug-and-play priors. In: Adv. Neural Inform. Process. Syst. (2022)
Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d-aware generator for high-resolution image synthesis. In: Int. Conf. Learn. Represent. (2022)
Guo, J., Yu, J., Lattas, A., Deng, J.: Perspective reconstruction of human faces by joint mesh and landmark regression. In: Eur. Conf. Comput. Vis. (2022)
Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-NeRF2NeRF: editing 3d scenes with instructions. In: Int. Conf. Comput. Vis. (2023)
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. Adv. Neural Inform. Process, Syst (2020)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Hertz, A., Aberman, K., Cohen-Or, D.: Delta denoising score. In: Int. Conf. Comput. Vis. (2023)
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: Int. Conf. Learn. Represent. (2022)
Hyung, J., Hwang, S., Kim, D., Lee, H., Choo, J.: Local 3d editing via 3d distillation of clip knowledge. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Jiang, K., Chen, S.Y., Fu, H., Gao, L.: NeRFFaceLighting: implicit and disentangled face lighting representation leveraging generative prior in neural radiance fields. ACM Trans, Graph (2023)
Jin, W., Ryu, N., Kim, G., Baek, S.H., Cho, S.: Dr. 3D: Adapting 3d GANs to artistic drawings. In: ACM SIGGRAPH Asia Conf. Proc. (2022)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)
Kim, G., Chun, S.Y.: DATID-3D: diversity-preserved domain adaptation using text-to-image diffusion for 3d generative model. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Ko, J., Cho, K., Choi, D., Ryoo, K., Kim, S.: 3D GAN inversion with pose optimization. In: IEEE Winter Conf. Appl. Comput. Vis. (2023)
Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Li, J., et al.: PREIM3D: 3d consistent precise image attribute editing from a single image. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Li, J., et al.: InstructPix2NeRF: instructed 3d portrait editing from a single image. arXiv preprint arXiv:2311.02826 (2023)
Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Int. Conf. Mach. Learn. (2022)
Lin, C., Lindell, D., Chan, E., Wetzstein, G.: 3D GAN inversion for controllable portrait image animation. In: Eur. Conf. Comput. Vis. Workshop (2022)
Liu, S., Zhang, Y., Li, W., Lin, Z., Jia, J.: Video-P2P: video editing with cross-attention control. arXiv preprint arXiv:2303.04761 (2023)
Liu, Z., et al.: Cones: concept neurons in diffusion models for customized generation. In: Int. Conf. Mach. Learn. (2023)
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: Int. Conf. Learn. Represent. (2022)
Nitzan, Y., et al.: Domain expansion of image generators. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3d-consistent image and geometry generation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Ouyang, H., et al.: CoDeF: content deformation fields for temporally consistent video processing. arXiv preprint arXiv:2308.07926 (2023)
Pan, X., Xu, X., Loy, C.C., Theobalt, C., Dai, B.: A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Adv. Neural Inform. Process, Syst (2021)
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: Int. Conf. Comput. Vis. (2021)
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3d face model for pose and illumination invariant face recognition. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (2009)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3d using 2d diffusion. In: Int. Conf. Learn. Represent. (2023)
Qi, C., et al.: FateZero: fusing attentions for zero-shot text-based video editing. In: Int. Conf. Comput. Vis. (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Int. Conf. Mach. Learn. (2021)
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. ACM Trans., Graph. (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3d-aware image synthesis. In: Adv. Neural Inform. Process. Syst. (2020)
Schwarz, K., Sauer, A., Niemeyer, M., Liao, Y., Geiger, A.: VoxGRAF: Fast 3d-aware image synthesis with sparse voxel grids. In: Adv. Neural Inform. Process. Syst. (2022)
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)
Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: Interpreting the disentangled face representation learned by GANs. IEEE Trans. Pattern Anal. Mach, Intell. (2020)
Shi, Z., Peng, S., Xu, Y., Geiger, A., Liao, Y., Shen, Y.: Deep generative models on 3d representations: a survey. arXiv preprint arXiv:2210.15663 (2022)
Shi, Z., et al.: Learning 3d-aware image synthesis with unknown pose distribution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Shi, Z., Xu, Y., Shen, Y., Zhao, D., Chen, Q., Yeung, D.Y.: Improving 3d-aware image synthesis with a geometry-aware discriminator. In: Adv. Neural Inform. Process. Syst. (2022)
Skorokhodov, I., Tulyakov, S., Wang, Y., Wonka, P.: EpiGRAF: rethinking training of 3d gans. In: Adv. Neural Inform. Process. Syst. (2022)
Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., Liu, Y.: IDE-3D: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. ACM Trans., Graph. (2022)
Sun, J., Wang, X., Zhang, Y., Li, X., Zhang, Q., Liu, Y., Wang, J.: FENeRF: face editing in neural radiance fields. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for StyleGAN image manipulation. ACM Trans., Graph. (2021)
Tran, P., Zakharov, E., Ho, L.N., Tran, A.T., Hu, L., Li, H.: VOODOO 3D: volumetric portrait disentanglement for one-shot 3d head reenactment. In: IEEE Conf. Comput. Vis. Pattern Recog. (2024)
Trevithick, A., et al.: Real-time radiance fields for single-image portrait view synthesis. ACM Trans., Graph. (2023)
Wang, Q., Shi, Z., Zheng, K., Xu, Y., Peng, S., Shen, Y.: Benchmarking and analyzing 3d-aware image synthesis with a modularized codebase. In: Adv. Neural Inform. Process. Syst. (2023)
Wang, T.C., Mallya, A., Liu, M.Y.: One-shot free-view neural talking-head synthesis for video conferencing. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)
Wang, W., et al.: Zero-shot video editing using off-the-shelf image diffusion models. arXiv preprint arXiv:2303.17599 (2023)
Wang, Y., et al.: NARRATE: a normal assisted free-view portrait stylizer. arXiv preprint arXiv:2207.00974 (2022)
Xia, W., Xue, J.H.: A survey on deep generative 3d-aware image synthesis. ACM Comput. Surv. (2023)
Xie, J., Ouyang, H., Piao, J., Lei, C., Chen, Q.: High-fidelity 3d GAN inversion by pseudo-multi-view optimization. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3d-aware image synthesis via learning structural and textural representations. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Xu, Y., Shen, Y., Zhu, J., Yang, C., Zhou, B.: Generative hierarchical features from synthesizing images. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)
Yang, S., Jiang, L., Liu, Z., Loy, C.C.: Pastiche master: exemplar-based high-resolution portrait style transfer. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)
Yang, S., Zhou, Y., Liu, Z., , Loy, C.C.: Rerender a video: zero-shot text-guided video-to-video translation. In: ACM SIGGRAPH Asia Conf. Proc. (2023)
Yin, F., et al.: 3D GAN inversion with facial symmetry prior. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)
Zhang, J., et al.: DeformToon3D: deformable neural radiance fields for 3d toonification. In: Int. Conf. Comput. Vis. (2023)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Int. Conf. Comput. Vis. (2023)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conf. Comput. Vis. Pattern Recog. (2018)
Zhou, P., Xie, L., Ni, B., Tian, Q.: CIPS-3D: a 3d-aware generator of gans based on conditionally-independent pixel synthesis. arXiv preprint arXiv:2110.09788 (2021)
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. In: Eur. Conf. Comput. Vis. (2020)
Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Eur. Conf. Comput. Vis. (2016)
Acknowledgements
This project was supported by the National Key R&D Program of China under grant number 2022ZD0161501.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bai, Q. et al. (2025). Real-Time 3D-Aware Portrait Editing from a Single Image. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15109. Springer, Cham. https://doi.org/10.1007/978-3-031-72983-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-72983-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72982-9
Online ISBN: 978-3-031-72983-6
eBook Packages: Computer ScienceComputer Science (R0)