Skip to main content

Real-Time 3D-Aware Portrait Editing from a Single Image

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15109))

Included in the following conference series:

  • 350 Accesses

Abstract

This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., \(\sim \)0.04 s per image), over 100\(\times \) faster than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e.g., with \(\sim \)5 min fine-tuning per style). Project page can be found here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdal, R., et al.: 3DAvatarGAN: bridging domains for personalized editable avatars. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  2. Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Int. Conf. Comput. Vis. (2019)

    Google Scholar 

  3. Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)

    Google Scholar 

  4. Aneja, S., Thies, J., Dai, A., Nießner, M.: ClipFace: text-guided editing of textured 3d morphable models. In: ACM SIGGRAPH Conf. Proc. (2023)

    Google Scholar 

  5. Bai, Q., Xu, Y., Zhu, J., Xia, W., Yang, Y., Shen, Y.: High-fidelity GAN inversion with padding space. In: Eur. Conf. Comput. Vis. (2022)

    Google Scholar 

  6. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Seminal Graphics Papers: Pushing the Boundaries (2023)

    Google Scholar 

  7. Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  8. Brown, T., et al.: Language models are few-shot learners. In: Adv. Neural Inform. Process. Syst. (2020)

    Google Scholar 

  9. Cai, S., Obukhov, A., Dai, D., Van Gool, L.: Pix2NeRF: unsupervised conditional p-gan for single image to neural radiance fields translation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  10. Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: MasaCtrl: tuning-free mutual self-attention control for consistent image synthesis and editing. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  11. Ceylan, D., Huang, C.H.P., Mitra, N.J.: Pix2Video: video editing using image diffusion. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  12. Chai, W., Guo, X., Wang, G., Lu, Y.: StableVideo: text-driven consistency-aware diffusion video editing. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  13. Chan, E.R., et al.: Efficient geometry-aware 3d generative adversarial networks. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  14. Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3d-aware image synthesis. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)

    Google Scholar 

  15. Chang, S., Kim, G., Kim, H.: HairNeRF: geometry-aware image synthesis for hairstyle transfer. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  16. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)

    Google Scholar 

  17. Deng, Y., Yang, J., Chen, D., Wen, F., Tong, X.: Disentangled and controllable face image generation via 3d imitative-contrastive learning. In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)

    Google Scholar 

  18. Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: from single image to image set. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)

    Google Scholar 

  19. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2021)

    Google Scholar 

  20. Gal, R., et al.: An image is worth one word: Personalizing text-to-image generation using textual inversion. In: Int. Conf. Learn. Represent. (2023)

    Google Scholar 

  21. Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: StyleGAN-NADA: Clip-guided domain adaptation of image generators. ACM Trans. Graph. (2022)

    Google Scholar 

  22. Gao, C., Shih, Y., Lai, W.S., Liang, C.K., Huang, J.B.: Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903 (2020)

  23. Geyer, M., Bar-Tal, O., Bagon, S., Dekel, T.: TokenFlow: consistent diffusion features for consistent video editing. In: Int. Conf. Learn. Represent. (2024)

    Google Scholar 

  24. Graikos, A., Malkin, N., Jojic, N., Samaras, D.: Diffusion models as plug-and-play priors. In: Adv. Neural Inform. Process. Syst. (2022)

    Google Scholar 

  25. Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d-aware generator for high-resolution image synthesis. In: Int. Conf. Learn. Represent. (2022)

    Google Scholar 

  26. Guo, J., Yu, J., Lattas, A., Deng, J.: Perspective reconstruction of human faces by joint mesh and landmark regression. In: Eur. Conf. Comput. Vis. (2022)

    Google Scholar 

  27. Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct-NeRF2NeRF: editing 3d scenes with instructions. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  28. Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. Adv. Neural Inform. Process, Syst (2020)

    Google Scholar 

  29. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  30. Hertz, A., Aberman, K., Cohen-Or, D.: Delta denoising score. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  31. Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: Int. Conf. Learn. Represent. (2022)

    Google Scholar 

  32. Hyung, J., Hwang, S., Kim, D., Lee, H., Choo, J.: Local 3d editing via 3d distillation of clip knowledge. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  33. Jiang, K., Chen, S.Y., Fu, H., Gao, L.: NeRFFaceLighting: implicit and disentangled face lighting representation leveraging generative prior in neural radiance fields. ACM Trans, Graph (2023)

    Google Scholar 

  34. Jin, W., Ryu, N., Kim, G., Baek, S.H., Cho, S.: Dr. 3D: Adapting 3d GANs to artistic drawings. In: ACM SIGGRAPH Asia Conf. Proc. (2022)

    Google Scholar 

  35. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019)

    Google Scholar 

  36. Kim, G., Chun, S.Y.: DATID-3D: diversity-preserved domain adaptation using text-to-image diffusion for 3d generative model. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  37. Ko, J., Cho, K., Choi, D., Ryoo, K., Kim, S.: 3D GAN inversion with pose optimization. In: IEEE Winter Conf. Appl. Comput. Vis. (2023)

    Google Scholar 

  38. Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  39. Li, J., et al.: PREIM3D: 3d consistent precise image attribute editing from a single image. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  40. Li, J., et al.: InstructPix2NeRF: instructed 3d portrait editing from a single image. arXiv preprint arXiv:2311.02826 (2023)

  41. Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Int. Conf. Mach. Learn. (2022)

    Google Scholar 

  42. Lin, C., Lindell, D., Chan, E., Wetzstein, G.: 3D GAN inversion for controllable portrait image animation. In: Eur. Conf. Comput. Vis. Workshop (2022)

    Google Scholar 

  43. Liu, S., Zhang, Y., Li, W., Lin, Z., Jia, J.: Video-P2P: video editing with cross-attention control. arXiv preprint arXiv:2303.04761 (2023)

  44. Liu, Z., et al.: Cones: concept neurons in diffusion models for customized generation. In: Int. Conf. Mach. Learn. (2023)

    Google Scholar 

  45. Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: Int. Conf. Learn. Represent. (2022)

    Google Scholar 

  46. Nitzan, Y., et al.: Domain expansion of image generators. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  47. Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3d-consistent image and geometry generation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  48. Ouyang, H., et al.: CoDeF: content deformation fields for temporally consistent video processing. arXiv preprint arXiv:2308.07926 (2023)

  49. Pan, X., Xu, X., Loy, C.C., Theobalt, C., Dai, B.: A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Adv. Neural Inform. Process, Syst (2021)

    Google Scholar 

  50. Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: Int. Conf. Comput. Vis. (2021)

    Google Scholar 

  51. Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3d face model for pose and illumination invariant face recognition. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (2009)

    Google Scholar 

  52. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3d using 2d diffusion. In: Int. Conf. Learn. Represent. (2023)

    Google Scholar 

  53. Qi, C., et al.: FateZero: fusing attentions for zero-shot text-based video editing. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  54. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Int. Conf. Mach. Learn. (2021)

    Google Scholar 

  55. Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. ACM Trans., Graph. (2022)

    Google Scholar 

  56. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  57. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  58. Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3d-aware image synthesis. In: Adv. Neural Inform. Process. Syst. (2020)

    Google Scholar 

  59. Schwarz, K., Sauer, A., Niemeyer, M., Liao, Y., Geiger, A.: VoxGRAF: Fast 3d-aware image synthesis with sparse voxel grids. In: Adv. Neural Inform. Process. Syst. (2022)

    Google Scholar 

  60. Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: IEEE Conf. Comput. Vis. Pattern Recog. (2020)

    Google Scholar 

  61. Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: Interpreting the disentangled face representation learned by GANs. IEEE Trans. Pattern Anal. Mach, Intell. (2020)

    Google Scholar 

  62. Shi, Z., Peng, S., Xu, Y., Geiger, A., Liao, Y., Shen, Y.: Deep generative models on 3d representations: a survey. arXiv preprint arXiv:2210.15663 (2022)

  63. Shi, Z., et al.: Learning 3d-aware image synthesis with unknown pose distribution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  64. Shi, Z., Xu, Y., Shen, Y., Zhao, D., Chen, Q., Yeung, D.Y.: Improving 3d-aware image synthesis with a geometry-aware discriminator. In: Adv. Neural Inform. Process. Syst. (2022)

    Google Scholar 

  65. Skorokhodov, I., Tulyakov, S., Wang, Y., Wonka, P.: EpiGRAF: rethinking training of 3d gans. In: Adv. Neural Inform. Process. Syst. (2022)

    Google Scholar 

  66. Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., Liu, Y.: IDE-3D: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. ACM Trans., Graph. (2022)

    Book  Google Scholar 

  67. Sun, J., Wang, X., Zhang, Y., Li, X., Zhang, Q., Liu, Y., Wang, J.: FENeRF: face editing in neural radiance fields. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  68. Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for StyleGAN image manipulation. ACM Trans., Graph. (2021)

    Google Scholar 

  69. Tran, P., Zakharov, E., Ho, L.N., Tran, A.T., Hu, L., Li, H.: VOODOO 3D: volumetric portrait disentanglement for one-shot 3d head reenactment. In: IEEE Conf. Comput. Vis. Pattern Recog. (2024)

    Google Scholar 

  70. Trevithick, A., et al.: Real-time radiance fields for single-image portrait view synthesis. ACM Trans., Graph. (2023)

    Book  Google Scholar 

  71. Wang, Q., Shi, Z., Zheng, K., Xu, Y., Peng, S., Shen, Y.: Benchmarking and analyzing 3d-aware image synthesis with a modularized codebase. In: Adv. Neural Inform. Process. Syst. (2023)

    Google Scholar 

  72. Wang, T.C., Mallya, A., Liu, M.Y.: One-shot free-view neural talking-head synthesis for video conferencing. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)

    Google Scholar 

  73. Wang, W., et al.: Zero-shot video editing using off-the-shelf image diffusion models. arXiv preprint arXiv:2303.17599 (2023)

  74. Wang, Y., et al.: NARRATE: a normal assisted free-view portrait stylizer. arXiv preprint arXiv:2207.00974 (2022)

  75. Xia, W., Xue, J.H.: A survey on deep generative 3d-aware image synthesis. ACM Comput. Surv. (2023)

    Google Scholar 

  76. Xie, J., Ouyang, H., Piao, J., Lei, C., Chen, Q.: High-fidelity 3d GAN inversion by pseudo-multi-view optimization. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  77. Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3d-aware image synthesis via learning structural and textural representations. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  78. Xu, Y., Shen, Y., Zhu, J., Yang, C., Zhou, B.: Generative hierarchical features from synthesizing images. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021)

    Google Scholar 

  79. Yang, S., Jiang, L., Liu, Z., Loy, C.C.: Pastiche master: exemplar-based high-resolution portrait style transfer. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022)

    Google Scholar 

  80. Yang, S., Zhou, Y., Liu, Z., , Loy, C.C.: Rerender a video: zero-shot text-guided video-to-video translation. In: ACM SIGGRAPH Asia Conf. Proc. (2023)

    Google Scholar 

  81. Yin, F., et al.: 3D GAN inversion with facial symmetry prior. In: IEEE Conf. Comput. Vis. Pattern Recog. (2023)

    Google Scholar 

  82. Zhang, J., et al.: DeformToon3D: deformable neural radiance fields for 3d toonification. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  83. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Int. Conf. Comput. Vis. (2023)

    Google Scholar 

  84. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conf. Comput. Vis. Pattern Recog. (2018)

    Google Scholar 

  85. Zhou, P., Xie, L., Ni, B., Tian, Q.: CIPS-3D: a 3d-aware generator of gans based on conditionally-independent pixel synthesis. arXiv preprint arXiv:2110.09788 (2021)

  86. Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. In: Eur. Conf. Comput. Vis. (2020)

    Google Scholar 

  87. Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Eur. Conf. Comput. Vis. (2016)

    Google Scholar 

Download references

Acknowledgements

This project was supported by the National Key R&D Program of China under grant number 2022ZD0161501.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bai, Q. et al. (2025). Real-Time 3D-Aware Portrait Editing from a Single Image. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15109. Springer, Cham. https://doi.org/10.1007/978-3-031-72983-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72983-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72982-9

  • Online ISBN: 978-3-031-72983-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics