Skip to main content
Log in

High-fidelity facial expression transfer using part-based local–global conditional gans

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We propose a GAN-based facial expression transfer method. It can transfer the facial expression of a reference subject to the source subject while preserving the source identity attributes, such as shape, appearance, and illumination. Our method consists of two modules based on GAN: Parts Generation Networks (PGNs), and Parts Fusion Network (PFN). Instead of training the model on the entire image globally, our key idea is to train different PGNs for different local facial parts independently and then fuse the generated parts together using PFN. To encode the facial expression faithfully, we use a pre-trained parametric 3D head model (called photometric FLAME) to reconstruct realistic head models from both source and reference images. We also extract 3D facial feature points of the reference image to handle extreme poses and occlusions. Based on the extracted contextual information, we use PGNs to generate different parts of the head independently. Finally, PFN is used to fuse all the generated parts together to form the final image. Experiments show that the proposed model outperforms state-of-the-art approaches in faithfully transferring facial expressions, especially when the reference image has a different head pose to the source image. Ablation studies demonstrate the power of using PGNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://www.youtube.com/watch?v=eXh1K36ZVo8.

  2. https://www.youtube.com/watch?v=cdZZpaB2kDM.

  3. https://www.youtube.com/watch?v=sdFzXyitJvo &t=44s.

References

  1. Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: How to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)

  2. Abdal, R., Qin, Y., Wonka, P.: Image2stylegan++: How to edit the embedded images? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8296–8305 (2020)

  3. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194 (1999)

  4. Bounareli, S., Argyriou, V., Tzimiropoulos, G.: Finding directions in gan’s latent space for neural face reenactment. arXiv:2202.00046 (2022)

  5. Chang, J.R., Chen, Y.S., Chiu, W.C.: Learning facial representations from the cycle-consistency of face. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9680–9689 (2021)

  6. Chen, A., Liu, R., Xie, L., Chen, Z., Su, H., Yu, J.: Sofgan: a portrait image generator with dynamic styling. ACM Trans. Graph. 41(1), 1–26 (2022)

    Google Scholar 

  7. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition, pp. 4690–4699 (2019)

  8. Doukas, M.C., Koujan, M.R., Sharmanska, V., Roussos, A., Zafeiriou, S.: Head2head++: deep facial attributes re-targeting. IEEE Trans. Biomet. Behav. Ident. Sci. 3(1), 31–43 (2021)

    Article  Google Scholar 

  9. Doukas, M.C., Zafeiriou, S., Sharmanska, V.: Headgan: One-shot neural head synthesis and editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14398–14407 (2021)

  10. Feng, H.: Photometric flame fitting (Last accessed: March, 2023)

  11. Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3d face model from in-the-wild images. ACM Trans. Graph. 40(4), 1–13 (2021)

    Article  Google Scholar 

  12. Ha, S., Kersner, M., Kim, B., Seo, S., Kim, D.: Marionette: few-shot face reenactment preserving identity of unseen targets. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10893–10900 (2020)

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  14. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems, vol. 30 (2017)

  15. Hong, Y., Peng, B., Xiao, H., Liu, L., Zhang, J.: Headnerf: a realtime nerf-based parametric head model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20342–20352 (2022)

  16. Hsu, G.S., Tsai, C.H., Wu, H.Y.: Dual-generator face reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 642–650 (2022)

  17. Hsu, G.S.J., Wu, H.Y.: Pose-guided and style-transferred face reenactment. In: Proceedings of IEEE International Conference on Image, pp. 2458–2462 (2021)

  18. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  19. Jiang, J., Li, G., Wu, S., Zhang, H., Nie, Y.: Bpa-gan: Human motion transfer using body-part-aware generative adversarial networks. Graph. Models 115, 101107 (2021)

    Article  Google Scholar 

  20. Jourabloo, A., De la Torre, F., Saragih, J., Wei, S.E., Lombardi, S., Wang, T.L., Belko, D., Trimble, A., Badino, H.: Robust egocentric photo-realistic facial expression transfer for virtual reality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20323–20332 (2022)

  21. Kafri, O., Patashnik, O., Alaluf, Y., Cohen-Or, D.: Stylefusion: disentangling spatial segments in Stylegan-generated images. ACM Trans. Graph. 41(5), 1–15 (2022)

    Article  Google Scholar 

  22. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

  23. Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Niessner, M., Pérez, P., Richardt, C., Zollhöfer, M., Theobalt, C.: Deep video portraits. ACM Trans. Graph. 37(4), 1–14 (2018)

    Article  Google Scholar 

  24. Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–16 (2020)

  25. Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics 36(6), 1–17 (2017)

    Google Scholar 

  26. Liang, B., Pan, Y., Guo, Z., Zhou, H., Hong, Z., Han, X., Han, J., Liu, J., Ding, E., Wang, J.: Expressive talking head generation with granular audio-visual control. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3387–3396 (2022)

  27. Lin, Y., Shen, J., Wang, Y., Pantic, M.: Roi tanh-polar transformer network for face parsing in the wild. Image Vis. Comput. 112, 1–13 (2021)

    Article  Google Scholar 

  28. Liu, J., Chen, P., Liang, T., Li, Z., Yu, C., Zou, S., Dai, J., Han, J.: Li-net: Large-pose identity-preserving face reenactment network. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1–6 (2021)

  29. Liu, Z., Li, M., Zhang, Y., Wang, C., Zhang, Q., Wang, J., Nie, Y.: Fine-grained face swapping via regional gan inversion. arXiv:2211.14068 (2022)

  30. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)

    Article  Google Scholar 

  31. Ma, T., Nie, Y., Zhang, Q., Zhang, Z., Sun, H., Li, G.: Effective video stabilization via joint trajectory smoothing and frame warping. IEEE Trans. Visual Comput. Graph. 26(11), 3163–3176 (2019)

    Article  Google Scholar 

  32. Nie, Y., Su, T., Zhang, Z., Sun, H., Li, G.: Dynamic video stitching via shakiness removing. IEEE Trans. Image Process. 27(1), 164–178 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  33. Nie, Y., Sun, H., Li, P., Xiao, C., Ma, K.L.: Object movements synopsis viapart assembling and stitching. IEEE Trans. Visual Comput. Graph. 20(9), 1303–1315 (2014)

    Article  Google Scholar 

  34. Nirkin, Y., Hassner, T., Keller, Y.: Fsganv2: better subject agnostic face swapping and reenactment. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

  35. Peng, B., Fan, H., Wang, W., Dong, J., Lyu, S.: A unified framework for high fidelity face swap and expression reenactment. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3673–3684 (2021)

    Article  Google Scholar 

  36. Shu, C., Wu, H., Zhou, H., Liu, J., Hong, Z., Ding, C., Han, J., Liu, J., Ding, E., Wang, J.: Few-shot head swapping in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10789–10798 (2022)

  37. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. In: Conference on Neural Information Processing Systems (NeurIPS) (2019)

  38. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)

  39. Wang, Q., Zhang, L., Li, B.: Safa: structure aware face animation. In: 2021 International Conference on 3D Vision, pp. 679–688 (2021)

  40. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

  41. Wang, X., Li, W., Huang, D.: Expression-latent-space-guided gan for facial expression animation based on discrete labels. In: Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–8 (2021)

  42. Wu, R., Zhang, G., Lu, S., Chen, T.: Cascade ef-gan: progressive facial expression editing with local focuses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5021–5030 (2020)

  43. Xu, C., Zhang, J., Hua, M., He, Q., Yi, Z., Liu, Y.: Region-aware face swapping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7632–7641 (2022)

  44. Yao, G., Yuan, Y., Shao, T., Li, S., Liu, S., Liu, Y., Wang, M., Zhou, K.: One-shot face reenactment using appearance adaptive normalization. arXiv:2102.03984 (2021)

  45. Yao, G., Yuan, Y., Shao, T., Zhou, K.: Mesh guided one-shot face reenactment using graph convolutional networks. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1773–1781 (2020)

  46. Yashima, T., Narihira, T., Kojima, T.: Thinking the fusion strategy of multi-reference face reenactment. arXiv:2202.10758 (2022)

  47. Zakharov, E., Ivakhnenko, A., Shysheya, A., Lempitsky, V.: Fast bi-layer neural synthesis of one-shot realistic head avatars. In: European Conference on Computer Vision, pp. 524–540 (2020)

  48. Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9459–9468 (2019)

  49. Zeno, B., Kalinovskiy, I., Matveev, Y., Alkhatib, B.: Ctrlfacenet: framework for geometric-driven face image synthesis. Pattern Recogn. Lett. 138, 527–533 (2020)

    Article  Google Scholar 

  50. Zhang, J., Zeng, X., Wang, M., Pan, Y., Liu, L., Liu, Y., Ding, Y., Fan, C.: Freenet: multi-identity face reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5326–5335 (2020)

  51. Zhu, P., Abdal, R., Qin, Y., Wonka, P.: Sean: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5104–5113 (2020)

Download references

Acknowledgements

This research is partially funded by NSFC (61972160, 62072191), and NSF of Guangdong Province (2021A1515012301).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiqing Li.

Ethics declarations

Conflict of interest.

The authors state that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 372 KB)

Supplementary file 2 (pdf 5432 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rashid, M.M., Wu, S., Nie, Y. et al. High-fidelity facial expression transfer using part-based local–global conditional gans. Vis Comput 39, 3635–3646 (2023). https://doi.org/10.1007/s00371-023-03035-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03035-1

Keywords

Navigation