High-fidelity facial expression transfer using part-based local–global conditional gans

Rashid, Muhammad Mamunur; Wu, Shihao; Nie, Yongwei; Li, Guiqing

doi:10.1007/s00371-023-03035-1

High-fidelity facial expression transfer using part-based local–global conditional gans

Original article
Published: 26 July 2023

Volume 39, pages 3635–3646, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Muhammad Mamunur Rashid¹,
Shihao Wu²,
Yongwei Nie¹ &
…
Guiqing Li¹

501 Accesses
Explore all metrics

Abstract

We propose a GAN-based facial expression transfer method. It can transfer the facial expression of a reference subject to the source subject while preserving the source identity attributes, such as shape, appearance, and illumination. Our method consists of two modules based on GAN: Parts Generation Networks (PGNs), and Parts Fusion Network (PFN). Instead of training the model on the entire image globally, our key idea is to train different PGNs for different local facial parts independently and then fuse the generated parts together using PFN. To encode the facial expression faithfully, we use a pre-trained parametric 3D head model (called photometric FLAME) to reconstruct realistic head models from both source and reference images. We also extract 3D facial feature points of the reference image to handle extreme poses and occlusions. Based on the extracted contextual information, we use PGNs to generate different parts of the head independently. Finally, PFN is used to fuse all the generated parts together to form the final image. Experiments show that the proposed model outperforms state-of-the-art approaches in faithfully transferring facial expressions, especially when the reference image has a different head pose to the source image. Ablation studies demonstrate the power of using PGNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

Exp-GAN: 3D-Aware Facial Image Generation with Expression Control

Towards Photo-Realistic Facial Expression Manipulation

Article 28 August 2020

Facial Expression Recognition with an Attention Network Using a Single Depth Image

Notes

References

Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: How to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan++: How to edit the embedded images? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8296–8305 (2020)
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194 (1999)
Bounareli, S., Argyriou, V., Tzimiropoulos, G.: Finding directions in gan’s latent space for neural face reenactment. arXiv:2202.00046 (2022)
Chang, J.R., Chen, Y.S., Chiu, W.C.: Learning facial representations from the cycle-consistency of face. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9680–9689 (2021)
Chen, A., Liu, R., Xie, L., Chen, Z., Su, H., Yu, J.: Sofgan: a portrait image generator with dynamic styling. ACM Trans. Graph. 41(1), 1–26 (2022)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition, pp. 4690–4699 (2019)
Doukas, M.C., Koujan, M.R., Sharmanska, V., Roussos, A., Zafeiriou, S.: Head2head++: deep facial attributes re-targeting. IEEE Trans. Biomet. Behav. Ident. Sci. 3(1), 31–43 (2021)
Article Google Scholar
Doukas, M.C., Zafeiriou, S., Sharmanska, V.: Headgan: One-shot neural head synthesis and editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14398–14407 (2021)
Feng, H.: Photometric flame fitting (Last accessed: March, 2023)
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3d face model from in-the-wild images. ACM Trans. Graph. 40(4), 1–13 (2021)
Article Google Scholar
Ha, S., Kersner, M., Kim, B., Seo, S., Kim, D.: Marionette: few-shot face reenactment preserving identity of unseen targets. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10893–10900 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems, vol. 30 (2017)
Hong, Y., Peng, B., Xiao, H., Liu, L., Zhang, J.: Headnerf: a realtime nerf-based parametric head model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20342–20352 (2022)
Hsu, G.S., Tsai, C.H., Wu, H.Y.: Dual-generator face reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 642–650 (2022)
Hsu, G.S.J., Wu, H.Y.: Pose-guided and style-transferred face reenactment. In: Proceedings of IEEE International Conference on Image, pp. 2458–2462 (2021)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Jiang, J., Li, G., Wu, S., Zhang, H., Nie, Y.: Bpa-gan: Human motion transfer using body-part-aware generative adversarial networks. Graph. Models 115, 101107 (2021)
Article Google Scholar
Jourabloo, A., De la Torre, F., Saragih, J., Wei, S.E., Lombardi, S., Wang, T.L., Belko, D., Trimble, A., Badino, H.: Robust egocentric photo-realistic facial expression transfer for virtual reality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20323–20332 (2022)
Kafri, O., Patashnik, O., Alaluf, Y., Cohen-Or, D.: Stylefusion: disentangling spatial segments in Stylegan-generated images. ACM Trans. Graph. 41(5), 1–15 (2022)
Article Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Niessner, M., Pérez, P., Richardt, C., Zollhöfer, M., Theobalt, C.: Deep video portraits. ACM Trans. Graph. 37(4), 1–14 (2018)
Article Google Scholar
Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–16 (2020)
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics 36(6), 1–17 (2017)
Google Scholar
Liang, B., Pan, Y., Guo, Z., Zhou, H., Hong, Z., Han, X., Han, J., Liu, J., Ding, E., Wang, J.: Expressive talking head generation with granular audio-visual control. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3387–3396 (2022)
Lin, Y., Shen, J., Wang, Y., Pantic, M.: Roi tanh-polar transformer network for face parsing in the wild. Image Vis. Comput. 112, 1–13 (2021)
Article Google Scholar
Liu, J., Chen, P., Liang, T., Li, Z., Yu, C., Zou, S., Dai, J., Han, J.: Li-net: Large-pose identity-preserving face reenactment network. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1–6 (2021)
Liu, Z., Li, M., Zhang, Y., Wang, C., Zhang, Q., Wang, J., Nie, Y.: Fine-grained face swapping via regional gan inversion. arXiv:2211.14068 (2022)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)
Article Google Scholar
Ma, T., Nie, Y., Zhang, Q., Zhang, Z., Sun, H., Li, G.: Effective video stabilization via joint trajectory smoothing and frame warping. IEEE Trans. Visual Comput. Graph. 26(11), 3163–3176 (2019)
Article Google Scholar
Nie, Y., Su, T., Zhang, Z., Sun, H., Li, G.: Dynamic video stitching via shakiness removing. IEEE Trans. Image Process. 27(1), 164–178 (2017)
Article MathSciNet MATH Google Scholar
Nie, Y., Sun, H., Li, P., Xiao, C., Ma, K.L.: Object movements synopsis viapart assembling and stitching. IEEE Trans. Visual Comput. Graph. 20(9), 1303–1315 (2014)
Article Google Scholar
Nirkin, Y., Hassner, T., Keller, Y.: Fsganv2: better subject agnostic face swapping and reenactment. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
Peng, B., Fan, H., Wang, W., Dong, J., Lyu, S.: A unified framework for high fidelity face swap and expression reenactment. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3673–3684 (2021)
Article Google Scholar
Shu, C., Wu, H., Zhou, H., Liu, J., Hong, Z., Ding, C., Han, J., Liu, J., Ding, E., Wang, J.: Few-shot head swapping in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10789–10798 (2022)
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. In: Conference on Neural Information Processing Systems (NeurIPS) (2019)
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)
Wang, Q., Zhang, L., Li, B.: Safa: structure aware face animation. In: 2021 International Conference on 3D Vision, pp. 679–688 (2021)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wang, X., Li, W., Huang, D.: Expression-latent-space-guided gan for facial expression animation based on discrete labels. In: Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–8 (2021)
Wu, R., Zhang, G., Lu, S., Chen, T.: Cascade ef-gan: progressive facial expression editing with local focuses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5021–5030 (2020)
Xu, C., Zhang, J., Hua, M., He, Q., Yi, Z., Liu, Y.: Region-aware face swapping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7632–7641 (2022)
Yao, G., Yuan, Y., Shao, T., Li, S., Liu, S., Liu, Y., Wang, M., Zhou, K.: One-shot face reenactment using appearance adaptive normalization. arXiv:2102.03984 (2021)
Yao, G., Yuan, Y., Shao, T., Zhou, K.: Mesh guided one-shot face reenactment using graph convolutional networks. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1773–1781 (2020)
Yashima, T., Narihira, T., Kojima, T.: Thinking the fusion strategy of multi-reference face reenactment. arXiv:2202.10758 (2022)
Zakharov, E., Ivakhnenko, A., Shysheya, A., Lempitsky, V.: Fast bi-layer neural synthesis of one-shot realistic head avatars. In: European Conference on Computer Vision, pp. 524–540 (2020)
Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9459–9468 (2019)
Zeno, B., Kalinovskiy, I., Matveev, Y., Alkhatib, B.: Ctrlfacenet: framework for geometric-driven face image synthesis. Pattern Recogn. Lett. 138, 527–533 (2020)
Article Google Scholar
Zhang, J., Zeng, X., Wang, M., Pan, Y., Liu, L., Liu, Y., Ding, Y., Fan, C.: Freenet: multi-identity face reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5326–5335 (2020)
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: Sean: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5104–5113 (2020)

Download references

Acknowledgements

This research is partially funded by NSFC (61972160, 62072191), and NSF of Guangdong Province (2021A1515012301).

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Muhammad Mamunur Rashid, Yongwei Nie & Guiqing Li
Capskin AG, Zürich, Switzerland
Shihao Wu

Authors

Muhammad Mamunur Rashid
View author publications
You can also search for this author in PubMed Google Scholar
Shihao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yongwei Nie
View author publications
You can also search for this author in PubMed Google Scholar
Guiqing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiqing Li.

Ethics declarations

Conflict of interest.

The authors state that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 372 KB)

Supplementary file 2 (pdf 5432 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rashid, M.M., Wu, S., Nie, Y. et al. High-fidelity facial expression transfer using part-based local–global conditional gans. Vis Comput 39, 3635–3646 (2023). https://doi.org/10.1007/s00371-023-03035-1

Download citation

Accepted: 06 July 2023
Published: 26 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-03035-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-fidelity facial expression transfer using part-based local–global conditional gans

Abstract

Access this article

Similar content being viewed by others

Exp-GAN: 3D-Aware Facial Image Generation with Expression Control

Towards Photo-Realistic Facial Expression Manipulation

Facial Expression Recognition with an Attention Network Using a Single Depth Image

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest.

Additional information

Publisher's Note

Supplementary Information

Supplementary file 2 (pdf 5432 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-fidelity facial expression transfer using part-based local–global conditional gans

Abstract

Access this article

Similar content being viewed by others

Exp-GAN: 3D-Aware Facial Image Generation with Expression Control

Towards Photo-Realistic Facial Expression Manipulation

Facial Expression Recognition with an Attention Network Using a Single Depth Image

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest.

Additional information

Publisher's Note

Supplementary Information

Supplementary file 2 (pdf 5432 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation