One-Shot Decoupled Face Reenactment with Vision Transformer

Hu, Chen; Xie, Xianghua

doi:10.1007/978-3-031-09282-4_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13364))

Included in the following conference series:

International Conference on Pattern Recognition and Artificial Intelligence

1135 Accesses

Abstract

Recent face reenactment paradigm involves estimating an optical flow to warp the source image or its feature maps such that pixel values can be sampled to generate the reenacted image. We propose a one-shot framework in which the reenactment of the overall face and individual landmarks are decoupled. We show that a shallow Vision Transformer can effectively estimate optical flow without much parameters and training data. When reenacting different identities, our method remedies previous conditional generator based method’s inability to preserve identities in reenacted images. To address the identity preserving problem in face reenactment, we model landmark coordinate transformation as a style transfer problem, yielding further improvement on preserving the source image’s identity in the reenacted image. Our method achieves the lower head pose error on the CelebV dataset while obtaining competitive results in identity preserving and expression accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016)
Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: OpenFace 2.0: facial behavior analysis toolkit. In: 13th IEEE International Conference on Automatic Face Gesture Recognition (2018)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)
Google Scholar
Cheng, Y.T., et al.: 3D-model-based face replacement in video. In: SIGGRAPH (2009)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: CVPR (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv: 2010.11929 (2020)
Ha, S., Kersner, M., Kim, B., Seo, S., Kim, D.: MarioNETte: few-shot face reenactment preserving identity of unseen targets. In: AAAI (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv: 1512.03385 (2015)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Kim, H., et al.: Deep video portraits. ACM Trans. Graph. 37, 1–14 (2018)
Google Scholar
Liu, Y., et al.: A survey of visual transformers. arXiv: 2111.06091 (2021)
Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1, e3 (2016)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv: 1511.06434 (2016)
Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 725–741. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_43
Chapter Google Scholar
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: Animating arbitrary objects via deep motion transfer. In: CVPR (2019)
Google Scholar
Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: What makes tom hanks look like tom hanks. In: ICCV (2015)
Google Scholar
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2Face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. Graph. (2005)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)
Google Scholar
Wiles, O., Koepke, A.S., Zisserman, A.: X2Face: a network for controlling face generation using images, audio, and pose codes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 690–706. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_41
Chapter Google Scholar
Wu, W., Zhang, Y., Li, C., Qian, C., Loy, C.C.: ReenactGAN: learning to reenact faces via boundary transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 622–638. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_37
Chapter Google Scholar
Yao, G., et al.: One-shot face reenactment using appearance adaptive normalization. arXiv: 2102.03984 (2021)
Yao, G., Yuan, Y., Shao, T., Zhou, K.: Mesh guided one-shot face reenactment using graph convolutional networks. In: 28th ACM International Conference on Multimedia (2020)
Google Scholar
Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: ICCV (2019)
Google Scholar
Zeng, X., Pan, Y., Wang, M., Zhang, J., Liu, Y.: Realistic face reenactment via self-supervised disentangling of identity and pose. In: AAAI (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Swansea University, Swansea, UK
Chen Hu & Xianghua Xie

Authors

Chen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xianghua Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianghua Xie .

Editor information

Editors and Affiliations

Télécom SudParis, Palaiseau, France
Mounîm El Yacoubi
École de Technologie Supérieure, Montreal, QC, Canada
Eric Granger
Hong Kong Baptist University, Kowloon, Kowloon, Hong Kong
Pong Chi Yuen
Indian Statistical Institute, Kolkata, India
Umapada Pal
Université Paris Cité, Paris, France
Nicole Vincent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, C., Xie, X. (2022). One-Shot Decoupled Face Reenactment with Vision Transformer. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13364. Springer, Cham. https://doi.org/10.1007/978-3-031-09282-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-09282-4_21
Published: 29 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09281-7
Online ISBN: 978-3-031-09282-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

One-Shot Decoupled Face Reenactment with Vision Transformer