research-article

Warp-guided GANs for single-photo facial animation

Authors:

Kun ZhouAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 37, Issue 6

Article No.: 231, Pages 1 - 12

https://doi.org/10.1145/3272127.3275043

Published: 04 December 2018 Publication History

Abstract

This paper introduces a novel method for realtime portrait animation in a single photo. Our method requires only a single portrait photo and a set of facial landmarks derived from a driving source (e.g., a photo or a video sequence), and generates an animated image with rich facial details. The core of our method is a warp-guided generative model that instantly fuses various fine facial details (e.g., creases and wrinkles), which are necessary to generate a high-fidelity facial expression, onto a pre-warped image. Our method factorizes out the nonlinear geometric transformations exhibited in facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity facial animation generation. We show such a factorization of geometric transformation and appearance synthesis largely helps the network better learn the high nonlinearity of the facial expression functions and also facilitates the design of the network architecture. Through extensive experiments on various portrait photos from the Internet, we show the significant efficacy of our method compared with prior arts.

Supplementary Material

ZIP File (a231-geng.zip)

Supplemental files.

Download
.19 KB

MP4 File (a231-geng.mp4)

Download
40.52 MB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, Vol. 16. 265--283.

Digital Library

[2]

Niki Aifanti, Christos Papachristou, and Anastasios Delopoulos. 2010. The MUG facial expression database. In Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th international workshop on. IEEE, 1--4.

[3]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).

[4]

Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, and Michael F Cohen. 2017. Bringing portraits to life. ACM Trans. Graph. 36, 6 (2017), 196:1--196:13.

Digital Library

[5]

Jiamin Bai, Aseem Agarwala, Maneesh Agrawala, and Ravi Ramamoorthi. 2013. Automatic cinemagraph portraits. Computer Graphics Forum 32, 4 (2013), 17--25.

Digital Library

[6]

Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating faces in images and video. Computer Graphics Forum 22, 3 (2003), 641--650.

[7]

Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., 187--194.

Digital Library

[8]

Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM Trans. Graph. 32, 4 (2013), 40:1--40:10.

Digital Library

[9]

Pia Breuer, Kwang-In Kim, Wolf Kienzle, Bernhard Scholkopf, and Volker Blanz. 2008. Automatic 3D face reconstruction from single images or video. In Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on. IEEE, 1--8.

[10]

Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4 (2015), 46:1--46:9.

Digital Library

[11]

Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. 33, 4 (2014), 43:1--43:10.

Digital Library

[12]

Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D shape regression for real-time facial animation. ACM Trans. Graph. 32, 4 (2013), 41.

Digital Library

[13]

Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. Faceware-house: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2014), 413--425.

Digital Library

[14]

Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4 (2016), 126:1--126:12.

Digital Library

[15]

Kevin Dale, Kalyan Sunkavalli, Micah K Johnson, Daniel Vlasic, Wojciech Matusik, and Hanspeter Pfister. 2011. Video face replacement. ACM Trans. Graph. 30, 6 (2011), 130:1--130:10.

Digital Library

[16]

Hui Ding, Kumar Sricharan, and Rama Chellappa. 2018. ExprGAN: Facial Expression Editing with Controllable Expression Intensity. In AAAI.

[17]

Ohad Fried, Eli Shechtman, Dan B Goldman, and Adam Finkelstein. 2016. Perspective-aware manipulation of portrait photos. ACM Trans. Graph. 35, 4 (2016), 128:1--128:10.

Digital Library

[18]

Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. 2016. Deepwarp: Photorealistic image resynthesis for gaze manipulation. In European Conference on Computer Vision (ECCV). Springer, 311--326.

[19]

Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormahlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4217--4224.

Digital Library

[20]

Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. Computer Graphics Forum 34, 2 (2015), 193--204.

Digital Library

[21]

Jon Gauthier. 2014. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014, 5 (2014), 2.

[22]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS). 2672--2680.

Digital Library

[23]

Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4295--4304.

[24]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778. Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683.

[25]

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Trans. Graph. 36, 4 (2017), 107:1--107:14.

Digital Library

[26]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition (2017).

[27]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).

[28]

Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2013. Photorealistic inner mouth expression in speech animation. In ACM SIGGRAPH 2013 Posters. ACM, 9:1--9:1.

Digital Library

[29]

Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-driven speech animation synthesis focusing on realistic inside of the mouth. Journal of information processing 22, 2 (2014), 401--409.

[30]

Hyeongwoo Kim, Pablo Carrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep Video Portraits. ACM Trans. Graph. 37, 4 (2018), 163:1--163:14.

Digital Library

[31]

Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2017. Fast face-swap using convolutional neural networks. In The IEEE International Conference on Computer Vision. 3697--3705.

[32]

Claudia Kuster, Tiberiu Popa, Jean-Charles Bazin, Craig Gotsman, and Markus Gross. 2012. Gaze correction for home video conferencing. ACM Trans. Graph. 31, 6 (2012), 174:1--174:6.

Digital Library

[33]

Chuan Li and Michael Wand. 2016. Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2479--2486.

[34]

Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4 (2013), 42:1--42:10.

Digital Library

[35]

Kai Li, Feng Xu, Jue Wang, Qionghai Dai, and Yebin Liu. 2012. A data-driven approach for facial expression synthesis in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 57--64.

Digital Library

[36]

Yilong Liu, Feng Xu, Jinxiang Chai, Xin Tong, Lijuan Wang, and Qiang Huo. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6 (2015), 182:1--182:10.

Digital Library

[37]

Zicheng Liu, Ying Shan, and Zhengyou Zhang. 2001. Expressive expression mapping with ratio images. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. ACM, 271--276.

Digital Library

[38]

Debbie S Ma, Joshua Correll, and Bernd Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4 (2015), 1122--1135.

[39]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML, Vol. 30. 3.

[40]

Iacopo Masi, Anh Tuan Tran, Tal Hassner, Jatuporn Toy Leksut, and Gérard Medioni. 2016. Do we really need to collect millions of faces for effective face recognition?. In European Conference on Computer Vision. Springer, 579--596.

[41]

Umar Mohammed, Simon JD Prince, and Jan Kautz. 2009. Visio-lization: generating novel facial images. ACM Trans. Graph. 28, 3 (2009), 57:1--57:8.

Digital Library

[42]

Kyle Olszewski, Zimo Li, Chao Yang, Yi Zhou, Ronald Yu, Zeng Huang, Sitao Xiang, Shunsuke Saito, Pushmeet Kohli, and Hao Li. 2017. Realistic dynamic facial textures from a single image using gans. In IEEE International Conference on Computer Vision (ICCV). 5429--5438.

[43]

Maja Pantic, Michel Valstar, Ron Rademaker, and Ludo Maat. 2005. Web-based database for facial expression analysis. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 5-pp.

[44]

Marcel Piotraschke and Volker Blanz. 2016. Automated 3d face reconstruction from multiple images using quality measures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3418--3427.

[45]

Fengchun Qiao, Naiming Yao, Zirui Jiao, Zhihao Li, Hui Chen, and Hongan Wang. 2018. Geometry-Contrastive Generative Adversarial Network for Facial Expression Synthesis. arXiv preprint arXiv:1802.01822 (2018).

[46]

Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans. Graph. 33, 6 (2014), 222:1--222:13.

Digital Library

[47]

Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russell Webb. 2017. Learning from Simulated and Unsupervised Images through Adversarial Training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 2242--2251.

[48]

Lingxiao Song, Zhihe Lu, Ran He, Zhenan Sun, and Tieniu Tan. 2017. Geometry Guided Adversarial Facial Expression Synthesis. arXiv preprint arXiv:1712.03474 (2017).

[49]

Joshua M Susskind, Geoffrey E Hinton, Javier R Movellan, and Adam K Anderson. 2008. Generating facial expressions with deep belief nets. In Affective Computing. InTech.

[50]

Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183:1--183:14.

Digital Library

[51]

Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2387--2395.

Digital Library

[52]

Michel Valstar and M Pantic. 2010. Induced disgust, happiness and surprise: An addition to the mmi facial expression database. In Proc. Int'l Conf. Language Resources and Evaluation, Workshop EMOTION. 65--70.

[53]

Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3 (2005), 426--433.

Digital Library

[54]

Congyi Wang, Fuhao Shi, Shihong Xia, and Jinxiang Chai. 2016. Realtime 3d eye gaze animation using a single rgb camera. ACM Trans. Graph. 35, 4 (2016), 118:1--118:14.

Digital Library

[55]

Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4 (2011), 77:1--77:10.

Digital Library

[56]

Fei Yang, Lubomir Bourdev, Eli Shechtman, Jue Wang, and Dimitris Metaxas. 2012. Facial expression editing in video using a temporally-smooth factorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 861--868.

Digital Library

[57]

Fei Yang, Jue Wang, Eli Shechtman, Lubomir Bourdev, and Dimitri Metaxas. 2011. Expression flow for 3D-aware face component transfer. ACM Trans. Graph. 30, 4 (2011), 60:1--60:10.

Digital Library

[58]

Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. 2016. Semantic facial expression editing using autoencoded flow. arXiv preprint arXiv:1611.09961 (2016).

Cited By

Yang SWang WLan YFan XPeng BYang LDong JWooldridge MDy JNatarajan S(2024)Learning dense correspondence for NeRF-based face reenactmentProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i7.28473(6522-6530)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i7.28473
Zhao HZhou WChen DZhang WGuo YCheng ZYan PYu N(2024)Audio-Visual Contrastive Pre-train for Face Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365131121:2(1-16)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1145/3651311
Shi XHuang ZWang FBian WLi DZhang YZhang MCheung KSee SQin HDai JLi H(2024)Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion ModelingACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657497(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657497
Show More Cited By

Index Terms

Warp-guided GANs for single-photo facial animation
1. Computing methodologies
  1. Computer graphics
    1. Animation
    2. Image manipulation
      1. Image processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Geometry Guided Adversarial Facial Expression Synthesis
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Facial expression synthesis has drawn much attention in the field of computer graphics and pattern recognition. It has been widely used in face animation and recognition. However, it is still challenging due to the high-level semantic presence of large ...
Synthesis of Facial Expressions in Photographs: Characteristics, Approaches, and Challenges

The synthesis of facial expressions has applicationsin areas such as interactive games, biometrics systems, and training of people with disorders, among others. Although this is an area relatively well explored in the literature, there are no recent ...
From 2D to 3D real-time expression transfer for facial animation

In this paper, we present a three-stage approach, which creates realistic facial animations by tracking expressions of a human face in 2D and transferring them to a human-like 3D model in real-time. Our calibration-free method, which is based on an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 37, Issue 6

December 2018

1401 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3272127

Editor:
Takeo Igarashi
The University of Tokyo, Japan

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2018

Published in TOG Volume 37, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
1,411
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)4

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang SWang WLan YFan XPeng BYang LDong JWooldridge MDy JNatarajan S(2024)Learning dense correspondence for NeRF-based face reenactmentProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i7.28473(6522-6530)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i7.28473
Zhao HZhou WChen DZhang WGuo YCheng ZYan PYu N(2024)Audio-Visual Contrastive Pre-train for Face Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365131121:2(1-16)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1145/3651311
Shi XHuang ZWang FBian WLi DZhang YZhang MCheung KSee SQin HDai JLi H(2024)Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion ModelingACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657497(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657497
Xie YXu HSong GWang CShi YLuo L(2024)X-Portrait: Expressive Portrait Animation with Hierarchical Motion AttentionACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657459(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657459
Liu XNi RYang BSong SCangelosi A(2024)Unlocking Human-Like Facial Expressions in Humanoid Robots: A Novel Approach for Action Unit Driven Facial Expression Disentangled SynthesisIEEE Transactions on Robotics10.1109/TRO.2024.342205140(3850-3865)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3422051
Wang ZChen BWang SWang SYe YMa S(2024)Ultra-Low Bitrate Face Video Compression Based on Conversions From 3D Keypoints to 2D Motion MapIEEE Transactions on Image Processing10.1109/TIP.2024.351810033(6850-6864)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3518100
Dong XNing XXu JYu LLi WZhang L(2024)A Recognizable Expression Line Portrait Synthesis Method in Portrait Rendering RobotIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.324100311:1(1440-1450)Online publication date: Feb-2024
https://doi.org/10.1109/TCSS.2023.3241003
Hu PWu XWu YYang W(2024)PortraitNeRF: A Single Neural Radiance Field for Complete and Coordinated Talking Portrait Generation2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688062(1-6)Online publication date: 15-Jul-2024
https://doi.org/10.1109/ICME57554.2024.10688062
Nie YZhao MZhang QLi PZhu JCai H(2024)Make static person walk again via separating pose action from shapeGraphical Models10.1016/j.gmod.2024.101222134(101222)Online publication date: Aug-2024
https://doi.org/10.1016/j.gmod.2024.101222
Koh ATan SNasrudin M(2024)A systematic literature review of generative adversarial networks (GANs) in 3D avatar reconstruction from 2D imagesMultimedia Tools and Applications10.1007/s11042-024-18665-383:26(68813-68853)Online publication date: 1-Mar-2024
https://doi.org/10.1007/s11042-024-18665-3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents