skip to main content
research-article

Warp-guided GANs for single-photo facial animation

Published:04 December 2018Publication History
Skip Abstract Section

Abstract

This paper introduces a novel method for realtime portrait animation in a single photo. Our method requires only a single portrait photo and a set of facial landmarks derived from a driving source (e.g., a photo or a video sequence), and generates an animated image with rich facial details. The core of our method is a warp-guided generative model that instantly fuses various fine facial details (e.g., creases and wrinkles), which are necessary to generate a high-fidelity facial expression, onto a pre-warped image. Our method factorizes out the nonlinear geometric transformations exhibited in facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity facial animation generation. We show such a factorization of geometric transformation and appearance synthesis largely helps the network better learn the high nonlinearity of the facial expression functions and also facilitates the design of the network architecture. Through extensive experiments on various portrait photos from the Internet, we show the significant efficacy of our method compared with prior arts.

Skip Supplemental Material Section

Supplemental Material

a231-geng.mp4

mp4

40.5 MB

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Niki Aifanti, Christos Papachristou, and Anastasios Delopoulos. 2010. The MUG facial expression database. In Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th international workshop on. IEEE, 1--4.Google ScholarGoogle Scholar
  3. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).Google ScholarGoogle Scholar
  4. Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, and Michael F Cohen. 2017. Bringing portraits to life. ACM Trans. Graph. 36, 6 (2017), 196:1--196:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jiamin Bai, Aseem Agarwala, Maneesh Agrawala, and Ravi Ramamoorthi. 2013. Automatic cinemagraph portraits. Computer Graphics Forum 32, 4 (2013), 17--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating faces in images and video. Computer Graphics Forum 22, 3 (2003), 641--650.Google ScholarGoogle ScholarCross RefCross Ref
  7. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM Trans. Graph. 32, 4 (2013), 40:1--40:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pia Breuer, Kwang-In Kim, Wolf Kienzle, Bernhard Scholkopf, and Volker Blanz. 2008. Automatic 3D face reconstruction from single images or video. In Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  10. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4 (2015), 46:1--46:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. 33, 4 (2014), 43:1--43:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D shape regression for real-time facial animation. ACM Trans. Graph. 32, 4 (2013), 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. Faceware-house: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2014), 413--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4 (2016), 126:1--126:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kevin Dale, Kalyan Sunkavalli, Micah K Johnson, Daniel Vlasic, Wojciech Matusik, and Hanspeter Pfister. 2011. Video face replacement. ACM Trans. Graph. 30, 6 (2011), 130:1--130:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hui Ding, Kumar Sricharan, and Rama Chellappa. 2018. ExprGAN: Facial Expression Editing with Controllable Expression Intensity. In AAAI.Google ScholarGoogle Scholar
  17. Ohad Fried, Eli Shechtman, Dan B Goldman, and Adam Finkelstein. 2016. Perspective-aware manipulation of portrait photos. ACM Trans. Graph. 35, 4 (2016), 128:1--128:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. 2016. Deepwarp: Photorealistic image resynthesis for gaze manipulation. In European Conference on Computer Vision (ECCV). Springer, 311--326.Google ScholarGoogle ScholarCross RefCross Ref
  19. Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormahlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4217--4224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. Computer Graphics Forum 34, 2 (2015), 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jon Gauthier. 2014. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014, 5 (2014), 2.Google ScholarGoogle Scholar
  22. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS). 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4295--4304.Google ScholarGoogle ScholarCross RefCross Ref
  24. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778. Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683.Google ScholarGoogle Scholar
  25. Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Trans. Graph. 36, 4 (2017), 107:1--107:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition (2017).Google ScholarGoogle ScholarCross RefCross Ref
  27. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google ScholarGoogle Scholar
  28. Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2013. Photorealistic inner mouth expression in speech animation. In ACM SIGGRAPH 2013 Posters. ACM, 9:1--9:1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-driven speech animation synthesis focusing on realistic inside of the mouth. Journal of information processing 22, 2 (2014), 401--409.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hyeongwoo Kim, Pablo Carrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep Video Portraits. ACM Trans. Graph. 37, 4 (2018), 163:1--163:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2017. Fast face-swap using convolutional neural networks. In The IEEE International Conference on Computer Vision. 3697--3705.Google ScholarGoogle ScholarCross RefCross Ref
  32. Claudia Kuster, Tiberiu Popa, Jean-Charles Bazin, Craig Gotsman, and Markus Gross. 2012. Gaze correction for home video conferencing. ACM Trans. Graph. 31, 6 (2012), 174:1--174:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chuan Li and Michael Wand. 2016. Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2479--2486.Google ScholarGoogle ScholarCross RefCross Ref
  34. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4 (2013), 42:1--42:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kai Li, Feng Xu, Jue Wang, Qionghai Dai, and Yebin Liu. 2012. A data-driven approach for facial expression synthesis in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yilong Liu, Feng Xu, Jinxiang Chai, Xin Tong, Lijuan Wang, and Qiang Huo. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6 (2015), 182:1--182:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zicheng Liu, Ying Shan, and Zhengyou Zhang. 2001. Expressive expression mapping with ratio images. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. ACM, 271--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Debbie S Ma, Joshua Correll, and Bernd Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4 (2015), 1122--1135.Google ScholarGoogle Scholar
  39. Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML, Vol. 30. 3.Google ScholarGoogle Scholar
  40. Iacopo Masi, Anh Tuan Tran, Tal Hassner, Jatuporn Toy Leksut, and Gérard Medioni. 2016. Do we really need to collect millions of faces for effective face recognition?. In European Conference on Computer Vision. Springer, 579--596.Google ScholarGoogle ScholarCross RefCross Ref
  41. Umar Mohammed, Simon JD Prince, and Jan Kautz. 2009. Visio-lization: generating novel facial images. ACM Trans. Graph. 28, 3 (2009), 57:1--57:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kyle Olszewski, Zimo Li, Chao Yang, Yi Zhou, Ronald Yu, Zeng Huang, Sitao Xiang, Shunsuke Saito, Pushmeet Kohli, and Hao Li. 2017. Realistic dynamic facial textures from a single image using gans. In IEEE International Conference on Computer Vision (ICCV). 5429--5438.Google ScholarGoogle ScholarCross RefCross Ref
  43. Maja Pantic, Michel Valstar, Ron Rademaker, and Ludo Maat. 2005. Web-based database for facial expression analysis. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 5-pp.Google ScholarGoogle ScholarCross RefCross Ref
  44. Marcel Piotraschke and Volker Blanz. 2016. Automated 3d face reconstruction from multiple images using quality measures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3418--3427.Google ScholarGoogle ScholarCross RefCross Ref
  45. Fengchun Qiao, Naiming Yao, Zirui Jiao, Zhihao Li, Hui Chen, and Hongan Wang. 2018. Geometry-Contrastive Generative Adversarial Network for Facial Expression Synthesis. arXiv preprint arXiv:1802.01822 (2018).Google ScholarGoogle Scholar
  46. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans. Graph. 33, 6 (2014), 222:1--222:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russell Webb. 2017. Learning from Simulated and Unsupervised Images through Adversarial Training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 2242--2251.Google ScholarGoogle ScholarCross RefCross Ref
  48. Lingxiao Song, Zhihe Lu, Ran He, Zhenan Sun, and Tieniu Tan. 2017. Geometry Guided Adversarial Facial Expression Synthesis. arXiv preprint arXiv:1712.03474 (2017).Google ScholarGoogle Scholar
  49. Joshua M Susskind, Geoffrey E Hinton, Javier R Movellan, and Adam K Anderson. 2008. Generating facial expressions with deep belief nets. In Affective Computing. InTech.Google ScholarGoogle Scholar
  50. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183:1--183:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2387--2395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Michel Valstar and M Pantic. 2010. Induced disgust, happiness and surprise: An addition to the mmi facial expression database. In Proc. Int'l Conf. Language Resources and Evaluation, Workshop EMOTION. 65--70.Google ScholarGoogle Scholar
  53. Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3 (2005), 426--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Congyi Wang, Fuhao Shi, Shihong Xia, and Jinxiang Chai. 2016. Realtime 3d eye gaze animation using a single rgb camera. ACM Trans. Graph. 35, 4 (2016), 118:1--118:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4 (2011), 77:1--77:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Fei Yang, Lubomir Bourdev, Eli Shechtman, Jue Wang, and Dimitris Metaxas. 2012. Facial expression editing in video using a temporally-smooth factorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 861--868. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Fei Yang, Jue Wang, Eli Shechtman, Lubomir Bourdev, and Dimitri Metaxas. 2011. Expression flow for 3D-aware face component transfer. ACM Trans. Graph. 30, 4 (2011), 60:1--60:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. 2016. Semantic facial expression editing using autoencoded flow. arXiv preprint arXiv:1611.09961 (2016).Google ScholarGoogle Scholar

Index Terms

  1. Warp-guided GANs for single-photo facial animation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 37, Issue 6
          December 2018
          1401 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/3272127
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 December 2018
          Published in tog Volume 37, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader