research-article

Reconstruction of Personalized 3D Face Rigs from Monocular Video

Authors:
Pablo Garrido

Max-Planck-Institute for Informatics, Saarbrücken, Germany

Max-Planck-Institute for Informatics, Saarbrücken, Germany
View Profile

,
Michael Zollhöfer

Max-Planck-Institute for Informatics, Saarbrücken, Germany

Max-Planck-Institute for Informatics, Saarbrücken, Germany
View Profile

,
Dan Casas

Max-Planck-Institute for Informatics, Saarbrücken, Germany

Max-Planck-Institute for Informatics, Saarbrücken, Germany
View Profile

,
Levi Valgaerts

Max-Planck-Institute for Informatics, Saarbrücken, Germany

Max-Planck-Institute for Informatics, Saarbrücken, Germany
View Profile

,
Kiran Varanasi

Technicolor

Technicolor
View Profile

,
Patrick Pérez

Technicolor, Cesson Sévigné Cedex, France

Technicolor, Cesson Sévigné Cedex, France
View Profile

,
Christian Theobalt

Max-Planck-Institute for Informatics, Saarbrücken, Germany

Max-Planck-Institute for Informatics, Saarbrücken, Germany
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 35 Issue 3Article No.: 28pp 1–15https://doi.org/10.1145/2890493

Published:18 May 2016Publication History

ACM Transactions on Graphics

Abstract

We present a novel approach for the automatic creation of a personalized high-quality 3D face rig of an actor from just monocular video data (e.g., vintage movies). Our rig is based on three distinct layers that allow us to model the actor’s facial shape as well as capture his person-specific expression characteristics at high fidelity, ranging from coarse-scale geometry to fine-scale static and transient detail on the scale of folds and wrinkles. At the heart of our approach is a parametric shape prior that encodes the plausible subspace of facial identity and expression variations. Based on this prior, a coarse-scale reconstruction is obtained by means of a novel variational fitting approach. We represent person-specific idiosyncrasies, which cannot be represented in the restricted shape and expression space, by learning a set of medium-scale corrective shapes. Fine-scale skin detail, such as wrinkles, are captured from video via shading-based refinement, and a generative detail formation model is learned. Both the medium- and fine-scale detail layers are coupled with the parametric prior by means of a novel sparse linear regression formulation. Once reconstructed, all layers of the face rig can be conveniently controlled by a low number of blendshape expression parameters, as widely used by animation artists. We show captured face rigs and their motions for several actors filmed in different monocular video formats, including legacy footage from YouTube, and demonstrate how they can be used for 3D animation and 2D video editing. Finally, we evaluate our approach qualitatively and quantitatively and compare to related state-of-the-art methods.

Supplemental Material

Available for Download

zip

garrido.zip (209.2 MB)

Supplemental movie, appendix, image and software files for, Reconstruction of Personalized 3D Face Rigs from Monocular Video

References

Marc Alexa. 2002. Linear combination of transformations. ACM TOG 21, 3 (2002), 380--387. Google ScholarDigital Library
Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The digital emily project: Photoreal facial modeling and animation. In ACM SIGGRAPH 2009 Courses. Article 12, 15 pages. Google ScholarDigital Library
Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, Article 40, 9 pages. Google ScholarDigital Library
Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, Article 75, 10 pages. Google ScholarDigital Library
Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM TOG 33, 6, Article 223, 12 pages. Google ScholarDigital Library
Amit H. Bermano, Derek Bradley, Thabo Beeler, Fabio Zund, Derek Nowrouzezahrai, Ilya Baran, Olga Sorkine-Hornung, Hanspeter Pfister, Robert W. Sumner, Bernd Bickel, and Markus Gross. 2014. Facial performance enhancement using dynamic shape space analysis. ACM TOG 33, 2, Article 13, 12 pages. Google ScholarDigital Library
V. Blanz, C. Basso, T. Poggio, and T. Vetter. 2003. Reanimating faces in images and video. CGF 22, 3, 641--650.Google ScholarCross Ref
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH’99. 187--194. Google ScholarDigital Library
Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, Article 40, 10 pages. Google ScholarDigital Library
Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, Article 46, 9 pages. Google ScholarDigital Library
Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, Article 43, 10 pages. Google ScholarDigital Library
Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D facial expression database for visual computing. IEEE TVCG 20, 3, 413--425. Google ScholarDigital Library
Philip David, Daniel DeMenthon, Ramani Duraiswami, and Hanan Samet. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259--284. Google ScholarDigital Library
Beat Fasel and Juergen Luettin. 2003. Automatic facial expression analysis: A survey. Pattern Recognition 36, 1, 259--275.Google ScholarCross Ref
Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, Article 8, 14 pages. Google ScholarDigital Library
Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, Article 158, 10 pages. Google ScholarDigital Library
Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2 (2015), 193--204. Google ScholarDigital Library
Paul Graham, Borom Tunwattanapong, Jay Busch, Xueming Yu, Andrew Jones, Paul E. Debevec, and Abhijeet Ghosh. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2 (2013), 335--344.Google ScholarCross Ref
Nicholas J. Higham. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174. Google ScholarDigital Library
Arthur E. Hoerl and Robert W. Kennard. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86. Google ScholarDigital Library
Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR.Google ScholarCross Ref
Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, Article 74, 10 pages. Google ScholarDigital Library
Hao-Da Huang, KangKang Yin, Ling Zhao, Yue Qi, Yizhou Yu, and Xin Tong. 2012. Detail-preserving controllable deformation from sparse examples. IEEE TVCG 18, 8, 1215--1227. Google ScholarDigital Library
Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, Article 45 (2015), 14 pages. Google ScholarDigital Library
Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frédéric Pighin. 2003. Learning controls for blend shape based realistic facial animation. In Proc. SCA’03. 187--192. Google ScholarDigital Library
Martin Klaudiny and Adrian Hilton. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT. 17--24. Google ScholarDigital Library
Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarter. Appl. Math. 2, 164--168.Google ScholarCross Ref
Bruno Lévy and Hao (Richard) Zhang. 2010. Spectral mesh processing. In ACM SIGGRAPH 2010 Courses. Article 8, 312 pages. Google ScholarDigital Library
J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and theory of blendshape facial models. In Eurographics STARs. 199--218.Google Scholar
Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, Article 42, 10 pages. Google ScholarDigital Library
Jun Li, Weiwei Xu, Zhi-Quan Cheng, Kai Xu, and Reinhard Klein. 2015. Lightweight wrinkle synthesis for 3D facial modeling and animation. Comput.-Aided Des. 58, 117--122.Google ScholarDigital Library
Wan-Chun Ma, Andrew Jones, Jen-Yuan Chiang, Tim Hawkins, Sune Frederiksen, Pieter Peers, Marko Vukovic, Ming Ouhyoung, and Paul Debevec. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM TOG 27, 5, Article 121, 10 pages. Google ScholarDigital Library
Donald W. Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 2 (1963), 431--441.Google ScholarCross Ref
Jorge J. Moré. 1978. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis. Lecture Notes in Math., Vol. 630. 105--116.Google ScholarCross Ref
Claus Müller. 1966. Spherical Harmonics. Lecture Notes in Math., Vol. 17.Google ScholarCross Ref
Diego Nehab, Szymon Rusinkiewicz, James Davis, and Ravi Ramamoorthi. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543. Google ScholarDigital Library
Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse localized deformation components. ACM TOG 32, 6, Article 179, 10 pages. Google ScholarDigital Library
Tiberiu Popa, I. South-Dickinson, Derek Bradley, Alla Sheffer, and Wolfgang Heidrich. 2010. Globally consistent space-time reconstruction. CGF 29, 5, 1633--1642.Google ScholarCross Ref
Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In Proc. SIGGRAPH ’01. 117--128. Google ScholarDigital Library
Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Real-time avatar animation from a single image. In Proc. FG. 213--220.Google Scholar
Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, Article 222, 13 pages. Google ScholarDigital Library
Eftychios Sifakis, Igor Neverov, and Ronald Fedkiw. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3 (2005), 417--425. Google ScholarDigital Library
Robert W. Sumner and Jovan Popovic. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3 (2004), 399--405. Google ScholarDigital Library
Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total moving face reconstruction. In Proc. ECCV. 796--812.Google Scholar
J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive region-based linear 3D face models. ACM TOG 30, 4, Article 76, 10 pages. Google ScholarDigital Library
Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, Article 183, 14 pages. Google ScholarDigital Library
Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, Article 187, 11 pages. Google ScholarDigital Library
Bruno Vallet and Bruno Lévy. 2008. Spectral geometry processing with manifold harmonics. CGF 27, 2, 251--260.Google ScholarCross Ref
Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433. Google ScholarDigital Library
Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 2, Article 15, 15 pages. Google ScholarDigital Library
Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM TOG 30, 4, Article 77, 10 pages. Google ScholarDigital Library
Andreas Wenger, Andrew Gardner, Chris Tchou, Jonas Unger, Tim Hawkins, and Paul Debevec. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764. Google ScholarDigital Library
Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6, Article 200, 10 pages. Google ScholarDigital Library
Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4, Article 156, 12 pages. Google ScholarDigital Library
Michael Zollhöfer, Justus Thies, Matteo Colaianni, Marc Stamminger, and Günther Greiner. 2014. Interactive model-based reconstruction of the human head using an RGB-D sensor. J.Vis. Comput. Anim. 25, 3--4 (2014), 213--222. Google ScholarDigital Library

Index Terms

Reconstruction of Personalized 3D Face Rigs from Monocular Video
1. Computing methodologies
  1. Computer graphics
    1. Animation

Recommendations

Learning an animatable detailed 3D face model from in-the-wild images

While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. ...
Read More
Dynamic 3D avatar creation from hand-held video input

We present a complete pipeline for creating fully rigged, personalized 3D facial avatars from hand-held video. Our system faithfully recovers facial expression dynamics of the user by adapting a blendshape template to an image sequence of recorded ...
Read More
Semantically-aware blendshape rigs from facial performance measurements
SA '16: SIGGRAPH ASIA 2016 Technical Briefs

We present a framework for automatically generating personalized blendshapes from actor performance measurements, while preserving the semantics of a template facial animation rig. Firstly, we capture various poses from the subject with our ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 35, Issue 3
June 2016
128 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2903775
Editor:
Kavita Bala
Cornell University
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 May 2016
- Accepted: 1 January 2016
- Revised: 1 December 2015
- Received: 1 September 2015
Published in tog Volume 35, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D model fitting
blendshapes
corrective shapes
facial animation
shape-from-shading
video editing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 147
  Total Citations
  View Citations
- 1,378
  Total Downloads
- Downloads (Last 12 months)90
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reconstruction of Personalized 3D Face Rigs from Monocular Video

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Learning an animatable detailed 3D face model from in-the-wild images

Dynamic 3D avatar creation from hand-held video input

Semantically-aware blendshape rigs from facial performance measurements

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reconstruction of Personalized 3D Face Rigs from Monocular Video

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Learning an animatable detailed 3D face model from in-the-wild images

Dynamic 3D avatar creation from hand-held video input

Semantically-aware blendshape rigs from facial performance measurements

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media