Abstract
We present a novel approach for the automatic creation of a personalized high-quality 3D face rig of an actor from just monocular video data (e.g., vintage movies). Our rig is based on three distinct layers that allow us to model the actor’s facial shape as well as capture his person-specific expression characteristics at high fidelity, ranging from coarse-scale geometry to fine-scale static and transient detail on the scale of folds and wrinkles. At the heart of our approach is a parametric shape prior that encodes the plausible subspace of facial identity and expression variations. Based on this prior, a coarse-scale reconstruction is obtained by means of a novel variational fitting approach. We represent person-specific idiosyncrasies, which cannot be represented in the restricted shape and expression space, by learning a set of medium-scale corrective shapes. Fine-scale skin detail, such as wrinkles, are captured from video via shading-based refinement, and a generative detail formation model is learned. Both the medium- and fine-scale detail layers are coupled with the parametric prior by means of a novel sparse linear regression formulation. Once reconstructed, all layers of the face rig can be conveniently controlled by a low number of blendshape expression parameters, as widely used by animation artists. We show captured face rigs and their motions for several actors filmed in different monocular video formats, including legacy footage from YouTube, and demonstrate how they can be used for 3D animation and 2D video editing. Finally, we evaluate our approach qualitatively and quantitatively and compare to related state-of-the-art methods.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Reconstruction of Personalized 3D Face Rigs from Monocular Video
- Marc Alexa. 2002. Linear combination of transformations. ACM TOG 21, 3 (2002), 380--387. Google ScholarDigital Library
- Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The digital emily project: Photoreal facial modeling and animation. In ACM SIGGRAPH 2009 Courses. Article 12, 15 pages. Google ScholarDigital Library
- Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, Article 40, 9 pages. Google ScholarDigital Library
- Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, Article 75, 10 pages. Google ScholarDigital Library
- Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM TOG 33, 6, Article 223, 12 pages. Google ScholarDigital Library
- Amit H. Bermano, Derek Bradley, Thabo Beeler, Fabio Zund, Derek Nowrouzezahrai, Ilya Baran, Olga Sorkine-Hornung, Hanspeter Pfister, Robert W. Sumner, Bernd Bickel, and Markus Gross. 2014. Facial performance enhancement using dynamic shape space analysis. ACM TOG 33, 2, Article 13, 12 pages. Google ScholarDigital Library
- V. Blanz, C. Basso, T. Poggio, and T. Vetter. 2003. Reanimating faces in images and video. CGF 22, 3, 641--650.Google ScholarCross Ref
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH’99. 187--194. Google ScholarDigital Library
- Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, Article 40, 10 pages. Google ScholarDigital Library
- Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, Article 46, 9 pages. Google ScholarDigital Library
- Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, Article 43, 10 pages. Google ScholarDigital Library
- Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D facial expression database for visual computing. IEEE TVCG 20, 3, 413--425. Google ScholarDigital Library
- Philip David, Daniel DeMenthon, Ramani Duraiswami, and Hanan Samet. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259--284. Google ScholarDigital Library
- Beat Fasel and Juergen Luettin. 2003. Automatic facial expression analysis: A survey. Pattern Recognition 36, 1, 259--275.Google ScholarCross Ref
- Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, Article 8, 14 pages. Google ScholarDigital Library
- Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, Article 158, 10 pages. Google ScholarDigital Library
- Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2 (2015), 193--204. Google ScholarDigital Library
- Paul Graham, Borom Tunwattanapong, Jay Busch, Xueming Yu, Andrew Jones, Paul E. Debevec, and Abhijeet Ghosh. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2 (2013), 335--344.Google ScholarCross Ref
- Nicholas J. Higham. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174. Google ScholarDigital Library
- Arthur E. Hoerl and Robert W. Kennard. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86. Google ScholarDigital Library
- Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR.Google ScholarCross Ref
- Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, Article 74, 10 pages. Google ScholarDigital Library
- Hao-Da Huang, KangKang Yin, Ling Zhao, Yue Qi, Yizhou Yu, and Xin Tong. 2012. Detail-preserving controllable deformation from sparse examples. IEEE TVCG 18, 8, 1215--1227. Google ScholarDigital Library
- Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, Article 45 (2015), 14 pages. Google ScholarDigital Library
- Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frédéric Pighin. 2003. Learning controls for blend shape based realistic facial animation. In Proc. SCA’03. 187--192. Google ScholarDigital Library
- Martin Klaudiny and Adrian Hilton. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT. 17--24. Google ScholarDigital Library
- Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarter. Appl. Math. 2, 164--168.Google ScholarCross Ref
- Bruno Lévy and Hao (Richard) Zhang. 2010. Spectral mesh processing. In ACM SIGGRAPH 2010 Courses. Article 8, 312 pages. Google ScholarDigital Library
- J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and theory of blendshape facial models. In Eurographics STARs. 199--218.Google Scholar
- Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, Article 42, 10 pages. Google ScholarDigital Library
- Jun Li, Weiwei Xu, Zhi-Quan Cheng, Kai Xu, and Reinhard Klein. 2015. Lightweight wrinkle synthesis for 3D facial modeling and animation. Comput.-Aided Des. 58, 117--122.Google ScholarDigital Library
- Wan-Chun Ma, Andrew Jones, Jen-Yuan Chiang, Tim Hawkins, Sune Frederiksen, Pieter Peers, Marko Vukovic, Ming Ouhyoung, and Paul Debevec. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM TOG 27, 5, Article 121, 10 pages. Google ScholarDigital Library
- Donald W. Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 2 (1963), 431--441.Google ScholarCross Ref
- Jorge J. Moré. 1978. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis. Lecture Notes in Math., Vol. 630. 105--116.Google ScholarCross Ref
- Claus Müller. 1966. Spherical Harmonics. Lecture Notes in Math., Vol. 17.Google ScholarCross Ref
- Diego Nehab, Szymon Rusinkiewicz, James Davis, and Ravi Ramamoorthi. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543. Google ScholarDigital Library
- Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse localized deformation components. ACM TOG 32, 6, Article 179, 10 pages. Google ScholarDigital Library
- Tiberiu Popa, I. South-Dickinson, Derek Bradley, Alla Sheffer, and Wolfgang Heidrich. 2010. Globally consistent space-time reconstruction. CGF 29, 5, 1633--1642.Google ScholarCross Ref
- Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In Proc. SIGGRAPH ’01. 117--128. Google ScholarDigital Library
- Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Real-time avatar animation from a single image. In Proc. FG. 213--220.Google Scholar
- Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, Article 222, 13 pages. Google ScholarDigital Library
- Eftychios Sifakis, Igor Neverov, and Ronald Fedkiw. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3 (2005), 417--425. Google ScholarDigital Library
- Robert W. Sumner and Jovan Popovic. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3 (2004), 399--405. Google ScholarDigital Library
- Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total moving face reconstruction. In Proc. ECCV. 796--812.Google Scholar
- J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive region-based linear 3D face models. ACM TOG 30, 4, Article 76, 10 pages. Google ScholarDigital Library
- Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, Article 183, 14 pages. Google ScholarDigital Library
- Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, Article 187, 11 pages. Google ScholarDigital Library
- Bruno Vallet and Bruno Lévy. 2008. Spectral geometry processing with manifold harmonics. CGF 27, 2, 251--260.Google ScholarCross Ref
- Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433. Google ScholarDigital Library
- Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 2, Article 15, 15 pages. Google ScholarDigital Library
- Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM TOG 30, 4, Article 77, 10 pages. Google ScholarDigital Library
- Andreas Wenger, Andrew Gardner, Chris Tchou, Jonas Unger, Tim Hawkins, and Paul Debevec. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764. Google ScholarDigital Library
- Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6, Article 200, 10 pages. Google ScholarDigital Library
- Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4, Article 156, 12 pages. Google ScholarDigital Library
- Michael Zollhöfer, Justus Thies, Matteo Colaianni, Marc Stamminger, and Günther Greiner. 2014. Interactive model-based reconstruction of the human head using an RGB-D sensor. J.Vis. Comput. Anim. 25, 3--4 (2014), 213--222. Google ScholarDigital Library
Index Terms
- Reconstruction of Personalized 3D Face Rigs from Monocular Video
Recommendations
Learning an animatable detailed 3D face model from in-the-wild images
While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. ...
Dynamic 3D avatar creation from hand-held video input
We present a complete pipeline for creating fully rigged, personalized 3D facial avatars from hand-held video. Our system faithfully recovers facial expression dynamics of the user by adapting a blendshape template to an image sequence of recorded ...
Semantically-aware blendshape rigs from facial performance measurements
SA '16: SIGGRAPH ASIA 2016 Technical BriefsWe present a framework for automatically generating personalized blendshapes from actor performance measurements, while preserving the semantics of a template facial animation rig. Firstly, we capture various poses from the subject with our ...
Comments