skip to main content
research-article

Reconstruction of Personalized 3D Face Rigs from Monocular Video

Published:18 May 2016Publication History
Skip Abstract Section

Abstract

We present a novel approach for the automatic creation of a personalized high-quality 3D face rig of an actor from just monocular video data (e.g., vintage movies). Our rig is based on three distinct layers that allow us to model the actor’s facial shape as well as capture his person-specific expression characteristics at high fidelity, ranging from coarse-scale geometry to fine-scale static and transient detail on the scale of folds and wrinkles. At the heart of our approach is a parametric shape prior that encodes the plausible subspace of facial identity and expression variations. Based on this prior, a coarse-scale reconstruction is obtained by means of a novel variational fitting approach. We represent person-specific idiosyncrasies, which cannot be represented in the restricted shape and expression space, by learning a set of medium-scale corrective shapes. Fine-scale skin detail, such as wrinkles, are captured from video via shading-based refinement, and a generative detail formation model is learned. Both the medium- and fine-scale detail layers are coupled with the parametric prior by means of a novel sparse linear regression formulation. Once reconstructed, all layers of the face rig can be conveniently controlled by a low number of blendshape expression parameters, as widely used by animation artists. We show captured face rigs and their motions for several actors filmed in different monocular video formats, including legacy footage from YouTube, and demonstrate how they can be used for 3D animation and 2D video editing. Finally, we evaluate our approach qualitatively and quantitatively and compare to related state-of-the-art methods.

Skip Supplemental Material Section

Supplemental Material

References

  1. Marc Alexa. 2002. Linear combination of transformations. ACM TOG 21, 3 (2002), 380--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The digital emily project: Photoreal facial modeling and animation. In ACM SIGGRAPH 2009 Courses. Article 12, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, Article 40, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, Article 75, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM TOG 33, 6, Article 223, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Amit H. Bermano, Derek Bradley, Thabo Beeler, Fabio Zund, Derek Nowrouzezahrai, Ilya Baran, Olga Sorkine-Hornung, Hanspeter Pfister, Robert W. Sumner, Bernd Bickel, and Markus Gross. 2014. Facial performance enhancement using dynamic shape space analysis. ACM TOG 33, 2, Article 13, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Blanz, C. Basso, T. Poggio, and T. Vetter. 2003. Reanimating faces in images and video. CGF 22, 3, 641--650.Google ScholarGoogle ScholarCross RefCross Ref
  8. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH’99. 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, Article 40, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, Article 46, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, Article 43, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D facial expression database for visual computing. IEEE TVCG 20, 3, 413--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Philip David, Daniel DeMenthon, Ramani Duraiswami, and Hanan Samet. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Beat Fasel and Juergen Luettin. 2003. Automatic facial expression analysis: A survey. Pattern Recognition 36, 1, 259--275.Google ScholarGoogle ScholarCross RefCross Ref
  15. Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, Article 8, 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, Article 158, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2 (2015), 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Paul Graham, Borom Tunwattanapong, Jay Busch, Xueming Yu, Andrew Jones, Paul E. Debevec, and Abhijeet Ghosh. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2 (2013), 335--344.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nicholas J. Higham. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Arthur E. Hoerl and Robert W. Kennard. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  22. Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, Article 74, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hao-Da Huang, KangKang Yin, Ling Zhao, Yue Qi, Yizhou Yu, and Xin Tong. 2012. Detail-preserving controllable deformation from sparse examples. IEEE TVCG 18, 8, 1215--1227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, Article 45 (2015), 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frédéric Pighin. 2003. Learning controls for blend shape based realistic facial animation. In Proc. SCA’03. 187--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Klaudiny and Adrian Hilton. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarter. Appl. Math. 2, 164--168.Google ScholarGoogle ScholarCross RefCross Ref
  28. Bruno Lévy and Hao (Richard) Zhang. 2010. Spectral mesh processing. In ACM SIGGRAPH 2010 Courses. Article 8, 312 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and theory of blendshape facial models. In Eurographics STARs. 199--218.Google ScholarGoogle Scholar
  30. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, Article 42, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jun Li, Weiwei Xu, Zhi-Quan Cheng, Kai Xu, and Reinhard Klein. 2015. Lightweight wrinkle synthesis for 3D facial modeling and animation. Comput.-Aided Des. 58, 117--122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wan-Chun Ma, Andrew Jones, Jen-Yuan Chiang, Tim Hawkins, Sune Frederiksen, Pieter Peers, Marko Vukovic, Ming Ouhyoung, and Paul Debevec. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM TOG 27, 5, Article 121, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Donald W. Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 2 (1963), 431--441.Google ScholarGoogle ScholarCross RefCross Ref
  34. Jorge J. Moré. 1978. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis. Lecture Notes in Math., Vol. 630. 105--116.Google ScholarGoogle ScholarCross RefCross Ref
  35. Claus Müller. 1966. Spherical Harmonics. Lecture Notes in Math., Vol. 17.Google ScholarGoogle ScholarCross RefCross Ref
  36. Diego Nehab, Szymon Rusinkiewicz, James Davis, and Ravi Ramamoorthi. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse localized deformation components. ACM TOG 32, 6, Article 179, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tiberiu Popa, I. South-Dickinson, Derek Bradley, Alla Sheffer, and Wolfgang Heidrich. 2010. Globally consistent space-time reconstruction. CGF 29, 5, 1633--1642.Google ScholarGoogle ScholarCross RefCross Ref
  39. Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In Proc. SIGGRAPH ’01. 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Real-time avatar animation from a single image. In Proc. FG. 213--220.Google ScholarGoogle Scholar
  41. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, Article 222, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Eftychios Sifakis, Igor Neverov, and Ronald Fedkiw. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3 (2005), 417--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Robert W. Sumner and Jovan Popovic. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3 (2004), 399--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total moving face reconstruction. In Proc. ECCV. 796--812.Google ScholarGoogle Scholar
  45. J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive region-based linear 3D face models. ACM TOG 30, 4, Article 76, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, Article 183, 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, Article 187, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bruno Vallet and Bruno Lévy. 2008. Spectral geometry processing with manifold harmonics. CGF 27, 2, 251--260.Google ScholarGoogle ScholarCross RefCross Ref
  49. Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 2, Article 15, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM TOG 30, 4, Article 77, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Andreas Wenger, Andrew Gardner, Chris Tchou, Jonas Unger, Tim Hawkins, and Paul Debevec. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6, Article 200, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4, Article 156, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Michael Zollhöfer, Justus Thies, Matteo Colaianni, Marc Stamminger, and Günther Greiner. 2014. Interactive model-based reconstruction of the human head using an RGB-D sensor. J.Vis. Comput. Anim. 25, 3--4 (2014), 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reconstruction of Personalized 3D Face Rigs from Monocular Video

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 35, Issue 3
      June 2016
      128 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/2903775
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2016
      • Accepted: 1 January 2016
      • Revised: 1 December 2015
      • Received: 1 September 2015
      Published in tog Volume 35, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader