skip to main content
research-article

Corrective 3D reconstruction of lips from monocular video

Published: 05 December 2016 Publication History

Abstract

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.

Supplementary Material

ZIP File (a219-garrido.zip)
Supplemental file.

References

[1]
Alexa, M. 2002. Linear combination of transformations. ACM TOG 21, 3, 380--387.
[2]
Alexander, O., Rogers, M., Lambeth, W., Chiang, J., Ma, W., Wang, C., and Debevec, P. E. 2010. The digital emily project: Achieving a photorealistic digital actor. IEEE CGAA 30, 4, 20--31.
[3]
Alexander, O., Fyffe, G., Busch, J., Yu, X., Ichikari, R., Jones, A., Debevec, P., Jimenez, J., Danvoye, E., Antionazzi, B., Eheler, M., Kysela, Z., and von der Pahlen, J. 2013. Digital Ira: Creating a real-time photoreal digital actor. In ACM Siggrah Posters.
[4]
Anderson, R., Stenger, B., and Cipolla, R. 2013. Lip tracking for 3D face registration. In Proc. MVA, 145--148.
[5]
Barnard, M., Holden, E. J., and Owens, R. 2002. Lip tracking using pattern matching snakes. In Proc. ACCV, 1--6.
[6]
Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, 40:1--40:9.
[7]
Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, 75:1--75:10.
[8]
Beeler, T., Bickel, B., Noris, G., Marschner, S., Beardsley, P., Sumner, R. W., and Gross, M. 2012. Coupled 3D reconstruction of sparse facial hair and skin. ACM TOG 31, 4, 117:1--117:10.
[9]
Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM TOG 33, 6, 223:1--223:12.
[10]
Bermano, A., Beeler, T., Kozlov, Y., Bradley, D., Bickel, B., and Gross, M. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM TOG 34, 4, 44:1--44:11.
[11]
Bhat, K. S., Goldenthal, R., Ye, Y., Mallet, R., and Koperwas, M. 2013. High fidelity facial animation capture and retargeting with contours. In Proc. ACM SCA, 7--14.
[12]
Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M. A., Pfister, H., and Gross, M. H. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 3, 33:1--33:10.
[13]
Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
[14]
Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. ACM Siggraph, 187--194.
[15]
Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: Image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches & Applications.
[16]
Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, 40:1--40:10.
[17]
Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG 29, 4, 41:1--41:10.
[18]
Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, 43:1--43:10.
[19]
Cao, C., Bradley, D., Zhou, K., and Beeler, T. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, 46:1--46:9.
[20]
Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., and Chai, J. 2013. Accurate and robust 3D facial capture using a single RGBD camera. In Proc. ICCV, 3615--3622.
[21]
Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE Trans. Pattern Anal. Machine Intell. 23, 6, 681--685.
[22]
Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG 30, 6, 130:1--130:10.
[23]
Dollár, P., Tu, Z., and Belongie, S. 2006. Supervised learning of edges and object boundaries. In Proc. CVPR, 1964--1971.
[24]
Echevarria, J. I., Bradley, D., Gutierrez, D., and Beeler, T. 2014. Capturing and stylizing hair for 3D fabrication. ACM TOG 33, 4, 125:1--125:11.
[25]
Eveno, N., Caplier, A., and Coulon, P. Y. 2004. Accurate and quasi-automatic lip tracking. IEEE Trans. Circuit and Systems for Video Tech. 14, 5, 706--715.
[26]
Fyffe, G., Jones, A., Alexander, O., Ichikari, R., and Debevec, P. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, 8:1--8:14.
[27]
Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, 158:1--158:10.
[28]
Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., and Theobalt, C. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2, 193--204.
[29]
Garrido, P., Zollhöfer, M., Casas, D., Valgaerts, L., Varanasi, K., Pérez, P., and Theobalt, C. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM TOG 35, 3, 28:1--28:15.
[30]
Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM TOG 30, 6, 129:1--129:10.
[31]
Graham, P., Tunwattanapong, B., Busch, J., Yu, X., Jones, A., Debevec, P. E., and Ghosh, A. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2, 335--344.
[32]
Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. ACM Siggraph, 55--66.
[33]
Higham, N. J. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174.
[34]
Hoerl, A. E., and Kennard, R. W. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86.
[35]
Hsieh, P.-L., Ma, C., Yu, J., and Li, H. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR, 1675--1683.
[36]
Hu, L., Ma, C., Luo, L., and Li, H. 2015. Single-view hair modeling using a hairstyle database. ACM TOG 34, 4, 125:1--125:9.
[37]
Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Lever-aging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, 74:1--74:10.
[38]
Ichim, A. E., Bouaziz, S., and Pauly, M. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, 45:1--45:14.
[39]
Kaucic, R., and Blake, A. 1998. Accurate, real-time, unadorned lip tracking. In Proc. ICCV, 370--375.
[40]
Kawai, M., Iwao, T., Maejima, A., and Morishima, S. 2014. Automatic photorealistic 3D inner mouth restoration from frontal images. In Proc. ISVC, 51--62.
[41]
Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353.
[42]
Klaudiny, M., and Hilton, A. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT, 17--24.
[43]
Lewis, J., and Anjyo, K.-i. 2010. Direct manipulation blend-shapes. IEEE Comp. Graphics and Applications 30, 4, 42--50.
[44]
Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, 42:1--42:10.
[45]
Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., and Huo, Q. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6, 182:1--182:10.
[46]
Luo, L., Li, H., Paris, S., Weise, T., Pauly, M., and Rusinkiewicz, S. 2012. Multi-view hair capture using orientation fields. In Proc. CVPR, 1490--1497.
[47]
Nagano, K., Fyffe, G., Alexander, O., Barbič, J., Li, H., Ghosh, A., and Debevec, P. 2015. Skin microstructure deformation with displacement map convolution. ACM TOG 34, 4, 109:1--109:10.
[48]
Nath, A. R., and Beauchamp, M. S. 2012. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage 59, 1, 781--787.
[49]
Nguyen, Q. D., and Milgram, M. 2009. Semi adaptive appearance models for lip tracking. In Proc. ICIP, 2437--2440.
[50]
Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM Siggraph Courses.
[51]
Saragih, J. M., Lucey, S., and Cohn, J. F. 2009. Face alignment through subspace constrained mean-shifts. In Proc. ICCV, 1034--1041.
[52]
Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Computer Vision 91, 2, 200--215.
[53]
Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, 222:1--222:13.
[54]
Sifakis, E., Neverov, I., and Fedkiw, R. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3, 417--425.
[55]
Sumner, R. W., and Popovic, J. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3, 399--405.
[56]
Suwajanakorn, S., Kemelmacher-Shlizerman, I., and Seitz, S. M. 2014. Total moving face reconstruction. In Proc. ECCV, 796--812.
[57]
Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, I. 2015. What makes Tom Hanks look like Tom Hanks. In Proc. ICCV, 3952--3960.
[58]
Thies, J., Zollhöfer, M., Niessner, M., Valgaerts, L., Stamminger, M., and Theobalt, C. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, 183:1--183:14.
[59]
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Niessner, M. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proc. CVPR.
[60]
Tian, Y.-L., Kanade, T., and Cohn, J. F. 2000. Robust lip tracking by combining shape, color and motion. In Proc. ACCV, 1--6.
[61]
Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, 187:1--187:11.
[62]
Vlasic, D., Brand, M., Pfister, H., and Popovic, J. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433.
[63]
Wang, S. L., Lau, W. H., and Leung, S. H. 2004. Automatic lip contour extraction from color images. Pattern Recogn. 37, 12, 2375--2387.
[64]
Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3D facial expressions. CGF 23, 3, 677--686.
[65]
Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: Live facial puppetry. In Proc. ACM SCA, 7--16.
[66]
Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1--77:10.
[67]
Wenger, A., Gardner, A., Tchou, C., Unger, J., Hawkins, T., and Debevec, P. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764.
[68]
Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless, J., Lee, J., Ngan, A., Jensen, H. W., and Gross, M. 2006. Analysis of human faces using a measurement-based skin reflectance model. ACM TOG 25, 3, 1013--1024.
[69]
Williams, L. 1990. Performance-driven facial animation. In Proc. ACM Siggraph, 235--242.

Cited By

View all
  • (2024)3D Facial Expressions through Analysis-by-Neural-Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00241(2490-2501)Online publication date: 16-Jun-2024
  • (2024)3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00123(1227-1237)Online publication date: 16-Jun-2024
  • (2024)Relightable Gaussian Codec Avatars2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00021(130-141)Online publication date: 16-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 35, Issue 6
November 2016
1045 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2980179
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2016
Published in TOG Volume 35, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. face modeling
  2. facial performance capture
  3. lip shape reconstruction
  4. radial basis function networks

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)3D Facial Expressions through Analysis-by-Neural-Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00241(2490-2501)Online publication date: 16-Jun-2024
  • (2024)3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00123(1227-1237)Online publication date: 16-Jun-2024
  • (2024)Relightable Gaussian Codec Avatars2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00021(130-141)Online publication date: 16-Jun-2024
  • (2024)3D Animation Simulation Based on Computer Virtual Simulation TechnologyMultidimensional Signals, Augmented Reality and Information Technologies10.1007/978-981-99-7011-7_18(227-236)Online publication date: 2-Jan-2024
  • (2023)HACK: Learning a Parametric Head and Neck Model for High-fidelity AnimationACM Transactions on Graphics10.1145/359209342:4(1-20)Online publication date: 26-Jul-2023
  • (2023)Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00315(3379-3390)Online publication date: 1-Oct-2023
  • (2023)SPECTRE: Visual Speech-Informed Perceptual 3D Facial Expression Reconstruction from Videos2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00609(5745-5755)Online publication date: Jun-2023
  • (2022)The accuracy of a three-dimensional face model reconstructing method based on conventional clinical two-dimensional photosBMC Oral Health10.1186/s12903-022-02439-022:1Online publication date: 19-Sep-2022
  • (2022)SCULPTORACM Transactions on Graphics10.1145/3550454.355546241:6(1-17)Online publication date: 30-Nov-2022
  • (2022)3D Face Reconstruction in Deep Learning Era: A SurveyArchives of Computational Methods in Engineering10.1007/s11831-021-09705-429:5(3475-3507)Online publication date: 10-Jan-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media