research-article

Corrective 3D reconstruction of lips from monocular video

Authors:

Michael Zollhöfer,

Patrick Pérez,

Christian TheobaltAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 35, Issue 6

Article No.: 219, Pages 1 - 11

https://doi.org/10.1145/2980179.2982419

Published: 05 December 2016 Publication History

Abstract

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.

Supplementary Material

ZIP File (a219-garrido.zip)

Supplemental file.

Download
231.91 MB

References

[1]

Alexa, M. 2002. Linear combination of transformations. ACM TOG 21, 3, 380--387.

Digital Library

[2]

Alexander, O., Rogers, M., Lambeth, W., Chiang, J., Ma, W., Wang, C., and Debevec, P. E. 2010. The digital emily project: Achieving a photorealistic digital actor. IEEE CGAA 30, 4, 20--31.

Digital Library

[3]

Alexander, O., Fyffe, G., Busch, J., Yu, X., Ichikari, R., Jones, A., Debevec, P., Jimenez, J., Danvoye, E., Antionazzi, B., Eheler, M., Kysela, Z., and von der Pahlen, J. 2013. Digital Ira: Creating a real-time photoreal digital actor. In ACM Siggrah Posters.

Digital Library

[4]

Anderson, R., Stenger, B., and Cipolla, R. 2013. Lip tracking for 3D face registration. In Proc. MVA, 145--148.

[5]

Barnard, M., Holden, E. J., and Owens, R. 2002. Lip tracking using pattern matching snakes. In Proc. ACCV, 1--6.

[6]

Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, 40:1--40:9.

Digital Library

[7]

Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, 75:1--75:10.

Digital Library

[8]

Beeler, T., Bickel, B., Noris, G., Marschner, S., Beardsley, P., Sumner, R. W., and Gross, M. 2012. Coupled 3D reconstruction of sparse facial hair and skin. ACM TOG 31, 4, 117:1--117:10.

Digital Library

[9]

Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM TOG 33, 6, 223:1--223:12.

Digital Library

[10]

Bermano, A., Beeler, T., Kozlov, Y., Bradley, D., Bickel, B., and Gross, M. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM TOG 34, 4, 44:1--44:11.

Digital Library

[11]

Bhat, K. S., Goldenthal, R., Ye, Y., Mallet, R., and Koperwas, M. 2013. High fidelity facial animation capture and retargeting with contours. In Proc. ACM SCA, 7--14.

Digital Library

[12]

Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M. A., Pfister, H., and Gross, M. H. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 3, 33:1--33:10.

Digital Library

[13]

Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.

Digital Library

[14]

Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. ACM Siggraph, 187--194.

Digital Library

[15]

Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: Image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches & Applications.

Digital Library

[16]

Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, 40:1--40:10.

Digital Library

[17]

Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG 29, 4, 41:1--41:10.

Digital Library

[18]

Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, 43:1--43:10.

Digital Library

[19]

Cao, C., Bradley, D., Zhou, K., and Beeler, T. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, 46:1--46:9.

Digital Library

[20]

Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., and Chai, J. 2013. Accurate and robust 3D facial capture using a single RGBD camera. In Proc. ICCV, 3615--3622.

Digital Library

[21]

Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE Trans. Pattern Anal. Machine Intell. 23, 6, 681--685.

Digital Library

[22]

Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG 30, 6, 130:1--130:10.

Digital Library

[23]

Dollár, P., Tu, Z., and Belongie, S. 2006. Supervised learning of edges and object boundaries. In Proc. CVPR, 1964--1971.

Digital Library

[24]

Echevarria, J. I., Bradley, D., Gutierrez, D., and Beeler, T. 2014. Capturing and stylizing hair for 3D fabrication. ACM TOG 33, 4, 125:1--125:11.

Digital Library

[25]

Eveno, N., Caplier, A., and Coulon, P. Y. 2004. Accurate and quasi-automatic lip tracking. IEEE Trans. Circuit and Systems for Video Tech. 14, 5, 706--715.

Digital Library

[26]

Fyffe, G., Jones, A., Alexander, O., Ichikari, R., and Debevec, P. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, 8:1--8:14.

Digital Library

[27]

Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, 158:1--158:10.

Digital Library

[28]

Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., and Theobalt, C. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2, 193--204.

Digital Library

[29]

Garrido, P., Zollhöfer, M., Casas, D., Valgaerts, L., Varanasi, K., Pérez, P., and Theobalt, C. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM TOG 35, 3, 28:1--28:15.

Digital Library

[30]

Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM TOG 30, 6, 129:1--129:10.

Digital Library

[31]

Graham, P., Tunwattanapong, B., Busch, J., Yu, X., Jones, A., Debevec, P. E., and Ghosh, A. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2, 335--344.

[32]

Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. ACM Siggraph, 55--66.

Digital Library

[33]

Higham, N. J. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174.

Digital Library

[34]

Hoerl, A. E., and Kennard, R. W. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86.

Digital Library

[35]

Hsieh, P.-L., Ma, C., Yu, J., and Li, H. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR, 1675--1683.

[36]

Hu, L., Ma, C., Luo, L., and Li, H. 2015. Single-view hair modeling using a hairstyle database. ACM TOG 34, 4, 125:1--125:9.

Digital Library

[37]

Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Lever-aging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, 74:1--74:10.

Digital Library

[38]

Ichim, A. E., Bouaziz, S., and Pauly, M. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, 45:1--45:14.

Digital Library

[39]

Kaucic, R., and Blake, A. 1998. Accurate, real-time, unadorned lip tracking. In Proc. ICCV, 370--375.

Digital Library

[40]

Kawai, M., Iwao, T., Maejima, A., and Morishima, S. 2014. Automatic photorealistic 3D inner mouth restoration from frontal images. In Proc. ISVC, 51--62.

[41]

Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353.

Digital Library

[42]

Klaudiny, M., and Hilton, A. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT, 17--24.

Digital Library

[43]

Lewis, J., and Anjyo, K.-i. 2010. Direct manipulation blend-shapes. IEEE Comp. Graphics and Applications 30, 4, 42--50.

Digital Library

[44]

Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, 42:1--42:10.

Digital Library

[45]

Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., and Huo, Q. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6, 182:1--182:10.

Digital Library

[46]

Luo, L., Li, H., Paris, S., Weise, T., Pauly, M., and Rusinkiewicz, S. 2012. Multi-view hair capture using orientation fields. In Proc. CVPR, 1490--1497.

Digital Library

[47]

Nagano, K., Fyffe, G., Alexander, O., Barbič, J., Li, H., Ghosh, A., and Debevec, P. 2015. Skin microstructure deformation with displacement map convolution. ACM TOG 34, 4, 109:1--109:10.

Digital Library

[48]

Nath, A. R., and Beauchamp, M. S. 2012. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage 59, 1, 781--787.

[49]

Nguyen, Q. D., and Milgram, M. 2009. Semi adaptive appearance models for lip tracking. In Proc. ICIP, 2437--2440.

Digital Library

[50]

Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM Siggraph Courses.

[51]

Saragih, J. M., Lucey, S., and Cohn, J. F. 2009. Face alignment through subspace constrained mean-shifts. In Proc. ICCV, 1034--1041.

[52]

Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Computer Vision 91, 2, 200--215.

Digital Library

[53]

Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, 222:1--222:13.

Digital Library

[54]

Sifakis, E., Neverov, I., and Fedkiw, R. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3, 417--425.

Digital Library

[55]

Sumner, R. W., and Popovic, J. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3, 399--405.

Digital Library

[56]

Suwajanakorn, S., Kemelmacher-Shlizerman, I., and Seitz, S. M. 2014. Total moving face reconstruction. In Proc. ECCV, 796--812.

[57]

Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, I. 2015. What makes Tom Hanks look like Tom Hanks. In Proc. ICCV, 3952--3960.

Digital Library

[58]

Thies, J., Zollhöfer, M., Niessner, M., Valgaerts, L., Stamminger, M., and Theobalt, C. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, 183:1--183:14.

Digital Library

[59]

Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Niessner, M. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proc. CVPR.

[60]

Tian, Y.-L., Kanade, T., and Cohn, J. F. 2000. Robust lip tracking by combining shape, color and motion. In Proc. ACCV, 1--6.

[61]

Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, 187:1--187:11.

Digital Library

[62]

Vlasic, D., Brand, M., Pfister, H., and Popovic, J. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433.

Digital Library

[63]

Wang, S. L., Lau, W. H., and Leung, S. H. 2004. Automatic lip contour extraction from color images. Pattern Recogn. 37, 12, 2375--2387.

Digital Library

[64]

Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3D facial expressions. CGF 23, 3, 677--686.

[65]

Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: Live facial puppetry. In Proc. ACM SCA, 7--16.

Digital Library

[66]

Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1--77:10.

Digital Library

[67]

Wenger, A., Gardner, A., Tchou, C., Unger, J., Hawkins, T., and Debevec, P. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764.

Digital Library

[68]

Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless, J., Lee, J., Ngan, A., Jensen, H. W., and Gross, M. 2006. Analysis of human faces using a measurement-based skin reflectance model. ACM TOG 25, 3, 1013--1024.

Digital Library

[69]

Williams, L. 1990. Performance-driven facial animation. In Proc. ACM Siggraph, 235--242.

Digital Library

Cited By

Retsinas GFilntisis PDaněček RAbrevaya VRoussos ABolkarr TMaragos P(2024)3D Facial Expressions through Analysis-by-Neural-Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00241(2490-2501)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00241
Taubner FRaina PTuli MTeh ELee CHuang J(2024)3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00123(1227-1237)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00123
Saito SSchwartz GSimon TLi JNam G(2024)Relightable Gaussian Codec Avatars2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00021(130-141)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00021
Show More Cited By

Index Terms

Corrective 3D reconstruction of lips from monocular video
1. Computing methodologies

Recommendations

Reconstructing detailed dynamic face geometry from monocular video

Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single ...
Automatic acquisition of high-fidelity facial performances using monocular videos

This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features ...
Robust 3D face modeling and reconstruction from frontal and side images

Robust and effective capture and reconstruction of 3D face models directly by smartphone users enables many applications. This paper presents a novel 3D face modeling and reconstruction solution that robustly and accurately acquire 3D face models from a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 35, Issue 6

November 2016

1045 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2980179

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2016

Published in TOG Volume 35, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Retsinas GFilntisis PDaněček RAbrevaya VRoussos ABolkarr TMaragos P(2024)3D Facial Expressions through Analysis-by-Neural-Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00241(2490-2501)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00241
Taubner FRaina PTuli MTeh ELee CHuang J(2024)3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00123(1227-1237)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00123
Saito SSchwartz GSimon TLi JNam G(2024)Relightable Gaussian Codec Avatars2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00021(130-141)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00021
Huang H(2024)3D Animation Simulation Based on Computer Virtual Simulation TechnologyMultidimensional Signals, Augmented Reality and Information Technologies10.1007/978-981-99-7011-7_18(227-236)Online publication date: 2-Jan-2024
https://doi.org/10.1007/978-981-99-7011-7_18
Zhang LZhao ZCong XZhang QGu SGao YZheng RYang WXu LYu J(2023)HACK: Learning a Parametric Head and Neck Model for High-fidelity AnimationACM Transactions on Graphics10.1145/359209342:4(1-20)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592093
Bühler MSarkar KShah TLi GWang DHelminger LOrts-Escolano SLagun DHilliges OBeeler TMeka A(2023)Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00315(3379-3390)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00315
Filntisis PRetsinas GParaperas-Papantoniou FKatsamanis ARoussos AMaragos P(2023)SPECTRE: Visual Speech-Informed Perceptual 3D Facial Expression Reconstruction from Videos2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00609(5745-5755)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00609
Mao BLi JTian YZhou Y(2022)The accuracy of a three-dimensional face model reconstructing method based on conventional clinical two-dimensional photosBMC Oral Health10.1186/s12903-022-02439-022:1Online publication date: 19-Sep-2022
https://doi.org/10.1186/s12903-022-02439-0
Qiu ZLi YHe DZhang QZhang LZhang YWang JXu LWang XZhang YYu J(2022)SCULPTORACM Transactions on Graphics10.1145/3550454.355546241:6(1-17)Online publication date: 30-Nov-2022
https://dl.acm.org/doi/10.1145/3550454.3555462
Sharma SKumar V(2022)3D Face Reconstruction in Deep Learning Era: A SurveyArchives of Computational Methods in Engineering10.1007/s11831-021-09705-429:5(3475-3507)Online publication date: 10-Jan-2022
https://doi.org/10.1007/s11831-021-09705-4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents