3D facial feature and expression computing from Internet image or video

Wang, Shan; Shen, Xukun; Zhang, Yan

doi:10.1007/s11042-018-5895-7

3D facial feature and expression computing from Internet image or video

Published: 21 March 2018

Volume 77, pages 22231–22246, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shan Wang^1,2,
Xukun Shen^1,2 &
Yan Zhang¹

254 Accesses
Explore all metrics

Abstract

Large-scale multimedia datasets such as the Internet image and video collections provide new opportunities to understand and analyze human actions, among which one of the most interesting type is facial performance. In this paper, we present an automatic reconstruction system of detailed face performances. Many existing facial performance reconstruction systems rely on data captured under controlled environments with densely spaced cameras and lights. On the contrary, our system reconstructs detailed facial geometry from just one image or a monocular video sequence with unknown lighting. To achieve this, we first simultaneously track 2D and 3D sparse features, then reconstruct the low frequency facial geometry by performing a 2D-3D feature trajectory fusion optimization, which we formulate as a linear problem that can be solved efficiently. Finally, we use a per-pixel shape-from-shading algorithm to estimate the fine-scale geometry details such as wrinkles to further improve the reconstruction fidelity. We demonstrate the accuracy of our system with reconstruction results using both single images and monocular video sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aldrian O, Smith WAP (2013) Inverse rendering of faces with a 3D morphable model. IEEE Trans Pattern Anal Mach Intell 35(5):1080–1093
Article Google Scholar
Basri R, Jacobs D (2003) Lambertian reflectance and linear subspaces. IEEE Trans Pattern Anal Mach Intell 25(2):218–233
Article Google Scholar
Beeler T, Bickel B, Beardsley P, Sumner B, Gross M (2010) High-quality single-shot capture of facial geometry. ACM Trans Graph 29(4):40:1–40:9
Article Google Scholar
Bickel B, Botsch M, Angst R, Matusik W, Otaduy M, Pfister H, Gross M (2007) Multi-scale capture of facial geometry and motion. ACM Trans Graph 26 (3):33:1–33:10
Article Google Scholar
Bouaziz S, Wang YY, Pauly Mark (2013) Online modeling for realtime facial animation. ACM Trans Graph 32(4):40:1–40:10
Article MATH Google Scholar
Bradley D, Heidrich W, Popa T, Sheffer A (2010) High resolution passive facial performance capture. ACM Trans Graph 29(4):41:1–41:10
Article Google Scholar
Bregler C, Hertzmann A, Biermann H (2000) Recovering non-rigid 3D shape from image streams. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 690–696
Cao C, Hou Q, Zhou K (2014) Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans Graph 33(4):43:1–43:10
MATH Google Scholar
Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans Vis Comput Graph 20(3):413–425
Article Google Scholar
Cao C, Bradley D, Zhou K, Beeler T (2015) Real-time high-fidelity facial performance capture. ACM Trans Graph 34(4):46:1–46:9
Article Google Scholar
Dai Y, Li H, He M (2012) A simple prior-free method for non-rigid structure-from-motion factorization. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2018–2025
Gao Z, Zhang L-F, Chen M-Y, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657
Article Google Scholar
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmannc A G (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Article Google Scholar
Garrido P, Valgaert L, Wu C, Theobalt C (2013) Reconstructing detailed dynamic face geometry from monocular video. ACM Trans Graph 32(6):158:1–158:10
Article Google Scholar
Garrido P, Valgaerts L, Sarmadi H, Steiner I, Varanasi K, Perez P, Theobalt C (2015) VDub: modifying face vedio of actors for plausible visual alignment to a dubbed audio track. Comput Graphic Forum 34(2):193–204
Article Google Scholar
Garrido P, Zollhofer M, Casas D, Valgaerts L (2016) Reconstruction of personalized 3D face rigs from monocular video. ACM Trans Graph 35(3):28:1–28:15
Article Google Scholar
Guenter B, Grimm C, Wood D (1998) Making faces. In: Processing of ACM SIGGRAPH 1998, pp 55–66
Hartley R, Ziserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, p 2003
Google Scholar
He X, Gao M, Kan M, Wang D (2017) BiRank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71
Article Google Scholar
Huang H, Chai J, Tong X, Wu H-T (2011) Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Trans Graph 30 (4):74:1–74:10
Article Google Scholar
Huber P, Hu G, Tena R, Kittler J (2016) A multiresolution 3D Morphable Face Model and fitting framework. In: Proceeding of international conference on computer vision theory and applications, pp 1–8
Li H, Adams B, Guibas LJ, Pauly M (2009) Robust single-view geometry and motion reconstruction. ACM Trans Graph 28(5):175:1–175:10
Article Google Scholar
Li H, Yu J, Ye Y, Bregler C (2013) Realtime facial animation with on-the-fly correctives. ACM Trans Graph 32(4):42:1–42:10
MATH Google Scholar
Ma W-C, Jones A, Chiang J-Y, Hawkins T, Frederiksen S, Peers P, Vukovic M, Ouhyong M, Debevec P (2008) Facial performance synthesis using deformation-driven polynomial displacement maps. ACM Trans Graph 27(5):121:1–121:10
Article Google Scholar
Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164
Article Google Scholar
Shi F, Wu H-T, Tong X, Chai J (2014) Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans Graph 33(6):222:1–222:13
Article MATH Google Scholar
Suwajanakorn S, Kemelmacher-Shlizerman I, Seitz SM (2014) Total moving face reconstruction. In: Processing of European conference on computer vision (ECCV), pp 796–812
Tian F, Liu X, Liu Z, Sun N,Wang M,Wang H, Zhang F (2017) Multimedia integrated annotation based on common space learning. Multimed Tools Appl 1–20. https://doi.org/10.1007/s11042-017-5068-0
Tian F, Shen X, Liu X (2017) Multimedia automatic annotation by mining label set correlation. Multimed Tools Appl 1–17. https://doi.org/10.1007/s11042-017-5170-3
Tian F, Shen X, Shang F (2017) Automatic image annotation with real-world community contributed data set. Multimed Syst 1–12. https://doi.org/10.1007/s00530-017-0548-7
Valgaerts L, Wu C, Bruhn A, Seidel H-P, Theobalt C (2012) Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans Graph 31(6):187:1–187:11
Article Google Scholar
Weise T, Bouaziz S, Li H, Pauly M (2011) Realtime performance-based facial animation. ACM Trans Graph 30(4):77:1–77:10
Article Google Scholar
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) Multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Article Google Scholar
Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581
Article Google Scholar
Zhang L, Snavely N, Curless B, Seitz SM (2004) Spacetime faces: high resolution capture for modeling and animation. ACM Trans Graph 23(3):548–558
Article Google Scholar
Zhang H, Yang Y, Luan H, Yang S, Chua T-S (2014) Start from scratch: towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM international conference on multimedia, pp 187–196
Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference, pp 781–790

Download references

Acknowledgements

This work is supported by National Key R&D Program of China (2017YFB1002702).

Author information

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Shan Wang, Xukun Shen & Yan Zhang
School of New Media Art and Design, Beihang University, Beijing, China
Shan Wang & Xukun Shen

Authors

Shan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xukun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shan Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Shen, X. & Zhang, Y. 3D facial feature and expression computing from Internet image or video. Multimed Tools Appl 77, 22231–22246 (2018). https://doi.org/10.1007/s11042-018-5895-7

Download citation

Received: 28 September 2017
Revised: 10 February 2018
Accepted: 13 March 2018
Published: 21 March 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5895-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D facial feature and expression computing from Internet image or video

Abstract

Access this article

Similar content being viewed by others

Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

3D Face Reconstruction with Dense Landmarks

Total Moving Face Reconstruction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D facial feature and expression computing from Internet image or video

Abstract

Access this article

Similar content being viewed by others

Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

3D Face Reconstruction with Dense Landmarks

Total Moving Face Reconstruction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation