Skip to main content

Advertisement

Log in

3D facial feature and expression computing from Internet image or video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Large-scale multimedia datasets such as the Internet image and video collections provide new opportunities to understand and analyze human actions, among which one of the most interesting type is facial performance. In this paper, we present an automatic reconstruction system of detailed face performances. Many existing facial performance reconstruction systems rely on data captured under controlled environments with densely spaced cameras and lights. On the contrary, our system reconstructs detailed facial geometry from just one image or a monocular video sequence with unknown lighting. To achieve this, we first simultaneously track 2D and 3D sparse features, then reconstruct the low frequency facial geometry by performing a 2D-3D feature trajectory fusion optimization, which we formulate as a linear problem that can be solved efficiently. Finally, we use a per-pixel shape-from-shading algorithm to estimate the fine-scale geometry details such as wrinkles to further improve the reconstruction fidelity. We demonstrate the accuracy of our system with reconstruction results using both single images and monocular video sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aldrian O, Smith WAP (2013) Inverse rendering of faces with a 3D morphable model. IEEE Trans Pattern Anal Mach Intell 35(5):1080–1093

    Article  Google Scholar 

  2. Basri R, Jacobs D (2003) Lambertian reflectance and linear subspaces. IEEE Trans Pattern Anal Mach Intell 25(2):218–233

    Article  Google Scholar 

  3. Beeler T, Bickel B, Beardsley P, Sumner B, Gross M (2010) High-quality single-shot capture of facial geometry. ACM Trans Graph 29(4):40:1–40:9

    Article  Google Scholar 

  4. Bickel B, Botsch M, Angst R, Matusik W, Otaduy M, Pfister H, Gross M (2007) Multi-scale capture of facial geometry and motion. ACM Trans Graph 26 (3):33:1–33:10

    Article  Google Scholar 

  5. Bouaziz S, Wang YY, Pauly Mark (2013) Online modeling for realtime facial animation. ACM Trans Graph 32(4):40:1–40:10

    Article  MATH  Google Scholar 

  6. Bradley D, Heidrich W, Popa T, Sheffer A (2010) High resolution passive facial performance capture. ACM Trans Graph 29(4):41:1–41:10

    Article  Google Scholar 

  7. Bregler C, Hertzmann A, Biermann H (2000) Recovering non-rigid 3D shape from image streams. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 690–696

  8. Cao C, Hou Q, Zhou K (2014) Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans Graph 33(4):43:1–43:10

    MATH  Google Scholar 

  9. Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans Vis Comput Graph 20(3):413–425

    Article  Google Scholar 

  10. Cao C, Bradley D, Zhou K, Beeler T (2015) Real-time high-fidelity facial performance capture. ACM Trans Graph 34(4):46:1–46:9

    Article  Google Scholar 

  11. Dai Y, Li H, He M (2012) A simple prior-free method for non-rigid structure-from-motion factorization. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2018–2025

  12. Gao Z, Zhang L-F, Chen M-Y, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657

    Article  Google Scholar 

  13. Gao Z, Zhang H, Xu GP, Xue YB, Hauptmannc A G (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97

    Article  Google Scholar 

  14. Garrido P, Valgaert L, Wu C, Theobalt C (2013) Reconstructing detailed dynamic face geometry from monocular video. ACM Trans Graph 32(6):158:1–158:10

    Article  Google Scholar 

  15. Garrido P, Valgaerts L, Sarmadi H, Steiner I, Varanasi K, Perez P, Theobalt C (2015) VDub: modifying face vedio of actors for plausible visual alignment to a dubbed audio track. Comput Graphic Forum 34(2):193–204

    Article  Google Scholar 

  16. Garrido P, Zollhofer M, Casas D, Valgaerts L (2016) Reconstruction of personalized 3D face rigs from monocular video. ACM Trans Graph 35(3):28:1–28:15

    Article  Google Scholar 

  17. Guenter B, Grimm C, Wood D (1998) Making faces. In: Processing of ACM SIGGRAPH 1998, pp 55–66

  18. Hartley R, Ziserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, p 2003

    Google Scholar 

  19. He X, Gao M, Kan M, Wang D (2017) BiRank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71

    Article  Google Scholar 

  20. Huang H, Chai J, Tong X, Wu H-T (2011) Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Trans Graph 30 (4):74:1–74:10

    Article  Google Scholar 

  21. Huber P, Hu G, Tena R, Kittler J (2016) A multiresolution 3D Morphable Face Model and fitting framework. In: Proceeding of international conference on computer vision theory and applications, pp 1–8

  22. Li H, Adams B, Guibas LJ, Pauly M (2009) Robust single-view geometry and motion reconstruction. ACM Trans Graph 28(5):175:1–175:10

    Article  Google Scholar 

  23. Li H, Yu J, Ye Y, Bregler C (2013) Realtime facial animation with on-the-fly correctives. ACM Trans Graph 32(4):42:1–42:10

    MATH  Google Scholar 

  24. Ma W-C, Jones A, Chiang J-Y, Hawkins T, Frederiksen S, Peers P, Vukovic M, Ouhyong M, Debevec P (2008) Facial performance synthesis using deformation-driven polynomial displacement maps. ACM Trans Graph 27(5):121:1–121:10

    Article  Google Scholar 

  25. Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164

    Article  Google Scholar 

  26. Shi F, Wu H-T, Tong X, Chai J (2014) Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans Graph 33(6):222:1–222:13

    Article  MATH  Google Scholar 

  27. Suwajanakorn S, Kemelmacher-Shlizerman I, Seitz SM (2014) Total moving face reconstruction. In: Processing of European conference on computer vision (ECCV), pp 796–812

  28. Tian F, Liu X, Liu Z, Sun N,Wang M,Wang H, Zhang F (2017) Multimedia integrated annotation based on common space learning. Multimed Tools Appl 1–20. https://doi.org/10.1007/s11042-017-5068-0

  29. Tian F, Shen X, Liu X (2017) Multimedia automatic annotation by mining label set correlation. Multimed Tools Appl 1–17. https://doi.org/10.1007/s11042-017-5170-3

  30. Tian F, Shen X, Shang F (2017) Automatic image annotation with real-world community contributed data set. Multimed Syst 1–12. https://doi.org/10.1007/s00530-017-0548-7

  31. Valgaerts L, Wu C, Bruhn A, Seidel H-P, Theobalt C (2012) Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans Graph 31(6):187:1–187:11

    Article  Google Scholar 

  32. Weise T, Bouaziz S, Li H, Pauly M (2011) Realtime performance-based facial animation. ACM Trans Graph 30(4):77:1–77:10

    Article  Google Scholar 

  33. Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) Multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742

    Article  Google Scholar 

  34. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581

    Article  Google Scholar 

  35. Zhang L, Snavely N, Curless B, Seitz SM (2004) Spacetime faces: high resolution capture for modeling and animation. ACM Trans Graph 23(3):548–558

    Article  Google Scholar 

  36. Zhang H, Yang Y, Luan H, Yang S, Chua T-S (2014) Start from scratch: towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM international conference on multimedia, pp 187–196

  37. Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference, pp 781–790

Download references

Acknowledgements

This work is supported by National Key R&D Program of China (2017YFB1002702).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shan Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Shen, X. & Zhang, Y. 3D facial feature and expression computing from Internet image or video. Multimed Tools Appl 77, 22231–22246 (2018). https://doi.org/10.1007/s11042-018-5895-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5895-7

Keywords

Navigation