Skip to main content

Vision Based Speech Animation Transferring with Underlying Anatomical Structure

  • Conference paper
Book cover Computer Vision – ACCV 2006 (ACCV 2006)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3851))

Included in the following conference series:

Abstract

We present a novel method to transfer speech animation recorded in low resolution videos onto realistic 3D facial models. Unsupervised learning is utilized on a speech video corpus to find underlying manifold of facial configurations. K-means clustering is applied on the low dimensional space to find key speaking-related facial shapes. With a small set of laser scanner captured 3D models related to the clustering centroid, the facial animation in 2D videos is transferred onto 3D shapes. Especially by virtue of a weak perspective projection model, the underlying mandible rotation is recovered from videos and is utilized to drive 3D skull movements. The adaption of a generic skull onto facial models is guided by a 2D image, Tissue Map. With parsimonious data requirements, our system realizes the animation transferring and gains a realistic rendering effect with the underlying anatomical structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ezzat, I., Geiger, G., Poggio, T.: Trainable videorealistic speech animation. ACM Transactions on Graphics 21, 388–398 (2002)

    Article  Google Scholar 

  2. Chai, J., Xiao, J., Hodgins, J.: Vision-based control of 3d facial animation. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, San Diego, CA, pp. 193–206. Eurographics Association Aire-la-Ville, San Diego (2003)

    Google Scholar 

  3. Allen, B., Curless, B., Popovic, Z.: The space of all body shapes: Reconstruction and parameterization from range scans. In: Proc. ACM SIGGRAPH, San Diego, CA, pp. 587–594. Addison-Wesley, San Diego (2003)

    Google Scholar 

  4. Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proc. ACM SIGGRAPH, Los Angeles, CA, pp. 353–360. ACM Press/Addison-Wesley Publishing Co., Los Angeles (1997)

    Google Scholar 

  5. Brand, M.: Voice puppetry. In: Proc. ACM SIGGRAPH, Los Angeles, CA, pp. 21–28. ACM Press/Addison-Wesley Publishing Co., Los Angeles (1999)

    Google Scholar 

  6. Cao, Y., Faloutsos, P., Kohler, E., Pighin, F.: Real-time speech motion synthesis from recorded motions. In: Proc. ACM SIGGRAPH/Eurographics Symp. on Computer Animation, Grenoble, France, pp. 347–355 (2004)

    Google Scholar 

  7. Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. ACM Transactions on Graphics 24, 426–433 (2005)

    Article  Google Scholar 

  8. Albrecht, I., Haber, J., Kahler, K., Schroder, M., Seidel, H.-P.: May i talk to you? facial animation from text. In: Proc. tenth Pacific Conference on Computer Graphics and Applications, pp. 77–86. IEEE Computer Society Press, Beijing (2002)

    Chapter  Google Scholar 

  9. Lee, Y., Terzopoulos, D., Waters, K.: Realistic modeling for facial animations. In: Proc. ACM SIGGRAPH 1995, pp. 55–62. ACM Press, Los Angeles (1995)

    Google Scholar 

  10. Koch, R.M., Gross, M.H., Carls, F.R., Buren, D.F., Fankhauser, G., Parish, Y.I.H.: Simulating facial surgery using finite element methods. In: Proc. ACM SIGGRAPH 1996, pp. 421–428. ACM Press, New Orleans (1996)

    Google Scholar 

  11. Sifakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion capture marker data. ACM Transactions on Graphics 24, 426–433 (2005)

    Article  Google Scholar 

  12. Jolliffe, I. (ed.): Principal Component Analysis. Springer, New York (1986)

    Google Scholar 

  13. Pyun, H., Kim, Y., Chae, W., Kang, H.Y., Shin, S.Y.: An example-based approach for facial expression cloning. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, San Diego, CA, pp. 167–176 (2003)

    Google Scholar 

  14. Chuang, E.S., Deshpande, H., Bregler, C.: Facial expression space learning. In: Proc. 10th Pacific Conference on Computer Graphics and Applications, pp. 68–76. IEEE Computer Society, Beijing (2002)

    Chapter  Google Scholar 

  15. Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications, Beverly Hills (1978)

    Google Scholar 

  16. Cao, Y., Faloutsos, P., Pighin, F.: Unsupervised learning for speech motion editing. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, San Diego, CA, pp. 225–231 (2003)

    Google Scholar 

  17. Hyvarinen, A., Karhunen, J., Oja, E. (eds.): Independent Component Analysis. John Wiley Sons, New York (2001)

    Google Scholar 

  18. Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  19. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  20. Juan, C., Bodenheimer, B.: Cartoon textures. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, Grenoble, France, pp. 267–276 (2004)

    Google Scholar 

  21. Hu, C., Chang, Y., Feris, R., Turk, M.: Manifold based analysis of facial expression. In: Proc. Computer Vision and Pattern Recognition Workshop, p. 81. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  22. Wang, Y., Huang, X., Lee, C.S., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., Huang, P.: High resolution acquisition, learning and transfer of dynamic 3-d facial expressions. In: Proc. Annual Conf. of the European Association for Computer Graphics, Grenoble, France, pp. 677–686 (2004)

    Google Scholar 

  23. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Proc. 5th European Conference on Computer Vision, Freiburg, Germany, pp. 484–498. Springer, Heidelberg (1998)

    Google Scholar 

  24. Hatze, H.: High-precision three-dimensional photo- grammetric calibration and object space reconstruction using a modified dlt-approach. J. Biomechanics 21, 533–538 (1988)

    Article  Google Scholar 

  25. Pei, Y., Zha, H.: Transferring speech video onto 3d realistic human faces. In: Proc. thirteenth Pacific Conference on Computer Graphics and Applications, Macao, P.R.China, pp. 13–15 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pei, Y., Zha, H. (2006). Vision Based Speech Animation Transferring with Underlying Anatomical Structure. In: Narayanan, P.J., Nayar, S.K., Shum, HY. (eds) Computer Vision – ACCV 2006. ACCV 2006. Lecture Notes in Computer Science, vol 3851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11612032_60

Download citation

  • DOI: https://doi.org/10.1007/11612032_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31219-2

  • Online ISBN: 978-3-540-32433-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics