Skip to main content

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

  • 1612 Accesses

Abstract

The paper deals with interpretation of facial motion capture data for visual speech synthesis. For the purpose of analysis visual speech composed of 170 artificially created words was recorded by one speaker and the state-of-the-art face motion capture method. New nonlinear method is proposed to approximate the motion capture data using intentionally defined set of articulatory parameters. The result of the comparison shows that the proposed method outperforms baseline method with the same number of parameters. The precision of the approximation is evaluated by the parameter values extracted from unseen dataset and also verified with the 3D animated model of human head as the output reproducing visual speech in an artificial manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.faceshift.com.

  2. 2.

    Available at https://charactergenerator.autodesk.com/.

References

  1. Mattheysesa, W., Verhelsta, W.: Audiovisual speech synthesis: an overview of the state-of-the-art. Speech Commun. 66, 182–217 (2015)

    Article  Google Scholar 

  2. Karpov, A., Tsirulnik, L., Krňoul, Z., Ronzhin, A., Lobanov, B., Železný, M.: Audio-visual speech asynchrony modeling in a talking head. In: Proceeding of the International Conference INTERSPEECH 2009, Brighton, UK, pp. 2911–2914 (2009)

    Google Scholar 

  3. Beskow, J.: Trainable articulatory control models for visual speech synthesis. Int. J. Speech Technol. 4, 335–349 (2004)

    Article  Google Scholar 

  4. Badin, P., Bailly, G., Revéret, L., Baciub, M., Segebarthc, C., Savariauxd, C.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phonetics 30(3), 533–553 (2002)

    Article  Google Scholar 

  5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey (1984). ISBN: 978-0-412-04841-8

    MATH  Google Scholar 

  6. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006). ISBN: 0-262-18253-X

    MATH  Google Scholar 

  7. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). ISBN: 0-387-31073-8

    MATH  Google Scholar 

  8. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. B 6(3), 611–622 (1999)

    Google Scholar 

  9. Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics, pp. 21–24. Omnipress (2007)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Technology Agency of the Czech Republic, project No. TA01011264.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miloš Železný .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Železný, M., Krňoul, Z., Jedlička, P. (2015). Analysis of Facial Motion Capture Data for Visual Speech Synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23132-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23131-0

  • Online ISBN: 978-3-319-23132-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics