Analysis of Facial Motion Capture Data for Visual Speech Synthesis

Železný, Miloš; Krňoul, Zdeněk; Jedlička, Pavel

doi:10.1007/978-3-319-23132-7_10

Miloš Železný⁷,
Zdeněk Krňoul⁷ &
Pavel Jedlička⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1612 Accesses

Abstract

The paper deals with interpretation of facial motion capture data for visual speech synthesis. For the purpose of analysis visual speech composed of 170 artificially created words was recorded by one speaker and the state-of-the-art face motion capture method. New nonlinear method is proposed to approximate the motion capture data using intentionally defined set of articulatory parameters. The result of the comparison shows that the proposed method outperforms baseline method with the same number of parameters. The precision of the approximation is evaluated by the parameter values extracted from unseen dataset and also verified with the 3D animated model of human head as the output reproducing visual speech in an artificial manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.faceshift.com.
2.
Available at https://charactergenerator.autodesk.com/.

References

Mattheysesa, W., Verhelsta, W.: Audiovisual speech synthesis: an overview of the state-of-the-art. Speech Commun. 66, 182–217 (2015)
Article Google Scholar
Karpov, A., Tsirulnik, L., Krňoul, Z., Ronzhin, A., Lobanov, B., Železný, M.: Audio-visual speech asynchrony modeling in a talking head. In: Proceeding of the International Conference INTERSPEECH 2009, Brighton, UK, pp. 2911–2914 (2009)
Google Scholar
Beskow, J.: Trainable articulatory control models for visual speech synthesis. Int. J. Speech Technol. 4, 335–349 (2004)
Article Google Scholar
Badin, P., Bailly, G., Revéret, L., Baciub, M., Segebarthc, C., Savariauxd, C.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phonetics 30(3), 533–553 (2002)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey (1984). ISBN: 978-0-412-04841-8
MATH Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006). ISBN: 0-262-18253-X
MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). ISBN: 0-387-31073-8
MATH Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. B 6(3), 611–622 (1999)
Google Scholar
Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics, pp. 21–24. Omnipress (2007)
Google Scholar

Download references

Acknowledgements

This research was supported by the Technology Agency of the Czech Republic, project No. TA01011264.

Author information

Authors and Affiliations

Faculty of Applied Sciences, University of West Bohemia, Univerzitní8, 306 14, Plzeň, Czech Republic
Miloš Železný, Zdeněk Krňoul & Pavel Jedlička

Authors

Miloš Železný
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Krňoul
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Jedlička
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miloš Železný .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Železný, M., Krňoul, Z., Jedlička, P. (2015). Analysis of Facial Motion Capture Data for Visual Speech Synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_10
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics