Abstract
The paper deals with interpretation of facial motion capture data for visual speech synthesis. For the purpose of analysis visual speech composed of 170 artificially created words was recorded by one speaker and the state-of-the-art face motion capture method. New nonlinear method is proposed to approximate the motion capture data using intentionally defined set of articulatory parameters. The result of the comparison shows that the proposed method outperforms baseline method with the same number of parameters. The precision of the approximation is evaluated by the parameter values extracted from unseen dataset and also verified with the 3D animated model of human head as the output reproducing visual speech in an artificial manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Available at https://charactergenerator.autodesk.com/.
References
Mattheysesa, W., Verhelsta, W.: Audiovisual speech synthesis: an overview of the state-of-the-art. Speech Commun. 66, 182–217 (2015)
Karpov, A., Tsirulnik, L., Krňoul, Z., Ronzhin, A., Lobanov, B., Železný, M.: Audio-visual speech asynchrony modeling in a talking head. In: Proceeding of the International Conference INTERSPEECH 2009, Brighton, UK, pp. 2911–2914 (2009)
Beskow, J.: Trainable articulatory control models for visual speech synthesis. Int. J. Speech Technol. 4, 335–349 (2004)
Badin, P., Bailly, G., Revéret, L., Baciub, M., Segebarthc, C., Savariauxd, C.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phonetics 30(3), 533–553 (2002)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey (1984). ISBN: 978-0-412-04841-8
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006). ISBN: 0-262-18253-X
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). ISBN: 0-387-31073-8
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. B 6(3), 611–622 (1999)
Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics, pp. 21–24. Omnipress (2007)
Acknowledgements
This research was supported by the Technology Agency of the Czech Republic, project No. TA01011264.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Železný, M., Krňoul, Z., Jedlička, P. (2015). Analysis of Facial Motion Capture Data for Visual Speech Synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)