skip to main content
article

Expressive speech-driven facial animation

Published: 01 October 2005 Publication History

Abstract

Speech-driven facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech-related high-fidelity facial motions. From this training set, we derive a generative model of expressive facial motion that incorporates emotion control, while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.

References

[1]
Albrecht, I., Haber, J., and peter Seidel, H. 2002. Speech synchronization for physics-based facial animation. In Proceedings of the International Conference on Computer Graphics, Visualization, and Computer Vision (WSCG'02). 9--16.
[2]
Brand, M. 1999. Voice puppetry. In Proceedings of ACM SIGGRAPH 1999. ACM Press/Addison-Wesley Publishing Co. 21--28.
[3]
Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings. 353--360.
[4]
Brook, N. and Scott, S. 1994. Computer graphics animations of talking faces based on stochastic models. In the International Symposium on Speech, Image Processing, and Neural Networkds.
[5]
Buhmann, M. D. 2003. Radial Basis Functions : Theory and Implementations. Cambridge University Press, Cambridge, UK.
[6]
Burges, C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Disc. 2, 2, 955--974.
[7]
Cao, Y., Faloutsos, P., and Pighin, F. 2003. Unsupervised learning for speech motion editing. In Proceedings of Eurographics/ACM SIGGRAPH Symposium on Computer Animation. 225--231.
[8]
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, W., Douville, B., Prevost, S., and Stone, M. 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994.
[9]
Chang, C.-C. and Lin, C.-J. 2001a. LIBSVM: A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[10]
Chang, C.-C. and Lin, C.-J. 2001b. Training nu-support vector classifiers: Theory and algorithms. Neural Computation, 2119--2147.
[11]
Chuang, E., Deshpande, H., and Bregler, C. 2002. Facial expression space learning. In Proceedings of Pacific Graphics.
[12]
Cohen, N. and Massaro, D. W. 1993. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation, N. M. Thalmann and D. Thalmann, Eds. Springer--Verlang, 139--156.
[13]
Ekman, P. and Friesen, W. 1978. Manual for Facial Action Coding System. Consulting Psychologists Press Inc., Palo Alto, CA.
[14]
Ezzat, T., Geiger, G., and Poggio, T. 2002. Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002. ACM Press, 388--398.
[15]
FastICA. Helsinki University of Technology, Laboratory of Computer Information Science, Neural Networks Research Centre. Available at www.cis.hut.fi/projects/ica/fastica/.
[16]
House of Moves Inc. Diva software. Available at www.moves.com/moveshack/diva.htm.
[17]
Hyvarinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. John Wiley & Sons.
[18]
Hyvarinen, A. and Oja, E. 2000. Independent component analysis: Algorithms and applications. Neural Networks 13, 411--430.
[19]
International Computer Science Institute, Berkeley, CA. Rasta software. Available at www.icsi.berkeley.edu/Speech/rasta.html.
[20]
Joshi, P., Tien, W. C., Desbrun, M., and Pighin, F. 2003. Learning controls for blend shape based realistic facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Eurographics Association, 187--192.
[21]
Kalberer, G. A., Mueller, P., and Gool, L. V. 2002. Speech animation using viseme space. In Vision, Modeling, and Visualization VMV 2002. Akademische Verlagsgesellschaft Aka GmbH, Berlin, Germany. 463--470.
[22]
Kovar, L., Gleicher, M., and Pighin, F. 2002. Motion graphs. In Proceedings of ACM SIGGRAPH 2002. ACM Press, 473--482.
[23]
Kshirsagar, S. and Magnenat-Thalmann, N. 2003. Visyllable based speech animation. In Proceedings of Eurographics 2003.
[24]
Lee, J., Chai, J., Reitsma, P. S. A., Hodgins, J. K., and Pollard, N. S. 2002. Interactive control of avatars animated with human motion data. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 491--500.
[25]
Lee, S. P., Badler, J. B., and Badler, N. I. 2003. Eyes alive. In Proceedings of ACM SIGGRAPH 2003. ACM Press, 637--644.
[26]
Lee, Y., Terzopoulos, D., and Waters, K. 1995. Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings. ACM SIGGRAPH, 55--62.
[27]
Lewis, J. 1991. Autmated lip-sync: Background and techniques. J. Visualiz. Comput. Animat. 2, 118--122.
[28]
Li, Y., Wang, T., and Shum, H.-Y. 2002. Motion texture: A two-level statistical model for character motion synthesis. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 465--472.
[29]
Lien, J., Cohn, J., Kanade, T., and Li, C. 1998. Automatic facial expression recognition based on FACS action units. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 390--395.
[30]
Masuko, T., Kobayashi, T., Tamura, M., Masubuchi, J., and Tokuda., K. 1998. Text-to-visual speech synthesis based on parameter generation from hmm. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98).
[31]
Noh, J., Fidaleo, D., and Neumann, U. 2000. Animated deformations with radial basis functions. In ACM Symposium on Virtual Realisty Software and Technology. 166--174.
[32]
Parke, F. 1975. A model for human faces that allows speech synchronized animation. J. Comput. Graph. 1, 1, 1--4.
[33]
Pelachaud, C. 1991. Realistic face animation for speech. Ph.D. thesis, University of Pennsylvania.
[34]
Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. In SIGGRAPH 98 Conference Proceedings. ACM SIGGRAPH, 75--84.
[35]
Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, Cambridge, UK.
[36]
Pyun, H., Kim, Y., Chae, W., Kang, H. W., and Shin, S. Y. 2003. An example-based approach for facial expression cloning. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Eurographics Association, 167--176.
[37]
Sahni, S. 1999. Data Structures, Algorithms, and Applications in C++. McGraw-Hill Publishing Co., New York, NY.
[38]
Saisan, P., Bissacco, A., Chiuso, A., and Soatto, S. 2004. Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision. 456--467.
[39]
Sankoff, D. and Kruskal, J. B. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. CSLI Publications, Stanford University, Stanford, CA.
[40]
Speech Group, Carnegie Mellon University. Festival software. Available at www.speech.cs.cmu.edu/festival.
[41]
Tenenbaum, J. B. and Freeman, W. T. 1999. Separating style and content with bilinear models. Neural Computat. J. 12, 1247--1283.
[42]
Vasilescu, M. A. O. and Terzopoulos, D. 2003. Multilinear subspace analysis of image ensembles. In Conference on Computer Vision and Pattern Recognition.
[43]
Waters, K. 1987. A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings. ACM SIGGRAPH, Vol. 21. 17--24.
[44]
Wu, T., Lin, C.-J., and Weng, R. C. 2003. Probability estimates for multi-class classification by pairwise coupling. In Proceedings of Neurol Information Processing System (NIPS'03).

Cited By

View all
  • (2024)DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity ModelingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681359(10506-10514)Online publication date: 28-Oct-2024
  • (2024)Media2Face: Co-speech Facial Animation Generation With Multi-Modality GuidanceACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657413(1-13)Online publication date: 13-Jul-2024
  • (2024)LaughTalk: Expressive 3D Talking Head Generation with Laughter2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00628(6392-6401)Online publication date: 3-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 24, Issue 4
October 2005
244 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/1095878
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2005
Published in TOG Volume 24, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Facial animation
  2. expression synthesis
  3. independent component analysis
  4. lip synching

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)6
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity ModelingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681359(10506-10514)Online publication date: 28-Oct-2024
  • (2024)Media2Face: Co-speech Facial Animation Generation With Multi-Modality GuidanceACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657413(1-13)Online publication date: 13-Jul-2024
  • (2024)LaughTalk: Expressive 3D Talking Head Generation with Laughter2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00628(6392-6401)Online publication date: 3-Jan-2024
  • (2024)Personalized Audio-Driven 3D Facial Animation via Style-Content DisentanglementIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.323054130:3(1803-1820)Online publication date: 1-Mar-2024
  • (2024)Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02577(27284-27293)Online publication date: 16-Jun-2024
  • (2024)DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-Driven Holistic 3D Expression and Gesture Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00702(7352-7361)Online publication date: 16-Jun-2024
  • (2024)A review of motion retargeting techniques for 3D character facial animationComputers and Graphics10.1016/j.cag.2024.104037123:COnline publication date: 21-Nov-2024
  • (2024)3D head-talk: speech synthesis 3D head movement face animationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09292-528:1(363-379)Online publication date: 1-Jan-2024
  • (2024)KMTalk: Speech-Driven 3D Facial Animation with Key Motion EmbeddingComputer Vision – ECCV 202410.1007/978-3-031-72992-8_14(236-253)Online publication date: 29-Sep-2024
  • (2024)DIM: Dyadic Interaction Modeling for Social Behavior GenerationComputer Vision – ECCV 202410.1007/978-3-031-72913-3_27(484-503)Online publication date: 29-Sep-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media