article

Expressive speech-driven facial animation

Authors:

Petros Faloutsos,

Frédéric PighinAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 24, Issue 4

Pages 1283 - 1302

https://doi.org/10.1145/1095878.1095881

Published: 01 October 2005 Publication History

Abstract

Speech-driven facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech-related high-fidelity facial motions. From this training set, we derive a generative model of expressive facial motion that incorporates emotion control, while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.

References

[1]

Albrecht, I., Haber, J., and peter Seidel, H. 2002. Speech synchronization for physics-based facial animation. In Proceedings of the International Conference on Computer Graphics, Visualization, and Computer Vision (WSCG'02). 9--16.

[2]

Brand, M. 1999. Voice puppetry. In Proceedings of ACM SIGGRAPH 1999. ACM Press/Addison-Wesley Publishing Co. 21--28.

[3]

Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings. 353--360.

[4]

Brook, N. and Scott, S. 1994. Computer graphics animations of talking faces based on stochastic models. In the International Symposium on Speech, Image Processing, and Neural Networkds.

[5]

Buhmann, M. D. 2003. Radial Basis Functions : Theory and Implementations. Cambridge University Press, Cambridge, UK.

[6]

Burges, C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Disc. 2, 2, 955--974.

[7]

Cao, Y., Faloutsos, P., and Pighin, F. 2003. Unsupervised learning for speech motion editing. In Proceedings of Eurographics/ACM SIGGRAPH Symposium on Computer Animation. 225--231.

[8]

Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, W., Douville, B., Prevost, S., and Stone, M. 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994.

[9]

Chang, C.-C. and Lin, C.-J. 2001a. LIBSVM: A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[10]

Chang, C.-C. and Lin, C.-J. 2001b. Training nu-support vector classifiers: Theory and algorithms. Neural Computation, 2119--2147.

[11]

Chuang, E., Deshpande, H., and Bregler, C. 2002. Facial expression space learning. In Proceedings of Pacific Graphics.

[12]

Cohen, N. and Massaro, D. W. 1993. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation, N. M. Thalmann and D. Thalmann, Eds. Springer--Verlang, 139--156.

[13]

Ekman, P. and Friesen, W. 1978. Manual for Facial Action Coding System. Consulting Psychologists Press Inc., Palo Alto, CA.

[14]

Ezzat, T., Geiger, G., and Poggio, T. 2002. Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002. ACM Press, 388--398.

[15]

FastICA. Helsinki University of Technology, Laboratory of Computer Information Science, Neural Networks Research Centre. Available at www.cis.hut.fi/projects/ica/fastica/.

[16]

House of Moves Inc. Diva software. Available at www.moves.com/moveshack/diva.htm.

[17]

Hyvarinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. John Wiley & Sons.

[18]

Hyvarinen, A. and Oja, E. 2000. Independent component analysis: Algorithms and applications. Neural Networks 13, 411--430.

[19]

International Computer Science Institute, Berkeley, CA. Rasta software. Available at www.icsi.berkeley.edu/Speech/rasta.html.

[20]

Joshi, P., Tien, W. C., Desbrun, M., and Pighin, F. 2003. Learning controls for blend shape based realistic facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Eurographics Association, 187--192.

[21]

Kalberer, G. A., Mueller, P., and Gool, L. V. 2002. Speech animation using viseme space. In Vision, Modeling, and Visualization VMV 2002. Akademische Verlagsgesellschaft Aka GmbH, Berlin, Germany. 463--470.

[22]

Kovar, L., Gleicher, M., and Pighin, F. 2002. Motion graphs. In Proceedings of ACM SIGGRAPH 2002. ACM Press, 473--482.

[23]

Kshirsagar, S. and Magnenat-Thalmann, N. 2003. Visyllable based speech animation. In Proceedings of Eurographics 2003.

[24]

Lee, J., Chai, J., Reitsma, P. S. A., Hodgins, J. K., and Pollard, N. S. 2002. Interactive control of avatars animated with human motion data. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 491--500.

[25]

Lee, S. P., Badler, J. B., and Badler, N. I. 2003. Eyes alive. In Proceedings of ACM SIGGRAPH 2003. ACM Press, 637--644.

[26]

Lee, Y., Terzopoulos, D., and Waters, K. 1995. Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings. ACM SIGGRAPH, 55--62.

[27]

Lewis, J. 1991. Autmated lip-sync: Background and techniques. J. Visualiz. Comput. Animat. 2, 118--122.

[28]

Li, Y., Wang, T., and Shum, H.-Y. 2002. Motion texture: A two-level statistical model for character motion synthesis. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press, 465--472.

[29]

Lien, J., Cohn, J., Kanade, T., and Li, C. 1998. Automatic facial expression recognition based on FACS action units. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 390--395.

[30]

Masuko, T., Kobayashi, T., Tamura, M., Masubuchi, J., and Tokuda., K. 1998. Text-to-visual speech synthesis based on parameter generation from hmm. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98).

[31]

Noh, J., Fidaleo, D., and Neumann, U. 2000. Animated deformations with radial basis functions. In ACM Symposium on Virtual Realisty Software and Technology. 166--174.

[32]

Parke, F. 1975. A model for human faces that allows speech synchronized animation. J. Comput. Graph. 1, 1, 1--4.

[33]

Pelachaud, C. 1991. Realistic face animation for speech. Ph.D. thesis, University of Pennsylvania.

[34]

Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. In SIGGRAPH 98 Conference Proceedings. ACM SIGGRAPH, 75--84.

[35]

Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, Cambridge, UK.

[36]

Pyun, H., Kim, Y., Chae, W., Kang, H. W., and Shin, S. Y. 2003. An example-based approach for facial expression cloning. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Eurographics Association, 167--176.

[37]

Sahni, S. 1999. Data Structures, Algorithms, and Applications in C++. McGraw-Hill Publishing Co., New York, NY.

[38]

Saisan, P., Bissacco, A., Chiuso, A., and Soatto, S. 2004. Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision. 456--467.

[39]

Sankoff, D. and Kruskal, J. B. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. CSLI Publications, Stanford University, Stanford, CA.

[40]

Speech Group, Carnegie Mellon University. Festival software. Available at www.speech.cs.cmu.edu/festival.

[41]

Tenenbaum, J. B. and Freeman, W. T. 1999. Separating style and content with bilinear models. Neural Computat. J. 12, 1247--1283.

[42]

Vasilescu, M. A. O. and Terzopoulos, D. 2003. Multilinear subspace analysis of image ensembles. In Conference on Computer Vision and Pattern Recognition.

[43]

Waters, K. 1987. A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings. ACM SIGGRAPH, Vol. 21. 17--24.

[44]

Wu, T., Lin, C.-J., and Weng, R. C. 2003. Probability estimates for multi-class classification by pairwise coupling. In Proceedings of Neurol Information Processing System (NIPS'03).

Cited By

Shen KXia HGeng GGeng GXia SDing ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity ModelingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681359(10506-10514)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681359
Zhao QLong PZhang QQin DLiang HZhang LZhang YYu JXu L(2024)Media2Face: Co-speech Facial Animation Generation With Multi-Modality GuidanceACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657413(1-13)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657413
Sung-Bin KHyun LHong DNam SJu JOh T(2024)LaughTalk: Expressive 3D Talking Head Generation with Laughter2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00628(6392-6401)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00628
Show More Cited By

Index Terms

Expressive speech-driven facial animation
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
      2. Game tree search
  2. Computer graphics
    1. Animation

Recommendations

Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D ...
Mood swings: expressive speech animation

Motion capture-based facial animation has recently gained popularity in many applications, such as movies, video games, and human-computer interface designs. With the use of sophisticated facial motions from a human performer, animated characters are ...
Speech driven facial animation
PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

The results reported in this article are an integral part of a larger project aimed at achieving perceptually realistic animations, including the individualized nuances, of three-dimensional human faces driven by speech. The audiovisual system that has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 24, Issue 4

October 2005

244 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/1095878

Issue’s Table of Contents

Copyright © 2005 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2005

Published in TOG Volume 24, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

138
Total Citations
View Citations
2,135
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)6

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen KXia HGeng GGeng GXia SDing ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity ModelingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681359(10506-10514)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681359
Zhao QLong PZhang QQin DLiang HZhang LZhang YYu JXu L(2024)Media2Face: Co-speech Facial Animation Generation With Multi-Modality GuidanceACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657413(1-13)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657413
Sung-Bin KHyun LHong DNam SJu JOh T(2024)LaughTalk: Expressive 3D Talking Head Generation with Laughter2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00628(6392-6401)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00628
Chai YShao TWeng YZhou K(2024)Personalized Audio-Driven 3D Facial Animation via Style-Content DisentanglementIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.323054130:3(1803-1820)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TVCG.2022.3230541
Yang KRanjan AChang JVemulapalli RTuzel O(2024)Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02577(27284-27293)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02577
Chen JLiu YWang JZeng ALi YChen Q(2024)DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-Driven Holistic 3D Expression and Gesture Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00702(7352-7361)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00702
Zhu CJoslin C(2024)A review of motion retargeting techniques for 3D character facial animationComputers and Graphics10.1016/j.cag.2024.104037123:COnline publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1016/j.cag.2024.104037
Yang DLi RYang QPeng YHuang XZou J(2024)3D head-talk: speech synthesis 3D head movement face animationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09292-528:1(363-379)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s00500-023-09292-5
Xu ZGong STang JLiang LHuang YLi HHuang S(2024)KMTalk: Speech-Driven 3D Facial Animation with Key Motion EmbeddingComputer Vision – ECCV 202410.1007/978-3-031-72992-8_14(236-253)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72992-8_14
Tran MChang DSiniukov MSoleymani M(2024)DIM: Dyadic Interaction Modeling for Social Behavior GenerationComputer Vision – ECCV 202410.1007/978-3-031-72913-3_27(484-503)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72913-3_27
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents