Evaluation of an image-based talking head with realistic facial expression and head motion

Liu, Kang; Ostermann, Joern

doi:10.1007/s12193-011-0070-8

Evaluation of an image-based talking head with realistic facial expression and head motion

Original Paper
Published: 29 October 2011

Volume 5, pages 37–44, (2012)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Kang Liu¹ &
Joern Ostermann¹

163 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively.

The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized.

In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Article Open access 28 March 2021

InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions

Article 26 March 2024

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

Article 11 January 2021

References

Ezzat T, Geiger G, Poggio T (2002) Trainable videorealistic speech animation. In: Proceedings of SIGGRAPH 2002, pp 388–397
Chapter Google Scholar
Cosatto E, Ostermann J, Graf HP, Schroeter J (2003) Lifelike talking faces for interactive services. Proc IEEE 91:1406–1429
Article Google Scholar
Liu K, Ostermann J (2009) Optimization of an image-based talking head system. EURASIP J Audio Speech Music Process. doi:10.1155/2009/174192
Banf M, Blanz V (2009) Example-based rendering of eye movements. Comput Graph Forum 28(2):659–666
Article Google Scholar
Theobald B, Fagel S, Bailly G, Elsei F (2008) LIPS2008: visual speech synthesis challenge. In: Proceedings of interspeech 2008, pp 2310–2313
Google Scholar
Deng Z, Neumann U (2008) Data-driven 3D facial animation. Springer, Berlin
Google Scholar
Pighin F, Hecker J, Lischinski D, Szeliski R, Salesin DH (1998) Synthesizing realistic facial expressions from photographs. In: Proceedings of SIGGRAPH 1998, pp 75–84
Chapter Google Scholar
Essa IA, Pentland AP (1997) Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans Pattern Anal Mach Intell 19(7):757–763
Article Google Scholar
Cao Y, Tien WC, Faloutsos P, Pighin F (2005) Expressive speech-driven facial animation. ACM Trans Graph 24(4):1283–1302
Article Google Scholar
Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: Proceedings of AFGR, pp 96–102
Google Scholar
Chuang E, Bregler C (2005) Mood swings: expressive speech animation. ACM Trans Graph 24(2):331–347
Article Google Scholar
Busso C, Deng Z, Grimm M, Neumann U, Narayanan SS (2007) Rigid head motion in expressive speech animation: analysis and synthesis IEEE Trans Audio Speech Lang Process 15:1075–1086
Article Google Scholar
Ambadar Z, Cohn JF, Reed LI (2009) All smiles are not created equal: morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. J Nonverbal Behav 33:17–34
Article Google Scholar
Liu K, Ostermann J (2011) Realistic facial expression synthesis for an image-based talking head. In: Proceedings of ICME11, Barcelona
Google Scholar
Liu K, Ostermann J (2011) Realistic head motion synthesis for an image-based talking head. In: Proceedings of IEEE conference FG2011, Santa Barbara, CA
Google Scholar
Advanced Television Systems Committee (ATSC) (2003) ATSC implementation subcommittee finding: relative timing of sound and vision for broadcast operations advanced television. Doc IS-191
ITU-R BT 500 11 (2002) Methodology for the subjective assessment of the quality of television pictures

Download references

Author information

Authors and Affiliations

Institut für Informationsverarbeitung, Leibniz Universität Hannover, Appelstr. 9A, 30167, Hannover, Germany
Kang Liu & Joern Ostermann

Authors

Kang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Joern Ostermann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kang Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, K., Ostermann, J. Evaluation of an image-based talking head with realistic facial expression and head motion. J Multimodal User Interfaces 5, 37–44 (2012). https://doi.org/10.1007/s12193-011-0070-8

Download citation

Received: 16 April 2011
Accepted: 12 October 2011
Published: 29 October 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s12193-011-0070-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of an image-based talking head with realistic facial expression and head motion

Abstract

Access this article

Similar content being viewed by others

Assessing Facial Symmetry and Attractiveness using Augmented Reality

InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of an image-based talking head with realistic facial expression and head motion

Abstract

Access this article

Similar content being viewed by others

Assessing Facial Symmetry and Attractiveness using Augmented Reality

InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation