Skip to main content
Log in

Evaluation of an image-based talking head with realistic facial expression and head motion

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively.

The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized.

In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ezzat T, Geiger G, Poggio T (2002) Trainable videorealistic speech animation. In: Proceedings of SIGGRAPH 2002, pp 388–397

    Chapter  Google Scholar 

  2. Cosatto E, Ostermann J, Graf HP, Schroeter J (2003) Lifelike talking faces for interactive services. Proc IEEE 91:1406–1429

    Article  Google Scholar 

  3. Liu K, Ostermann J (2009) Optimization of an image-based talking head system. EURASIP J Audio Speech Music Process. doi:10.1155/2009/174192

  4. Banf M, Blanz V (2009) Example-based rendering of eye movements. Comput Graph Forum 28(2):659–666

    Article  Google Scholar 

  5. Theobald B, Fagel S, Bailly G, Elsei F (2008) LIPS2008: visual speech synthesis challenge. In: Proceedings of interspeech 2008, pp 2310–2313

    Google Scholar 

  6. Deng Z, Neumann U (2008) Data-driven 3D facial animation. Springer, Berlin

    Google Scholar 

  7. Pighin F, Hecker J, Lischinski D, Szeliski R, Salesin DH (1998) Synthesizing realistic facial expressions from photographs. In: Proceedings of SIGGRAPH 1998, pp 75–84

    Chapter  Google Scholar 

  8. Essa IA, Pentland AP (1997) Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans Pattern Anal Mach Intell 19(7):757–763

    Article  Google Scholar 

  9. Cao Y, Tien WC, Faloutsos P, Pighin F (2005) Expressive speech-driven facial animation. ACM Trans Graph 24(4):1283–1302

    Article  Google Scholar 

  10. Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: Proceedings of AFGR, pp 96–102

    Google Scholar 

  11. Chuang E, Bregler C (2005) Mood swings: expressive speech animation. ACM Trans Graph 24(2):331–347

    Article  Google Scholar 

  12. Busso C, Deng Z, Grimm M, Neumann U, Narayanan SS (2007) Rigid head motion in expressive speech animation: analysis and synthesis IEEE Trans Audio Speech Lang Process 15:1075–1086

    Article  Google Scholar 

  13. Ambadar Z, Cohn JF, Reed LI (2009) All smiles are not created equal: morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. J Nonverbal Behav 33:17–34

    Article  Google Scholar 

  14. Liu K, Ostermann J (2011) Realistic facial expression synthesis for an image-based talking head. In: Proceedings of ICME11, Barcelona

    Google Scholar 

  15. Liu K, Ostermann J (2011) Realistic head motion synthesis for an image-based talking head. In: Proceedings of IEEE conference FG2011, Santa Barbara, CA

    Google Scholar 

  16. Advanced Television Systems Committee (ATSC) (2003) ATSC implementation subcommittee finding: relative timing of sound and vision for broadcast operations advanced television. Doc IS-191

  17. ITU-R BT 500 11 (2002) Methodology for the subjective assessment of the quality of television pictures

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kang Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, K., Ostermann, J. Evaluation of an image-based talking head with realistic facial expression and head motion. J Multimodal User Interfaces 5, 37–44 (2012). https://doi.org/10.1007/s12193-011-0070-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-011-0070-8

Keywords

Navigation