Skip to main content

Intelligent Content Production for a Virtual Speaker

  • Conference paper
Intelligent Media Technology for Communicative Intelligence (IMTCI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3490))

Included in the following conference series:

  • 516 Accesses

Abstract

We present a graphically embodied animated agent (a virtual speaker) capable of reading a plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Our system uses a lexical analysis of an English text and statistical models of facial gestures in order to automatically generate the gestures related to the spoken text. It is intended for the automatic creation of the realistically animated virtual speakers, such as newscasters and storytellers and incorporates the characteristics of such speakers captured from the training video clips. Our system is based on a visual text-to-speech system which generates a lip movement synchronised with the generated speech. This is extended to include eye blinks, head and eyebrow motion, and a simple gaze following behaviour. The result is a full face animation produced automatically from the plain English text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Pelachaud, C., Badler, N., Steedman, M.: Generating Facial Expressions for Speech. Cognitive, Science 20(1), 1–46 (1996)

    Article  Google Scholar 

  2. Legoff, B., Benoît, C.: A French speaking synthetic head. In: Benot, C., Campbell, R. (eds.) Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, Rhodes, Greece, pp. 145–148 (1997)

    Google Scholar 

  3. Lewis, J.P., Parke, F.I.: Automated lipsynch and speech synthesis for character animation. In: Caroll, J.H., Tanner, P. (eds.) Proceedings of Human Factors in Computing Systems and Graphics Interface, pp. 143–147 (1987)

    Google Scholar 

  4. Smid, K., Pandzic, I.S.: A Conversational Virtual Character for the Web. In: Proceedings of Computer Animation, Geneva, Switzerland, pp. 240–248 (2002)

    Google Scholar 

  5. Beskow, J.: Rule-based visual speech synthesis. In: Proceedings of ESCAEUROSPEECH 1995. 4th European Conference on Speech Communication and Technology, vol. 1, pp. 299–302 (1995)

    Google Scholar 

  6. Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann, M., Thalmann, D. (eds.) Proceedings of Models and Techniques in Computer Animation, pp. 139–156. Springer, Tokyo (1993)

    Google Scholar 

  7. Lundeberg, M., Beskow, J.: Developing a 3D-agent for the August dialogue system. In: Proceedings from AVSP1999, Santa Cruz, USA (1999)

    Google Scholar 

  8. Ostermann, J., Millen, D.: Talking heads and synthetic speech: An architecture for supporting electronic commerce. In: Proceedings of ICME 2000, pp. 71–74 (2000)

    Google Scholar 

  9. Pandzic, I.S.: Facial Animation Framework for the Web and Mobile Platforms. In: Proceedings of Web3D Symposium, Tempe, AZ, USA, pp. 27–34 (2002)

    Google Scholar 

  10. Lee, S.P., Badler, J.B., Badler, N.I.: Eyes Alive. In: Proceedings of the 29th annual conference on Computer graphics and interactive techniques 2002, San Antonio, Texas, USA, pp. 637–644. ACM Press, New York (2002)

    Google Scholar 

  11. Cassell, J., Sullivan, J., Prevost, S., Churchill, E.: Embodied Conversational Agents. The MIT Press, Cambridge (2000)

    Google Scholar 

  12. Argyle, M., Cook, M.: Gaze and mutual gaze. Cambridge University Press, Cambridge (1976)

    Google Scholar 

  13. Collier, G.: Emotional expression. Lawrence Erlbaum Associates, Hillsdale (1985)

    Google Scholar 

  14. Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1992)

    Google Scholar 

  15. Cassell, J., Vilhjálmsson, H., Bickmore, T.: BEAT: the Behavior Expression Animation Toolkit. In: Fiume, E. (ed.) Proceedings of SIGGRAPH 2001, Computer Graphics Proceedings, Annual Conference Series, ACM, pp. 477–486. ACM Press / ACM SIGGRAPH, New York (2001)

    Google Scholar 

  16. Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation – The standard, implementations and applications. John Wiley & Sons, Chichester (2002)

    Book  Google Scholar 

  17. Hadar, U., Steiner, T., Grant, E., Rose, F.C.: Kinematics of head movements accompanying speech during conversation. Human Movement Science 2, 35–46 (1983)

    Article  Google Scholar 

  18. Faigin, G.: The artist’s complete guide to facial expression. Watson-Guptill Publications, New York (1990)

    Google Scholar 

  19. Ekman, P., Friesen, W.: The repertoire of nonverbal behavioral categories – Origins, usage, and coding. Semiotica 1, 49–98 (1969)

    Google Scholar 

  20. Ekman, P.: About brows: Emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds.) Human ethology: Claims and limits of a new discipline, pp. 169–249. Cambridge University Press, New York (1979)

    Google Scholar 

  21. Duncan, S.: Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2), 283–292 (1972)

    Article  Google Scholar 

  22. Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.: Visual Prosody: Facial Movements Accompanying Speech. In: Proceedings of AFGR 2002, pp. 381–386 (2002)

    Google Scholar 

  23. Radman, V.: Leksička analiza teksta za automatsku proizvodnju pokreta lica. Graduate work no. 2472 on Faculty of Electrical Engineering and Computing, University of Zagreb (2004)

    Google Scholar 

  24. Hiyakumoto, L., Prevost, S., Cassell, J.: Semantic and Discourse Information for Textto- Speech Intonation. In: Proceedings of ACL Workshop on Concept-to-Speech Generation 1997, Madrid, pp. 47–56 (1997)

    Google Scholar 

  25. Parent, R., King, S., Fujimura, O.: Issues with Lip Synch Animation: Can You Read My Lips? In: Proceedings of Computer Animation 2000, Geneva, Switzerland, pp. 3–10 (2002)

    Google Scholar 

  26. Silverman, K., Beckman, M., Pitrelli, J., Osterndorf, M., Wightman, C., Price, P., Pierrehumbert, J., Herschberg, J.: ToBI: A Standard for Labeling English Prosody. In: Proceedings of Conference on Spoken Language, Banff, Canada, pp. 867–870 (1992)

    Google Scholar 

  27. Gratch, J.: Emile: Marshalling Passions in Training and Education. In: Proceedings of the Fourth International Conference on Autonomous Agents, pp. 325–332. ACM Press, New York (2000)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smid, K., Pandzic, I.S., Radman, V. (2005). Intelligent Content Production for a Virtual Speaker. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds) Intelligent Media Technology for Communicative Intelligence. IMTCI 2004. Lecture Notes in Computer Science(), vol 3490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558637_17

Download citation

  • DOI: https://doi.org/10.1007/11558637_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29035-3

  • Online ISBN: 978-3-540-31738-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics