Abstract
We present a graphically embodied animated agent (a virtual speaker) capable of reading a plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Our system uses a lexical analysis of an English text and statistical models of facial gestures in order to automatically generate the gestures related to the spoken text. It is intended for the automatic creation of the realistically animated virtual speakers, such as newscasters and storytellers and incorporates the characteristics of such speakers captured from the training video clips. Our system is based on a visual text-to-speech system which generates a lip movement synchronised with the generated speech. This is extended to include eye blinks, head and eyebrow motion, and a simple gaze following behaviour. The result is a full face animation produced automatically from the plain English text.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pelachaud, C., Badler, N., Steedman, M.: Generating Facial Expressions for Speech. Cognitive, Science 20(1), 1–46 (1996)
Legoff, B., Benoît, C.: A French speaking synthetic head. In: Benot, C., Campbell, R. (eds.) Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, Rhodes, Greece, pp. 145–148 (1997)
Lewis, J.P., Parke, F.I.: Automated lipsynch and speech synthesis for character animation. In: Caroll, J.H., Tanner, P. (eds.) Proceedings of Human Factors in Computing Systems and Graphics Interface, pp. 143–147 (1987)
Smid, K., Pandzic, I.S.: A Conversational Virtual Character for the Web. In: Proceedings of Computer Animation, Geneva, Switzerland, pp. 240–248 (2002)
Beskow, J.: Rule-based visual speech synthesis. In: Proceedings of ESCAEUROSPEECH 1995. 4th European Conference on Speech Communication and Technology, vol. 1, pp. 299–302 (1995)
Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann, M., Thalmann, D. (eds.) Proceedings of Models and Techniques in Computer Animation, pp. 139–156. Springer, Tokyo (1993)
Lundeberg, M., Beskow, J.: Developing a 3D-agent for the August dialogue system. In: Proceedings from AVSP1999, Santa Cruz, USA (1999)
Ostermann, J., Millen, D.: Talking heads and synthetic speech: An architecture for supporting electronic commerce. In: Proceedings of ICME 2000, pp. 71–74 (2000)
Pandzic, I.S.: Facial Animation Framework for the Web and Mobile Platforms. In: Proceedings of Web3D Symposium, Tempe, AZ, USA, pp. 27–34 (2002)
Lee, S.P., Badler, J.B., Badler, N.I.: Eyes Alive. In: Proceedings of the 29th annual conference on Computer graphics and interactive techniques 2002, San Antonio, Texas, USA, pp. 637–644. ACM Press, New York (2002)
Cassell, J., Sullivan, J., Prevost, S., Churchill, E.: Embodied Conversational Agents. The MIT Press, Cambridge (2000)
Argyle, M., Cook, M.: Gaze and mutual gaze. Cambridge University Press, Cambridge (1976)
Collier, G.: Emotional expression. Lawrence Erlbaum Associates, Hillsdale (1985)
Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1992)
Cassell, J., Vilhjálmsson, H., Bickmore, T.: BEAT: the Behavior Expression Animation Toolkit. In: Fiume, E. (ed.) Proceedings of SIGGRAPH 2001, Computer Graphics Proceedings, Annual Conference Series, ACM, pp. 477–486. ACM Press / ACM SIGGRAPH, New York (2001)
Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation – The standard, implementations and applications. John Wiley & Sons, Chichester (2002)
Hadar, U., Steiner, T., Grant, E., Rose, F.C.: Kinematics of head movements accompanying speech during conversation. Human Movement Science 2, 35–46 (1983)
Faigin, G.: The artist’s complete guide to facial expression. Watson-Guptill Publications, New York (1990)
Ekman, P., Friesen, W.: The repertoire of nonverbal behavioral categories – Origins, usage, and coding. Semiotica 1, 49–98 (1969)
Ekman, P.: About brows: Emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds.) Human ethology: Claims and limits of a new discipline, pp. 169–249. Cambridge University Press, New York (1979)
Duncan, S.: Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2), 283–292 (1972)
Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.: Visual Prosody: Facial Movements Accompanying Speech. In: Proceedings of AFGR 2002, pp. 381–386 (2002)
Radman, V.: Leksička analiza teksta za automatsku proizvodnju pokreta lica. Graduate work no. 2472 on Faculty of Electrical Engineering and Computing, University of Zagreb (2004)
Hiyakumoto, L., Prevost, S., Cassell, J.: Semantic and Discourse Information for Textto- Speech Intonation. In: Proceedings of ACL Workshop on Concept-to-Speech Generation 1997, Madrid, pp. 47–56 (1997)
Parent, R., King, S., Fujimura, O.: Issues with Lip Synch Animation: Can You Read My Lips? In: Proceedings of Computer Animation 2000, Geneva, Switzerland, pp. 3–10 (2002)
Silverman, K., Beckman, M., Pitrelli, J., Osterndorf, M., Wightman, C., Price, P., Pierrehumbert, J., Herschberg, J.: ToBI: A Standard for Labeling English Prosody. In: Proceedings of Conference on Spoken Language, Banff, Canada, pp. 867–870 (1992)
Gratch, J.: Emile: Marshalling Passions in Training and Education. In: Proceedings of the Fourth International Conference on Autonomous Agents, pp. 325–332. ACM Press, New York (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smid, K., Pandzic, I.S., Radman, V. (2005). Intelligent Content Production for a Virtual Speaker. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds) Intelligent Media Technology for Communicative Intelligence. IMTCI 2004. Lecture Notes in Computer Science(), vol 3490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558637_17
Download citation
DOI: https://doi.org/10.1007/11558637_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29035-3
Online ISBN: 978-3-540-31738-8
eBook Packages: Computer ScienceComputer Science (R0)