Intelligent Content Production for a Virtual Speaker

Smid, Karlo; Pandzic, Igor S.; Radman, Viktorija

doi:10.1007/11558637_17

Karlo Smid²¹,
Igor S. Pandzic²² &
Viktorija Radman²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3490))

Included in the following conference series:

Intelligent Media Technology for Communicative Intelligence

516 Accesses

Abstract

We present a graphically embodied animated agent (a virtual speaker) capable of reading a plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Our system uses a lexical analysis of an English text and statistical models of facial gestures in order to automatically generate the gestures related to the spoken text. It is intended for the automatic creation of the realistically animated virtual speakers, such as newscasters and storytellers and incorporates the characteristics of such speakers captured from the training video clips. Our system is based on a visual text-to-speech system which generates a lip movement synchronised with the generated speech. This is extended to include eye blinks, head and eyebrow motion, and a simple gaze following behaviour. The result is a full face animation produced automatically from the plain English text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Article Open access 08 September 2015

Realistic Speech-Driven Facial Animation with GANs

Article Open access 13 October 2019

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

References

Pelachaud, C., Badler, N., Steedman, M.: Generating Facial Expressions for Speech. Cognitive, Science 20(1), 1–46 (1996)
Article Google Scholar
Legoff, B., Benoît, C.: A French speaking synthetic head. In: Benot, C., Campbell, R. (eds.) Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, Rhodes, Greece, pp. 145–148 (1997)
Google Scholar
Lewis, J.P., Parke, F.I.: Automated lipsynch and speech synthesis for character animation. In: Caroll, J.H., Tanner, P. (eds.) Proceedings of Human Factors in Computing Systems and Graphics Interface, pp. 143–147 (1987)
Google Scholar
Smid, K., Pandzic, I.S.: A Conversational Virtual Character for the Web. In: Proceedings of Computer Animation, Geneva, Switzerland, pp. 240–248 (2002)
Google Scholar
Beskow, J.: Rule-based visual speech synthesis. In: Proceedings of ESCAEUROSPEECH 1995. 4th European Conference on Speech Communication and Technology, vol. 1, pp. 299–302 (1995)
Google Scholar
Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann, M., Thalmann, D. (eds.) Proceedings of Models and Techniques in Computer Animation, pp. 139–156. Springer, Tokyo (1993)
Google Scholar
Lundeberg, M., Beskow, J.: Developing a 3D-agent for the August dialogue system. In: Proceedings from AVSP1999, Santa Cruz, USA (1999)
Google Scholar
Ostermann, J., Millen, D.: Talking heads and synthetic speech: An architecture for supporting electronic commerce. In: Proceedings of ICME 2000, pp. 71–74 (2000)
Google Scholar
Pandzic, I.S.: Facial Animation Framework for the Web and Mobile Platforms. In: Proceedings of Web3D Symposium, Tempe, AZ, USA, pp. 27–34 (2002)
Google Scholar
Lee, S.P., Badler, J.B., Badler, N.I.: Eyes Alive. In: Proceedings of the 29th annual conference on Computer graphics and interactive techniques 2002, San Antonio, Texas, USA, pp. 637–644. ACM Press, New York (2002)
Google Scholar
Cassell, J., Sullivan, J., Prevost, S., Churchill, E.: Embodied Conversational Agents. The MIT Press, Cambridge (2000)
Google Scholar
Argyle, M., Cook, M.: Gaze and mutual gaze. Cambridge University Press, Cambridge (1976)
Google Scholar
Collier, G.: Emotional expression. Lawrence Erlbaum Associates, Hillsdale (1985)
Google Scholar
Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1992)
Google Scholar
Cassell, J., Vilhjálmsson, H., Bickmore, T.: BEAT: the Behavior Expression Animation Toolkit. In: Fiume, E. (ed.) Proceedings of SIGGRAPH 2001, Computer Graphics Proceedings, Annual Conference Series, ACM, pp. 477–486. ACM Press / ACM SIGGRAPH, New York (2001)
Google Scholar
Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation – The standard, implementations and applications. John Wiley & Sons, Chichester (2002)
Book Google Scholar
Hadar, U., Steiner, T., Grant, E., Rose, F.C.: Kinematics of head movements accompanying speech during conversation. Human Movement Science 2, 35–46 (1983)
Article Google Scholar
Faigin, G.: The artist’s complete guide to facial expression. Watson-Guptill Publications, New York (1990)
Google Scholar
Ekman, P., Friesen, W.: The repertoire of nonverbal behavioral categories – Origins, usage, and coding. Semiotica 1, 49–98 (1969)
Google Scholar
Ekman, P.: About brows: Emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds.) Human ethology: Claims and limits of a new discipline, pp. 169–249. Cambridge University Press, New York (1979)
Google Scholar
Duncan, S.: Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2), 283–292 (1972)
Article Google Scholar
Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.: Visual Prosody: Facial Movements Accompanying Speech. In: Proceedings of AFGR 2002, pp. 381–386 (2002)
Google Scholar
Radman, V.: Leksička analiza teksta za automatsku proizvodnju pokreta lica. Graduate work no. 2472 on Faculty of Electrical Engineering and Computing, University of Zagreb (2004)
Google Scholar
Hiyakumoto, L., Prevost, S., Cassell, J.: Semantic and Discourse Information for Textto- Speech Intonation. In: Proceedings of ACL Workshop on Concept-to-Speech Generation 1997, Madrid, pp. 47–56 (1997)
Google Scholar
Parent, R., King, S., Fujimura, O.: Issues with Lip Synch Animation: Can You Read My Lips? In: Proceedings of Computer Animation 2000, Geneva, Switzerland, pp. 3–10 (2002)
Google Scholar
Silverman, K., Beckman, M., Pitrelli, J., Osterndorf, M., Wightman, C., Price, P., Pierrehumbert, J., Herschberg, J.: ToBI: A Standard for Labeling English Prosody. In: Proceedings of Conference on Spoken Language, Banff, Canada, pp. 867–870 (1992)
Google Scholar
Gratch, J.: Emile: Marshalling Passions in Training and Education. In: Proceedings of the Fourth International Conference on Autonomous Agents, pp. 325–332. ACM Press, New York (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Ericsson Nikola Tesla, Krapinska 45, p.p. 93, HR-10 002, Zagreb
Karlo Smid & Viktorija Radman
Faculty of electrical engineering and computing, Zagreb University, Unska 3, HR-10 000, Zagreb
Igor S. Pandzic

Authors

Karlo Smid
View author publications
You can also search for this author in PubMed Google Scholar
Igor S. Pandzic
View author publications
You can also search for this author in PubMed Google Scholar
Viktorija Radman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Science, Ordona 21, 01-237, Warsaw, Poland
Leonard Bolc
School of Computer Science, University of Adelaide, 5005, Adelaide, SA, Australia
Zbigniew Michalewicz
Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan
Toyoaki Nishida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smid, K., Pandzic, I.S., Radman, V. (2005). Intelligent Content Production for a Virtual Speaker. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds) Intelligent Media Technology for Communicative Intelligence. IMTCI 2004. Lecture Notes in Computer Science(), vol 3490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558637_17

Download citation

DOI: https://doi.org/10.1007/11558637_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29035-3
Online ISBN: 978-3-540-31738-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics