Abstract
The Embodied Conversational Agents (ECAs) are an application of virtual characters that is subject of considerable ongoing research. An essential prerequisite for creating believable ECAs is the ability to describe and visually realize multimodal conversational behaviors. The recently developed Behavior Markup Language (BML) seeks to address this requirement by granting a means to specify physical realizations of multimodal behaviors through human-readable scripts. In this paper we present an approach to implement a behavior realizer compatible with BML language. The system’s architecture is based on hierarchical controllers which apply preprocessed behaviors to body modalities. Animation database is feasibly extensible and contains behavior examples constructed upon existing lexicons and theory of gestures. Furthermore, we describe a novel solution to the issue of synchronizing gestures with synthesized speech using neural networks and propose improvements to the BML specification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cassell, J.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Lee, J., Marsella, S.: Non-verbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)
Pelachaud, C.: Studies on gesture expressivity for a virtual agent. Speech Communication, Special issue in honor of Bjorn Granstrom and Rolf Carlson (2009) (to appear)
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: Creating animated conversational characters from recordings of human performance. In: Proceedings of ACM SIGGRAPH 2004, vol. 23, pp. 506–513 (2004)
Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thorisson, K., Vilhjalmsson, H.: Towards a common framework for multimodal generation: The behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006)
Vilhjálmsson, H., Cantelmo, N., Cassell, J., Chafai, N.E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A.N., Pelachaud, C., Ruttkay, Z., Thórisson, K.R., van Welbergen, H., van der Werf, R.J.: The behavior markup language: Recent developments and challenges. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 99–111. Springer, Heidelberg (2007)
Ekman, P.: About brows: Emotional and conversational signals, pp. 169–202. Cambridge University Press, Cambridge (1979)
McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)
Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1991)
Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 1–24 (2008)
Cassell, J., Vilhjalmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: SIGGRAPH 2001: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 477–486. ACM, New York (2001)
Smid, K., Zoric, G., Pandzic, I.S.: [HUGE]: Universal architecture for statistically based hUman gEsturing. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 256–269. Springer, Heidelberg (2006)
Zoric, G., Smid, K., Pandzic, I.S.: Towards facial gestures generation by speech signal analysis using huge architecture. In: Multimodal Signals: Cognitive and Algorithmic Issues: COST Action 2102 and euCognition International School Vietri sul Mare, Italy, April 21-26, Revised Selected and Invited Papers, pp. 112–120. Springer, Heidelberg (2009)
Albrecht, I., Haber, J., peter Seidel, H.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)
Kopp, S., Wachsmuth, I.: Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15, 39–52 (2004)
Thiebaux, M., Marshall, A., Marsella, S., Kallmann, M.: Smartbody: Behavior realization for embodied conversational agents. In: Proceedings of Autonomous Agents and Multi-Agent Systems AAMAS (2008)
Microsoft speech API: http://www.microsoft.com/speech/speech2007/default.mspx
Pejsa, T., Pandzic, I.S.: Architecture of an animation system for human characters. In: Proceedings of the 10th International Conference on Telecommunications ConTEL 2009 (2009)
Pandzic, I.S., Ahlberg, J., Wzorek, M., Rudol, P., Mosmondor, M.: Faces everywhere: Towards ubiquitous production and delivery of face animation. In: Proceedings of the 2nd International Conference on Mobile and Ubiquitous Multimedia MUM 2003, pp. 49–55 (2003)
Hartmann, B., Mancini, M., Pelachaud, C.: Formational parameters and adaptive prototype instantiation for mpeg-4 compliant gesture synthesis. In: Proc. Computer Animation, June 19-21, pp. 111–119 (2002)
Van Deemter, K., Krenn, B., Piwek, P., Klesen, M., Schroder, M., Baumann, S.: Fully generated scripted dialogue for embodied agents. Artificial Intelligence 172(10), 1219–1244 (2008)
Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 365–377 (2003)
Taylor, P.A., Black, A., Caley, R.: The architecture of the festival speech synthesis system. In: The Third ESCA Workshop in Speech Synthesis, pp. 147–151 (1998)
Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Heidelberg (1996)
Steinmetz, R.: Human perception of jitter and media synchronization. IEEE Journal on Selected Areas in Communications 14(1) (1996)
Brkic, M., Smid, K., Pejsa, T., Pandzic, I.S.: Towards natural head movement of autonomous speaker agent. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 73–80. Springer, Heidelberg (2008)
Huang, H., Cerekovic, A., Pandzic, I.S., Nakano, Y., Nishida, T.: Toward a culture adaptive conversational agent with a modularized approach. In: Proceedings of Workshop on Enculturating Conversational Interfaces by Socio-cultural Aspects of Communication (2008 International Conference on Intelligent User Interfaces, IUI 2008) (2008)
Poggi, I.: Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler (2007)
Posner, R., Serenari, M.: Blag: Berlin dictionary of everyday gestures
Armstrong, N.: Field Guide to Gestures: How to Identify and Interpret Virtually Every Gesture Known to Man. Quirk Books (2003)
Cerekovic, A., Huang, H., Pandzic, I.S., Nakano, Y., Nishida, T.: Towards a multicultural ECA tour guide system. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 364–365. Springer, Heidelberg (2007)
Kovar, L.: Automated Methods for Data-driven Synthesis of Realistic and Controllable Human Motion. PhD thesis, University of Wisconsin-Madison (2004)
Heck, R., Gleicher, M.: Parametric motion graphs. In: Proceedings of the 2007 symposium on Interactive 3D graphics and games, pp. 129–136. ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Čereković, A., Pejša, T., Pandžić, I.S. (2010). A Controller-Based Animation System for Synchronizing and Realizing Human-Like Conversational Behaviors. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-12397-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)