Abstract
We present a dialogue architecture that addresses perception, planning and execution of multimodal dialogue behavior. Motivated by realtime human performance and modular architectural principles, the architecture is full-duplex (“open-mic”); prosody is continuously analyzed and used for mixed-control turntaking behaviors (reactive and deliberative) and incremental utterance production. The architecture is fine-grain and highly expandable; we are currently applying it in more complex multimodal interaction and dynamic task environments. We describe here the theoretical underpinnings behind the architecture, compare it to prior efforts, discuss the methodology and give a brief overview of its current runtime characteristics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Moore, R.K.: PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction. IEEE Transactions on Computers 56(9), 1176–1188 (2007)
Raux, A., Eskenazi, M.: A Multi-Layer Architecture for Semi-Synchronous Event-Driven Dialogue Management, ASRU, Japan, 514–519 (2007)
Allen, J., Ferguson, G., Stent, A.: An Architecture for More Realistic Conversational Systems. In: IUI, pp. 14–17. ACM Press, Santa Fe (2001)
O’Connell, D.C., Kowal, S., Kaltenbacher, E.: Turn-Taking: A Critical Analysis of the Research Tradition. Journal of Psycholinguistic Research 19(6), 345–373 (1990)
Thórisson, K.R., Benko, H., Arnold, A., Abramov, D., Maskey, S., Vaseekaran, A.: Constructionist Design Methodology for Interactive Intelligences. A.I. Magazine. American Association for Artificial Intelligence 25(4), 77–90 (2004)
Thórisson, K.R.: A Mind Model for Multimodal Communicative Creatures and Humanoids. International J. Appl. Artif. Intell. 13(4-5), 449–486 (1999)
Thórisson, K.R.: Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action. In: Granström, B., House, D., Karlsson, I. (eds.) Multimodality in Language and Speech Systems, pp. 173–207. Kluwer Academic Publishers, Dordrecht (2002)
Jonsdottir, G.R., Thórisson, K.R., Nivel, E.: Learning Smooth, Human-Like Turntaking in Realtime Dialogue. In: Prendinger, H., Lester, J., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, Springer, Heidelberg (2008)
Pan, S., McKeown, K.R.: Integrating Language Generation with Speech Synthesis in a Concept to Speech System. In: Proceedings of the ACL Workshop on Concept to Speech Generation Systems. ACL/EACL (1997)
Grote, B., Hagen, E., Teich, E.: Matchmaking: Dialogue Modeling and Speech Generation Meet. In: Proceedings of the 1996 International Workshop on Natural Language Generation, Herstmonceux, England, pp. 171–180 (1996)
Wilson, M., Wilson, T.P.: An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review 12(6), 957–968 (2005)
Sacks, H., Schegloff, E.A., Jefferson, G.A.: A Simplest Systematics for the Organization of Turn-Taking in Conversation. Language 50, 696–735 (1974)
Thórisson, K.R.: Modeling Multimodal Communication as a Complex System. In: Wachsmuth, I., Lenzen, M., Knoblich, G. (eds.) Modeling Communication with Robots and Virtual Humans, pp. 143–168. Springer, New York (2008)
Simon, H.A.: Can there be a science of complex systems? In: Bar-Yam, Y. (ed.) Unifying themes in complex systems: Proceedings from the International Conference on Complex Systems, pp. 4–14. Perseus Press, Cambridge (1999)
Thórisson, K.R., List, T., Pennock, C., DiPirro, J.: Whiteboards: Scheduling Blackboards for Semantic Routing of Messages & Streams. Proceedings of AAAI 2005, AAAI Technical Report WS-05-08, 8-15 (2005)
Thórisson, K.R.: Integrated A.I. Systems. Minds & Machines 17, 11–25 (2007)
Scwabacher, M., Gelsey, A.: Multi-Level Simulation and Numerical Optimization of Complex Engineering Designs. In: 6th AIAA/NASA/USAF Multidisciplinary Analysis & Optmization Symposium, Bellevue, WA, AIAA-96-4021 (1996)
Schaffner, K.F.: Reduction: the Cheshire cat problem and a return to roots. Synthese 151(3), 377–402 (2006)
Ng-Thow-Hing, V., List, T., Thórisson, K.R., Lim, J., Wormer, J.: Design and Evaluation of Communication Middleware in a Distributed Humanoid Robot Architecture. In: IROS 2007 Workshop Measures and Procedures for the Evaluation of Robot Architectures and Middleware, San Diego, California, 29 October - 2 November (2007)
Gaud, N., Gechter, F., Galland, S., Koukam, A.: Holonic Multiagent Multilevel Simulation Application to Real-time Pedestrians Simulation in Urban Environment. In: Proceedings of IJCAI 2007, pp. 1275–1280 (2007)
Arbib, M.A.: Levels of Modeling of Visually Guided Behavior (with peer commentary and author’s response). Behavioral and Brain Sciences 10, 407–465 (1987)
Bonaiuto, J., Thórisson, K.R.: Towards a Neurocognitive Model of Realtime Turntaking in Face-to-Face Dialogue. In: Wachsmuth, I., Lenzen, M., Knoblich, G. (eds.) Embodied Communication in Humans and Machines. Oxford University Press, U.K (2008)
Thórisson, K.R., Jonsdottir, G.R., Nivel, E.: Methods for Complex Single-Mind Architecture Designs. In: Proc. AAMAS, Portugal (June 2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thórisson, K.R., Jonsdottir, G.R. (2008). A Granular Architecture for Dynamic Realtime Dialogue. In: Prendinger, H., Lester, J., Ishizuka, M. (eds) Intelligent Virtual Agents. IVA 2008. Lecture Notes in Computer Science(), vol 5208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85483-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-85483-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85482-1
Online ISBN: 978-3-540-85483-8
eBook Packages: Computer ScienceComputer Science (R0)