Abstract
Tailoring the linguistic content of automatically generated descriptions to the preferences of a target user has been well demonstrated to be an effective way to produce higher-quality output that may even have a greater impact on user behaviour. It is known that the non-verbal behaviour of an embodied agent can have a significant effect on users’ responses to content presented by that agent. However, to date no-one has examined the contribution of non-verbal behaviour to the effectiveness of user tailoring in automatically generated embodied output. We describe a series of experiments designed to address this question. We begin by introducing a multimodal dialogue system designed to generate descriptions and comparisons tailored to user preferences, and demonstrate that the user-preference tailoring is detectable to an overhearer when the output is presented as synthesised speech. We then present a multimodal corpus consisting of the annotated facial expressions used by a speaker to accompany the generated tailored descriptions, and verify that the most characteristic positive and negative expressions used by that speaker are identifiable when resynthesised on an artificial talking head. Finally, we combine the corpus-derived facial displays with the tailored descriptions to test whether the addition of the non-verbal channel improves users’ ability to detect the intended tailoring, comparing two strategies for selecting the displays: one based on a simple corpus-derived rule, and one making direct use of the full corpus data. The performance of the subjects who saw displays selected by the rule-based strategy was not significantly different than that of the subjects who got only the linguistic content, while the subjects who saw the data-driven displays were significantly worse at detecting the correctly tailored output. We propose a possible explanation for this result, and also make recommendations for developers of future systems that may make use of an embodied agent to present user-tailored content.
Similar content being viewed by others
References
Artstein, R., Poesio, M.: Kappa3 = alpha (or beta). Technical Report CSM-437, Department of Computer Science, University of Essex. http://www.essex.ac.uk/csee/research/publications/technicalreports/2005/csm-437.pdf (2005)
Baker, R., Clark, R., White, M.: Synthesizing contextually appropriate intonation in limited domains. In: Proceedings of the 5th ISCA Workshop on Speech Synthesis, Pittsburgh, PA, pp. 91–96 (2004)
Belz A.: That’s nice... what can you do with it?. Computational Linguistics 35(1), 111–118 (2009)
Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 313–320 (2006)
Carberry S., Chu-Carroll J., Elzer S.: Constructing and utilizing a model of user preferences in collaborative consultation dialogues. Comput. Intell. 15(3), 185–217 (1999)
Carenini G., Moore J.D.: Generating and evaluating evaluative arguments. Artif. Intell. 170(11), 925–952 (2006)
Cassell J., Sullivan J., Prevost S., Churchill E.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Cassell J., Bickmore T., Vilhjálmsson H., Yan H.: More than just a pretty face: Conversational protocols and the affordances of embodiment. Knowledge-Based Syst. 14(1–2), 55–64 (2001)
Clark, R.A.J., Richmond, K., King, S.: Festival 2—build your own general purpose unit selection speech synthesiser. In: Proceedings of the 5th ISCA Workshop on Speech Synthesis, pp. 173–178 (2004)
Clemen R.T.: Making Hard Decisions: An Introduction to Decision Analysis. Duxbury Press, Belmont (1996)
Davidson R.J., Irwin W.: The functional neuroanatomy of emotion and affective style. Trends Cogn. Sci. 3(1), 11–21 (1999)
Davidson R.J., Ekman P., Saron C., Senulis J., Friesen W.: Approach-withdrawal and cerebral asymmetry: Emotional expression and brain physiology I. J. Pers. Soc. Psychol. 58(2), 330–341 (1990)
DeCarlo D., Stone M., Revilla C., Venditti J.: Specifying and animating facial signals for discourse in embodied conversational agents. Comput. Anim. Virtual Worlds 15(1), 27–38 (2004)
Demberg, V., Moore, J.D.: Information presentation in spoken dialogue systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 65–72 (2006)
Edwards W., Barron F.H.: SMARTS and SMARTER: Improved simple methods for multiattribute utility management. Organ. Behav. Hum. Decis. Process. 60, 306–325 (1994)
Foster, M.E.: Comparing rule-based and data-driven selection of facial displays. In: Proceedings of the ACL 2007 Workshop on Embodied Language Processing, Prague, pp. 1–8 (2007a)
Foster, M.E.: Generating embodied descriptions tailored to user preferences. In: Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA 2007), Paris, pp. 264–271 (2007b)
Foster, M.E.: Automated metrics that agree with human judgements on generated output for an embodied conversational agent. In: Proceedings of The 5th International Natural Language Generation Conference (INLG 2008), Salt Fork, OH, pp. 95–103 (2008)
Foster, M.E., White, M.: Techniques for text planning with XSLT. In: Proceedings of the 4th Workshop on NLP and XML (NLPXML 2004), Barcelona, Spain, pp. 1–8 (2004)
Foster, M.E., White, M.: Assessing the impact of adaptive generation in the COMIC multimodal dialogue system. In: Proceedings of the IJCAI 2005 Workshop on Knowledge and Reasoning in Practical Dialogue Systems. Edinburgh, Scotland, pp. 24–31 (2005)
Gaebel W., Wölwer W.: Facial expressivity in the course of schizophrenia and depression. Eur. Arch. Psychiatry Clin. Neurosci. 254(5), 335–342 (2004)
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 1996), vol. 1. Atlanta, GA, pp. 373–376 (1996)
Kay, J.: Scrutable Adaptation: Because We Can and Must. In: Adaptive Hypermedia and Adaptive Web-Based Systems. Springer, Berlin, pp. 11–19 (2006)
Kipp, M.: Gesture generation by imitation—from human behavior to computer character animation. Dissertation.com, Boca Raton (2004)
Kobsa, A., Wahlster, W. (eds): User Models in Dialog Systems. Springer, Heidelberg (1989)
Koller, A., Striegnitz, K., Byron, D., Cassell, J., Dale, R., Dalzel-Job, S., Moore, J., Oberlander, J.: Validating the web-based evaluation of NLG systems. In: Proceedings of the ACL-IJCNLP 2009 Conference. Singapore, pp. 301–304 (2009)
Krahmer E., Swerts M.: More about brows: a cross-linguistic study via analysis-by-synthesis. In: Pelachaud, C., Ruttkay, Z. (eds) From Brows to Trust: Evaluating Embodied Conversational Agents, pp. 191–216. Kluwer, New York (2004)
Marsi, E., van Rooden, F.: Expressing uncertainty with a talking head in a multimodal question-answering system. In: Proceedings of the Workshop on Multimodal Output Generation (MOG 2007), pp. 117–128 (2007)
McKeown K.R.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)
Mellish C., Dale R.: Evaluation in the context of natural language generation. Comput. Speech Lang. 12(4), 349–373 (1998)
Moore, J., Foster, M.E., Lemon, O., White, M.: Generating tailored, comparative descriptions in spoken dialogue. In: Proceedings of the 17th International FLAIRS Conference (FLAIRS 2004), pp. 917–922 (2004)
Nass C., Isbister C., Lee E.: Truth is beauty: researching embodied conversational agents. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds) Embodied Conversational Agents, pp. 374–402. MIT Press, Cambridge (2000)
Neff M., Kipp M., Albrecht I., Seidel H.-P.: Gesture modeling and animation based on a probabilistic recreation of speaker style. ACM Trans. Graph. 27(1), 1–24 (2008)
Nielsen J., Levy J.: Measuring usability: preference vs. performance. Commun. ACM 37(4), 66–75 (1994)
Oviatt S.: Ten myths of multimodal interaction. Commun. ACM 42(11), 74–81 (1999)
Paris C.: Tailoring object descriptions to a user’s level of expertise. Comput. Ling. 14(3), 64–78 (1988)
Poggi I., Pelachaud C.: Performative facial expressions in animated faces. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds) Embodied Conversational Agents, pp. 154–188. MIT Press, Cambridge (2000)
Reeves B., Nass C.: The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press, Cambridge (1996)
Rehm, M., André, E.: Catch me if you can—exploring lying agents in social settings. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, pp. 937–944 (2005)
Reiter E., Dale R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)
Rich E.: User modeling via stereotypes. Cogn. Sci. 3, 329–354 (1979)
Rocha, N.F.: Evaluating automatic assignment of prosodic structure for speech synthesis. Master’s thesis, Department of Theoretical and Applied Linguistics, University of Edinburgh, (2004)
Rudnicky A.: Multimodal Dialogue Systems. In: Minker, W., Bühler, D., Dybkjær, L. (eds) Spoken Multimodal Human–Computer Dialogue in Mobile Environments, pp. 3–11. Springer, The Netherlands (2005)
Schober M.F., Clark H.H.: Understanding by addressees and overhearers. Cogn. Psychol. 21(2), 211–232 (1989)
Stent, A., Marge, M., Singhai, M.: Evaluating evaluation methods for generation in the presence of variation. In: Computational Linguistics and Intelligent Text Processing, vol. 3406/2005 of Lecture Notes in Computer Science. Springer, Heidelberg, pp. 341–351 (2005)
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Lees, A., Stere, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. (TOG) 23(3), 506–513. Special Issue: Proceedings of the 2004 SIGGRAPH Conference, (2004)
Swerts M., Krahmer E.: Facial expression and prosodic prominence: Effects of modality and facial area. J. Phon. 36(2), 219–238 (2008)
Walker M., Rambow O., Rogati M.: Training a sentence planner for spoken dialogue using boosting. Comput. Speech Lang. 16(3–4), 409–433 (2002)
Walker M., Whittaker S., Stent A., Maloor P., Moore J., Johnston M., Vasireddy G.: Generation and evaluation of user tailored responses in multimodal dialogue. Cogn. Sci. 28(5), 811–840 (2004)
White, M.: CCG Chart realization from disjunctive inputs. In: Proceedings of the 4th International Conference on Natural Language Generation (INLG-06) (2006a)
White M.: Efficient realization of coordinate structures in Combinatory Categorial Grammar. Res. Lang. Comput. 4(1), 39–75 (2006b)
White, M., Dale, R. (eds.): NAACL 2007 Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation. http://www.ling.ohio-state.edu/~mwhite/nlgeval07/ (2007)
White, M., Foster, M.E., Oberlander, J., Brown, A.: Using facial feedback to enhance turn-taking in a multimodal dialogue system. In: Proceedings of HCI International 2005, Las Vegas, vol 8 (2005)
Whittaker S., Walker M.: Evaluating dialogue strategies in multimodal dialogue systems. In: Minker, W., Bühler, D., Dybkjær, L. (eds) Spoken Multimodal Human–Computer Dialogue in Mobile Environments, pp. 247–268. Springer, The Netherlands (2005)
Winterboer, A., Moore, J.D.: Evaluating information presentation strategies for spoken recommendations. In: Proceedings of the 2007 ACM Conference on Recommender Systems, pp. 157–160 (2007)
Zukerman I., Litman D.: Natural Language Processing and User Modeling: Synergies and Limitations. User Model. User-Adapted Interact. 11(1–2), 129–159 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Foster, M.E., Oberlander, J. User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system. User Model User-Adap Inter 20, 341–381 (2010). https://doi.org/10.1007/s11257-010-9080-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11257-010-9080-6