User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

Foster, Mary Ellen; Oberlander, Jon

doi:10.1007/s11257-010-9080-6

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

Original Paper
Published: 19 October 2010

Volume 20, pages 341–381, (2010)
Cite this article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Mary Ellen Foster¹ &
Jon Oberlander²

300 Accesses
4 Citations
Explore all metrics

Abstract

Tailoring the linguistic content of automatically generated descriptions to the preferences of a target user has been well demonstrated to be an effective way to produce higher-quality output that may even have a greater impact on user behaviour. It is known that the non-verbal behaviour of an embodied agent can have a significant effect on users’ responses to content presented by that agent. However, to date no-one has examined the contribution of non-verbal behaviour to the effectiveness of user tailoring in automatically generated embodied output. We describe a series of experiments designed to address this question. We begin by introducing a multimodal dialogue system designed to generate descriptions and comparisons tailored to user preferences, and demonstrate that the user-preference tailoring is detectable to an overhearer when the output is presented as synthesised speech. We then present a multimodal corpus consisting of the annotated facial expressions used by a speaker to accompany the generated tailored descriptions, and verify that the most characteristic positive and negative expressions used by that speaker are identifiable when resynthesised on an artificial talking head. Finally, we combine the corpus-derived facial displays with the tailored descriptions to test whether the addition of the non-verbal channel improves users’ ability to detect the intended tailoring, comparing two strategies for selecting the displays: one based on a simple corpus-derived rule, and one making direct use of the full corpus data. The performance of the subjects who saw displays selected by the rule-based strategy was not significantly different than that of the subjects who got only the linguistic content, while the subjects who saw the data-driven displays were significantly worse at detecting the correctly tailored output. We propose a possible explanation for this result, and also make recommendations for developers of future systems that may make use of an embodied agent to present user-tailored content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Towards Reasoned Modality Selection in an Embodied Conversation Agent

Enabling Embodied Conversational Agents to Respond to Nonverbal Behavior of the Communication Partner

References

Artstein, R., Poesio, M.: Kappa³ = alpha (or beta). Technical Report CSM-437, Department of Computer Science, University of Essex. http://www.essex.ac.uk/csee/research/publications/technicalreports/2005/csm-437.pdf (2005)
Baker, R., Clark, R., White, M.: Synthesizing contextually appropriate intonation in limited domains. In: Proceedings of the 5th ISCA Workshop on Speech Synthesis, Pittsburgh, PA, pp. 91–96 (2004)
Belz A.: That’s nice... what can you do with it?. Computational Linguistics 35(1), 111–118 (2009)
Article Google Scholar
Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 313–320 (2006)
Carberry S., Chu-Carroll J., Elzer S.: Constructing and utilizing a model of user preferences in collaborative consultation dialogues. Comput. Intell. 15(3), 185–217 (1999)
Article Google Scholar
Carenini G., Moore J.D.: Generating and evaluating evaluative arguments. Artif. Intell. 170(11), 925–952 (2006)
Article Google Scholar
Cassell J., Sullivan J., Prevost S., Churchill E.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Google Scholar
Cassell J., Bickmore T., Vilhjálmsson H., Yan H.: More than just a pretty face: Conversational protocols and the affordances of embodiment. Knowledge-Based Syst. 14(1–2), 55–64 (2001)
Article Google Scholar
Clark, R.A.J., Richmond, K., King, S.: Festival 2—build your own general purpose unit selection speech synthesiser. In: Proceedings of the 5th ISCA Workshop on Speech Synthesis, pp. 173–178 (2004)
Clemen R.T.: Making Hard Decisions: An Introduction to Decision Analysis. Duxbury Press, Belmont (1996)
Google Scholar
Davidson R.J., Irwin W.: The functional neuroanatomy of emotion and affective style. Trends Cogn. Sci. 3(1), 11–21 (1999)
Article Google Scholar
Davidson R.J., Ekman P., Saron C., Senulis J., Friesen W.: Approach-withdrawal and cerebral asymmetry: Emotional expression and brain physiology I. J. Pers. Soc. Psychol. 58(2), 330–341 (1990)
Article Google Scholar
DeCarlo D., Stone M., Revilla C., Venditti J.: Specifying and animating facial signals for discourse in embodied conversational agents. Comput. Anim. Virtual Worlds 15(1), 27–38 (2004)
Article Google Scholar
Demberg, V., Moore, J.D.: Information presentation in spoken dialogue systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 65–72 (2006)
Edwards W., Barron F.H.: SMARTS and SMARTER: Improved simple methods for multiattribute utility management. Organ. Behav. Hum. Decis. Process. 60, 306–325 (1994)
Article Google Scholar
Foster, M.E.: Comparing rule-based and data-driven selection of facial displays. In: Proceedings of the ACL 2007 Workshop on Embodied Language Processing, Prague, pp. 1–8 (2007a)
Foster, M.E.: Generating embodied descriptions tailored to user preferences. In: Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA 2007), Paris, pp. 264–271 (2007b)
Foster, M.E.: Automated metrics that agree with human judgements on generated output for an embodied conversational agent. In: Proceedings of The 5th International Natural Language Generation Conference (INLG 2008), Salt Fork, OH, pp. 95–103 (2008)
Foster, M.E., White, M.: Techniques for text planning with XSLT. In: Proceedings of the 4th Workshop on NLP and XML (NLPXML 2004), Barcelona, Spain, pp. 1–8 (2004)
Foster, M.E., White, M.: Assessing the impact of adaptive generation in the COMIC multimodal dialogue system. In: Proceedings of the IJCAI 2005 Workshop on Knowledge and Reasoning in Practical Dialogue Systems. Edinburgh, Scotland, pp. 24–31 (2005)
Gaebel W., Wölwer W.: Facial expressivity in the course of schizophrenia and depression. Eur. Arch. Psychiatry Clin. Neurosci. 254(5), 335–342 (2004)
Google Scholar
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 1996), vol. 1. Atlanta, GA, pp. 373–376 (1996)
Kay, J.: Scrutable Adaptation: Because We Can and Must. In: Adaptive Hypermedia and Adaptive Web-Based Systems. Springer, Berlin, pp. 11–19 (2006)
Kipp, M.: Gesture generation by imitation—from human behavior to computer character animation. Dissertation.com, Boca Raton (2004)
Kobsa, A., Wahlster, W. (eds): User Models in Dialog Systems. Springer, Heidelberg (1989)
MATH Google Scholar
Koller, A., Striegnitz, K., Byron, D., Cassell, J., Dale, R., Dalzel-Job, S., Moore, J., Oberlander, J.: Validating the web-based evaluation of NLG systems. In: Proceedings of the ACL-IJCNLP 2009 Conference. Singapore, pp. 301–304 (2009)
Krahmer E., Swerts M.: More about brows: a cross-linguistic study via analysis-by-synthesis. In: Pelachaud, C., Ruttkay, Z. (eds) From Brows to Trust: Evaluating Embodied Conversational Agents, pp. 191–216. Kluwer, New York (2004)
Google Scholar
Marsi, E., van Rooden, F.: Expressing uncertainty with a talking head in a multimodal question-answering system. In: Proceedings of the Workshop on Multimodal Output Generation (MOG 2007), pp. 117–128 (2007)
McKeown K.R.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)
Google Scholar
Mellish C., Dale R.: Evaluation in the context of natural language generation. Comput. Speech Lang. 12(4), 349–373 (1998)
Article Google Scholar
Moore, J., Foster, M.E., Lemon, O., White, M.: Generating tailored, comparative descriptions in spoken dialogue. In: Proceedings of the 17th International FLAIRS Conference (FLAIRS 2004), pp. 917–922 (2004)
Nass C., Isbister C., Lee E.: Truth is beauty: researching embodied conversational agents. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds) Embodied Conversational Agents, pp. 374–402. MIT Press, Cambridge (2000)
Google Scholar
Neff M., Kipp M., Albrecht I., Seidel H.-P.: Gesture modeling and animation based on a probabilistic recreation of speaker style. ACM Trans. Graph. 27(1), 1–24 (2008)
Article Google Scholar
Nielsen J., Levy J.: Measuring usability: preference vs. performance. Commun. ACM 37(4), 66–75 (1994)
Article Google Scholar
Oviatt S.: Ten myths of multimodal interaction. Commun. ACM 42(11), 74–81 (1999)
Article Google Scholar
Paris C.: Tailoring object descriptions to a user’s level of expertise. Comput. Ling. 14(3), 64–78 (1988)
Google Scholar
Poggi I., Pelachaud C.: Performative facial expressions in animated faces. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds) Embodied Conversational Agents, pp. 154–188. MIT Press, Cambridge (2000)
Google Scholar
Reeves B., Nass C.: The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press, Cambridge (1996)
Google Scholar
Rehm, M., André, E.: Catch me if you can—exploring lying agents in social settings. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, pp. 937–944 (2005)
Reiter E., Dale R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)
Google Scholar
Rich E.: User modeling via stereotypes. Cogn. Sci. 3, 329–354 (1979)
Article Google Scholar
Rocha, N.F.: Evaluating automatic assignment of prosodic structure for speech synthesis. Master’s thesis, Department of Theoretical and Applied Linguistics, University of Edinburgh, (2004)
Rudnicky A.: Multimodal Dialogue Systems. In: Minker, W., Bühler, D., Dybkjær, L. (eds) Spoken Multimodal Human–Computer Dialogue in Mobile Environments, pp. 3–11. Springer, The Netherlands (2005)
Chapter Google Scholar
Schober M.F., Clark H.H.: Understanding by addressees and overhearers. Cogn. Psychol. 21(2), 211–232 (1989)
Article Google Scholar
Stent, A., Marge, M., Singhai, M.: Evaluating evaluation methods for generation in the presence of variation. In: Computational Linguistics and Intelligent Text Processing, vol. 3406/2005 of Lecture Notes in Computer Science. Springer, Heidelberg, pp. 341–351 (2005)
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Lees, A., Stere, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. (TOG) 23(3), 506–513. Special Issue: Proceedings of the 2004 SIGGRAPH Conference, (2004)
Google Scholar
Swerts M., Krahmer E.: Facial expression and prosodic prominence: Effects of modality and facial area. J. Phon. 36(2), 219–238 (2008)
Article Google Scholar
Walker M., Rambow O., Rogati M.: Training a sentence planner for spoken dialogue using boosting. Comput. Speech Lang. 16(3–4), 409–433 (2002)
Article Google Scholar
Walker M., Whittaker S., Stent A., Maloor P., Moore J., Johnston M., Vasireddy G.: Generation and evaluation of user tailored responses in multimodal dialogue. Cogn. Sci. 28(5), 811–840 (2004)
Article Google Scholar
White, M.: CCG Chart realization from disjunctive inputs. In: Proceedings of the 4th International Conference on Natural Language Generation (INLG-06) (2006a)
White M.: Efficient realization of coordinate structures in Combinatory Categorial Grammar. Res. Lang. Comput. 4(1), 39–75 (2006b)
Article MathSciNet Google Scholar
White, M., Dale, R. (eds.): NAACL 2007 Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation. http://www.ling.ohio-state.edu/~mwhite/nlgeval07/ (2007)
White, M., Foster, M.E., Oberlander, J., Brown, A.: Using facial feedback to enhance turn-taking in a multimodal dialogue system. In: Proceedings of HCI International 2005, Las Vegas, vol 8 (2005)
Whittaker S., Walker M.: Evaluating dialogue strategies in multimodal dialogue systems. In: Minker, W., Bühler, D., Dybkjær, L. (eds) Spoken Multimodal Human–Computer Dialogue in Mobile Environments, pp. 247–268. Springer, The Netherlands (2005)
Chapter Google Scholar
Winterboer, A., Moore, J.D.: Evaluating information presentation strategies for spoken recommendations. In: Proceedings of the 2007 ACM Conference on Recommender Systems, pp. 157–160 (2007)
Zukerman I., Litman D.: Natural Language Processing and User Modeling: Synergies and Limitations. User Model. User-Adapted Interact. 11(1–2), 129–159 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, Scotland, UK
Mary Ellen Foster
School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, Scotland, UK
Jon Oberlander

Authors

Mary Ellen Foster
View author publications
You can also search for this author in PubMed Google Scholar
Jon Oberlander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mary Ellen Foster.

Additional information

This article integrates and extends the work described in Foster and White (2005) and Foster (2007a,b).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foster, M.E., Oberlander, J. User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system. User Model User-Adap Inter 20, 341–381 (2010). https://doi.org/10.1007/s11257-010-9080-6

Download citation

Received: 01 February 2010
Accepted: 24 September 2010
Published: 19 October 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11257-010-9080-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

Abstract

Access this article

Similar content being viewed by others

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Towards Reasoned Modality Selection in an Embodied Conversation Agent

Enabling Embodied Conversational Agents to Respond to Nonverbal Behavior of the Communication Partner

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

Abstract

Access this article

Similar content being viewed by others

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Towards Reasoned Modality Selection in an Embodied Conversation Agent

Enabling Embodied Conversational Agents to Respond to Nonverbal Behavior of the Communication Partner

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation