Skip to main content
Log in

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

  • Original Paper
  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

Tailoring the linguistic content of automatically generated descriptions to the preferences of a target user has been well demonstrated to be an effective way to produce higher-quality output that may even have a greater impact on user behaviour. It is known that the non-verbal behaviour of an embodied agent can have a significant effect on users’ responses to content presented by that agent. However, to date no-one has examined the contribution of non-verbal behaviour to the effectiveness of user tailoring in automatically generated embodied output. We describe a series of experiments designed to address this question. We begin by introducing a multimodal dialogue system designed to generate descriptions and comparisons tailored to user preferences, and demonstrate that the user-preference tailoring is detectable to an overhearer when the output is presented as synthesised speech. We then present a multimodal corpus consisting of the annotated facial expressions used by a speaker to accompany the generated tailored descriptions, and verify that the most characteristic positive and negative expressions used by that speaker are identifiable when resynthesised on an artificial talking head. Finally, we combine the corpus-derived facial displays with the tailored descriptions to test whether the addition of the non-verbal channel improves users’ ability to detect the intended tailoring, comparing two strategies for selecting the displays: one based on a simple corpus-derived rule, and one making direct use of the full corpus data. The performance of the subjects who saw displays selected by the rule-based strategy was not significantly different than that of the subjects who got only the linguistic content, while the subjects who saw the data-driven displays were significantly worse at detecting the correctly tailored output. We propose a possible explanation for this result, and also make recommendations for developers of future systems that may make use of an embodied agent to present user-tailored content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Artstein, R., Poesio, M.: Kappa3 =  alpha (or beta). Technical Report CSM-437, Department of Computer Science, University of Essex. http://www.essex.ac.uk/csee/research/publications/technicalreports/2005/csm-437.pdf (2005)

  • Baker, R., Clark, R., White, M.: Synthesizing contextually appropriate intonation in limited domains. In: Proceedings of the 5th ISCA Workshop on Speech Synthesis, Pittsburgh, PA, pp. 91–96 (2004)

  • Belz A.: That’s nice... what can you do with it?. Computational Linguistics 35(1), 111–118 (2009)

    Article  Google Scholar 

  • Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 313–320 (2006)

  • Carberry S., Chu-Carroll J., Elzer S.: Constructing and utilizing a model of user preferences in collaborative consultation dialogues. Comput. Intell. 15(3), 185–217 (1999)

    Article  Google Scholar 

  • Carenini G., Moore J.D.: Generating and evaluating evaluative arguments. Artif. Intell. 170(11), 925–952 (2006)

    Article  Google Scholar 

  • Cassell J., Sullivan J., Prevost S., Churchill E.: Embodied Conversational Agents. MIT Press, Cambridge (2000)

    Google Scholar 

  • Cassell J., Bickmore T., Vilhjálmsson H., Yan H.: More than just a pretty face: Conversational protocols and the affordances of embodiment. Knowledge-Based Syst. 14(1–2), 55–64 (2001)

    Article  Google Scholar 

  • Clark, R.A.J., Richmond, K., King, S.: Festival 2—build your own general purpose unit selection speech synthesiser. In: Proceedings of the 5th ISCA Workshop on Speech Synthesis, pp. 173–178 (2004)

  • Clemen R.T.: Making Hard Decisions: An Introduction to Decision Analysis. Duxbury Press, Belmont (1996)

    Google Scholar 

  • Davidson R.J., Irwin W.: The functional neuroanatomy of emotion and affective style. Trends Cogn. Sci. 3(1), 11–21 (1999)

    Article  Google Scholar 

  • Davidson R.J., Ekman P., Saron C., Senulis J., Friesen W.: Approach-withdrawal and cerebral asymmetry: Emotional expression and brain physiology I. J. Pers. Soc. Psychol. 58(2), 330–341 (1990)

    Article  Google Scholar 

  • DeCarlo D., Stone M., Revilla C., Venditti J.: Specifying and animating facial signals for discourse in embodied conversational agents. Comput. Anim. Virtual Worlds 15(1), 27–38 (2004)

    Article  Google Scholar 

  • Demberg, V., Moore, J.D.: Information presentation in spoken dialogue systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 65–72 (2006)

  • Edwards W., Barron F.H.: SMARTS and SMARTER: Improved simple methods for multiattribute utility management. Organ. Behav. Hum. Decis. Process. 60, 306–325 (1994)

    Article  Google Scholar 

  • Foster, M.E.: Comparing rule-based and data-driven selection of facial displays. In: Proceedings of the ACL 2007 Workshop on Embodied Language Processing, Prague, pp. 1–8 (2007a)

  • Foster, M.E.: Generating embodied descriptions tailored to user preferences. In: Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA 2007), Paris, pp. 264–271 (2007b)

  • Foster, M.E.: Automated metrics that agree with human judgements on generated output for an embodied conversational agent. In: Proceedings of The 5th International Natural Language Generation Conference (INLG 2008), Salt Fork, OH, pp. 95–103 (2008)

  • Foster, M.E., White, M.: Techniques for text planning with XSLT. In: Proceedings of the 4th Workshop on NLP and XML (NLPXML 2004), Barcelona, Spain, pp. 1–8 (2004)

  • Foster, M.E., White, M.: Assessing the impact of adaptive generation in the COMIC multimodal dialogue system. In: Proceedings of the IJCAI 2005 Workshop on Knowledge and Reasoning in Practical Dialogue Systems. Edinburgh, Scotland, pp. 24–31 (2005)

  • Gaebel W., Wölwer W.: Facial expressivity in the course of schizophrenia and depression. Eur. Arch. Psychiatry Clin. Neurosci. 254(5), 335–342 (2004)

    Google Scholar 

  • Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 1996), vol. 1. Atlanta, GA, pp. 373–376 (1996)

  • Kay, J.: Scrutable Adaptation: Because We Can and Must. In: Adaptive Hypermedia and Adaptive Web-Based Systems. Springer, Berlin, pp. 11–19 (2006)

  • Kipp, M.: Gesture generation by imitation—from human behavior to computer character animation. Dissertation.com, Boca Raton (2004)

  • Kobsa, A., Wahlster, W. (eds): User Models in Dialog Systems. Springer, Heidelberg (1989)

    MATH  Google Scholar 

  • Koller, A., Striegnitz, K., Byron, D., Cassell, J., Dale, R., Dalzel-Job, S., Moore, J., Oberlander, J.: Validating the web-based evaluation of NLG systems. In: Proceedings of the ACL-IJCNLP 2009 Conference. Singapore, pp. 301–304 (2009)

  • Krahmer E., Swerts M.: More about brows: a cross-linguistic study via analysis-by-synthesis. In: Pelachaud, C., Ruttkay, Z. (eds) From Brows to Trust: Evaluating Embodied Conversational Agents, pp. 191–216. Kluwer, New York (2004)

    Google Scholar 

  • Marsi, E., van Rooden, F.: Expressing uncertainty with a talking head in a multimodal question-answering system. In: Proceedings of the Workshop on Multimodal Output Generation (MOG 2007), pp. 117–128 (2007)

  • McKeown K.R.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)

    Google Scholar 

  • Mellish C., Dale R.: Evaluation in the context of natural language generation. Comput. Speech Lang. 12(4), 349–373 (1998)

    Article  Google Scholar 

  • Moore, J., Foster, M.E., Lemon, O., White, M.: Generating tailored, comparative descriptions in spoken dialogue. In: Proceedings of the 17th International FLAIRS Conference (FLAIRS 2004), pp. 917–922 (2004)

  • Nass C., Isbister C., Lee E.: Truth is beauty: researching embodied conversational agents. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds) Embodied Conversational Agents, pp. 374–402. MIT Press, Cambridge (2000)

    Google Scholar 

  • Neff M., Kipp M., Albrecht I., Seidel H.-P.: Gesture modeling and animation based on a probabilistic recreation of speaker style. ACM Trans. Graph. 27(1), 1–24 (2008)

    Article  Google Scholar 

  • Nielsen J., Levy J.: Measuring usability: preference vs. performance. Commun. ACM 37(4), 66–75 (1994)

    Article  Google Scholar 

  • Oviatt S.: Ten myths of multimodal interaction. Commun. ACM 42(11), 74–81 (1999)

    Article  Google Scholar 

  • Paris C.: Tailoring object descriptions to a user’s level of expertise. Comput. Ling. 14(3), 64–78 (1988)

    Google Scholar 

  • Poggi I., Pelachaud C.: Performative facial expressions in animated faces. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds) Embodied Conversational Agents, pp. 154–188. MIT Press, Cambridge (2000)

    Google Scholar 

  • Reeves B., Nass C.: The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  • Rehm, M., André, E.: Catch me if you can—exploring lying agents in social settings. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, pp. 937–944 (2005)

  • Reiter E., Dale R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  • Rich E.: User modeling via stereotypes. Cogn. Sci. 3, 329–354 (1979)

    Article  Google Scholar 

  • Rocha, N.F.: Evaluating automatic assignment of prosodic structure for speech synthesis. Master’s thesis, Department of Theoretical and Applied Linguistics, University of Edinburgh, (2004)

  • Rudnicky A.: Multimodal Dialogue Systems. In: Minker, W., Bühler, D., Dybkjær, L. (eds) Spoken Multimodal Human–Computer Dialogue in Mobile Environments, pp. 3–11. Springer, The Netherlands (2005)

    Chapter  Google Scholar 

  • Schober M.F., Clark H.H.: Understanding by addressees and overhearers. Cogn. Psychol. 21(2), 211–232 (1989)

    Article  Google Scholar 

  • Stent, A., Marge, M., Singhai, M.: Evaluating evaluation methods for generation in the presence of variation. In: Computational Linguistics and Intelligent Text Processing, vol. 3406/2005 of Lecture Notes in Computer Science. Springer, Heidelberg, pp. 341–351 (2005)

  • Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Lees, A., Stere, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. (TOG) 23(3), 506–513. Special Issue: Proceedings of the 2004 SIGGRAPH Conference, (2004)

    Google Scholar 

  • Swerts M., Krahmer E.: Facial expression and prosodic prominence: Effects of modality and facial area. J. Phon. 36(2), 219–238 (2008)

    Article  Google Scholar 

  • Walker M., Rambow O., Rogati M.: Training a sentence planner for spoken dialogue using boosting. Comput. Speech Lang. 16(3–4), 409–433 (2002)

    Article  Google Scholar 

  • Walker M., Whittaker S., Stent A., Maloor P., Moore J., Johnston M., Vasireddy G.: Generation and evaluation of user tailored responses in multimodal dialogue. Cogn. Sci. 28(5), 811–840 (2004)

    Article  Google Scholar 

  • White, M.: CCG Chart realization from disjunctive inputs. In: Proceedings of the 4th International Conference on Natural Language Generation (INLG-06) (2006a)

  • White M.: Efficient realization of coordinate structures in Combinatory Categorial Grammar. Res. Lang. Comput. 4(1), 39–75 (2006b)

    Article  MathSciNet  Google Scholar 

  • White, M., Dale, R. (eds.): NAACL 2007 Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation. http://www.ling.ohio-state.edu/~mwhite/nlgeval07/ (2007)

  • White, M., Foster, M.E., Oberlander, J., Brown, A.: Using facial feedback to enhance turn-taking in a multimodal dialogue system. In: Proceedings of HCI International 2005, Las Vegas, vol 8 (2005)

  • Whittaker S., Walker M.: Evaluating dialogue strategies in multimodal dialogue systems. In: Minker, W., Bühler, D., Dybkjær, L. (eds) Spoken Multimodal Human–Computer Dialogue in Mobile Environments, pp. 247–268. Springer, The Netherlands (2005)

    Chapter  Google Scholar 

  • Winterboer, A., Moore, J.D.: Evaluating information presentation strategies for spoken recommendations. In: Proceedings of the 2007 ACM Conference on Recommender Systems, pp. 157–160 (2007)

  • Zukerman I., Litman D.: Natural Language Processing and User Modeling: Synergies and Limitations. User Model. User-Adapted Interact. 11(1–2), 129–159 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mary Ellen Foster.

Additional information

This article integrates and extends the work described in Foster and White (2005) and Foster (2007a,b).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foster, M.E., Oberlander, J. User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system. User Model User-Adap Inter 20, 341–381 (2010). https://doi.org/10.1007/s11257-010-9080-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-010-9080-6

Keywords

Navigation