Skip to main content
Log in

Non-verbal communication strategies to improve robustness in dialogue systems: a comparative study

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

This paper explores the use of embodied conversational agents (ECAs) and their visual communicative ability to improve interaction with spoken language dialogue systems (SLDSs) through an experimental case study in the application context of secure access by speaker verification followed by remote home automation control. After identifying a set of typical interaction problems with SLDSs and associated with each of them a particular ECA gesture or behaviour, we conducted a comparative evaluation based on ITU recommendations for the evaluation of spoken dialogue systems. User tests were carried out dividing the test users into two groups, each facing a different interface setup: one with an ECA, and the other only with voice output. The ECA group encountered fewer interaction problems. Users’ impressions, however, were similar in both groups, with a slight advantage observed for the ECA group. In particular, the ECA seems to help users to better understand the flow of the dialogue and reduce confusion. Results also suggest that rejection (based on privacy and security concerns) is a dimension in its own right that may influence subjective evaluation parameters closely related to user acceptance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bell L, Gustafson J (2003) Child and adult speaker adaptation during error resolution in a publicly available spoken dialogue system. In: 8th European conference on speech communication and technology-EUROSPEECH 2003, ISCA, Geneva, Switzerland, pp 613–616

  2. Bickmore T, Cassell J (2005) Social dialogue with embodied conversational agents. In: Jan van Kuppevelt LD, Bernsen NO (eds) Advances in natural multimodal dialogue systems. Text, speech and language technology, vol 30. Springer, Berlin, pp 23–54

    Chapter  Google Scholar 

  3. Bickmore TW (2004) Unspoken rules of spoken interaction. Commun ACM 47(4):38–44. Special issue: Human-computer etiquette: managing expectations with intentional agents

    Article  Google Scholar 

  4. Bohus D, Rudnicky AI (2005) Sorry, I didn’t catch that!-an investigation of non-understanding errors and recovery strategies. In: 6th SIGdial workshop on discourse and dialogue, ISCA

  5. Bohus D, Rudnicky AI (2008) Sorry, I didn’t catch that! an investigation of non-understanding errors and recovery strategies. In: Dybkjær L, Minker W (eds) Recent trends in discourse and dialogue. Text, speech and language technology, vol 39. Springer, Amsterdam, pp 123–154

    Chapter  Google Scholar 

  6. Buisine S, Abrilian S, Martin J (2004) Evaluation of multimodal behaviour of embodied agents. In: Ruttkay Z, Pelachaud C (eds) From brows to trust: evaluating embodied conversational agents. Springer, Berlin, pp 217–238

    Google Scholar 

  7. Bulyko I, Kirchhoff K, Ostendorf M, Goldberg J (2005) Error-correction detection and response generation in a spoken dialogue system. Speech Commun 45(3):271–288

    Article  Google Scholar 

  8. Cassell J (2000) Embodied conversational agents. MIT Press, Cambridge

    Google Scholar 

  9. Cassell J (2000) Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents. In: Embodied conversational agents. MIT Press, Cambridge, pp 1–27

    Google Scholar 

  10. Cassell J, Thórisson K (1999) The power of a nod and a glance envelope vs. emotional feedback in animated conversational agents. Appl Artif Intell 13:519–538

    Article  Google Scholar 

  11. Cassell J, Bickmore T, Campbell L, Vilhjalmsson H, Yan H (2001) More than just a pretty face: Conversational protocols and the affordances of embodiment. Knowl-Based Syst 14(1–2):55–64

    Article  Google Scholar 

  12. Cassell J, Nakano Y, Bickmore T, Sidner C, Rich C (2001) Non-verbal cues for discourse structure. In: Proceedings of the 39th annual meeting on association for computational linguistics, association for computational linguistics. Morgan Kaufmann, Toulouse, pp 114–123

    Google Scholar 

  13. Catrambone R, Stasko J, Xiao J (2004) ECA as user interface paradigm. In: From brows to trust: evaluating embodied conversational agents. Springer, Berlin, pp 239–267

    Google Scholar 

  14. Cerrato L, Ekeklint S (2004) Evaluating users’ reactions to human-like interfaces: prosodic and paralinguistic features as measures of user satisfaction. In: From brows to trust: evaluating embodied conversational agents. Springer, Berlin, pp 101–124

    Google Scholar 

  15. Dehn DM, Van Mulken S (2000) The impact of animated interface agents: a review of empirical research. Int J Hum-Comput Stud 52(1):1–22

    Article  Google Scholar 

  16. Fagerberg P, Stahl A, Höök K (2003) Designing gestures for affective input: an analysis of shape, effort and valence. In: Ollila M, Rantzer M (eds) Proceedings of mobile ubiquitous and multimedia, MUM 2003. Linköping University Electronic Press, Norrköping

    Google Scholar 

  17. Goldberg J, Ostendorf M, Kirchhoff K (2003) The impact of response wording in error correction subdialogs. In: ISCA tutorial and research workshop on error handling in spoken dialogue systems. ISCA, Citeseer, pp 101–106

    Google Scholar 

  18. Hartmann B, Mancini M, Buisine S, Pelachaud C (2005) Design and evaluation of expressive gesture synthesis for embodied conversational agents. In: Proceedings of the 4th international joint conference on autonomous agents and multiagent systems. ACM, Association for Computational Linguistics, The Netherlands, pp 1095–1096

  19. Hone K (2005) Animated agents to reduce user frustration. In: The 19th British HCI group annual conference, Edinburgh

  20. Hone KS, Graham R (2001) Towards a tool for the subjective assessment of speech system interfaces (SASSI). Nat Lang Eng 6(3–4):287–303

    Google Scholar 

  21. Rec ITU-T P851 (2003) Subjective quality evaluation of telephone services based on spoken dialogue systems. International recommendation, International Telecommunication Union

  22. Suppl ITU-T 24 to P-Series Rec (2005) Parameters describing the interaction with spoken dialogue systems. International recommendation, International Telecommunication Union

  23. Kendon A (1990) Conducting interaction: patterns of behaviour in focused encounters. Cambridge University Press, Cambridge

    Google Scholar 

  24. Lamel L, Bennacef S, Gauvain J, Dartigues H, Temem J (1998) User evaluation of the MASK kiosk. In: Fifth international conference on spoken language processing, Citeseer

  25. Lamel L, Rosset S, Gauvain J, Bennacef S, Garnier-Rizet M, Prouts B (2000) The LIMSI ARISE system. Speech Commun 31(4):339–354

    Article  Google Scholar 

  26. Lee J, DeVault D, Marsella S, Traum D (2008) Thoughts on FML: Behavior generation in the virtual human communication architecture. In: Why conversational agents do what they do. Functional Representations for Generating Conversational Agent Behavior. AAMAS 2008, Estoril, Portugal

  27. Lester JC, Converse SA, Kahler SE, Barlow BA, Stone ST, Bhogal RS (1997) The persona effect: affective impact of animated pedagogical agents. In: Pemberton S (ed) Proceedings of the SIGCHI conference on human factors in computing systems, 1997, Atlanta, Georgia, pp 359–366

  28. López-Mencía B, Hernández-Trapote A, Díaz-Pardo D, Fernández-Pozo R, Hernández-Gómez L, Torre Toledano D (2007) Design and validation of eca gestures to improve dialogue system robustness. In: Proceedings of the ACL 2007 workshop on embodied language processing. Association for Computational Linguistics, Prague, Czech Republic, pp 67–74

  29. ter Maat M, Heylen D (2009) Turn management or impression management? In: Proceedings of 9th international conference on intelligent virtual agents, IVA 2009. Springer, Berlin, pp 467–473

    Google Scholar 

  30. McTear M (2008) Handling miscommunication: Why bother? In: Dybkjær L, Minker W (eds) Recent trends in discourse and dialogue. Springer, Amsterdam, pp 101–122.

    Chapter  Google Scholar 

  31. Möller S, Smeele P, Boland H, Krebber J (2007) Evaluating spoken dialogue systems according to de-facto standards: A case study. Comput Speech Lang 21(1):26–53

    Article  Google Scholar 

  32. Noor C (2004) Empirical evaluation methodology for embodied conversational agents. In: From brows to trust: evaluating embodied conversational agents. Kluwer, Dordrecht, pp 67–99

    Google Scholar 

  33. Oviatt S, Adams B (2000) Designing and evaluating conversational interfaces with animated characters. In: Cassell J, Sullivan J, Churchill EF (eds) Embodied conversational agents. MIT Press, Cambridge, pp 319–345

    Google Scholar 

  34. Oviatt S, VanGent R (1996) Error resolution during multimodal human-computer interaction. In: Proceedings of the fourth international conference on spoken language processing, vol 1. Institute of Electrical & Electronics Engineers, pp 204–207

  35. Oviatt S, MacEachern M, Levow G (1998) Predicting hyperarticulate speech during human-computer error resolution. Speech Commun 24(2):87–110

    Article  Google Scholar 

  36. Pelachaud C (2003) Overview of representation languages for ECAs. Project reports, Paris VIII, IUT Montreal

  37. San-Segundo R, Montero J, Ferreiros J, Córdoba R, Pardo J (2001) Designing confirmation mechanisms and error recover techniques in a railway information system for Spanish. In: Proceedings of the second SIGdial workshop on discourse and dialogue, vol 16. Association for Computational Linguistics, Association for Computational Linguistics, Aalborg, pp 136–139

    Google Scholar 

  38. Schaumburg H (2001) Computers as tools or as social actors?—the users’ perspective on anthropomorphic agents. Int J Coop Inf Syst 10(1–2):217–234

    Article  Google Scholar 

  39. Stevens JP (1992) Applied multivariate statistics for the social sciences. Lawrence Erlbaum, Las Vegas

    Google Scholar 

  40. Walker M, Whittaker S, Stent A, Maloor P, Moore J, Johnston M, Vasireddy G (2004) Generation and evaluation of user tailored responses in multimodal dialogue. Cogn Sci 28(5):811–840

    Article  Google Scholar 

  41. Walker MA, Litman DJ, Kamm CA, Abella A (1997) Paradise: a framework for evaluating spoken dialogue agents. In: Proceedings of the 35th annual meeting of the association for computational linguistics (ACL-97). Association for Computational Linguistics, Madrid, pp 271–280

    Chapter  Google Scholar 

  42. Weiss B, Kühnel C, Wechsung I, Fagel S, Möller S (2010) Quality of talking heads in different interaction and media contexts. Speech Comm 52(6):481–492

    Article  Google Scholar 

  43. White M, Foster M, Oberlander J, Brown A (2005) Using facial feedback to enhance turn-taking in a multimodal dialogue system. In: Proceedings of HCI international, vol 2. Lawrence Erlbaum Associates, Inc, Las Vegas

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beatriz L. Mencia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pardo, D., Mencia, B.L., Trapote, Á.H. et al. Non-verbal communication strategies to improve robustness in dialogue systems: a comparative study. J Multimodal User Interfaces 3, 285–297 (2009). https://doi.org/10.1007/s12193-010-0052-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-010-0052-2

Keywords

Navigation