Skip to main content

Empirical Evaluation Methodology for Embodied Conversational Agents

On Conducting Evaluation Studies

  • Chapter
From Brows to Trust

Part of the book series: Human-Computer Interaction Series ((HCIS,volume 7))

Abstract

The objective of this chapter is to identify the common knowledge and practice in research methodology and to apply it to the field of software evaluation, especially of embodied conversational agents. Relevant issues discussed are: how to formulate a good research question, what research strategy to use, which data collection methods are most appropriate and how to select the right participants. Reliability and validity of the data sets are dealt with and finally the chapter concludes with a list of guidelines that one should keep in mind when setting up and conducting empirical evaluation studies on embodied conversational agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Andersson, G., Höök, K., Mourão, D., Paiva, A., and Costa, M. (2002). Using a Wizard of Oz study to inform the design of Sen Toy. Personal and ubiquitous computing, 6(5–6): 378–389.

    Google Scholar 

  • Berg, B.L. (2001). Qualitative Research Methods for the Social Sciences. Allyn and Bacon, Boston.

    Google Scholar 

  • Boehm, B. (1988). The spiral model of software development and enhancement. IEEE Computer, 21(5): 61–72.

    Google Scholar 

  • Buisine, S., Abrilian, S., Rendu, C., and Martin, J. (2002). Towards experimental specification and evaluation of lifelike multimodal behaviour. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents —-let’s specify and evaluate them!, Bologna, Italy.

    Google Scholar 

  • Campbell, D.T. and Fiske, D.W. (1959). Convergent and discriminant validation by the multi trait-multi method matrix. Psychological Bulletin, 56: 81–105.

    Article  Google Scholar 

  • Christoph, L.H. and Van de Sande, J.P. (1999). Werkboek gedragsob-servatie: systematisch observeren en The Observer [Workbook observing behaviour: systematic observation and The Observer]. Wolters-Noordhoff, Groningen, The Netherlands.

    Google Scholar 

  • Cohen, J.A. (1960). Coeffcient of agreement for nominal scales. Educational and Psychological measurement, 20: 37–46.

    Article  Google Scholar 

  • Cronbach, L.J. (1951). Coeffcient alpha and the internal structure of tests. Psychometrika., 16: 297–334.

    Article  Google Scholar 

  • Coolican, H. (1994). Research Methods and Statistics in Psychology. Hodder and Stoughton, London.

    Google Scholar 

  • Cowell, A.J. and Stanney, K.M. (2003). On manipulating nonverbal interaction style to increase anthropomorphic computer character credibility. In Proceedings of AAMAS 2003 workshop: Embodied conversational characters as individuals, Melbourne, Australia.

    Google Scholar 

  • De Furia, G.L. (1996). A behavioral model of interpersonal trust. Unpublished Ph.D. thesis. St. John’s University, Springfield, L.A., USA.

    Google Scholar 

  • Dehn, D.M. and Van Mulken, S. (2000). The impact of animated interface agents: A review of empirical research. Int. J. human-computer studies, 52(1): 1–22.

    Article  Google Scholar 

  • Erdfelder, E., Faul, F., and Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, and Computers, 28: 1–11.

    Article  Google Scholar 

  • Guilford, J.P. and Fruchter, B. (1978). Fundamental statistics in psychology and education. McGraw-Hill, New York.

    Google Scholar 

  • Hix, D. and Harston, H.R. (1993). Developing user interfaces: ensuring usability through product and process. Wiley, New York, USA.

    MATH  Google Scholar 

  • Holm, R., Priglinger, M., Stauder, E., Volkert, J., and Wagner, R. (2002). Automatic data acquisition and visulatization for usability evaluation of virtual reaity systems. In Proceedings of Eurographics Short Presentations, Saarbrücken, Germany.

    Google Scholar 

  • Höök, K. (2002). Evaluation of affective interfaces. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents — let’s specify and evaluate them!, Bologna, Italy.

    Google Scholar 

  • Howel, D.C. (1982). Statistical methods for psychology. Duxbury Press, Boston, Mass.

    Google Scholar 

  • Johnson, R. (1988). Elementary statistics. PWS-kent publishing company, Boston.

    Google Scholar 

  • Kabel, S., De Hoog, R., and Sandberg, J. (1997). User interface evaluation and improvements: A framework and empirical results. Internal report SWI-UVA.

    Google Scholar 

  • Krahmer, E., van Buuren, S., Ruttkay, Zs., and Wesselink, W. (2003). Audio-visual personality cues for embodied agents; an experimental evaluation. In Proceedings of AAMAS 2003 workshop: Embodied conversational characters as individuals, Melbourne, Australia.

    Google Scholar 

  • Mangione, T.W. (1995). Mail surveys: Improving the quality. SAGE publications, Thousand Oakes, CA.

    Book  Google Scholar 

  • Morishima, S. and Nakamura, S. (2002). Multi-modal translation and evaluation of lip-synchronization using noise added voice. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents — let’s specify and evaluate them!, Bologna, Italy.

    Google Scholar 

  • Mosteller, F. and Rourke, R.E.K. (1973). Sturdy statistics: Nonparametrics and order statistics. Addison-Wesley, Massachusetts.

    MATH  Google Scholar 

  • Moundridou, M. and Virvou, M. (2002). Evaluating the persona effect on an interface agent in an intelligent tutoring system. Journal of computer assisted learning, 18(3): 253–261.

    Article  Google Scholar 

  • Neter, J., Wasserman, W., and Kutner, M.H. (1990). Applied linear statistical models: regression, analysis of variance and experimental design. Irwin, Boston.

    Google Scholar 

  • Neale, J.M. and Liebert, R.M (1986). Science and behavior. An introduction to methods of research. Prentice Hall International editions, New York.

    Google Scholar 

  • Norman, D.A. (1986). Cognitive engineering. In, Norman, D.A. and Draper, S., editors. User Centered Systems Design: new perspectives on human-computer interaction, pp. 31–61, Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  • Nielsen, J. (1993). Usability engineering. Morgan Kaufmann, San Francisco.

    MATH  Google Scholar 

  • Norusis, M.J. (2002). SPSS 11.0, guide to data analysis. Prentice Hall, New Jersey.

    Google Scholar 

  • Oates, J., Gove, J., Goudge, A., Hill, R., Littleton, K., Christoph, L.H., Edwards, N., Gardner, R., Grayson, A., and Manners, P. (2000). fOCUS: a CD-ROM based application for developing observation skills. Winner of the European Academic Software Awards (EASA), November 2000, Rotterdam, The Netherlands.

    Google Scholar 

  • Preece, J., Rogers, R., Sharp, H., Benyon, D., Holland, S., and Carey, T. (1994). Human-computer interaction. Addison-Wesley, England.

    Google Scholar 

  • Reeves, T.C. and Hedberg, J.G. (2003). Interactive learning systems evaluation, Educational Technology Publications, Englewood Cliffs, NJ.

    Google Scholar 

  • Rempel, J.K. and Holmes, J.G. (1986). How do I trust thee. Psychology Today, 20: 28–34.

    Google Scholar 

  • Ruttkay, Zs., Dormann, C., and Noot, H. (2002). Evaluating ECAs — what and how?. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents — let’s specify and evaluate them!, Bologna, Italy.

    Google Scholar 

  • Sande, J.P., van de (1999). Gedragsobservatie: een inleiding tot systematisch observeren [Observing behaviour: an introduction to systematic observation]. Wolters-Noordhoff, Groningen, The Netherlands.

    Google Scholar 

  • Silverman, D. (2000). Doing qualitative research: a practical handbook. SAGE publications, London.

    Google Scholar 

  • Spradley, J.P. (1980). Participant observation. Holt Rinehart and Winston, New York.

    Google Scholar 

  • SPSS Inc. (2002). SPSS version 11.0 for Windows. SPSS Inc., Chicago IL.

    Google Scholar 

  • STATDISK (2003). STATDISK version 9.5 for Windows. Addison-Wesley, Boston.

    Google Scholar 

  • Swanborn, P.G. (1997). Basisboek social onderzoek [Handbook of social research]. Boom, Meppel, Amsterdam, The Netherlands.

    Google Scholar 

  • Triola, M.F. (2002). Essentials of statistics. Addison-Wesley, Boston.

    Google Scholar 

  • Verschuren, P. and Doorewaard, H. (1999). Designing a research project. Lemma, Utrecht, The Netherlands.

    Google Scholar 

  • Vocht, de, A. (2002). Basishandbook SPSS 11 voor Windows (Handbook SPSS 11 for Windows). Bijleveld press, Utrecht, The Netherlands.

    Google Scholar 

  • Wilkinson, J. (1995). Direct observation. In. G.M. Breakwell, S. Hammond, and C. Fife-Schaw (Eds). Research methods in psychology. London, Sage publications.

    Google Scholar 

  • Xiao, J., Stasko, J. and Catrambone, R. (2002). Embodied conversational agents as a UI paradigm: a framework for evaluation. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents —let’s specify and evaluate them!, Bologna, Italy.

    Google Scholar 

Download references

Authors

Editor information

Zsófia Ruttkay Catherine Pelachaud

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Kluwer Academic Publishers

About this chapter

Cite this chapter

Christoph, N. (2004). Empirical Evaluation Methodology for Embodied Conversational Agents. In: Ruttkay, Z., Pelachaud, C. (eds) From Brows to Trust. Human-Computer Interaction Series, vol 7. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2730-3_3

Download citation

  • DOI: https://doi.org/10.1007/1-4020-2730-3_3

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-2729-1

  • Online ISBN: 978-1-4020-2730-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics