Empirical Evaluation Methodology for Embodied Conversational Agents

Christoph, Noor

doi:10.1007/1-4020-2730-3_3

Noor Christoph

Part of the book series: Human-Computer Interaction Series ((HCIS,volume 7))

Abstract

The objective of this chapter is to identify the common knowledge and practice in research methodology and to apply it to the field of software evaluation, especially of embodied conversational agents. Relevant issues discussed are: how to formulate a good research question, what research strategy to use, which data collection methods are most appropriate and how to select the right participants. Reliability and validity of the data sets are dealt with and finally the chapter concludes with a list of guidelines that one should keep in mind when setting up and conducting empirical evaluation studies on embodied conversational agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Evaluating embodied conversational agents in multimodal interfaces

Article Open access 18 August 2015

Human Versus Machine: Contingency Factors of Anthropomorphism as a Trust-Inducing Design Strategy for Conversational Agents

Ethical Design of Conversational Agents: Towards Principles for a Value-Sensitive Design

References

Andersson, G., Höök, K., Mourão, D., Paiva, A., and Costa, M. (2002). Using a Wizard of Oz study to inform the design of Sen Toy. Personal and ubiquitous computing, 6(5–6): 378–389.
Google Scholar
Berg, B.L. (2001). Qualitative Research Methods for the Social Sciences. Allyn and Bacon, Boston.
Google Scholar
Boehm, B. (1988). The spiral model of software development and enhancement. IEEE Computer, 21(5): 61–72.
Google Scholar
Buisine, S., Abrilian, S., Rendu, C., and Martin, J. (2002). Towards experimental specification and evaluation of lifelike multimodal behaviour. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents —-let’s specify and evaluate them!, Bologna, Italy.
Google Scholar
Campbell, D.T. and Fiske, D.W. (1959). Convergent and discriminant validation by the multi trait-multi method matrix. Psychological Bulletin, 56: 81–105.
Article Google Scholar
Christoph, L.H. and Van de Sande, J.P. (1999). Werkboek gedragsob-servatie: systematisch observeren en The Observer [Workbook observing behaviour: systematic observation and The Observer]. Wolters-Noordhoff, Groningen, The Netherlands.
Google Scholar
Cohen, J.A. (1960). Coeffcient of agreement for nominal scales. Educational and Psychological measurement, 20: 37–46.
Article Google Scholar
Cronbach, L.J. (1951). Coeffcient alpha and the internal structure of tests. Psychometrika., 16: 297–334.
Article Google Scholar
Coolican, H. (1994). Research Methods and Statistics in Psychology. Hodder and Stoughton, London.
Google Scholar
Cowell, A.J. and Stanney, K.M. (2003). On manipulating nonverbal interaction style to increase anthropomorphic computer character credibility. In Proceedings of AAMAS 2003 workshop: Embodied conversational characters as individuals, Melbourne, Australia.
Google Scholar
De Furia, G.L. (1996). A behavioral model of interpersonal trust. Unpublished Ph.D. thesis. St. John’s University, Springfield, L.A., USA.
Google Scholar
Dehn, D.M. and Van Mulken, S. (2000). The impact of animated interface agents: A review of empirical research. Int. J. human-computer studies, 52(1): 1–22.
Article Google Scholar
Erdfelder, E., Faul, F., and Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, and Computers, 28: 1–11.
Article Google Scholar
Guilford, J.P. and Fruchter, B. (1978). Fundamental statistics in psychology and education. McGraw-Hill, New York.
Google Scholar
Hix, D. and Harston, H.R. (1993). Developing user interfaces: ensuring usability through product and process. Wiley, New York, USA.
MATH Google Scholar
Holm, R., Priglinger, M., Stauder, E., Volkert, J., and Wagner, R. (2002). Automatic data acquisition and visulatization for usability evaluation of virtual reaity systems. In Proceedings of Eurographics Short Presentations, Saarbrücken, Germany.
Google Scholar
Höök, K. (2002). Evaluation of affective interfaces. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents — let’s specify and evaluate them!, Bologna, Italy.
Google Scholar
Howel, D.C. (1982). Statistical methods for psychology. Duxbury Press, Boston, Mass.
Google Scholar
Johnson, R. (1988). Elementary statistics. PWS-kent publishing company, Boston.
Google Scholar
Kabel, S., De Hoog, R., and Sandberg, J. (1997). User interface evaluation and improvements: A framework and empirical results. Internal report SWI-UVA.
Google Scholar
Krahmer, E., van Buuren, S., Ruttkay, Zs., and Wesselink, W. (2003). Audio-visual personality cues for embodied agents; an experimental evaluation. In Proceedings of AAMAS 2003 workshop: Embodied conversational characters as individuals, Melbourne, Australia.
Google Scholar
Mangione, T.W. (1995). Mail surveys: Improving the quality. SAGE publications, Thousand Oakes, CA.
Book Google Scholar
Morishima, S. and Nakamura, S. (2002). Multi-modal translation and evaluation of lip-synchronization using noise added voice. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents — let’s specify and evaluate them!, Bologna, Italy.
Google Scholar
Mosteller, F. and Rourke, R.E.K. (1973). Sturdy statistics: Nonparametrics and order statistics. Addison-Wesley, Massachusetts.
MATH Google Scholar
Moundridou, M. and Virvou, M. (2002). Evaluating the persona effect on an interface agent in an intelligent tutoring system. Journal of computer assisted learning, 18(3): 253–261.
Article Google Scholar
Neter, J., Wasserman, W., and Kutner, M.H. (1990). Applied linear statistical models: regression, analysis of variance and experimental design. Irwin, Boston.
Google Scholar
Neale, J.M. and Liebert, R.M (1986). Science and behavior. An introduction to methods of research. Prentice Hall International editions, New York.
Google Scholar
Norman, D.A. (1986). Cognitive engineering. In, Norman, D.A. and Draper, S., editors. User Centered Systems Design: new perspectives on human-computer interaction, pp. 31–61, Erlbaum Associates, Hillsdale, NJ.
Google Scholar
Nielsen, J. (1993). Usability engineering. Morgan Kaufmann, San Francisco.
MATH Google Scholar
Norusis, M.J. (2002). SPSS 11.0, guide to data analysis. Prentice Hall, New Jersey.
Google Scholar
Oates, J., Gove, J., Goudge, A., Hill, R., Littleton, K., Christoph, L.H., Edwards, N., Gardner, R., Grayson, A., and Manners, P. (2000). fOCUS: a CD-ROM based application for developing observation skills. Winner of the European Academic Software Awards (EASA), November 2000, Rotterdam, The Netherlands.
Google Scholar
Preece, J., Rogers, R., Sharp, H., Benyon, D., Holland, S., and Carey, T. (1994). Human-computer interaction. Addison-Wesley, England.
Google Scholar
Reeves, T.C. and Hedberg, J.G. (2003). Interactive learning systems evaluation, Educational Technology Publications, Englewood Cliffs, NJ.
Google Scholar
Rempel, J.K. and Holmes, J.G. (1986). How do I trust thee. Psychology Today, 20: 28–34.
Google Scholar
Ruttkay, Zs., Dormann, C., and Noot, H. (2002). Evaluating ECAs — what and how?. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents — let’s specify and evaluate them!, Bologna, Italy.
Google Scholar
Sande, J.P., van de (1999). Gedragsobservatie: een inleiding tot systematisch observeren [Observing behaviour: an introduction to systematic observation]. Wolters-Noordhoff, Groningen, The Netherlands.
Google Scholar
Silverman, D. (2000). Doing qualitative research: a practical handbook. SAGE publications, London.
Google Scholar
Spradley, J.P. (1980). Participant observation. Holt Rinehart and Winston, New York.
Google Scholar
SPSS Inc. (2002). SPSS version 11.0 for Windows. SPSS Inc., Chicago IL.
Google Scholar
STATDISK (2003). STATDISK version 9.5 for Windows. Addison-Wesley, Boston.
Google Scholar
Swanborn, P.G. (1997). Basisboek social onderzoek [Handbook of social research]. Boom, Meppel, Amsterdam, The Netherlands.
Google Scholar
Triola, M.F. (2002). Essentials of statistics. Addison-Wesley, Boston.
Google Scholar
Verschuren, P. and Doorewaard, H. (1999). Designing a research project. Lemma, Utrecht, The Netherlands.
Google Scholar
Vocht, de, A. (2002). Basishandbook SPSS 11 voor Windows (Handbook SPSS 11 for Windows). Bijleveld press, Utrecht, The Netherlands.
Google Scholar
Wilkinson, J. (1995). Direct observation. In. G.M. Breakwell, S. Hammond, and C. Fife-Schaw (Eds). Research methods in psychology. London, Sage publications.
Google Scholar
Xiao, J., Stasko, J. and Catrambone, R. (2002). Embodied conversational agents as a UI paradigm: a framework for evaluation. In Proceedings of AAMAS 2002 workshop: Embodied conversational agents —let’s specify and evaluate them!, Bologna, Italy.
Google Scholar

Download references

Authors

Noor Christoph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zsófia Ruttkay Catherine Pelachaud

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Christoph, N. (2004). Empirical Evaluation Methodology for Embodied Conversational Agents. In: Ruttkay, Z., Pelachaud, C. (eds) From Brows to Trust. Human-Computer Interaction Series, vol 7. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2730-3_3

Download citation

DOI: https://doi.org/10.1007/1-4020-2730-3_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2729-1
Online ISBN: 978-1-4020-2730-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Empirical Evaluation Methodology for Embodied Conversational Agents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Evaluating embodied conversational agents in multimodal interfaces

Human Versus Machine: Contingency Factors of Anthropomorphism as a Trust-Inducing Design Strategy for Conversational Agents

Ethical Design of Conversational Agents: Towards Principles for a Value-Sensitive Design

References

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Empirical Evaluation Methodology for Embodied Conversational Agents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Evaluating embodied conversational agents in multimodal interfaces

Human Versus Machine: Contingency Factors of Anthropomorphism as a Trust-Inducing Design Strategy for Conversational Agents

Ethical Design of Conversational Agents: Towards Principles for a Value-Sensitive Design

References

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation