Skip to main content

Semi-formal Evaluation of Conversational Characters

  • Chapter
Languages: From Formal to Natural

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5533))

Abstract

Conversational dialogue systems cannot be evaluated in a fully formal manner, because dialogue is heavily dependent on context and current dialogue theory is not precise enough to specify a target output ahead of time. Instead, we evaluate dialogue systems in a semi-formal manner, using human judges to rate the coherence of a conversational character and correlating these judgments with measures extracted from within the system. We present a series of three evaluations of a single conversational character over the course of a year, demonstrating how this kind of evaluation helps bring about an improvement in overall dialogue coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  2. Levin, E., Pieraccini, R., Eckert, W.: A stochastic model of human–machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing 8(1), 11–23 (2000)

    Article  Google Scholar 

  3. Walker, M.A.: An application of reinforcement learning to dialogue strategy selection in a spoked dialogue system for email. Journal of Artificial Intelligence Research 12, 387–416 (2000)

    MATH  Google Scholar 

  4. Leuski, A., Patel, R., Traum, D., Kennedy, B.: Building effective question answering characters. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, Sydney, Australia, Association for Computational Linguistics, July 2006, pp. 18–27 (2006)

    Google Scholar 

  5. Leuski, A., Traum, D.: A statistical approach for text processing in virtual humans. In: 26th Army Science Conference, Orlando, Florida (December 2008)

    Google Scholar 

  6. Artstein, R., Gandhe, S., Leuski, A., Traum, D.: Field testing of an interactive question-answering character. In: ELRA Workshop on Evaluation, Marrakech, Morocco, May 2008, pp. 36–40 (2008)

    Google Scholar 

  7. Artstein, R., Cannon, J., Gandhe, S., Gerten, J., Henderer, J., Leuski, A., Traum, D.: Coherence of off-topic responses for a virtual character. In: 26th Army Science Conference, Orlando, Florida (December 2008)

    Google Scholar 

  8. Ai, H., Raux, A., Bohus, D., Eskenazi, M., Litman, D.: Comparing spoken dialog corpora collected with recruited subjects versus real users. In: Keizer, S., Bunt, H., Paek, T. (eds.) Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, Antwerp, Belgium, September 2007, pp. 124–131. Association for Computational Linguistics (2007)

    Google Scholar 

  9. Robinson, S., Traum, D., Ittycheriah, M., Henderer, J.: What would you ask a conversational agent? Observations of human-agent dialogues in a museum setting. In: LREC 2008 Proceedings, Marrakech, Morocco (May 2008)

    Google Scholar 

  10. Patel, R., Leuski, A., Traum, D.: Dealing with out of domain questions in virtual characters. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS, vol. 4133, pp. 121–131. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, ch. 12, pp. 129–154. Sage, Beverly Hills (1980)

    Google Scholar 

  12. Siegel, S., Castellan Jr., N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn., ch. 9.8, pp. 284–291. McGraw-Hill, New York (1988)

    Google Scholar 

  13. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4), 555–596 (2008)

    Article  Google Scholar 

  14. Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Communication Methods and Measures 1(1), 77–89 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Artstein, R., Gandhe, S., Gerten, J., Leuski, A., Traum, D. (2009). Semi-formal Evaluation of Conversational Characters. In: Grumberg, O., Kaminski, M., Katz, S., Wintner, S. (eds) Languages: From Formal to Natural. Lecture Notes in Computer Science, vol 5533. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01748-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01748-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01747-6

  • Online ISBN: 978-3-642-01748-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics