Skip to main content

How to Evaluate a Good Conversation? An Evaluation Framework for Chat Experience in Smart Home

  • Conference paper
  • First Online:
Human-Computer Interaction. Theory, Methods and Tools (HCII 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12762))

Included in the following conference series:

  • 3580 Accesses

Abstract

With the development of artificial intelligence technology, more and more smart devices equipped with smart conversational agents, which can engage in chat or free conversation with human. However, the human-machine chat is still in the early stage of development, and there is a lack of effective methods to evaluate chat experience. In this study we proposed a framework to evaluate chat experience with smart conversational agents in smart home. Firstly, we collected evaluation metrics, and then applied them in the first user test and optimized the metrics and constructed an evaluation system. Finally, we carried out the second user test to validate the evaluation system with SEM. The results indicated that the evaluation system had good reliability, validity and internal consistency, which can be used to evaluate the user experience of smart conversational agents’ chat-oriented dialogue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Banchs, E., Li, H.: IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 System Demonstrations, pp. 37–42 (2012)

    Google Scholar 

  2. Busemann, S., Declerck, T., Diagne, A., Dini, L., Klein, J., Schmeier, S.: Natural language dialogue service for appointment scheduling agents. In: Proceedings of the 5th Conference on Applied NLP, pp. 25–32 (1997)

    Google Scholar 

  3. Seneff, S., Polifroni, J.: Dialogue management in the Mercury flight reservation system. In: Proceedings of the ANLP-NAACL 2000 Workshop on Conversational Systems, pp. 11–16 (2000)

    Google Scholar 

  4. Stallard, D.: Talk’n’travel: a conversational system for air travel planning. In: Proceedings of the 6th Conference on Applied NLP, pp. 68–75 (2000)

    Google Scholar 

  5. Bordes, A., Boureau, Y.-L.,Weston, J.: Learning end-to-end goal-oriented dialog. In: Proceedings of ICLR (2017)

    Google Scholar 

  6. Yu, Z., Papangelis, A., Rudnicky, A.: Ticktock: a non-goal-oriented multimodal dialog system with engagement awareness. In: 2015 AAAI Spring Symposium Series, pp. 108–111 (2015)

    Google Scholar 

  7. Yu, Z., Xu, Z., Black, A.W., Rudnicky, A.I.: Strategy and policy learning for non-task-oriented conversational systems. In: Proceedings of the SIGDIAL 2016 Conference, pp. 404–412 (2016)

    Google Scholar 

  8. Bickmore, T., Cassell, J.: Relational agents: a model and implementation of building user trust. In:Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 396–403 (2001)

    Google Scholar 

  9. Wilcock, G., Jokinen, K.: WikiTalk human-robot interactions. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 73–74 (2013)

    Google Scholar 

  10. Bang, J., Noh, H., Kim, Y., Lee, G.G.: Example-based chat-oriented dialogue system with personalized long-term memory. In: 2015 International Conference on Big Data and Smart Computing (BIGCOMP), pp. 238–243 (2015)

    Google Scholar 

  11. Deriu, J., et al.: Survey on evaluation methods for dialogue systems. Artif. Intell. Rev. 54(1), 755–810 (2020). https://doi.org/10.1007/s10462-020-09866-x

    Article  Google Scholar 

  12. Hassenzahl, M., et al.: The Thing and I: understanding the relationship between user and product. In: Blythe, M.A., Overbeeke, K., Monk, A.F., Wright, P.C. (eds.) Funology. HCIS, pp. 31–42. Springer, Cham (2004). https://doi.org/10.1007/1-4020-2967-5_4

    Chapter  Google Scholar 

  13. Radziwill, N.M., Benton, M.C.: Evaluating Quality of Chatbots and Intelligent Conversational Agents. Computing Research Repository (CoRR), pp. 1–21 (2017)

    Google Scholar 

  14. Zhou, L., Gao, J., Li, D., Shum, H.-Y.: The design and implementation of xiaoice, an empathetic social chatbot. Comput. Linguist. 46(1), 53–93 (2020)

    Article  Google Scholar 

  15. Shawar, B.A., Atwell, E.: Different measurements metrics to evaluate a chatbot system. In: Bridging the Gap: Academic and Industrial Research in Dialog Technologies Workshop Proceedings, pp. 89–96 (2007)

    Google Scholar 

  16. Papineni, K., Roukos, S., Ward, T., Henderson, J., Reeder, F.: Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 132–137 (2002)

    Google Scholar 

  17. Liu, C.W., Lowe, R., Serban, I.V., Noseworthy, M., Charlin, L., Pineau, J.: How not to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: EMNLP, Association for Computational Linguistics, pp. 2122–2132 (2016)

    Google Scholar 

  18. Gandhe, S., Traum, D.R.: Creating spoken dialogue characters from corpora without annotations. In: INTERSPEECH, pp. 2201–2204 (2007)

    Google Scholar 

  19. Dubuisson Duplessis, G., Letard, V., Ligozat, A.L., Rosset, S.: Purely corpus-based automatic conversation authoring. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 2728–2735 (2016)

    Google Scholar 

  20. Grice, P.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and Semantics 3: Speech Acts, pp. 41–58. Academic Press, New York (1975)

    Google Scholar 

  21. Leech, G.N.: Principles of Pragmatics. Longman, London (1983)

    Google Scholar 

  22. Serban, I.V., Lowe, R., Henderson, P., Charlin, L., Pineau, J.: A survey of available corpora for building data-driven dialogue systems. Comput. Sci. 33(16), 6078–6093 (2015)

    Google Scholar 

  23. Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2204–2213 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiantao Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Ma, L., Jia, M., Han, Y., Mi, J., Xu, M. (2021). How to Evaluate a Good Conversation? An Evaluation Framework for Chat Experience in Smart Home. In: Kurosu, M. (eds) Human-Computer Interaction. Theory, Methods and Tools. HCII 2021. Lecture Notes in Computer Science(), vol 12762. Springer, Cham. https://doi.org/10.1007/978-3-030-78462-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78462-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78461-4

  • Online ISBN: 978-3-030-78462-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics