How to Evaluate a Good Conversation? An Evaluation Framework for Chat Experience in Smart Home

Chen, Xiantao; Ma, Liang; Jia, Menghua; Han, Yajuan; Mi, Jiaqi; Xu, Meng

doi:10.1007/978-3-030-78462-1_27

Xiantao Chen⁹,
Liang Ma¹⁰,
Menghua Jia⁹,
Yajuan Han⁹,
Jiaqi Mi⁹ &
…
Meng Xu⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12762))

Included in the following conference series:

International Conference on Human-Computer Interaction

3580 Accesses

Abstract

With the development of artificial intelligence technology, more and more smart devices equipped with smart conversational agents, which can engage in chat or free conversation with human. However, the human-machine chat is still in the early stage of development, and there is a lack of effective methods to evaluate chat experience. In this study we proposed a framework to evaluate chat experience with smart conversational agents in smart home. Firstly, we collected evaluation metrics, and then applied them in the first user test and optimized the metrics and constructed an evaluation system. Finally, we carried out the second user test to validate the evaluation system with SEM. The results indicated that the evaluation system had good reliability, validity and internal consistency, which can be used to evaluate the user experience of smart conversational agents’ chat-oriented dialogue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Banchs, E., Li, H.: IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 System Demonstrations, pp. 37–42 (2012)
Google Scholar
Busemann, S., Declerck, T., Diagne, A., Dini, L., Klein, J., Schmeier, S.: Natural language dialogue service for appointment scheduling agents. In: Proceedings of the 5th Conference on Applied NLP, pp. 25–32 (1997)
Google Scholar
Seneff, S., Polifroni, J.: Dialogue management in the Mercury flight reservation system. In: Proceedings of the ANLP-NAACL 2000 Workshop on Conversational Systems, pp. 11–16 (2000)
Google Scholar
Stallard, D.: Talk’n’travel: a conversational system for air travel planning. In: Proceedings of the 6th Conference on Applied NLP, pp. 68–75 (2000)
Google Scholar
Bordes, A., Boureau, Y.-L.,Weston, J.: Learning end-to-end goal-oriented dialog. In: Proceedings of ICLR (2017)
Google Scholar
Yu, Z., Papangelis, A., Rudnicky, A.: Ticktock: a non-goal-oriented multimodal dialog system with engagement awareness. In: 2015 AAAI Spring Symposium Series, pp. 108–111 (2015)
Google Scholar
Yu, Z., Xu, Z., Black, A.W., Rudnicky, A.I.: Strategy and policy learning for non-task-oriented conversational systems. In: Proceedings of the SIGDIAL 2016 Conference, pp. 404–412 (2016)
Google Scholar
Bickmore, T., Cassell, J.: Relational agents: a model and implementation of building user trust. In:Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 396–403 (2001)
Google Scholar
Wilcock, G., Jokinen, K.: WikiTalk human-robot interactions. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 73–74 (2013)
Google Scholar
Bang, J., Noh, H., Kim, Y., Lee, G.G.: Example-based chat-oriented dialogue system with personalized long-term memory. In: 2015 International Conference on Big Data and Smart Computing (BIGCOMP), pp. 238–243 (2015)
Google Scholar
Deriu, J., et al.: Survey on evaluation methods for dialogue systems. Artif. Intell. Rev. 54(1), 755–810 (2020). https://doi.org/10.1007/s10462-020-09866-x
Article Google Scholar
Hassenzahl, M., et al.: The Thing and I: understanding the relationship between user and product. In: Blythe, M.A., Overbeeke, K., Monk, A.F., Wright, P.C. (eds.) Funology. HCIS, pp. 31–42. Springer, Cham (2004). https://doi.org/10.1007/1-4020-2967-5_4
Chapter Google Scholar
Radziwill, N.M., Benton, M.C.: Evaluating Quality of Chatbots and Intelligent Conversational Agents. Computing Research Repository (CoRR), pp. 1–21 (2017)
Google Scholar
Zhou, L., Gao, J., Li, D., Shum, H.-Y.: The design and implementation of xiaoice, an empathetic social chatbot. Comput. Linguist. 46(1), 53–93 (2020)
Article Google Scholar
Shawar, B.A., Atwell, E.: Different measurements metrics to evaluate a chatbot system. In: Bridging the Gap: Academic and Industrial Research in Dialog Technologies Workshop Proceedings, pp. 89–96 (2007)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Henderson, J., Reeder, F.: Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 132–137 (2002)
Google Scholar
Liu, C.W., Lowe, R., Serban, I.V., Noseworthy, M., Charlin, L., Pineau, J.: How not to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: EMNLP, Association for Computational Linguistics, pp. 2122–2132 (2016)
Google Scholar
Gandhe, S., Traum, D.R.: Creating spoken dialogue characters from corpora without annotations. In: INTERSPEECH, pp. 2201–2204 (2007)
Google Scholar
Dubuisson Duplessis, G., Letard, V., Ligozat, A.L., Rosset, S.: Purely corpus-based automatic conversation authoring. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 2728–2735 (2016)
Google Scholar
Grice, P.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and Semantics 3: Speech Acts, pp. 41–58. Academic Press, New York (1975)
Google Scholar
Leech, G.N.: Principles of Pragmatics. Longman, London (1983)
Google Scholar
Serban, I.V., Lowe, R., Henderson, P., Charlin, L., Pineau, J.: A survey of available corpora for building data-driven dialogue systems. Comput. Sci. 33(16), 6078–6093 (2015)
Google Scholar
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2204–2213 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Baidu AI User Experience Department, Beijing, China
Xiantao Chen, Menghua Jia, Yajuan Han, Jiaqi Mi & Meng Xu
College of Engineering, Heilongjian Bayi Agricultural University, Daqing, China
Liang Ma

Authors

Xiantao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Menghua Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yajuan Han
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Mi
View author publications
You can also search for this author in PubMed Google Scholar
Meng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiantao Chen .

Editor information

Editors and Affiliations

The Open University of Japan, Chiba, Japan
Masaaki Kurosu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Ma, L., Jia, M., Han, Y., Mi, J., Xu, M. (2021). How to Evaluate a Good Conversation? An Evaluation Framework for Chat Experience in Smart Home. In: Kurosu, M. (eds) Human-Computer Interaction. Theory, Methods and Tools. HCII 2021. Lecture Notes in Computer Science(), vol 12762. Springer, Cham. https://doi.org/10.1007/978-3-030-78462-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-78462-1_27
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78461-4
Online ISBN: 978-3-030-78462-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics