abstract

Evaluating Task-oriented Dialogue Systems with Users

Author:

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Page 3495

https://doi.org/10.1145/3539618.3591788

Published: 18 July 2023 Publication History

Get Access

Abstract

Evaluation is one of the major concerns when developing information retrieval systems. Especially in the field of conversational AI, this topic has been heavily studied in the setting of both non-task and task-oriented conversational agents (dialogue systems).[1] Recently, several automatic metrics e.g., BLEU and ROUGE, proposed for the evaluation of dialogue systems, have shown poor correlation with human judgment and are thus ineffective for the evaluation of dialogue systems. As a consequence, a significant amount of research relies on human evaluation to estimate the effectiveness of dialogue systems[1, 4}.

An emerging approach for evaluating task-oriented dialogue systems (TDS) is to estimate a user's overall satisfaction with the system from explicit and implicit user interaction signals [2, 3]. Though useful and effective, overall user satisfaction does not necessarily give insights into what aspects or dimensions a TDS is performing well on. Understanding why a user is satisfied or dissatisfied helps the TDS recover from an error and optimize towards an individual aspect to avoid total dissatisfaction during an interaction session.

Understanding a user's satisfaction with TDS is crucial, mainly for two reasons. First, it allows system designers to understand different user perceptions regarding satisfaction, which in turn leads to better user personalization. Secondly, it can be used to avoid total dialogue failure by the system by deploying adaptive conversational approaches, such as failure recovery or switching topics. And, thus, fine-grained evaluation of TDS gives the system an opportunity to learn an individual user's interaction preferences leading to a fulfilled user goal. Therefore in this research, we take the first initiative toward understanding user satisfaction with TDS. We mainly focus on the fine-grained evaluation of conversational systems in a task-oriented setting.

References

[1]

Jan Deriu, Álvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. 2020. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, Vol. 54 (2020), 755--810.

Digital Library

Google Scholar

[2]

Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, and Paul A. Crook. 2018. Measuring User Satisfaction on Smart Speaker Intelligent Assistants Using Intent Sensitive Query Embeddings. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM '18). Association for Computing Machinery, New York, NY, USA, 1183--1192. https://doi.org/10.1145/3269206.3271802

Digital Library

Google Scholar

[3]

Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, and Tasos Anastasakos. 2016. Understanding User Satisfaction with Intelligent Assistants. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval (Carrboro, North Carolina, USA) (CHIIR '16). Association for Computing Machinery, New York, NY, USA, 121--130. https://doi.org/10.1145/2854946.2854961

Digital Library

Google Scholar

[4]

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2122--2132. https://doi.org/10.18653/v1/D16-1230

Crossref

Google Scholar

[5]

Clemencia Siro, Mohammad Aliannejadi, and Maarten de Rijke. 2022. Understanding User Satisfaction with Task-Oriented Dialogue Systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 2018--2023. https://doi.org/10.1145/3477495.3531798

Digital Library

Google Scholar

Cited By

View all

Chen XSong XZuo JWei YNie LChua T(2024)Domain-aware Multimodal Dialog Systems with Distribution-based User Characteristic ModelingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370481121:2(1-22)Online publication date: 26-Dec-2024
https://dl.acm.org/doi/10.1145/3704811

Index Terms

Evaluating Task-oriented Dialogue Systems with Users
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Users and interactive retrieval

Recommendations

Understanding User Satisfaction with Task-oriented Dialogue Systems
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

\beginabstract \AcpDS are evaluated depending on their type and purpose. Two categories are often distinguished: \beginenumerate* \item \acpTDS, which are typically evaluated on utility, i.e., their ability to complete a specified task, and \item open-...
Multimodal User Satisfaction Recognition for Non-task Oriented Dialogue Systems
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Multimodal dialogue systems (MDSs) are needed to allow users to converse with virtual agents that use natural language by sensing the multimodal behavior of users. One crucial step in the development of an MDS is measuring how well the dialogue system ...
A Multi-Task Based Neural Model to Simulate Users in Goal Oriented Dialogue Systems
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

A human-like user simulator that anticipates users' satisfaction scores, actions, and utterances can help goal-oriented dialogue systems in evaluating the conversation and refining their dialogue strategies. However, little work has experimented with ...

Comments

Information & Contributors

Information

Published In

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

Dreams Lab

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
141
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen XSong XZuo JWei YNie LChua T(2024)Domain-aware Multimodal Dialog Systems with Distribution-based User Characteristic ModelingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370481121:2(1-22)Online publication date: 26-Dec-2024
https://dl.acm.org/doi/10.1145/3704811

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Understanding User Satisfaction with Task-oriented Dialogue Systems

Multimodal User Satisfaction Recognition for Non-task Oriented Dialogue Systems

A Multi-Task Based Neural Model to Simulate Users in Goal Oriented Dialogue Systems

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations