short-paper

An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems.

Authors:

Sahiti Labhishetty,

Chengxiang ZhaiAuthors Info & Claims

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1598 - 1602

https://doi.org/10.1145/3404835.3463091

Published: 11 July 2021 Publication History

Abstract

User simulation is needed for evaluating Interactive Information Retrieval (IIR) Systems. However, for any user simulator to be useful, it must be reliable. In this paper, we propose a novel Tester-based evaluation approach to evaluating the reliability of user simulators, in which we would construct a Tester based on a set of IR systems with an expected performance pattern and apply such a Tester to a user simulator to see if the user simulator would generate the expected performance pattern. We construct multiple Testers and apply them to a set of representative user simulators to empirically study the feasibility and effectiveness of the proposed Tester-based evaluation method. The results show that Tester-based evaluation is a feasible and effective method for evaluating user simulators and selecting reliable ones for evaluating IIR systems.

Supplementary Material

MP4 File (1659.mp4)

Presentation video - short version

Download
12.58 MB

References

[1]

Leif Azzopardi, Maarten De Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 455--462.

Digital Library

[2]

Leif Azzopardi, Kalervo J"arvelin, Jaap Kamps, and Mark D Smucker. 2011. Report on the SIGIR 2010 workshop on the simulation of interaction. In ACM SIGIR Forum, Vol. 44. ACM New York, NY, USA, 35--47.

[3]

Feza Baskaya, Heikki Keskustalo, and Kalervo J"arvelin. 2011. Simulating simple and fallible relevance feedback. In European Conference on Information Retrieval. Springer, 593--604.

Digital Library

[4]

Feza Baskaya, Heikki Keskustalo, and Kalervo J"arvelin. 2012. Time drives interaction: Simulating sessions in diverse searching environments. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 105--114.

Digital Library

[5]

Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic test collections for retrieval evaluation. In Proceedings of the 2015 international conference on the theory of information retrieval. 91--100.

Digital Library

[6]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis lectures on information concepts, retrieval, and services, Vol. 7, 3 (2015), 1--115.

[7]

Charles LA Clarke, Luanne Freund, Mark D Smucker, and Emine Yilmaz. 2013. SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 1134--1134.

[8]

Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 15--24. https://doi.org/10.1145/3331184.3331269

Digital Library

[9]

Jiepu Jiang and James Allan. 2016. Correlation between system and user metrics in a session. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval. 285--288.

Digital Library

[10]

Jiepu Jiang, Ahmed Hassan Awadallah, Xiaolin Shi, and Ryen W White. 2015. Understanding and predicting graded search satisfaction. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 57--66.

Digital Library

[11]

Chris Jordan, Carolyn Watters, and Qigang Gao. 2006. Using controlled query generation to evaluate blind relevance feedback algorithms. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries. 286--295.

Digital Library

[12]

Heikki Keskustalo, Kalervo J"arvelin, Ari Pirkola, Tarun Sharma, and Marianne Lykke. 2009. Test collection-based IR evaluation needs extension toward sessions--a case of extremely short queries. In Asia Information Retrieval Symposium. Springer, 63--74.

Digital Library

[13]

David Maxwell and Leif Azzopardi. 2016. Agents, simulated users and humans: An analysis of performance and behaviour. In Proceedings of the 25th ACM international on conference on information and knowledge management. 731--740.

Digital Library

[14]

David Maxwell, Leif Azzopardi, Kalervo J"arvelin, and Heikki Keskustalo. 2015. Searching and stopping: An analysis of stopping rules and strategies. In Proceedings of the 24th ACM international on conference on information and knowledge management. 313--322.

Digital Library

[15]

Joseph Rocchio. 1971. Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing (1971), 313--323.

[16]

Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness ofÂ Conversational Search Refinement Through User Simulation. In Advances in Information Retrieval, Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.). Springer International Publishing, Cham, 587--602.

[17]

Mark Sanderson. 2010. Test collection based evaluation of information retrieval systems .Now Publishers Inc.

[18]

Smitha Sriram, Xuehua Shen, and Chengxiang Zhai. 2004. A session-based search engine. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 492--493.

Digital Library

[19]

Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium. 58--65.

Digital Library

[20]

Suzan Verberne, Maya Sappelli, Kalervo J"arvelin, and Wessel Kraaij. 2015. User simulations for interactive search: Evaluating personalized query suggestion. In European Conference on Information Retrieval. Springer, 678--690.

[21]

Bernard P. Zeigler, Tag Gon Kim, and Herbert Praehofer. 2000. Theory of Modeling and Simulation 2nd ed.). Academic Press, Inc., USA.

[22]

Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1512--1520.

Digital Library

[23]

Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information retrieval evaluation as search simulation: A general formal framework for ir evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 193--200.

Digital Library

Cited By

Azzopardi LBreuer TEngelmann BKreutz CMacAvaney SMaxwell DParry ARoegiest AWang XZerhoudi SSakai TIshita EOhshima HHasibi FMao JJose J(2024)SimIIR 3: A Framework for the Simulation of Interactive and Conversational Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698427(197-202)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698427
Breuer TFuhr NSchaer P(2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3623640
Balog KZhai CChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641243
Show More Cited By

Index Terms

An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems.
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Users and interactive retrieval

Recommendations

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic ...
Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

While the Cranfield evaluation methodology based on test collections has been very useful for evaluating simple IR systems that return a ranked list of documents, it has significant limitations when applied to search systems with interface features ...
RATE: A Reliability-Aware Tester-Based Evaluation Framework of User Simulators
Advances in Information Retrieval
Abstract
Evaluation of user simulators is needed in order to use them for evaluating Interactive Information Retrieval (IIR) Systems. Previous work has proposed a tester-based approach to evaluate user simulators, but it has not addressed the important ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2021

2998 pages

ISBN:9781450380379

DOI:10.1145/3404835

General Chairs:
Fernando Diaz
(Google)
,
Chirag Shah
University of Washington
,
Torsten Suel
New York University
,
Program Chairs:
Pablo Castells
Universidad Autónoma de Madrid, Amazon
,
Rosie Jones
Spotify
,
Tetsuya Sakai
Waseda University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '21

Sponsor:

SIGIR

SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2021

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
195
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)2

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Azzopardi LBreuer TEngelmann BKreutz CMacAvaney SMaxwell DParry ARoegiest AWang XZerhoudi SSakai TIshita EOhshima HHasibi FMao JJose J(2024)SimIIR 3: A Framework for the Simulation of Interactive and Conversational Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698427(197-202)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698427
Breuer TFuhr NSchaer P(2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3623640
Balog KZhai CChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641243
Balog KZhai C(2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3629549
Sun WGuo SZhang SRen PChen Zde Rijke MRen Z(2023)Metaphorical User Simulators for Evaluating Task-oriented Dialogue SystemsACM Transactions on Information Systems10.1145/359651042:1(1-29)Online publication date: 18-Aug-2023
https://dl.acm.org/doi/10.1145/3596510
Balog KZhai CFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Tutorial on User Simulation for Evaluating Information Access SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615296(5200-5203)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615296
Labhishetty SZhai CCrestani FPasi GGaussier E(2022)PRE: A Precision-Recall-Effort Optimization Framework for Query SimulationProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545136(51-60)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545136
Balog KMaxwell DThomas PZhang S(2022)Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR 2021ACM SIGIR Forum10.1145/3527546.352755955:2(1-16)Online publication date: 17-Mar-2022
https://dl.acm.org/doi/10.1145/3527546.3527559
Labhishetty SZhai C(2022): A Reliability-Aware Tester-Based Evaluation Framework of User SimulatorsAdvances in Information Retrieval10.1007/978-3-030-99736-6_23(336-350)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99736-6_23

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten