skip to main content
10.1145/3404835.3463091acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems.

Published: 11 July 2021 Publication History

Abstract

User simulation is needed for evaluating Interactive Information Retrieval (IIR) Systems. However, for any user simulator to be useful, it must be reliable. In this paper, we propose a novel Tester-based evaluation approach to evaluating the reliability of user simulators, in which we would construct a Tester based on a set of IR systems with an expected performance pattern and apply such a Tester to a user simulator to see if the user simulator would generate the expected performance pattern. We construct multiple Testers and apply them to a set of representative user simulators to empirically study the feasibility and effectiveness of the proposed Tester-based evaluation method. The results show that Tester-based evaluation is a feasible and effective method for evaluating user simulators and selecting reliable ones for evaluating IIR systems.

Supplementary Material

MP4 File (1659.mp4)
Presentation video - short version

References

[1]
Leif Azzopardi, Maarten De Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 455--462.
[2]
Leif Azzopardi, Kalervo J"arvelin, Jaap Kamps, and Mark D Smucker. 2011. Report on the SIGIR 2010 workshop on the simulation of interaction. In ACM SIGIR Forum, Vol. 44. ACM New York, NY, USA, 35--47.
[3]
Feza Baskaya, Heikki Keskustalo, and Kalervo J"arvelin. 2011. Simulating simple and fallible relevance feedback. In European Conference on Information Retrieval. Springer, 593--604.
[4]
Feza Baskaya, Heikki Keskustalo, and Kalervo J"arvelin. 2012. Time drives interaction: Simulating sessions in diverse searching environments. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 105--114.
[5]
Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic test collections for retrieval evaluation. In Proceedings of the 2015 international conference on the theory of information retrieval. 91--100.
[6]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis lectures on information concepts, retrieval, and services, Vol. 7, 3 (2015), 1--115.
[7]
Charles LA Clarke, Luanne Freund, Mark D Smucker, and Emine Yilmaz. 2013. SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 1134--1134.
[8]
Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 15--24. https://doi.org/10.1145/3331184.3331269
[9]
Jiepu Jiang and James Allan. 2016. Correlation between system and user metrics in a session. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval. 285--288.
[10]
Jiepu Jiang, Ahmed Hassan Awadallah, Xiaolin Shi, and Ryen W White. 2015. Understanding and predicting graded search satisfaction. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 57--66.
[11]
Chris Jordan, Carolyn Watters, and Qigang Gao. 2006. Using controlled query generation to evaluate blind relevance feedback algorithms. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries. 286--295.
[12]
Heikki Keskustalo, Kalervo J"arvelin, Ari Pirkola, Tarun Sharma, and Marianne Lykke. 2009. Test collection-based IR evaluation needs extension toward sessions--a case of extremely short queries. In Asia Information Retrieval Symposium. Springer, 63--74.
[13]
David Maxwell and Leif Azzopardi. 2016. Agents, simulated users and humans: An analysis of performance and behaviour. In Proceedings of the 25th ACM international on conference on information and knowledge management. 731--740.
[14]
David Maxwell, Leif Azzopardi, Kalervo J"arvelin, and Heikki Keskustalo. 2015. Searching and stopping: An analysis of stopping rules and strategies. In Proceedings of the 24th ACM international on conference on information and knowledge management. 313--322.
[15]
Joseph Rocchio. 1971. Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing (1971), 313--323.
[16]
Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness of Conversational Search Refinement Through User Simulation. In Advances in Information Retrieval, Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.). Springer International Publishing, Cham, 587--602.
[17]
Mark Sanderson. 2010. Test collection based evaluation of information retrieval systems .Now Publishers Inc.
[18]
Smitha Sriram, Xuehua Shen, and Chengxiang Zhai. 2004. A session-based search engine. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 492--493.
[19]
Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium. 58--65.
[20]
Suzan Verberne, Maya Sappelli, Kalervo J"arvelin, and Wessel Kraaij. 2015. User simulations for interactive search: Evaluating personalized query suggestion. In European Conference on Information Retrieval. Springer, 678--690.
[21]
Bernard P. Zeigler, Tag Gon Kim, and Herbert Praehofer. 2000. Theory of Modeling and Simulation 2nd ed.). Academic Press, Inc., USA.
[22]
Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1512--1520.
[23]
Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information retrieval evaluation as search simulation: A general formal framework for ir evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 193--200.

Cited By

View all
  • (2024)SimIIR 3: A Framework for the Simulation of Interactive and Conversational Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698427(197-202)Online publication date: 8-Dec-2024
  • (2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
  • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2021
      2998 pages
      ISBN:9781450380379
      DOI:10.1145/3404835
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 July 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. interactive IR systems
      2. user simulation
      3. user simulation evaluation

      Qualifiers

      • Short-paper

      Conference

      SIGIR '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SimIIR 3: A Framework for the Simulation of Interactive and Conversational Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698427(197-202)Online publication date: 8-Dec-2024
      • (2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
      • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
      • (2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
      • (2023)Metaphorical User Simulators for Evaluating Task-oriented Dialogue SystemsACM Transactions on Information Systems10.1145/359651042:1(1-29)Online publication date: 18-Aug-2023
      • (2023)Tutorial on User Simulation for Evaluating Information Access SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615296(5200-5203)Online publication date: 21-Oct-2023
      • (2022)PRE: A Precision-Recall-Effort Optimization Framework for Query SimulationProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545136(51-60)Online publication date: 23-Aug-2022
      • (2022)Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR 2021ACM SIGIR Forum10.1145/3527546.352755955:2(1-16)Online publication date: 17-Mar-2022
      • (2022): A Reliability-Aware Tester-Based Evaluation Framework of User SimulatorsAdvances in Information Retrieval10.1007/978-3-030-99736-6_23(336-350)Online publication date: 10-Apr-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media