skip to main content
10.1145/3539813.3545136acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

PRE: A Precision-Recall-Effort Optimization Framework for Query Simulation

Published: 25 August 2022 Publication History

Abstract

We study how to develop an interpretable query simulation framework that can potentially explain the process a real user might have used to formulate a query and propose a novel interpretable optimization framework (PRE) for simulating query formulation and reformulation uniformly based on a user's knowledge state, where the three high-level objectives are to maximize the precision and recall of the anticipated retrieval results and minimize the user effort. We propose probabilistic models to model how a user might estimate precision and recall for a candidate query and derive multiple specific query formulation algorithms. Evaluation results show that the major assumptions made in the PRE framework appear to be reasonable, matching the observed empirical result patterns. PRE provides specific hypotheses about a user's query formulation process that can be further examined via user studies, enables simulation of meaningful variations of users without requiring extra training data, and serves as a roadmap for systematic exploration and derivation of new interpretable query simulation methods.

References

[1]
Leif Azzopardi. 2011. The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 15--24.
[2]
Leif Azzopardi. 2014. Modelling interaction with economic models of search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 3--12.
[3]
Leif Azzopardi, Maarten De Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 455--462.
[4]
Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2022. Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR 2021. In ACM SIGIR Forum, Vol. 55. ACM New York, NY, USA, 1--16.
[5]
F. Baskaya. 2014. Simulating Search Sessions in Interactive Information Retrieval Evaluation. PhD thesis, University of Tempere.
[6]
F. Baskaya, H. Keskustalo, and K. Jarvelin. 2011. Simulating simple and fallible relevance feedback. In Proceedings of ECIR.
[7]
Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2013. Modeling behavioral factors in interactive information retrieval. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2297--2302.
[8]
Marcia J Bates. 1989. The design of browsing and berrypicking techniques for the online search interface. Online review (1989).
[9]
Nicholas J Belkin, Robert N Oddy, and HelenMBrooks. 1982. ASK for information retrieval: Part I. Background and theory. Journal of documentation 38, 2 (1982), 61--71.
[10]
Nilavra Bhattacharya and Jacek Gwizdka. 2018. Relating eye-tracking measures with changes in knowledge on search tasks. In Proceedings of the 2018 ACM symposium on eye tracking research & applications. 1--5.
[11]
Alexey Borisov, Ilya Markov, Maarten De Rijke, and Pavel Serdyukov. 2016. A neural click model for web search. In Proceedings of the 25th International Conference on World Wide Web. 531--541.
[12]
Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 80--94.
[13]
Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. arXiv preprint arXiv:2201.07620 (2022).
[14]
Arthur Câmara, David Maxwell, and Claudia Hauff. 2022. Searching, Learning, and Subtopic Ordering: A Simulation-based Analysis. arXiv preprint arXiv:2201.11181 (2022).
[15]
Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic test collections for retrieval evaluation. In Proceedings of the 2015 international conference on the theory of information retrieval. ACM, 91--100.
[16]
Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. Technical Report. DELAWARE UNIV NEWARK DEPT OF COMPUTER AND INFORMATION SCIENCES.
[17]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis Lectures on Information Concepts, Retrieval, and Services 7, 3 (2015), 1--115.
[18]
Michael D Cooper. 1973. A simulation model of an information retrieval system. Information Storage and Retrieval 9, 1 (1973), 13--32.
[19]
Carsten Eickhoff, Jaime Teevan, Ryen White, and Susan Dumais. 2014. Lessons from the journey: a query log analysis of within-session learning. In Proceedings of the 7th ACM international conference on Web search and data mining. 223--232.
[20]
David Ellis. 1993. Modeling the information-seeking patterns of academic researchers: A grounded theory approach. The Library Quarterly 63, 4 (1993), 469--486.
[21]
Artem Grotov, Aleksandr Chuklin, Ilya Markov, Luka Stout, Finde Xumara, and Maarten de Rijke. 2015. A comparative study of click models for web search. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 78--90.
[22]
Dongyi Guan, Sicong Zhang, and Hui Yang. 2013. Utilizing query change for session search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 453--462.
[23]
Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient multiple-click models in web search. In Proceedings of the second acm international conference on web search and data mining. 124--131.
[24]
Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, and Yi Chang. 2016. Learning to rewrite queries. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1443--1452.
[25]
Amaç Herdagdelen, Massimiliano Ciaramita, Daniel Mahler, Maria Holmqvist, Keith Hall, Stefan Riezler, and Enrique Alfonseca. 2010. Generalized syntactic and semantic models of query reformulation. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 283--290.
[26]
Peter Ingwersen. 1996. Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. Journal of documentation 52, 1 (1996), 3--50.
[27]
Peter Ingwersen. 2005. Integrative framework for information seeking and interactive information retrieval. na.
[28]
Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web. 387--396.
[29]
Chris Jordan, Carolyn Watters, and Qigang Gao. 2006. Using controlled query generation to evaluate blind relevance feedback algorithms. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries. 286--295.
[30]
Heikki Keskustalo, Kalervo Järvelin, Ari Pirkola, Tarun Sharma, and Marianne Lykke. 2009. Test collection-based IR evaluation needs extension toward sessions--a case of extremely short queries. In Asia Information Retrieval Symposium. Springer, 63--74.
[31]
Carol Collier Kuhlthau. 1988. Developing a model of the library search process: Cognitive and affective aspects. Rq (1988), 232--242.
[32]
Sahiti Labhishetty and Chengxiang Zhai. 2021. An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1598--1602.
[33]
Sahiti Labhishetty and ChengXiang Zhai. 2022. RATE: A Reliability-Aware Tester-Based Evaluation Framework of User Simulators. In European Conference on Information Retrieval. Springer, 336--350.
[34]
Sahiti Labhishetty, Chengxiang Zhai, Suhas Ranganath, and Pradeep Ranganathan. 2020. A Cognitive User Model for E-Commerce Search. In Proceedings of the Data Science for Retail and E-Commerce Workshop.
[35]
David Maxwell and Leif Azzopardi. 2016. Agents, simulated users and humans: An analysis of performance and behaviour. In Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, 731--740.
[36]
David Maxwell and Leif Azzopardi. 2016. Simulating interactive information retrieval: Simiir: A framework for the simulation of interaction. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 1141--1144.
[37]
Felipe Moraes, Sindunuraga Rikarno Putra, and Claudia Hauff. 2018. Contrasting search as a learning activity with instructor-designed learning. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 167--176.
[38]
Peter Norvig. 2008. Natural Language Corpus Data: Beautiful Data. http://norvig.com/ngrams/
[39]
Heather L O'Brien, Andrea Kampen, AmeliaWCole, and Kathleen Brennan. 2020. The role of domain knowledge in search as learning. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 313--317.
[40]
Teemu Pääkkönen, Jaana Kekäläinen, Heikki Keskustalo, Leif Azzopardi, David Maxwell, and Kalervo Järvelin. 2017. Validating simulated interaction for retrieval evaluation. Information Retrieval Journal 20, 4 (2017), 338--362.
[41]
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643.
[42]
Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness of Conversational Search Refinement Through User Simulation. In European Conference on Information Retrieval. Springer, 587--602.
[43]
Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. 2007. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. 149--152.
[44]
Thibault Sellam, Dipanjan Das, and Ankur P Parikh. 2020. BLEURT: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696 (2020).
[45]
Jean Tague, Michael Nelson, and Harry Wu. 1980. Problems in the simulation of bibliographic retrieval systems. In Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. 236--255.
[46]
Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In ECIR, Vol. 2017. Springer.
[47]
Suzan Verberne, Maya Sappelli, Kalervo Järvelin, and Wessel Kraaij. 2015. User simulations for interactive search: Evaluating personalized query suggestion. In European Conference on Information Retrieval. Springer, 678--690.
[48]
Barbara M Wildemuth. 2004. The effects of domain knowledge on search tactic formulation. Journal of the american society for information science and technology 55, 3 (2004), 246--258.
[49]
Hui Yang, Dongyi Guan, and Sicong Zhang. 2015. The query change model: Modeling session search as a markov decision process. ACM Transactions on Information Systems (TOIS) 33, 4 (2015), 1--33.
[50]
Shuo Zhang and Krisztian Balog. 2020. Evaluating conversational recommender systems via user simulation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1512--1520.
[51]
Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information retrieval evaluation as search simulation: A general formal framework for ir evaluation. In ACM ICTIR. ACM, 193--200.

Cited By

View all
  • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
  • (2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
  • (2023)Tutorial on User Simulation for Evaluating Information Access SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615296(5200-5203)Online publication date: 21-Oct-2023

Index Terms

  1. PRE: A Precision-Recall-Effort Optimization Framework for Query Simulation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
      August 2022
      289 pages
      ISBN:9781450394123
      DOI:10.1145/3539813
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. formal interpretable framework
      2. knowledge state
      3. query simulation

      Qualifiers

      • Research-article

      Conference

      ICTIR '22
      Sponsor:

      Acceptance Rates

      ICTIR '22 Paper Acceptance Rate 32 of 80 submissions, 40%;
      Overall Acceptance Rate 235 of 527 submissions, 45%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
      • (2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
      • (2023)Tutorial on User Simulation for Evaluating Information Access SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615296(5200-5203)Online publication date: 21-Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media