research-article

PRE: A Precision-Recall-Effort Optimization Framework for Query Simulation

Authors:

Sahiti Labhishetty,

ChengXiang ZhaiAuthors Info & Claims

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

Pages 51 - 60

https://doi.org/10.1145/3539813.3545136

Published: 25 August 2022 Publication History

Abstract

We study how to develop an interpretable query simulation framework that can potentially explain the process a real user might have used to formulate a query and propose a novel interpretable optimization framework (PRE) for simulating query formulation and reformulation uniformly based on a user's knowledge state, where the three high-level objectives are to maximize the precision and recall of the anticipated retrieval results and minimize the user effort. We propose probabilistic models to model how a user might estimate precision and recall for a candidate query and derive multiple specific query formulation algorithms. Evaluation results show that the major assumptions made in the PRE framework appear to be reasonable, matching the observed empirical result patterns. PRE provides specific hypotheses about a user's query formulation process that can be further examined via user studies, enables simulation of meaningful variations of users without requiring extra training data, and serves as a roadmap for systematic exploration and derivation of new interpretable query simulation methods.

References

[1]

Leif Azzopardi. 2011. The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 15--24.

Digital Library

[2]

Leif Azzopardi. 2014. Modelling interaction with economic models of search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 3--12.

Digital Library

[3]

Leif Azzopardi, Maarten De Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 455--462.

Digital Library

[4]

Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2022. Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR 2021. In ACM SIGIR Forum, Vol. 55. ACM New York, NY, USA, 1--16.

[5]

F. Baskaya. 2014. Simulating Search Sessions in Interactive Information Retrieval Evaluation. PhD thesis, University of Tempere.

[6]

F. Baskaya, H. Keskustalo, and K. Jarvelin. 2011. Simulating simple and fallible relevance feedback. In Proceedings of ECIR.

[7]

Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2013. Modeling behavioral factors in interactive information retrieval. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2297--2302.

[8]

Marcia J Bates. 1989. The design of browsing and berrypicking techniques for the online search interface. Online review (1989).

[9]

Nicholas J Belkin, Robert N Oddy, and HelenMBrooks. 1982. ASK for information retrieval: Part I. Background and theory. Journal of documentation 38, 2 (1982), 61--71.

[10]

Nilavra Bhattacharya and Jacek Gwizdka. 2018. Relating eye-tracking measures with changes in knowledge on search tasks. In Proceedings of the 2018 ACM symposium on eye tracking research & applications. 1--5.

Digital Library

[11]

Alexey Borisov, Ilya Markov, Maarten De Rijke, and Pavel Serdyukov. 2016. A neural click model for web search. In Proceedings of the 25th International Conference on World Wide Web. 531--541.

Digital Library

[12]

Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 80--94.

[13]

Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. arXiv preprint arXiv:2201.07620 (2022).

[14]

Arthur Câmara, David Maxwell, and Claudia Hauff. 2022. Searching, Learning, and Subtopic Ordering: A Simulation-based Analysis. arXiv preprint arXiv:2201.11181 (2022).

[15]

Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic test collections for retrieval evaluation. In Proceedings of the 2015 international conference on the theory of information retrieval. ACM, 91--100.

Digital Library

[16]

Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. Technical Report. DELAWARE UNIV NEWARK DEPT OF COMPUTER AND INFORMATION SCIENCES.

[17]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis Lectures on Information Concepts, Retrieval, and Services 7, 3 (2015), 1--115.

[18]

Michael D Cooper. 1973. A simulation model of an information retrieval system. Information Storage and Retrieval 9, 1 (1973), 13--32.

[19]

Carsten Eickhoff, Jaime Teevan, Ryen White, and Susan Dumais. 2014. Lessons from the journey: a query log analysis of within-session learning. In Proceedings of the 7th ACM international conference on Web search and data mining. 223--232.

Digital Library

[20]

David Ellis. 1993. Modeling the information-seeking patterns of academic researchers: A grounded theory approach. The Library Quarterly 63, 4 (1993), 469--486.

[21]

Artem Grotov, Aleksandr Chuklin, Ilya Markov, Luka Stout, Finde Xumara, and Maarten de Rijke. 2015. A comparative study of click models for web search. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 78--90.

Digital Library

[22]

Dongyi Guan, Sicong Zhang, and Hui Yang. 2013. Utilizing query change for session search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 453--462.

Digital Library

[23]

Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient multiple-click models in web search. In Proceedings of the second acm international conference on web search and data mining. 124--131.

Digital Library

[24]

Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, and Yi Chang. 2016. Learning to rewrite queries. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1443--1452.

Digital Library

[25]

Amaç Herdagdelen, Massimiliano Ciaramita, Daniel Mahler, Maria Holmqvist, Keith Hall, Stefan Riezler, and Enrique Alfonseca. 2010. Generalized syntactic and semantic models of query reformulation. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 283--290.

Digital Library

[26]

Peter Ingwersen. 1996. Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. Journal of documentation 52, 1 (1996), 3--50.

[27]

Peter Ingwersen. 2005. Integrative framework for information seeking and interactive information retrieval. na.

[28]

Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web. 387--396.

Digital Library

[29]

Chris Jordan, Carolyn Watters, and Qigang Gao. 2006. Using controlled query generation to evaluate blind relevance feedback algorithms. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries. 286--295.

Digital Library

[30]

Heikki Keskustalo, Kalervo Järvelin, Ari Pirkola, Tarun Sharma, and Marianne Lykke. 2009. Test collection-based IR evaluation needs extension toward sessions--a case of extremely short queries. In Asia Information Retrieval Symposium. Springer, 63--74.

Digital Library

[31]

Carol Collier Kuhlthau. 1988. Developing a model of the library search process: Cognitive and affective aspects. Rq (1988), 232--242.

[32]

Sahiti Labhishetty and Chengxiang Zhai. 2021. An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1598--1602.

Digital Library

[33]

Sahiti Labhishetty and ChengXiang Zhai. 2022. RATE: A Reliability-Aware Tester-Based Evaluation Framework of User Simulators. In European Conference on Information Retrieval. Springer, 336--350.

[34]

Sahiti Labhishetty, Chengxiang Zhai, Suhas Ranganath, and Pradeep Ranganathan. 2020. A Cognitive User Model for E-Commerce Search. In Proceedings of the Data Science for Retail and E-Commerce Workshop.

[35]

David Maxwell and Leif Azzopardi. 2016. Agents, simulated users and humans: An analysis of performance and behaviour. In Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, 731--740.

Digital Library

[36]

David Maxwell and Leif Azzopardi. 2016. Simulating interactive information retrieval: Simiir: A framework for the simulation of interaction. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 1141--1144.

Digital Library

[37]

Felipe Moraes, Sindunuraga Rikarno Putra, and Claudia Hauff. 2018. Contrasting search as a learning activity with instructor-designed learning. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 167--176.

Digital Library

[38]

Peter Norvig. 2008. Natural Language Corpus Data: Beautiful Data. http://norvig.com/ngrams/

[39]

Heather L O'Brien, Andrea Kampen, AmeliaWCole, and Kathleen Brennan. 2020. The role of domain knowledge in search as learning. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 313--317.

Digital Library

[40]

Teemu Pääkkönen, Jaana Kekäläinen, Heikki Keskustalo, Leif Azzopardi, David Maxwell, and Kalervo Järvelin. 2017. Validating simulated interaction for retrieval evaluation. Information Retrieval Journal 20, 4 (2017), 338--362.

Digital Library

[41]

Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643.

[42]

Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness of Conversational Search Refinement Through User Simulation. In European Conference on Information Retrieval. Springer, 587--602.

[43]

Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. 2007. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. 149--152.

[44]

Thibault Sellam, Dipanjan Das, and Ankur P Parikh. 2020. BLEURT: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696 (2020).

[45]

Jean Tague, Michael Nelson, and Harry Wu. 1980. Problems in the simulation of bibliographic retrieval systems. In Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. 236--255.

[46]

Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In ECIR, Vol. 2017. Springer.

[47]

Suzan Verberne, Maya Sappelli, Kalervo Järvelin, and Wessel Kraaij. 2015. User simulations for interactive search: Evaluating personalized query suggestion. In European Conference on Information Retrieval. Springer, 678--690.

[48]

Barbara M Wildemuth. 2004. The effects of domain knowledge on search tactic formulation. Journal of the american society for information science and technology 55, 3 (2004), 246--258.

Digital Library

[49]

Hui Yang, Dongyi Guan, and Sicong Zhang. 2015. The query change model: Modeling session search as a markov decision process. ACM Transactions on Information Systems (TOIS) 33, 4 (2015), 1--33.

Digital Library

[50]

Shuo Zhang and Krisztian Balog. 2020. Evaluating conversational recommender systems via user simulation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1512--1520.

Digital Library

[51]

Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information retrieval evaluation as search simulation: A general formal framework for ir evaluation. In ACM ICTIR. ACM, 193--200.

Cited By

Balog KZhai CChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641243
Balog KZhai C(2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3629549
Balog KZhai CFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Tutorial on User Simulation for Evaluating Information Access SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615296(5200-5203)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615296

Index Terms

PRE: A Precision-Recall-Effort Optimization Framework for Query Simulation
1. Computing methodologies
  1. Modeling and simulation
2. Information systems
  1. Information retrieval
    1. Users and interactive retrieval

Recommendations

Validating Simulations of User Query Variants
Advances in Information Retrieval
Abstract
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives ...
Query Optimization for Ontology-Mediated Query Answering
WWW '24: Proceedings of the ACM Web Conference 2024

Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called the KB's database, which is described by domain knowledge called the KB's ontology. A widely-investigated OMQA technique ...
Simulating User Querying Behavior Using Embedding Space Alignment
Linking Theory and Practice of Digital Libraries
Abstract
Simulation is used as a cost-efficient and repeatable means of experimentation to support Information Retrieval (IR) systems and digital libraries with more realistic directives when user interaction data is lacking. While simulation has been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

August 2022

289 pages

ISBN:9781450394123

DOI:10.1145/3539813

Program Chairs:
Fabio Crestani
Università della Svizzera Italiana - USI, Switzerland
,
Gabriella Pasi
Univ. Milano-Bicocca, Italy
,
Eric Gaussier
Univ. Grenoble-Alpes, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICTIR '22

Sponsor:

SIGIR

ICTIR '22: The 2022 ACM SIGIR International Conference on the Theory of Information Retrieval

July 11 - 12, 2022

Madrid, Spain

Acceptance Rates

ICTIR '22 Paper Acceptance Rate 32 of 80 submissions, 40%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
90
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Balog KZhai CChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641243
Balog KZhai C(2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3629549
Balog KZhai CFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Tutorial on User Simulation for Evaluating Information Access SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615296(5200-5203)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615296

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten