research-article

Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation

Authors:

ChengXiang ZhaiAuthors Info & Claims

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

Pages 193 - 200

https://doi.org/10.1145/3121050.3121070

Published: 01 October 2017 Publication History

Abstract

While the Cranfield evaluation methodology based on test collections has been very useful for evaluating simple IR systems that return a ranked list of documents, it has significant limitations when applied to search systems with interface features going beyond a ranked list, and sophisticated interactive IR systems in general. In this paper, we propose a general formal framework for evaluating IR systems based on search session simulation that can be used to perform reproducible experiments for evaluating any IR system, including interactive systems and systems with sophisticated interfaces. We show that the traditional Cranfield evaluation method can be regarded as a special instantiation of the proposed framework where the simulated search session is a user sequentially browsing the presented search results. By examining a number of existing evaluation metrics in the proposed framework, we reveal the exact assumptions they have made implicitly about the simulated users and discuss possible ways to improve these metrics. We further show that the proposed framework enables us to evaluate a set of tag-based search interfaces, a generalization of faceted browsing interfaces, producing results consistent with real user experiments and revealing interesting findings about effectiveness of the interfaces for different types of users.

References

[1]

Leif Azzopardi. 2014. Modelling Interaction with Economic Models of Search. In SIGIR '14. 3--12.

Digital Library

[2]

Leif Azzopardi, Maarten de Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six European languages. In SIGIR '07. 455--462.

Digital Library

[3]

Leif Azzopardi and Guido Zuccon. 2016. An Analysis of the Cost and Benefit of Search Interactions. In ICTIR '16. 59--68.

Digital Library

[4]

Luca Busin and Stefano Mizzaro. 2013. Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. In ICTIR '13. 8.

Digital Library

[5]

Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic Test Collections for Retrieval Evaluation. In ICTIR '15. 91--100.

Digital Library

[6]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In CIKM '09. 621--630.

Digital Library

[7]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers.

[8]

Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In SIGIR '08. 659--666.

Digital Library

[9]

Susan Dumais, Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2003. Stuff I've Seen: A System for Personal Information Retrieval and Re-use. In SIGIR '03. 72--79.

Digital Library

[10]

Donna Harman. 2011. Information Retrieval Evaluation. Morgan & Claypool Publishers.

Digital Library

[11]

Kalervo Järvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In ECIR. Springer, 4--15.

Digital Library

[12]

Chris Jordan, Carolyn R. Watters, and Qigang Gao. 2006. Using controlled query generation to evaluate blind relevance feedback algorithms. In JCDL. ACM, 286--295.

Digital Library

[13]

Kalervo JÃd'rvelin. 2009. Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments. In CIKM (2009--11--17). ACM, 2053--2056.

[14]

Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1--2 (2009), 1--224.

Digital Library

[15]

Heikki Keskustalo, Kalervo JÃdrvelin, and Ari Pirkola. 2008. Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value. Inf. Retr. 11, 3 (2008), 209--228.

Digital Library

[16]

David Maxwell and Leif Azzopardi. 2016. Simulating Interactive Information Retrieval: SimIIR: A Framework for the Simulation of Interaction. In SIGIR '16. 1141--1144.

Digital Library

[17]

David Maxwell, Leif Azzopardi, Kalervo JÃdrvelin, and Heikki Keskustalo.2015. An Initial Investigation into Fixed and Adaptive Stopping Strategies. In SIGIR. ACM, 903--906.

Digital Library

[18]

Alistair Mo at and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (Dec. 2008), 27 pages.

Digital Library

[19]

Peter Pirolli and Stuart Card. 1999. Information Foraging. Psychological Review 106 (1999), 643--675.

[20]

Stephen Robertson. 2008. A New Interpretation of Average Precision. In SIGIR '08. 689--690.

Digital Library

[21]

Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, Ranked Retrieval and Sessions: A Uni ed Framework for Information Access Evaluation. In SIGIR '13. 473--482.

Digital Library

[22]

Mark Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval 4, 4 (2010), 247--375.

[23]

Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based Calibration of Effectiveness Measures. In SIGIR '12. 95--104.

Digital Library

[24]

Karen Sparck Jones and Peter Willett (Eds.). 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Digital Library

[25]

Paul Thomas, Alistair Mo at, Peter Bailey, and Falk Scholer. 2014. Modeling decision points in user search behavior. In IIiX. ACM, 239--242.

Digital Library

[26]

Suzan Verberne, Maya Sappelli, Kalervo JÃd'rvelin, and Wessel Kraaij. 2015. User Simulations for Interactive Search: Evaluating Personalized Query Suggestion. In ECIR, Vol. 9022. 678--690.

[27]

Fan Zhang, Yiqun Liu, Xin Li, Min Zhang, Yinghui Xu, and Shaoping Ma. 2017. Evaluating Web Search with a Bejeweled Player Model. In SIGIR '17. 425--434.

Digital Library

[28]

Yinan Zhang and Chengxiang Zhai. 2015. Information Retrieval As Card Playing: A Formal Model for Optimizing Interactive Retrieval Interface. In SIGIR '15. 685--694.

Digital Library

[29]

Yinan Zhang and Chengxiang Zhai. 2016. A Sequential Decision Formulation of the Interface Card Model for Interactive IR. In SIGIR '16. 85--94.

Digital Library

Cited By

Balog KZhai CChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641243
Landoni MHuibers TMurgia EPera M(2024)Good for Children, Good for All?Advances in Information Retrieval10.1007/978-3-031-56066-8_24(302-313)Online publication date: 15-Mar-2024
https://doi.org/10.1007/978-3-031-56066-8_24
Engelmann BBreuer TFriese JSchaer PFuhr N(2024)Context-Driven Interactive Query Simulations Based on Generative Large Language ModelsAdvances in Information Retrieval10.1007/978-3-031-56060-6_12(173-188)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_12
Show More Cited By

Index Terms

Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Including summaries in system evaluation
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

In batch evaluation of retrieval systems, performance is calculated based on predetermined relevance judgements applied to a list of documents returned by the system for a query. This evaluation paradigm, however, ignores the current standard operation ...
Facet-Based Browsing in Video Retrieval: A Simulation-Based Evaluation
MMM '09: Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling

In this paper we introduce a novel interactive video retrieval approach which uses sub-needs of an information need for querying and organising the search process. The underlying assumption of this approach is that the search effectiveness will be ...
Automatic ranking of retrieval systems in imperfect environments
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

The empirical investigation of the effectiveness of information retrieval (IR) systems requires a test collection, a set of query topics, and a set of relevance judgments made by human assessors for each query. Previous experiments show that differences ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

October 2017

348 pages

ISBN:9781450344906

DOI:10.1145/3121050

General Chairs:
Jaap Kamps
University of Amsterdam, The Netherlands
,
Evangelos Kanoulas
University of Amsterdam, The Netherlands
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Program Chairs:
Hui Fang
University of Delaware, USA
,
Emine Yilmaz
University College London, UK

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICTIR '17

Sponsor:

SIGIR

ICTIR '17: ACM SIGIR International Conference on the Theory of Information Retrieval

October 1 - 4, 2017

Amsterdam, The Netherlands

Acceptance Rates

ICTIR '17 Paper Acceptance Rate 27 of 54 submissions, 50%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
298
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Balog KZhai CChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641243
Landoni MHuibers TMurgia EPera M(2024)Good for Children, Good for All?Advances in Information Retrieval10.1007/978-3-031-56066-8_24(302-313)Online publication date: 15-Mar-2024
https://doi.org/10.1007/978-3-031-56066-8_24
Engelmann BBreuer TFriese JSchaer PFuhr N(2024)Context-Driven Interactive Query Simulations Based on Generative Large Language ModelsAdvances in Information Retrieval10.1007/978-3-031-56060-6_12(173-188)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_12
Balog KZhai C(2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3629549
Breuer TFuhr NSchaer P(2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
https://dl.acm.org/doi/10.1145/3623640
Sun WGuo SZhang SRen PChen Zde Rijke MRen Z(2023)Metaphorical User Simulators for Evaluating Task-oriented Dialogue SystemsACM Transactions on Information Systems10.1145/359651042:1(1-29)Online publication date: 18-Aug-2023
https://doi.org/10.1145/3596510
Engelmann BBreuer TSchaer PFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Simulating Users in Interactive Web Table RetrievalProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615187(3875-3879)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615187
Liu JLiu J(2023)Back to the Fundamentals: Extend the Rational AssumptionsA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_5(131-152)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_5
Liu JLiu J(2023)Bounded Rationality in Decision-Making Under UncertaintyA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_4(93-130)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_4
Liu JLiu J(2023)From Rational Agent to Human with Bounded RationalityA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_3(65-89)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten