skip to main content
10.1145/3121050.3121070acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation

Published: 01 October 2017 Publication History

Abstract

While the Cranfield evaluation methodology based on test collections has been very useful for evaluating simple IR systems that return a ranked list of documents, it has significant limitations when applied to search systems with interface features going beyond a ranked list, and sophisticated interactive IR systems in general. In this paper, we propose a general formal framework for evaluating IR systems based on search session simulation that can be used to perform reproducible experiments for evaluating any IR system, including interactive systems and systems with sophisticated interfaces. We show that the traditional Cranfield evaluation method can be regarded as a special instantiation of the proposed framework where the simulated search session is a user sequentially browsing the presented search results. By examining a number of existing evaluation metrics in the proposed framework, we reveal the exact assumptions they have made implicitly about the simulated users and discuss possible ways to improve these metrics. We further show that the proposed framework enables us to evaluate a set of tag-based search interfaces, a generalization of faceted browsing interfaces, producing results consistent with real user experiments and revealing interesting findings about effectiveness of the interfaces for different types of users.

References

[1]
Leif Azzopardi. 2014. Modelling Interaction with Economic Models of Search. In SIGIR '14. 3--12.
[2]
Leif Azzopardi, Maarten de Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six European languages. In SIGIR '07. 455--462.
[3]
Leif Azzopardi and Guido Zuccon. 2016. An Analysis of the Cost and Benefit of Search Interactions. In ICTIR '16. 59--68.
[4]
Luca Busin and Stefano Mizzaro. 2013. Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. In ICTIR '13. 8.
[5]
Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015. Dynamic Test Collections for Retrieval Evaluation. In ICTIR '15. 91--100.
[6]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In CIKM '09. 621--630.
[7]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers.
[8]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In SIGIR '08. 659--666.
[9]
Susan Dumais, Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2003. Stuff I've Seen: A System for Personal Information Retrieval and Re-use. In SIGIR '03. 72--79.
[10]
Donna Harman. 2011. Information Retrieval Evaluation. Morgan & Claypool Publishers.
[11]
Kalervo Järvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In ECIR. Springer, 4--15.
[12]
Chris Jordan, Carolyn R. Watters, and Qigang Gao. 2006. Using controlled query generation to evaluate blind relevance feedback algorithms. In JCDL. ACM, 286--295.
[13]
Kalervo JÃd'rvelin. 2009. Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments. In CIKM (2009--11--17). ACM, 2053--2056.
[14]
Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1--2 (2009), 1--224.
[15]
Heikki Keskustalo, Kalervo JÃdrvelin, and Ari Pirkola. 2008. Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value. Inf. Retr. 11, 3 (2008), 209--228.
[16]
David Maxwell and Leif Azzopardi. 2016. Simulating Interactive Information Retrieval: SimIIR: A Framework for the Simulation of Interaction. In SIGIR '16. 1141--1144.
[17]
David Maxwell, Leif Azzopardi, Kalervo JÃdrvelin, and Heikki Keskustalo.2015. An Initial Investigation into Fixed and Adaptive Stopping Strategies. In SIGIR. ACM, 903--906.
[18]
Alistair Mo at and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (Dec. 2008), 27 pages.
[19]
Peter Pirolli and Stuart Card. 1999. Information Foraging. Psychological Review 106 (1999), 643--675.
[20]
Stephen Robertson. 2008. A New Interpretation of Average Precision. In SIGIR '08. 689--690.
[21]
Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, Ranked Retrieval and Sessions: A Uni ed Framework for Information Access Evaluation. In SIGIR '13. 473--482.
[22]
Mark Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval 4, 4 (2010), 247--375.
[23]
Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based Calibration of Effectiveness Measures. In SIGIR '12. 95--104.
[24]
Karen Sparck Jones and Peter Willett (Eds.). 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[25]
Paul Thomas, Alistair Mo at, Peter Bailey, and Falk Scholer. 2014. Modeling decision points in user search behavior. In IIiX. ACM, 239--242.
[26]
Suzan Verberne, Maya Sappelli, Kalervo JÃd'rvelin, and Wessel Kraaij. 2015. User Simulations for Interactive Search: Evaluating Personalized Query Suggestion. In ECIR, Vol. 9022. 678--690.
[27]
Fan Zhang, Yiqun Liu, Xin Li, Min Zhang, Yinghui Xu, and Shaoping Ma. 2017. Evaluating Web Search with a Bejeweled Player Model. In SIGIR '17. 425--434.
[28]
Yinan Zhang and Chengxiang Zhai. 2015. Information Retrieval As Card Playing: A Formal Model for Optimizing Interactive Retrieval Interface. In SIGIR '15. 685--694.
[29]
Yinan Zhang and Chengxiang Zhai. 2016. A Sequential Decision Formulation of the Interface Card Model for Interactive IR. In SIGIR '16. 85--94.

Cited By

View all
  • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
  • (2024)Good for Children, Good for All?Advances in Information Retrieval10.1007/978-3-031-56066-8_24(302-313)Online publication date: 15-Mar-2024
  • (2024)Context-Driven Interactive Query Simulations Based on Generative Large Language ModelsAdvances in Information Retrieval10.1007/978-3-031-56060-6_12(173-188)Online publication date: 16-Mar-2024
  • Show More Cited By

Index Terms

  1. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval
    October 2017
    348 pages
    ISBN:9781450344906
    DOI:10.1145/3121050
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. IR evaluation
    2. interface card
    3. user simulation

    Qualifiers

    • Research-article

    Conference

    ICTIR '17
    Sponsor:

    Acceptance Rates

    ICTIR '17 Paper Acceptance Rate 27 of 54 submissions, 50%;
    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
    • (2024)Good for Children, Good for All?Advances in Information Retrieval10.1007/978-3-031-56066-8_24(302-313)Online publication date: 15-Mar-2024
    • (2024)Context-Driven Interactive Query Simulations Based on Generative Large Language ModelsAdvances in Information Retrieval10.1007/978-3-031-56060-6_12(173-188)Online publication date: 16-Mar-2024
    • (2023)User Simulation for Evaluating Information Access SystemsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3629549(302-305)Online publication date: 26-Nov-2023
    • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
    • (2023)Metaphorical User Simulators for Evaluating Task-oriented Dialogue SystemsACM Transactions on Information Systems10.1145/359651042:1(1-29)Online publication date: 18-Aug-2023
    • (2023)Simulating Users in Interactive Web Table RetrievalProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615187(3875-3879)Online publication date: 21-Oct-2023
    • (2023)Back to the Fundamentals: Extend the Rational AssumptionsA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_5(131-152)Online publication date: 18-Feb-2023
    • (2023)Bounded Rationality in Decision-Making Under UncertaintyA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_4(93-130)Online publication date: 18-Feb-2023
    • (2023)From Rational Agent to Human with Bounded RationalityA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_3(65-89)Online publication date: 18-Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media