research-article

Ranking Interruptus: When Truncated Rankings Are Better and How to Measure That

Authors:

Enrique Amigó,

Stefano Mizzaro,

Damiano SpinaAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 588 - 598

https://doi.org/10.1145/3477495.3532051

Published: 07 July 2022 Publication History

Abstract

Most of information retrieval effectiveness evaluation metrics assume that systems appending irrelevant documents at the bottom of the ranking are as effective as (or not worse than) systems that have a stopping criteria to 'truncate' the ranking at the right position to avoid retrieving those irrelevant documents at the end. It can be argued, however, that such truncated rankings are more useful to the end user. It is thus important to understand how to measure retrieval effectiveness in this scenario. In this paper we provide both theoretical and experimental contributions. We first define formal properties to analyze how effectiveness metrics behave when evaluating truncated rankings. Our theoretical analysis shows that de-facto standard metrics do not satisfy desirable properties to evaluate truncated rankings: only Observational Information Effectiveness (OIE) -- a metric based on Shannon's information theory -- satisfies them all. We then perform experiments to compare several metrics on nine TREC datasets. According to our experimental results, the most appropriate metrics for truncated rankings are OIE and a novel extension of Rank-Biased Precision that adds a user effort factor penalizing the retrieval of irrelevant documents.

References

[1]

Ameer Albahem, Damiano Spina, Falk Scholer, and Lawrence Cavedon. 2019. Meta-Evaluation of Dynamic Search: How Do Metrics Capture Topical Relevance, Diversity and User Effort?. In Proceedings of the 41st European Conference on Information Retrieval (ECIR '19). Springer International Publishing, Cham, 607-- 620. https://doi.org/10.1007/978--3-030--15712--8_39

Digital Library

[2]

Ameer Albahem, Damiano Spina, Falk Scholer, and Lawrence Cavedon. 2021. Component-Based Analysis of Dynamic Search Performance. ACM Trans. Inf. Syst. 40, 3, Article 61 (nov 2021), 47 pages. https://doi.org/10.1145/3483237

Digital Library

[3]

Enrique Amigó, Hui Fang, Stefano Mizzaro, and ChengXiang Zhai. 2018. Are We on the Right Track? An Examination of Information Retrieval Methodologies. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18). Association for Computing Machinery, New York, NY, USA, 997--1000. https://doi.org/10.1145/3209978.3210131

Digital Library

[4]

Enrique Amigó, Fernando Giner, Julio Gonzalo, and Felisa Verdejo. 2020. On the foundations of similarity in information access. Inf. Retr. J. 23, 3 (2020), 216--254. https://doi.org/10.1007/s10791-020-09375-z

Digital Library

[5]

Enrique Amigó, Fernando Giner, Stefano Mizzaro, and Damiano Spina. 2018. A Formal Account of Effectiveness Evaluation and Ranking Fusion. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '18). Association for Computing Machinery, New York, NY, USA, 123--130. https://doi.org/10.1145/3234944.3234958

Digital Library

[6]

Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2011. Combining Evaluation Metrics via the Unanimous Improvement Ratio and Its Application to Clustering Tasks. J. Artif. Int. Res. 42, 1 (sep 2011), 689--718. https://doi.org/10. 5555/2208436.2208454

[7]

Enrique Amigó, Julio Gonzalo, Stefano Mizzaro, and Jorge Carrillo-de Albornoz. 2020. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL '20). Association for Computational Linguistics, Online, 3938--3949. https://doi.org/10.18653/v1/2020.acl-main.363

[8]

Enrique Amigó, Julio Gonzalo, and Felisa Verdejo. 2013. A General Evaluation Measure for Document Organization Tasks. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). Association for Computing Machinery, New York, NY, USA, 643--652. https://doi.org/10.1145/2484028.2484081

Digital Library

[9]

Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18). Association for Computing Machinery, New York, NY, USA, 625--634. https://doi.org/10.1145/3209978.3210024

Digital Library

[10]

Avishek Anand, Lawrence Cavedon, Hideo Joho, Mark Sanderson, and Benno Stein. 2020. Conversational Search (Dagstuhl Seminar 19461). Dagstuhl Reports 9, 11 (2020), 34--83. https://doi.org/10.4230/DagRep.9.11.34

[11]

Chris Buckley and Ellen M. Voorhees. 2004. Retrieval Evaluation with Incomplete Information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '04). Association for Computing Machinery, New York, NY, USA, 25--32. https://doi.org/10.1145/ 1008992.1009000

[12]

David Carmel, Elad Haramaty, Arnon Lazerson, Liane Lewin-Eytan, and Yoelle Maarek. 2020. Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? On the Relation between Product Relevance and Customer Satisfaction in ECommerce. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM '20). Association for Computing Machinery, New York, NY, USA, 79--87. https://doi.org/10.1145/3336191.3371780

Digital Library

[13]

Ben Carterette, Paul N. Bennett, David Maxwell Chickering, and Susan T. Dumais. 2008. Here or There: Preference Judgments for Relevance. In Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval (ECIR'08). Springer-Verlag, Berlin, Heidelberg, 16--27. https://doi.org/10.5555/ 1793274.1793281

[14]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). Association for Computing Machinery, New York, NY, USA, 621--630. https://doi.org/10.1145/1645953. 1646033

Digital Library

[15]

Charles L.A. Clarke, Chengxi Luo, and Mark D. Smucker. 2021. Evaluation Measures Based on Preference Graphs. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1534--1543. https://doi.org/10.1145/3404835.3462947

Digital Library

[16]

Charles L.A. Clarke, Mark D. Smucker, and Alexandra Vtyurina. 2020. Offline Evaluation by Maximum Similarity to an Ideal Ranking. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA, 225--234. https://doi.org/10.1145/3340531.3411915

Digital Library

[17]

Cyril Cleverdon. 1967. The Cranfield Tests on Index Language Devices. In Aslib proceedings, Vol. 19. MCB UP Ltd, Emerald Publishing Ltd, 173--194. https://doi.org/10.1108/eb050097

[18]

Fabio Crestani, Stefano Mizzaro, and Ivan Scagnetto. 2017. Mobile Information Retrieval. Springer. https://doi.org/10.1007/978--3--319--60777--1

[19]

Marco Ferrante, Nicola Ferro, and Norbert Fuhr. 2021. Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales. IEEE Access 9 (2021), 136182--136216. https://doi.org/10.1109/ACCESS.2021. 3116857

[20]

Marco Ferrante, Nicola Ferro, and Maria Maistro. 2015. Towards a Formal Framework for Utility-Oriented Measurements of Retrieval Effectiveness. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval (ICTIR '15). Association for Computing Machinery, New York, NY, USA, 21--30. https://doi.org/10.1145/2808194.2809452

Digital Library

[21]

Hans P. Frei and Peter Schäuble. 1991. Determining the Effectiveness of Retrieval Algorithms. Information Processing & Management 27, 2 (1991), 153--164. https: //doi.org/10.1016/0306--4573(91)90046-O

[22]

Fernando Giner, Enrique Amigó, and Felisa Verdejo. 2020. Integrating Learned and Explicit Document Features for Reputation Monitoring in Social Media. Knowl. Inf. Syst. 62, 3 (2020), 951--985. https://doi.org/10.1007/s10115-019-01383-w

[23]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422--446. https://doi.org/10. 1145/582415.582418

Digital Library

[24]

Johannes Kiesel, Damiano Spina, Henning Wachsmuth, and Benno Stein. 2021. The Meant, the Said, and the Understood: Conversational Argument Search and Cognitive Biases. In Proceedings of the 3rd Conference on Conversational User Interfaces (CUI '21). Association for Computing Machinery, New York, NY, USA, Article 20, 5 pages. https://doi.org/10.1145/3469595.3469615

Digital Library

[25]

Fei Liu, Alistair Moffat, Timothy Baldwin, and Xiuzhen Zhang. 2016. Quit While Ahead: Evaluating Truncated Rankings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). Association for Computing Machinery, New York, NY, USA, 953--956. https://doi.org/10.1145/2911451.2914737

Digital Library

[26]

Simon Mason and N.E. Graham. 2002. Areas Beneath the Relative Operating Characteristics (ROC) and Relative Operating Levels (ROL) Curves: Statistical Significance and Interpretation. Quarterly Journal of the Royal Meteorological Society 128 (07 2002), 2145 -- 2166. https://doi.org/10.1256/003590002320603584

[27]

Alistair Moffat. 2013. Seven Numeric Properties of Effectiveness Metrics. In Information Retrieval Technology (AIRS'13). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--12. https://doi.org/10.1007/978--3--642--45068--6_1

[28]

Alistair Moffat and Justin Zobel. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (dec 2008), 27 pages. https://doi.org/10.1145/1416950.1416952

Digital Library

[29]

Anselmo Peñas and Álvaro Rodrigo. 2011. A Simple Measure to Assess Nonresponse. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL '11). Association for Computational Linguistics, Portland, Oregon, USA, 1415--1424. https: //aclanthology.org/P11--1142

[30]

Tetsuya Sakai. 2004. New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization (NTCIR-4). National Institute of Informatics (NII), Tokyo, Japan, 17 pages. http://research.nii.ac.jp/ntcir/workshop/ OnlineProceedings4/OPEN/NTCIR4-OPEN-SakaiTrev.pdf

[31]

Tetsuya Sakai. 2009. On the Robustness of Information Retrieval Metrics to Biased Relevance Assessments. Journal of Information Processing 17 (2009), 156--166. https://doi.org/10.2197/ipsjjip.17.156

[32]

Tetsuya Sakai and Zhaohao Zeng. 2019. Which Diversity Evaluation Measures Are "Good"?. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 595--604. https://doi.org/10.1145/ 3331184.3331215

Digital Library

[33]

Falk Scholer, Diane Kelly, and Ben Carterette. 2016. Information Retrieval Evaluation Using Test Collections. Inf. Retr. 19, 3 (2016), 225--229. https: //doi.org/10.1007/s10791-016--9281--7

[34]

Mark D. Smucker and Charles L.A. Clarke. 2012. Time-Based Calibration of Effectiveness Measures. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '12). Association for Computing Machinery, New York, NY, USA, 95--104. https: //doi.org/10.1145/2348283.2348300

Digital Library

[35]

Karen Sparck Jones and Cornelis J. van Rijsbergen. 1976. Information Retrieval Test Collections. J. Documentation 32, 1 (1976), 59--75. https://doi.org/10.1108/ eb026616

[36]

Damiano Spina, Johanne R. Trippas, Paul Thomas, Hideo Joho, Katriina Byström, Leigh Clark, Nick Craswell, Mary Czerwinski, David Elsweiler, Alexander Frummet, Souvick Ghosh, Johannes Kiesel, Irene Lopatovska, Daniel McDuff, Selina Meyer, Ahmed Mourad, Paul Owoicho, Sachin Pathiyan Cherumanal, Daniel Russell, and Laurianne Sitbon. 2021. Report on the Future Conversations Workshop at CHIIR 2021. SIGIR Forum 55, 1, Article 6 (jul 2021), 22 pages. https://doi.org/10.1145/3476415.3476421

Digital Library

[37]

Johanne R. Trippas, Damiano Spina, Paul Thomas, Mark Sanderson, Hideo Joho, and Lawrence Cavedon. 2020. Towards a Model for Spoken Conversational Search. Information Processing & Management 57, 2 (2020), 102162. https: //doi.org/10.1016/j.ipm.2019.102162

Digital Library

[38]

Ellen M. Voorhees. 2001. Evaluation by Highly Relevant Documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). Association for Computing Machinery, New York, NY, USA, 74--82. https://doi.org/10.1145/383952.383963

Digital Library

[39]

Ellen M. Voorhees. 2002. The Philosophy of Information Retrieval Evaluation. In Evaluation of Cross-Language Information Retrieval Systems (CLEF '02), Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 355--370. https://doi.org/10.1007/3--540--45691- 0_34

[40]

Ellen M. Voorhees and Donna K. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. Vol. 1. MIT Press Cambridge.

Digital Library

[41]

Hamed Zamani, Johanne R Trippas, Jeff Dalton, and Filip Radlinski. 2022. Conversational Information Seeking. arXiv preprint arXiv:2201.08808 (2022). https://doi.org/10.48550/arXiv.2201.08808

Cited By

Pathiyan Cherumanal STian LAbushaqra FMagnossão de Paula AJi KAli HHettiachchi DTrippas JScholer FSpina D(2024)Walert: Putting Conversational Information Seeking Knowledge into Action by Building and Evaluating a Large Language Model-Powered ChatbotProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638309(401-405)Online publication date: 10-Mar-2024
https://dl.acm.org/doi/10.1145/3627508.3638309
Kweon WKang SJang SYu HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Top-Personalized-K RecommendationProceedings of the ACM Web Conference 202410.1145/3589334.3645417(3388-3399)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645417
Ye FLi SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645349

Index Terms

Ranking Interruptus: When Truncated Rankings Are Better and How to Measure That
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness

Recommendations

An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and Protection

This paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV

This study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Spanish Ministry of Economic Affairs and Digital Transformation
Australian Research Council Centre of Excellence for Automated Decision-Making and Society
Australian Research Council

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)7

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pathiyan Cherumanal STian LAbushaqra FMagnossão de Paula AJi KAli HHettiachchi DTrippas JScholer FSpina D(2024)Walert: Putting Conversational Information Seeking Knowledge into Action by Building and Evaluating a Large Language Model-Powered ChatbotProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638309(401-405)Online publication date: 10-Mar-2024
https://dl.acm.org/doi/10.1145/3627508.3638309
Kweon WKang SJang SYu HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Top-Personalized-K RecommendationProceedings of the ACM Web Conference 202410.1145/3589334.3645417(3388-3399)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645417
Ye FLi SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645349
Feng DKarmaker S(2023)Joint upper & expected value normalization for evaluation of retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10340460:4Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.ipm.2023.103404

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten