research-article

Rank-biased precision for measurement of retrieval effectiveness

Authors:

Alistair Moffat,

Justin ZobelAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 27, Issue 1

Article No.: 2, Pages 1 - 27

https://doi.org/10.1145/1416950.1416952

Published: 23 December 2008 Publication History

Abstract

A range of methods for measuring the effectiveness of information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary of a document ranking relative to a query. However, many of these measures have failings. For example, recall is not well founded as a measure of satisfaction, since the user of an actual system cannot judge recall. Average precision is derived from recall, and suffers from the same problem. In addition, average precision lacks key stability properties that are needed for robust experiments. In this article, we introduce a new effectiveness metric, rank-biased precision, that avoids these problems. Rank-biased pre-cision is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.

References

[1]

Allan, J., Carterette, B., and Lewis, J. 2005. When will information retrieval be “good enough”&quest; See Marchionini et al. {2005}, 433--440.

Digital Library

[2]

Aslam, J. A., Pavlu, V., and Yilmaz, E. 2006. A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Järvelin, Eds. ACM Press, New York, NY, 541--548.

Digital Library

[3]

Aslam, J. A., Yilmaz, E., and Pavlu, V. 2005. A geometric interpretation of r-precision and its correlation with average precision. See See Marchionini et al. {2005}, 573--574.

Digital Library

[4]

Borlund, P. and Ingwersen, P. 1998. Measures of relative relevance and ranked half-life: Performance indicators for interactive IR. In Proceedings of the Twenty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 324--331.

Digital Library

[5]

Buckley, C. and Voorhees, E. M. 2004. Retrieval evaluation with incomplete information. In Proceedings of the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, M. Sanderson, K. Järvelin, J. Allan, and P. Bruza, Eds. ACM Press, New York, NY, 25--32.

Digital Library

[6]

Buckley, C. and Voorhees, E. M. 2005. Retrieval system evaluation. In TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge, MA, Chapter 3, 53--75.

[7]

Büttcher, S., Clarke, C. L. A., Yeung, P. C. K., and Soboroff, I. 2007. Reliable information retrieval evaluation with incomplete and biased judgements. In Proceedings of the Thirtieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, C. L. A. Clarke, N. Fuhr, N. Kando, W. Kraaij, and A. P. de Vries, Eds. ACM Press, New York, NY, 63--70.

Digital Library

[8]

Carterette, B., Allan, J., and Sitaraman, R. 2006. Minimal test collections for retrieval evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Järvelin, Eds. ACM Press, New York, NY, 268--275.

Digital Library

[9]

Cooper, W. S. 1968. Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. Amer. Document. 19, 1 (Jan.), 30--41.

[10]

Cooper, W. S. 1973. On selecting a measure of retrieval effectiveness: Part I, the ‘subjective’ philosophy of evaluation. J. Amer. Soc. Inform. Sci. 24, 87--100.

[11]

Cormack, G. V. and Lynam, T. R. 2006. Statistical precision of information retrieval evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Järvelin, Eds. ACM Press, New York, NY, 533--540.

Digital Library

[12]

Della Mea, V. and Mizzaro, S. 2004. Measuring retrieval effectiveness: a new proposal and a first experimental validation. J. Amer. Soc. Inform. Sci. Tech. 55, 6, 530--543.

Digital Library

[13]

Frei, H. P. and Schäuble, P. 1991. Determining the effectiveness of retrieval algorithms. Inform. Process. Manage. 27, 2/3, 153--164.

Digital Library

[14]

Harman, D. 1995. Overview of the second text retrieval conference (TREC-2). Inform. Process. Manage. 31, 3, 271--289.

Digital Library

[15]

Harter, S. P. 1996. Variations in relevance assessments and the measurement of retrieval effectiveness. J. Amer. Soc. Inform. Sci. 47, 1, 37--49.

Digital Library

[16]

Hosanagar, K. 2005. A utility theoretic approach to determining optimal wait times in distributed information retrieval. See See Marchionini et al. {2005}, 91--97.

Digital Library

[17]

Järvelin, K. and Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20, 4, 422--446.

Digital Library

[18]

Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. See See Marchionini et al. {2005}, 154--161.

Digital Library

[19]

Kagolovsky, Y. and Moehr, J. R. 2003. Current status of the evaluation of information retrieval. J. Med. Syst. 27, 5, 409--424.

Digital Library

[20]

Keen, E. M. 1992. Presenting results of experimental retrieval comparisons. Inform. Process. Manage. 28, 4, 491--502.

Digital Library

[21]

Kekäläinen, J. 2005. Binary and graded relevance in IR evaluations. Inform. Process. Manage. 41, 5, 1019--1034.

Digital Library

[22]

Losee, R. M. 2000. When information retrieval measures agree about the relative quality of document rankings. J. Amer. Soc. Inform. Sci. 51, 9, 834--840.

Digital Library

[23]

Marchionini, G., Moffat, A., Tate, J., Baeza-Yates, R., and Ziviani, N., Eds. 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY.

[24]

Meng, X. and Chen, Z. 2004. On user-oriented measurements of effectiveness of Web information retrieval systems. In Proceedings of the International Conference on Internet Computing, H. R. Arabnia, O. Droegehorn, and S. Chatterjee, Eds. CSREA Press, Las Vegas, NV, 527--533.

[25]

Mizzaro, S. 1997. Relevance: The whole history. J. Amer. Soc. Inform. Sci. 48, 9, 810--832.

Digital Library

[26]

Moffat, A., Webber, W., and Zobel, J. 2007. Strategic system comparisons via targeted relevance judgments. In Proceedings of the Thirtieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, C. L. A. Clarke, N. Fuhr, N. Kando, W. Kraaij, and A. P. de Vries, Eds. ACM Press, New York, NY, 375--382.

Digital Library

[27]

Raghavan, V. V., Jung, G. S., and Bollmann, P. 1989. A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inform. Syst. 7, 3, 205--229.

Digital Library

[28]

Sakai, T. 2004. Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the AIRS Asian Information Retrieval Symposium. Lecture Notes in Computer Science, vol. 3411. Springer, Berlin, Germany, 251--262.

Digital Library

[29]

Sakai, T. 2007. Alternatives to Bpref. In Proceedings of the Thirtieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, C. L. A. Clarke, N. Fuhr, N. Kando, W. Kraaij, and A. P. de Vries, Eds. ACM Press, New York, NY, 71--78.

Digital Library

[30]

Sanderson, M. and Zobel, J. 2005. Information retrieval system evaluation: Effort, sensitivity, and reliability. See See Marchionini et al. {2005}, 162--169.

Digital Library

[31]

Saracevic, T. 1995. Evaluation of evaluation in information retrieval. In Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 138--146.

Digital Library

[32]

Shaw, Jr., W. M. 1986. On the foundation of evaluation. J. Amer. Soc. Inform. Sci. 37, 5, 346--348.

[33]

Su, L. T. 1994. The relevance of recall and precision in user evaluation. J. Amer. Soc. Inform. Sci. 45, 3 (Apr.), 207--217.

Digital Library

[34]

Tague-Sutcliffe, J. 1992. The pragmatics of information retrieval experimentation, revisited. Inform. Process. Manage. 28, 4, 467--490.

Digital Library

[35]

van Rijsbergen, C. J. 1979. Information Retrieval, 2nd ed. Butterworths, London, U.K.

Digital Library

[36]

Voorhees, E. M. 2002. The philosophy of information retrieval evaluation. In Proceedings of the 2001 Cross Language Evaluation Forum Workshop, C. Peters, M. Braschler, J. Gonzalo, and M. Kluck, Eds. Lecture Notes in Computer Science, vol. 2406, Springer, Berlin, Germany, 355--370.

Digital Library

[37]

Webber, W., Moffat, A., and Zobel, J. 2008. Score standardization for inter-collection comparison of retrieval systems. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY. To appear.

Digital Library

[38]

Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd ed. Morgan Kaufmann, San Francisco, CA.

Digital Library

[39]

Zobel, J. 1998. How reliable are the results of large-scale information retrieval experiments&quest; In Proceedings of the Twenty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 307--314.

Digital Library

Cited By

Hirsch THofer B(2025)Best practices for evaluating IRFL approachesJournal of Systems and Software10.1016/j.jss.2025.112342(112342)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2025.112342
Liu HLi ZHan BChen XPaul DLiu Y(2025)Integrating neural mutation into mutation-based fault localization: A hybrid approachJournal of Systems and Software10.1016/j.jss.2024.112281221(112281)Online publication date: Mar-2025
https://doi.org/10.1016/j.jss.2024.112281
Wu SYang BChang ZLi ZChen XLiu Y(2025)Boosting mutation-based fault localization by effectively generating Higher-Order MutantsInformation and Software Technology10.1016/j.infsof.2024.107660180(107660)Online publication date: Apr-2025
https://doi.org/10.1016/j.infsof.2024.107660
Show More Cited By

Index Terms

Rank-biased precision for measurement of retrieval effectiveness
1. Information systems
  1. Information retrieval

Recommendations

The measures precision, recall, fallout and miss as a function of the number of retrieved documents and their mutual interrelations

In this paper, for the first time, we present global curves for the measures precision, recall, fallout and miss in function of the number of retrieved documents. Different curves apply for different retrieved systems, for which we give exact ...
New approach for field association term dictionary with passage retrieval
ACMOS'07: Proceedings of the 9th WSEAS international conference on Automatic control, modelling and simulation

Field Association (FA) terms are a limited set of discriminating terms that can specify document fields. Document fields can be decided efficiently if there are many relevant FA terms in that documents. An earlier approach built FA terms dictionary ...
Retrieval effectiveness of cross language information retrieval search engines
ICADL'11: Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation

This study evaluates the retrieval effectiveness of English-Chinese (EC) cross-language information retrieval (CLIR) on four common search engines along the dimensions of recall and precision. We formulated a set of simple and complex queries on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 27, Issue 1

December 2008

208 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1416950

Issue’s Table of Contents

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 December 2008

Accepted: 01 April 2008

Revised: 01 September 2007

Received: 01 October 2005

Published in TOIS Volume 27, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

452
Total Citations
View Citations
2,658
Total Downloads

Downloads (Last 12 months)147
Downloads (Last 6 weeks)10

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hirsch THofer B(2025)Best practices for evaluating IRFL approachesJournal of Systems and Software10.1016/j.jss.2025.112342(112342)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2025.112342
Liu HLi ZHan BChen XPaul DLiu Y(2025)Integrating neural mutation into mutation-based fault localization: A hybrid approachJournal of Systems and Software10.1016/j.jss.2024.112281221(112281)Online publication date: Mar-2025
https://doi.org/10.1016/j.jss.2024.112281
Wu SYang BChang ZLi ZChen XLiu Y(2025)Boosting mutation-based fault localization by effectively generating Higher-Order MutantsInformation and Software Technology10.1016/j.infsof.2024.107660180(107660)Online publication date: Apr-2025
https://doi.org/10.1016/j.infsof.2024.107660
Dinh PLe DHoang TLe D(2025)Towards efficient pareto-optimal utility-fairness between groups in repeated rankingsMachine Learning10.1007/s10994-024-06679-9114:3Online publication date: 6-Feb-2025
https://doi.org/10.1007/s10994-024-06679-9
Chen HDou ZMao J(2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-981-96-1024-2_2
Kimura EKawakami YInoue SOkajima A(2024)Mapping Drug Terms via Integration of a Retrieval-Augmented Generation Algorithm with a Large Language ModelHealthcare Informatics Research10.4258/hir.2024.30.4.35530:4(355-363)Online publication date: 31-Oct-2024
https://doi.org/10.4258/hir.2024.30.4.355
ZHENG WHU HCHEN TYANG FFAN XXIAO P(2024)Boosting Spectrum-Based Fault Localization via Multi-Correct Programs in Online ProgrammingIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7164E107.D:4(525-536)Online publication date: 1-Apr-2024
https://doi.org/10.1587/transinf.2023EDP7164
Chen NLiu JFang HLuo YSakai TWu X(2024)Decoy Effect in Search Interaction: Understanding User Behavior and Measuring System VulnerabilityACM Transactions on Information Systems10.1145/370888443:2(1-58)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3708884
Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Arabzadeh NDiaz FHe JSakai TIshita EOhshima HHasibi FMao JJose J(2024)Offline Evaluation of Set-Based Text-to-Image GenerationProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698424(42-53)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698424
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents