skip to main content
10.1145/1645953.1646033acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Expected reciprocal rank for graded relevance

Published: 02 November 2009 Publication History

Abstract

While numerous metrics for information retrieval are available in the case of binary relevance, there is only one commonly used metric for graded relevance, namely the Discounted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assumption: a document in a given position has always the same gain and discount independently of the documents shown above it. Inspired by the "cascade" user model, we present a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents. More precisely, this new metric is defined as the expected reciprocal length of time that the user will take to find a relevant document. This can be seen as an extension of the classical reciprocal rank to the graded relevance case and we call this metric Expected Reciprocal Rank (ERR). We conduct an extensive evaluation on the query logs of a commercial search engine and show that ERR correlates better with clicks metrics than other editorial metrics.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. 2nd Intl. Conf. on Web search and Web Data Mining, pages 5--14, New York, NY, USA, 2009. ACM. {2} A. Al-Maskari, M. Sanderson, and P. Clough. The
[2]
relationship between IR effectiveness measures and user satisfaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 773--774. ACM New York, NY, USA, 2007.
[3]
J. A. Aslam and E. Yilmaz. Inferring document relevance from incomplete information. In Proc. 16th Intl. Conf. on Information and Knowledge Management, pages 633--642, New York, NY, USA, 2007. ACM.
[4]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the International Conference on Machine Learning, 2005.
[5]
S. Buttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proc. 30th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 63--70, New York, NY, USA, 2007. ACM.
[6]
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proc. 29th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 268--275, New York, NY, USA, 2006. ACM.
[7]
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In Proc. 21st Proc. of Advances in Neural Information Processing Systems, 2007.
[8]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In Proc. 18th Intl. Conf. on World Wide Web, pages 1--10, New York, NY, USA, 2009. ACM.
[9]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. 31st Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 659--666, New York, NY, USA, 2008. ACM.
[10]
C. W. Cleverdon. Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing Systems. Cranfield College of Aeronautics, 1962.
[11]
W. Cooper. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19:30--41, 1968.
[12]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proc. 1st Intl. Conf. on Web search and Web Ddata Mining, pages 87--94, New York, NY, USA, 2008. ACM.
[13]
I. Good and Y. Mittal. The amalgamation and geometry of two-by-two contingency table. The Annals of Statistics, pages 694--711, 1987.
[14]
D. Harman. Overview of the TREC 2002 novelty track. In Proc. 11th Text REtrieval Conference, 2002.
[15]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[16]
J. Kekalainen. Binary and graded relevance in IR evaluations -- comparison of the effects on ranking of IR systems. Information Processing and Management, 41(5):1019--1033, 2005.
[17]
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):1--27, 2008.
[18]
P. Over. TREC-7 interactive track report. In Proc. 7th Text REtrieval Conference, 1998.
[19]
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In Proc 17th Intl. Conf. on Information and Knowledge Management. ACM New York, NY, USA, 2008.
[20]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 521--530. ACM, 2007.
[21]
T. Sakai. Alternatives to bpref. In Proc. 30th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 71--78, New York, NY, USA, 2007. ACM.
[22]
T. Sakai and S. Robertson. Modelling a user population for designing information retrieval metrics. In Proceedings of the Second Workshop on Evaluating Information Access (EVIA), 2008.
[23]
T. Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. part iii: Behavior and effects of relevance. J. Am. Soc. Inf. Sci. Technol., 58(13):2126--2144, 2007.
[24]
M.-C. Tang and Y. Sun. Evaluation of web-based search engines using user-effort measures. Library and Information Science Research Electronic Journal, 13(2), 2003.
[25]
E. M. Voorhees. The philosophy of information retrieval evaluation. In Proc. 2nd CLEF Workshop on Evaluation of Cross-Language Information Retrieval Systems, pages 355--370, London, UK, 2002. Springer-Verlag.
[26]
E. M. Voorhees and D. M. Tice. The TREC-8 question answering track evaluation. In In Text Retrieval Conference TREC-8, pages 83--105, 1999.
[27]
C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. 26th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 10--17, New York, NY, USA, 2003. ACM.
[28]
K. Zhou, H. Zha, G. Xue, and Y. Yu. Learning the gain values and discount factors of DCG. In P. N. Bennett, B. Carterette, O. Chapelle, and T. Joachims, editors, SIGIR Workshop on Beyond Binary Relevance: Preferences, Diversity, and Set-Level Judgments, pages 7--14, 2008.

Cited By

View all
  • (2025)Ranking approaches for similarity-based web element locationJournal of Systems and Software10.1016/j.jss.2024.112286222(112286)Online publication date: Apr-2025
  • (2025)Web search result diversification by combining global and local document featuresApplied Soft Computing10.1016/j.asoc.2024.112543169(112543)Online publication date: Jan-2025
  • (2025)Ranking guidance actions to support engineers in fulfilling process constraintsJournal of Software: Evolution and Process10.1002/smr.272937:1Online publication date: 22-Jan-2025
  • Show More Cited By

Index Terms

  1. Expected reciprocal rank for graded relevance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
    November 2009
    2162 pages
    ISBN:9781605585123
    DOI:10.1145/1645953
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. click logs
    2. evaluation
    3. non-binary relevance
    4. user model
    5. web search

    Qualifiers

    • Research-article

    Conference

    CIKM '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)204
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Ranking approaches for similarity-based web element locationJournal of Systems and Software10.1016/j.jss.2024.112286222(112286)Online publication date: Apr-2025
    • (2025)Web search result diversification by combining global and local document featuresApplied Soft Computing10.1016/j.asoc.2024.112543169(112543)Online publication date: Jan-2025
    • (2025)Ranking guidance actions to support engineers in fulfilling process constraintsJournal of Software: Evolution and Process10.1002/smr.272937:1Online publication date: 22-Jan-2025
    • (2024)Enhancing Personalized Travel Recommendations: Integrating User Behavior and Content AnalysisProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.49Online publication date: 2024
    • (2024)Identifiability mattersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692342(7057-7080)Online publication date: 21-Jul-2024
    • (2024)Challenges for first contact physiotherapists' managing sickness absence: Consensus development using the nominal group techniqueClinical Rehabilitation10.1177/0269215524130008939:2(236-248)Online publication date: 20-Dec-2024
    • (2024)Decoy Effect in Search Interaction: Understanding User Behavior and Measuring System VulnerabilityACM Transactions on Information Systems10.1145/370888443:2(1-58)Online publication date: 19-Dec-2024
    • (2024)Offline Evaluation of Set-Based Text-to-Image GenerationProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698424(42-53)Online publication date: 8-Dec-2024
    • (2024)Evaluating Relative Retrieval Effectiveness with Normalized Residual GainProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698410(64-71)Online publication date: 8-Dec-2024
    • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media