skip to main content
10.1145/3511808.3557816acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Best Practices for Top-N Recommendation Evaluation: Candidate Set Sampling and Statistical Inference Techniques

Published:17 October 2022Publication History

ABSTRACT

Top-N recommendation evaluation experiments are complex, with many decisions needed. These decisions are often made inconsistently, and we don't have clear best practices for many of them. The goal of this project, is to identify, substantiate, and document best practices to improve evaluations.

References

  1. Alejandro Bellogin, Pablo Castells, and Ivan Cantador. 2011. Precision-oriented evaluation of recommender systems: an algorithmic comparison. In Proceedings of the fifth ACM conference on Recommender systems. 333--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Roc'io Ca namares and Pablo Castells. 2020. On target item sampling in offline recommender system evaluation. In Fourteenth ACM Conference on Recommender Systems. 259--268.Google ScholarGoogle Scholar
  3. Ben Carterette. 2011. Model-based inference about IR systems. In Conference on the Theory of Information Retrieval. Springer, 101--112.Google ScholarGoogle ScholarCross RefCross Ref
  4. Michael D Ekstrand and Vaibhav Mahant. 2017. Sturgeon and the cool kids: Problems with random decoys for top-n recommender evaluation. In The Thirtieth International Flairs Conference.Google ScholarGoogle Scholar
  5. Blakeley B McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L Tackett. 2019. Abandon statistical significance. The American Statistician, Vol. 73, sup1 (2019), 235--245.Google ScholarGoogle ScholarCross RefCross Ref
  6. Javier Parapar, David E Losada, Manuel A Presedo-Quindimil, and Alvaro Barreiro. 2020. Using score distributions to compare statistical significance tests for information retrieval evaluation. Journal of the Association for Information Science and Technology, Vol. 71, 1 (2020), 98--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guy Shani and Asela Gunawardana. 2011. Evaluating recommendation systems. In Recommender systems handbook. Springer, 257--297.Google ScholarGoogle Scholar
  8. Mark D Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. 623--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Julián Urbano, Harlley Lima, and Alan Hanjalic. 2019. Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 505--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Julián Urbano and Thomas Nagler. 2018. Stochastic simulation of test collections: Evaluation scores. In The 41st international ACM SIGIR conference on research & development in information retrieval. 695--704.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. 2019. Moving to a world beyond ''p < 0.05", 19 pages.Google ScholarGoogle Scholar

Index Terms

  1. Best Practices for Top-N Recommendation Evaluation: Candidate Set Sampling and Statistical Inference Techniques

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
      October 2022
      5274 pages
      ISBN:9781450392365
      DOI:10.1145/3511808
      • General Chairs:
      • Mohammad Al Hasan,
      • Li Xiong

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)39
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader