ABSTRACT
Top-N recommendation evaluation experiments are complex, with many decisions needed. These decisions are often made inconsistently, and we don't have clear best practices for many of them. The goal of this project, is to identify, substantiate, and document best practices to improve evaluations.
- Alejandro Bellogin, Pablo Castells, and Ivan Cantador. 2011. Precision-oriented evaluation of recommender systems: an algorithmic comparison. In Proceedings of the fifth ACM conference on Recommender systems. 333--336.Google ScholarDigital Library
- Roc'io Ca namares and Pablo Castells. 2020. On target item sampling in offline recommender system evaluation. In Fourteenth ACM Conference on Recommender Systems. 259--268.Google Scholar
- Ben Carterette. 2011. Model-based inference about IR systems. In Conference on the Theory of Information Retrieval. Springer, 101--112.Google ScholarCross Ref
- Michael D Ekstrand and Vaibhav Mahant. 2017. Sturgeon and the cool kids: Problems with random decoys for top-n recommender evaluation. In The Thirtieth International Flairs Conference.Google Scholar
- Blakeley B McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L Tackett. 2019. Abandon statistical significance. The American Statistician, Vol. 73, sup1 (2019), 235--245.Google ScholarCross Ref
- Javier Parapar, David E Losada, Manuel A Presedo-Quindimil, and Alvaro Barreiro. 2020. Using score distributions to compare statistical significance tests for information retrieval evaluation. Journal of the Association for Information Science and Technology, Vol. 71, 1 (2020), 98--113.Google ScholarDigital Library
- Guy Shani and Asela Gunawardana. 2011. Evaluating recommendation systems. In Recommender systems handbook. Springer, 257--297.Google Scholar
- Mark D Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. 623--632.Google ScholarDigital Library
- Julián Urbano, Harlley Lima, and Alan Hanjalic. 2019. Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 505--514.Google ScholarDigital Library
- Julián Urbano and Thomas Nagler. 2018. Stochastic simulation of test collections: Evaluation scores. In The 41st international ACM SIGIR conference on research & development in information retrieval. 695--704.Google ScholarDigital Library
- Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. 2019. Moving to a world beyond ''p < 0.05", 19 pages.Google Scholar
Index Terms
- Best Practices for Top-N Recommendation Evaluation: Candidate Set Sampling and Statistical Inference Techniques
Recommendations
User-Specific Feature-Based Similarity Models for Top-n Recommendation of New Items
Survey Paper, Regular Papers and Special Section on Participatory Sensing and Crowd IntelligenceRecommending new items for suitable users is an important yet challenging problem due to the lack of preference history for the new items. Noncollaborative user modeling techniques that rely on the item features can be used to recommend new items. ...
Tag-Based Collaborative Filtering Recommendation in Personal Learning Environments
The personal learning environment (PLE) concept offers a learner-centric view of learning and suggests a shift from knowledge-push to knowledge-pull approach to learning. One concern with a PLE-driven knowledge-pull approach to learning, however, is ...
Serendipitous Personalized Ranking for Top-N Recommendation
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Serendipitous recommendation has benefitted both e-retailers and users. It tends to suggest items which are both unexpected and useful to users. These items are not only profitable to the retailers but also surprisingly suitable to consumers' tastes. ...
Comments