ABSTRACT
Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.
Supplemental Material
- J. Bennet and S. Lanning. The Netflix Prize. In Workshop at SIGKDD-07, ACM Conference on Knowledge Discovery and Data Mining, 2007.Google Scholar
- MovieLens data. homepage: http://www.grouplens.org/node/73, 2006.Google Scholar
- S. Deerwester, S. Dumais, G. Furnas, R. Harshman, T. Landauer, K. Lochbaum, Lynn Streeter, et al. Latent semantic analysis / indexing. homepage: http://lsa.colorado.edu/.Google Scholar
- S. Funk. Netflix update: Try this at home, 2006. http://sifter.org/ simon/journal/20061211.html.Google Scholar
- D. J. Hand and R. J. Till. A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 45:171--86, 2001. Google ScholarDigital Library
- Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In International Conference on Data Mining (ICDM), 2008. Google ScholarDigital Library
- R. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. arXiv:0906.2027, 2009.Google Scholar
- Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Conf. on Knowledge Discovery and Data Mining (KDD), 2008. Google ScholarDigital Library
- M. Kurucz, A. Benczur, T. Kiss, I. Nagy, A. Szabo, and B. Torma. KDD Cup 2007 task 1 winner report. ACM SIGKDD Explorations Newsletter, 9:53--6, 2007. Google ScholarDigital Library
- R. Little and D. B. Rubin. Statistical Analysis with missing data. Wiley, 1986. Google ScholarDigital Library
- B. Marlin and R. Zemel. Collaborative prediction and ranking with non-random missing data. In ACM Conference on Recommender Systems (RecSys), 2009. Google ScholarDigital Library
- B. Marlin, R. Zemel, S. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. In Conf. on Uncertainty in Artificial Intelligence (UAI), 2007.Google Scholar
- A. Paterek. Improving regularized singular value decomposition for collaborative filtering. KDDCup 2007.Google Scholar
- D. B. Rubin. Inference and missing data. Biometrika, 63:581--92, 1976.Google ScholarCross Ref
- R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted Boltzmann machines for collaborative filtering. In Int. Conf. on Machine Learning (ICML), 2007. Google ScholarDigital Library
- N. Srebro and T. Jaakkola. Weighted low-rank approximations. In International Conference on Machine Learning (ICML), pages 720--7, 2003.Google Scholar
- H. Steck. Hinge rank loss and the area under the ROC curve. In Proceedings of the European Conference on Machine Learning (ECML), 2007. Google ScholarDigital Library
- M. Weimer, A. Karatzoglou, Q. Le, and A. Smola. Cofi rank - maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems (NIPS), 2008.Google Scholar
- S. Wu and P. Flach. A scored AUC metric for classifier evaluation and selection. In ROCML workshop at ICML, 2005.Google Scholar
Index Terms
- Training and testing of recommender systems on data missing not at random
Recommendations
Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...
A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data MiningRecommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
Investigating serendipity in recommender systems based on real user feedback
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingOver the past several years, research in recommender systems has emphasized the importance of serendipity, but there is still no consensus on the definition of this concept and whether serendipitous items should be recommended is still not a well-...
Comments