skip to main content
10.1145/1835804.1835895acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Training and testing of recommender systems on data missing not at random

Published:25 July 2010Publication History

ABSTRACT

Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.

Skip Supplemental Material Section

Supplemental Material

kdd2010_steck_ttrs_01.mov

mov

145.5 MB

References

  1. J. Bennet and S. Lanning. The Netflix Prize. In Workshop at SIGKDD-07, ACM Conference on Knowledge Discovery and Data Mining, 2007.Google ScholarGoogle Scholar
  2. MovieLens data. homepage: http://www.grouplens.org/node/73, 2006.Google ScholarGoogle Scholar
  3. S. Deerwester, S. Dumais, G. Furnas, R. Harshman, T. Landauer, K. Lochbaum, Lynn Streeter, et al. Latent semantic analysis / indexing. homepage: http://lsa.colorado.edu/.Google ScholarGoogle Scholar
  4. S. Funk. Netflix update: Try this at home, 2006. http://sifter.org/ simon/journal/20061211.html.Google ScholarGoogle Scholar
  5. D. J. Hand and R. J. Till. A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 45:171--86, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In International Conference on Data Mining (ICDM), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. arXiv:0906.2027, 2009.Google ScholarGoogle Scholar
  8. Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Conf. on Knowledge Discovery and Data Mining (KDD), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Kurucz, A. Benczur, T. Kiss, I. Nagy, A. Szabo, and B. Torma. KDD Cup 2007 task 1 winner report. ACM SIGKDD Explorations Newsletter, 9:53--6, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Little and D. B. Rubin. Statistical Analysis with missing data. Wiley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Marlin and R. Zemel. Collaborative prediction and ranking with non-random missing data. In ACM Conference on Recommender Systems (RecSys), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Marlin, R. Zemel, S. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. In Conf. on Uncertainty in Artificial Intelligence (UAI), 2007.Google ScholarGoogle Scholar
  13. A. Paterek. Improving regularized singular value decomposition for collaborative filtering. KDDCup 2007.Google ScholarGoogle Scholar
  14. D. B. Rubin. Inference and missing data. Biometrika, 63:581--92, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted Boltzmann machines for collaborative filtering. In Int. Conf. on Machine Learning (ICML), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Srebro and T. Jaakkola. Weighted low-rank approximations. In International Conference on Machine Learning (ICML), pages 720--7, 2003.Google ScholarGoogle Scholar
  17. H. Steck. Hinge rank loss and the area under the ROC curve. In Proceedings of the European Conference on Machine Learning (ECML), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Weimer, A. Karatzoglou, Q. Le, and A. Smola. Cofi rank - maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems (NIPS), 2008.Google ScholarGoogle Scholar
  19. S. Wu and P. Flach. A scored AUC metric for classifier evaluation and selection. In ROCML workshop at ICML, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Training and testing of recommender systems on data missing not at random

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader