research-article

Training and testing of recommender systems on data missing not at random

Author:
Harald Steck

Bell Labs, Alcatel-Lucent, Murray Hill, NJ, USA

Bell Labs, Alcatel-Lucent, Murray Hill, NJ, USA
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 713–722https://doi.org/10.1145/1835804.1835895

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 713–722

ABSTRACT

Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.

Supplemental Material

kdd2010_steck_ttrs_01.mov

mov

145.5 MB

Download

References

J. Bennet and S. Lanning. The Netflix Prize. In Workshop at SIGKDD-07, ACM Conference on Knowledge Discovery and Data Mining, 2007.Google Scholar
MovieLens data. homepage: http://www.grouplens.org/node/73, 2006.Google Scholar
S. Deerwester, S. Dumais, G. Furnas, R. Harshman, T. Landauer, K. Lochbaum, Lynn Streeter, et al. Latent semantic analysis / indexing. homepage: http://lsa.colorado.edu/.Google Scholar
S. Funk. Netflix update: Try this at home, 2006. http://sifter.org/ simon/journal/20061211.html.Google Scholar
D. J. Hand and R. J. Till. A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 45:171--86, 2001. Google ScholarDigital Library
Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In International Conference on Data Mining (ICDM), 2008. Google ScholarDigital Library
R. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. arXiv:0906.2027, 2009.Google Scholar
Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Conf. on Knowledge Discovery and Data Mining (KDD), 2008. Google ScholarDigital Library
M. Kurucz, A. Benczur, T. Kiss, I. Nagy, A. Szabo, and B. Torma. KDD Cup 2007 task 1 winner report. ACM SIGKDD Explorations Newsletter, 9:53--6, 2007. Google ScholarDigital Library
R. Little and D. B. Rubin. Statistical Analysis with missing data. Wiley, 1986. Google ScholarDigital Library
B. Marlin and R. Zemel. Collaborative prediction and ranking with non-random missing data. In ACM Conference on Recommender Systems (RecSys), 2009. Google ScholarDigital Library
B. Marlin, R. Zemel, S. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. In Conf. on Uncertainty in Artificial Intelligence (UAI), 2007.Google Scholar
A. Paterek. Improving regularized singular value decomposition for collaborative filtering. KDDCup 2007.Google Scholar
D. B. Rubin. Inference and missing data. Biometrika, 63:581--92, 1976.Google ScholarCross Ref
R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted Boltzmann machines for collaborative filtering. In Int. Conf. on Machine Learning (ICML), 2007. Google ScholarDigital Library
N. Srebro and T. Jaakkola. Weighted low-rank approximations. In International Conference on Machine Learning (ICML), pages 720--7, 2003.Google Scholar
H. Steck. Hinge rank loss and the area under the ROC curve. In Proceedings of the European Conference on Machine Learning (ECML), 2007. Google ScholarDigital Library
M. Weimer, A. Karatzoglou, Q. Le, and A. Smola. Cofi rank - maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems (NIPS), 2008.Google Scholar
S. Wu and P. Flach. A scored AUC metric for classifier evaluation and selection. In ROCML workshop at ICML, 2005.Google Scholar

Index Terms

Training and testing of recommender systems on data missing not at random
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03

Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...
Read More
A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining

Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
Read More
Investigating serendipity in recommender systems based on real user feedback
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Over the past several years, research in recommender systems has emphasized the importance of serendipity, but there is still no consensus on the definition of this concept and whether serendipitous items should be recommended is still not a well-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
recommender systems
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 2,997
  Total Downloads
- Downloads (Last 12 months)158
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Training and testing of recommender systems on data missing not at random

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Acquiring User Information Needs for Recommender Systems

A Scalable, Accurate Hybrid Recommender System

Investigating serendipity in recommender systems based on real user feedback