Abstract
Modern recommender systems rely on user preference data to understand, analyze and provide items of interest to users. However, for some domains, collecting and sharing such data can be problematic: it may be expensive to gather data from several users, or it may be undesirable to share real user data for privacy reasons. We therefore propose a new model for generating realistic preference data. Our Sparse Probabilistic User Preference (SPUP) model produces synthetic data by sparsifying an initially dense user preference matrix generated by a standard matrix factorization model. The model incorporates aggregate statistics of the original data, such as user activity level and item popularity, as well as their interaction, to produce realistic data. We show empirically that our model can reproduce real-world datasets from different domains to a high degree of fidelity according to several measures. Our model can be used by both researchers and practitioners to generate new datasets or to extend existing ones, enabling the sound testing of new models and providing an improved form of bootstrapping in cases where limited data is available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The idea of using the combination of user budgets and item popularity has also been exploited for sampling preference matrices in the context of stochastic variational inference [17].
- 2.
Recent work has proposed the use of Poisson-observation matrix factorization models [18]. Using such models would alleviate the need for this discretization step but this is largely independent of our proposed approach.
References
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. J. Comput. 42, 30–37 (2009)
Maxwell Harper, F., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5(4) (2015). Article no. 19
ICAPS. Ipc. http://www.icaps-conference.org/index.php/Main/Competitions
Cassandra, T.: POMDP file repository. http://www.pomdp.org/examples/
RL-GLUE. Reinforcement learning glue. http://glue.rl-community.org/
Cointet, J.P., Roth, C.: How realistic should knowledge diffusion models be. J. Artif. Soc. Soc. Simul. 10(3), 1–11 (2007)
Leskovec, J.: Dynamics of large networks. Ph.D. thesis, Carnegie Mellon University (2008)
Rubin, D.B.: Discussion statistical disclosure limitation. JOS 9(2), 461–468 (1993)
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS, pp. 1257–1264 (2008)
Pasinato, M., Mello, C.E., Aufaure, M.A., Zimbro, G.: Generating synthetic data for context-aware recommender systems. In: BRICS-CCI CBIC 2013
Tso, K.H.L., Schmidt-Thieme, L.: Empirical analysis of attribute-aware recommender system algorithms using synthetic data. J. Comput. 1(4), 18–29 (2006)
Caron, F., Fox, E.B.: Sparse graphs using exchangeable random measures. ArXiv e-prints, January 2014
Newman, M.E.J., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64(2), 026118 (2001)
Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Data Mining, 2008, pp. 263–272. IEEE, ICDM 2008 (2008)
Aldous, D.J.: Representations for partially exchangeable arrays of random variables. J. Multivar. Anal. 11(4), 581–598 (1981)
Hoover, D.N.: Relations on probability spaces and arrays of random variables. Technical report, Institute for Advanced Study, Princeton, NJ (1979)
Hernandez-Lobato, J.M., Houlsby, N., Ghahramani, Z.: Stochastic inference for scalable probabilistic modeling of binary matrices. In: ICML (2014)
Gopalan, P., Hofman, J.M., Blei, D.M.: Scalable recommendation with hierarchical Poisson factorization. In: UAI (2015)
Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of 12th ISMIR (2011)
Tang, J., Gao, H., Liu, H.: eTrust: discerning multi-faceted trust in a connected world. In: ACM International Conference on Web Search and Data Mining (2012)
Tang, J., Gao, H., Liu, H., Das Sarma, A.: eTrust: Understanding trust evolution in an online world. In: Proceedings of the 18th ACM SIGKDD, pp. 253–261. ACM (2012)
Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of WWW (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Smith, M., Charlin, L., Pineau, J. (2017). A Sparse Probabilistic Model of User Preference Data. In: Mouhoub, M., Langlais, P. (eds) Advances in Artificial Intelligence. Canadian AI 2017. Lecture Notes in Computer Science(), vol 10233. Springer, Cham. https://doi.org/10.1007/978-3-319-57351-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-57351-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57350-2
Online ISBN: 978-3-319-57351-9
eBook Packages: Computer ScienceComputer Science (R0)