ABSTRACT
Recommender systems are ubiquitous in the modern internet, where they help users find items they might like. A widely deployed recommendation approach is item-based collaborative filtering. This approach relies on analyzing large item cooccurrence matrices that denote how many users interacted with a pair of items. The potentially quadratic number of items to compare poses a scalability bottleneck in analyzing such item cooccurrences. Additionally, this problem intensifies in real world use cases with incrementally growing datasets, especially when the recommendation model is regularly recomputed from scratch. We highlight the connection between the growing cost of item-based recommendation and densification processes in common interaction datasets. Based on our findings, we propose an efficient incremental algorithm for item-based collaborative filtering based on cooccurrence analysis. This approach restricts the number of interactions to consider from 'power users' and 'ubiquitous items' to guarantee a provably constant amount of work per user-item interaction to process. We discuss efficient implementations of our algorithm on a single machine as well as on a distributed stream processing engine, and present an extensive experimental evaluation. Our results confirm the asymptotic benefits of the incremental approach. Furthermore, we find that our implementation is an order of magnitude faster than existing open source recommender libraries on many datasets, and at the same time scales to high dimensional datasets which these existing recommenders fail to process.
- Jacob Abernethy, Kevin Canini, John Langford, and Alex Simma. 2007. Online collaborative filtering. University of California at Berkeley, Tech. Rep.Google Scholar
- Xavier Amatriain. 2012. Building Industrial-scale Real-world Recommender Systems. RecSys, 7--8. Google ScholarDigital Library
- Xavier Amatriain. 2013. Mining large streams of user data for personalized recommendations. SIGKDD 14, 2, 37--48. Google ScholarDigital Library
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
- Robert M. Bell and Yehuda Koren. 2007. Lessons from the Netflix prize challenge. SIGKDD 9, 75--79. Google ScholarDigital Library
- Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10, 12, 1718--1729. Google ScholarDigital Library
- Badrish Chandramouli, Justin J Levandoski, Ahmed Eldawy, and Mohamed F Mokbel. 2011. StreamRec: a real-time recommender system. SIGMOD, 1243--1246. Google ScholarDigital Library
- Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. WWW, 271--280. Google ScholarDigital Library
- James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube video recommendation system. RecSys, 293--296. Google ScholarDigital Library
- Ernesto Diaz-Aviles, Lucas Drumond, Lars Schmidt-Thieme, and Wolfgang Nejdl. 2012. Real-time top-n recommendation in social streams. RecSys, 59--66. Google ScholarDigital Library
- Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 1, 61--74. Google ScholarDigital Library
- Ted Dunning and Ellen Friedman. 2014. Practical Machine Learning: Innovations in Recommendation. O'Reilly Media, Inc. Google ScholarDigital Library
- Michael D Ekstrand, Michael Ludwig, Jack Kolb, and John T Riedl. 2011. LensKit: a modular recommender framework. RecSys, 349--350. Google ScholarDigital Library
- Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. 2012. Spinning fast iterative data flows. PVLDB 5, 11, 1268--1279. Google ScholarDigital Library
- Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. MyMediaLite: A Free Recommender System Library. RecSys. Google ScholarDigital Library
- Yanxiang Huang, Bin Cui, Wenyu Zhang, Jie Jiang, and Ying Xu. 2015. TencentRec: Real-time Stream Recommendation in Practice. SIGMOD, 227--238. Google ScholarDigital Library
- Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. RecSys, 306--310. Google ScholarDigital Library
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8. Google ScholarDigital Library
- Jérôme Kunegis, Ernesto De Luca, and Sahin Albayrak. 2010. The Link Prediction Problem in Bipartite Networks. Computational Intelligence for Knowledge-Based Systems Design, 380--389. Google ScholarDigital Library
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph Evolution: Densification and Shrinking Diameters. TKDD 1, 1. Google ScholarDigital Library
- Justin J Levandoski, Mohamed Sarwat, Mohamed F Mokbel, and Michael D Ekstrand. 2012. RecStore: an extensible and adaptive framework for online recommender queries inside the database engine. EDBT, 86--96. Google ScholarDigital Library
- Nathan N Liu, Min Zhao, Evan Xiang, and Qiang Yang. 2010. Online evolutionary collaborative filtering. RecSys, 95--102. Google ScholarDigital Library
- John H McDonald. 2009. Handbook of biological statistics. Vol. 2.Google Scholar
- Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate is not enough: how accuracy metrics have hurt recommender systems. CHI, 1097--1101. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. NeurIPS, 3111--3119. Google ScholarDigital Library
- Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data Lifecycle Challenges in Production Machine Learning: A Survey. ACM SIGMOD Record 47, 2, 17--28. Google ScholarDigital Library
- Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2011. Recommender Systems Handbook. Google ScholarDigital Library
- Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. WWW, 285--295. Google ScholarDigital Library
- Mohamed Sarwat, James Avery, and Mohamed F Mokbel. 2013. RecDB in action: recommendation made easy in relational databases. PVLDB 6, 12, 1242--1245. Google ScholarDigital Library
- Mohamed Sarwat, Raha Moraffah, Mohamed F Mokbel, and James L Avery. 2017. Database system support for personalized recommendation applications. ICDE, 1320--1331.Google Scholar
- Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On challenges in machine learning model management. Data Engineering, 5.Google Scholar
- Sebastian Schelter, Christoph Boden, and Volker Markl. 2012. Scalable similarity-based neighborhood methods with mapreduce. RecSys, 163--170. Google ScholarDigital Library
- Sebastian Schelter, Venu Satuluri, and Reza Zadeh. 2014. Factorbird-a parameter server approach to distributed matrix factorization. Distributed Machine Learning and Matrix Computations workshop at NeurIPS.Google Scholar
Recommendations
A Genre-Based Item-Item Collaborative Filtering: Facing the Cold-Start Problem
ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer ApplicationsRecommender System is a technique which is used to recommend an item or product to a user based on the user's preference'. Collaborative filtering is an approach that is vastly used in recommender systems. Item-item-based collaborative filtering is a ...
Userrank for item-based collaborative filtering recommendation
With the recent explosive growth of the Web, recommendation systems have been widely accepted by users. Item-based Collaborative Filtering (CF) is one of the most popular approaches for determining recommendations. A common problem of current item-based ...
Comments