skip to main content
10.1145/3335783.3335784acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative Filtering

Published:23 July 2019Publication History

ABSTRACT

Recommender systems are ubiquitous in the modern internet, where they help users find items they might like. A widely deployed recommendation approach is item-based collaborative filtering. This approach relies on analyzing large item cooccurrence matrices that denote how many users interacted with a pair of items. The potentially quadratic number of items to compare poses a scalability bottleneck in analyzing such item cooccurrences. Additionally, this problem intensifies in real world use cases with incrementally growing datasets, especially when the recommendation model is regularly recomputed from scratch. We highlight the connection between the growing cost of item-based recommendation and densification processes in common interaction datasets. Based on our findings, we propose an efficient incremental algorithm for item-based collaborative filtering based on cooccurrence analysis. This approach restricts the number of interactions to consider from 'power users' and 'ubiquitous items' to guarantee a provably constant amount of work per user-item interaction to process. We discuss efficient implementations of our algorithm on a single machine as well as on a distributed stream processing engine, and present an extensive experimental evaluation. Our results confirm the asymptotic benefits of the incremental approach. Furthermore, we find that our implementation is an order of magnitude faster than existing open source recommender libraries on many datasets, and at the same time scales to high dimensional datasets which these existing recommenders fail to process.

References

  1. Jacob Abernethy, Kevin Canini, John Langford, and Alex Simma. 2007. Online collaborative filtering. University of California at Berkeley, Tech. Rep.Google ScholarGoogle Scholar
  2. Xavier Amatriain. 2012. Building Industrial-scale Real-world Recommender Systems. RecSys, 7--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Xavier Amatriain. 2013. Mining large streams of user data for personalized recommendations. SIGKDD 14, 2, 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google ScholarGoogle Scholar
  5. Robert M. Bell and Yehuda Koren. 2007. Lessons from the Netflix prize challenge. SIGKDD 9, 75--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10, 12, 1718--1729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Badrish Chandramouli, Justin J Levandoski, Ahmed Eldawy, and Mohamed F Mokbel. 2011. StreamRec: a real-time recommender system. SIGMOD, 1243--1246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. WWW, 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube video recommendation system. RecSys, 293--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ernesto Diaz-Aviles, Lucas Drumond, Lars Schmidt-Thieme, and Wolfgang Nejdl. 2012. Real-time top-n recommendation in social streams. RecSys, 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 1, 61--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ted Dunning and Ellen Friedman. 2014. Practical Machine Learning: Innovations in Recommendation. O'Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael D Ekstrand, Michael Ludwig, Jack Kolb, and John T Riedl. 2011. LensKit: a modular recommender framework. RecSys, 349--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. 2012. Spinning fast iterative data flows. PVLDB 5, 11, 1268--1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. MyMediaLite: A Free Recommender System Library. RecSys. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yanxiang Huang, Bin Cui, Wenyu Zhang, Jie Jiang, and Ying Xu. 2015. TencentRec: Real-time Stream Recommendation in Practice. SIGMOD, 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. RecSys, 306--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jérôme Kunegis, Ernesto De Luca, and Sahin Albayrak. 2010. The Link Prediction Problem in Bipartite Networks. Computational Intelligence for Knowledge-Based Systems Design, 380--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph Evolution: Densification and Shrinking Diameters. TKDD 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Justin J Levandoski, Mohamed Sarwat, Mohamed F Mokbel, and Michael D Ekstrand. 2012. RecStore: an extensible and adaptive framework for online recommender queries inside the database engine. EDBT, 86--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nathan N Liu, Min Zhao, Evan Xiang, and Qiang Yang. 2010. Online evolutionary collaborative filtering. RecSys, 95--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. John H McDonald. 2009. Handbook of biological statistics. Vol. 2.Google ScholarGoogle Scholar
  24. Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate is not enough: how accuracy metrics have hurt recommender systems. CHI, 1097--1101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. NeurIPS, 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data Lifecycle Challenges in Production Machine Learning: A Survey. ACM SIGMOD Record 47, 2, 17--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2011. Recommender Systems Handbook. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. WWW, 285--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mohamed Sarwat, James Avery, and Mohamed F Mokbel. 2013. RecDB in action: recommendation made easy in relational databases. PVLDB 6, 12, 1242--1245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mohamed Sarwat, Raha Moraffah, Mohamed F Mokbel, and James L Avery. 2017. Database system support for personalized recommendation applications. ICDE, 1320--1331.Google ScholarGoogle Scholar
  31. Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On challenges in machine learning model management. Data Engineering, 5.Google ScholarGoogle Scholar
  32. Sebastian Schelter, Christoph Boden, and Volker Markl. 2012. Scalable similarity-based neighborhood methods with mapreduce. RecSys, 163--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sebastian Schelter, Venu Satuluri, and Reza Zadeh. 2014. Factorbird-a parameter server approach to distributed matrix factorization. Distributed Machine Learning and Matrix Computations workshop at NeurIPS.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management
    July 2019
    244 pages
    ISBN:9781450362160
    DOI:10.1145/3335783

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 July 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate56of146submissions,38%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader