research-article

Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative Filtering

Authors:
Sebastian Schelter

New York University

New York University
View Profile

,
Ufuk Celebi

Freie Universität Berlin

Freie Universität Berlin
View Profile

,
Ted Dunning

MapR Technologies

MapR Technologies
View Profile

SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database ManagementJuly 2019Pages 61–72https://doi.org/10.1145/3335783.3335784

Published:23 July 2019Publication History

SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management

Pages 61–72

ABSTRACT

Recommender systems are ubiquitous in the modern internet, where they help users find items they might like. A widely deployed recommendation approach is item-based collaborative filtering. This approach relies on analyzing large item cooccurrence matrices that denote how many users interacted with a pair of items. The potentially quadratic number of items to compare poses a scalability bottleneck in analyzing such item cooccurrences. Additionally, this problem intensifies in real world use cases with incrementally growing datasets, especially when the recommendation model is regularly recomputed from scratch. We highlight the connection between the growing cost of item-based recommendation and densification processes in common interaction datasets. Based on our findings, we propose an efficient incremental algorithm for item-based collaborative filtering based on cooccurrence analysis. This approach restricts the number of interactions to consider from 'power users' and 'ubiquitous items' to guarantee a provably constant amount of work per user-item interaction to process. We discuss efficient implementations of our algorithm on a single machine as well as on a distributed stream processing engine, and present an extensive experimental evaluation. Our results confirm the asymptotic benefits of the incremental approach. Furthermore, we find that our implementation is an order of magnitude faster than existing open source recommender libraries on many datasets, and at the same time scales to high dimensional datasets which these existing recommenders fail to process.

References

Jacob Abernethy, Kevin Canini, John Langford, and Alex Simma. 2007. Online collaborative filtering. University of California at Berkeley, Tech. Rep.Google Scholar
Xavier Amatriain. 2012. Building Industrial-scale Real-world Recommender Systems. RecSys, 7--8. Google ScholarDigital Library
Xavier Amatriain. 2013. Mining large streams of user data for personalized recommendations. SIGKDD 14, 2, 37--48. Google ScholarDigital Library
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
Robert M. Bell and Yehuda Koren. 2007. Lessons from the Netflix prize challenge. SIGKDD 9, 75--79. Google ScholarDigital Library
Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10, 12, 1718--1729. Google ScholarDigital Library
Badrish Chandramouli, Justin J Levandoski, Ahmed Eldawy, and Mohamed F Mokbel. 2011. StreamRec: a real-time recommender system. SIGMOD, 1243--1246. Google ScholarDigital Library
Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. WWW, 271--280. Google ScholarDigital Library
James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube video recommendation system. RecSys, 293--296. Google ScholarDigital Library
Ernesto Diaz-Aviles, Lucas Drumond, Lars Schmidt-Thieme, and Wolfgang Nejdl. 2012. Real-time top-n recommendation in social streams. RecSys, 59--66. Google ScholarDigital Library
Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 1, 61--74. Google ScholarDigital Library
Ted Dunning and Ellen Friedman. 2014. Practical Machine Learning: Innovations in Recommendation. O'Reilly Media, Inc. Google ScholarDigital Library
Michael D Ekstrand, Michael Ludwig, Jack Kolb, and John T Riedl. 2011. LensKit: a modular recommender framework. RecSys, 349--350. Google ScholarDigital Library
Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. 2012. Spinning fast iterative data flows. PVLDB 5, 11, 1268--1279. Google ScholarDigital Library
Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. MyMediaLite: A Free Recommender System Library. RecSys. Google ScholarDigital Library
Yanxiang Huang, Bin Cui, Wenyu Zhang, Jie Jiang, and Ying Xu. 2015. TencentRec: Real-time Stream Recommendation in Practice. SIGMOD, 227--238. Google ScholarDigital Library
Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. RecSys, 306--310. Google ScholarDigital Library
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8. Google ScholarDigital Library
Jérôme Kunegis, Ernesto De Luca, and Sahin Albayrak. 2010. The Link Prediction Problem in Bipartite Networks. Computational Intelligence for Knowledge-Based Systems Design, 380--389. Google ScholarDigital Library
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph Evolution: Densification and Shrinking Diameters. TKDD 1, 1. Google ScholarDigital Library
Justin J Levandoski, Mohamed Sarwat, Mohamed F Mokbel, and Michael D Ekstrand. 2012. RecStore: an extensible and adaptive framework for online recommender queries inside the database engine. EDBT, 86--96. Google ScholarDigital Library
Nathan N Liu, Min Zhao, Evan Xiang, and Qiang Yang. 2010. Online evolutionary collaborative filtering. RecSys, 95--102. Google ScholarDigital Library
John H McDonald. 2009. Handbook of biological statistics. Vol. 2.Google Scholar
Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate is not enough: how accuracy metrics have hurt recommender systems. CHI, 1097--1101. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. NeurIPS, 3111--3119. Google ScholarDigital Library
Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data Lifecycle Challenges in Production Machine Learning: A Survey. ACM SIGMOD Record 47, 2, 17--28. Google ScholarDigital Library
Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2011. Recommender Systems Handbook. Google ScholarDigital Library
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. WWW, 285--295. Google ScholarDigital Library
Mohamed Sarwat, James Avery, and Mohamed F Mokbel. 2013. RecDB in action: recommendation made easy in relational databases. PVLDB 6, 12, 1242--1245. Google ScholarDigital Library
Mohamed Sarwat, Raha Moraffah, Mohamed F Mokbel, and James L Avery. 2017. Database system support for personalized recommendation applications. ICDE, 1320--1331.Google Scholar
Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On challenges in machine learning model management. Data Engineering, 5.Google Scholar
Sebastian Schelter, Christoph Boden, and Volker Markl. 2012. Scalable similarity-based neighborhood methods with mapreduce. RecSys, 163--170. Google ScholarDigital Library
Sebastian Schelter, Venu Satuluri, and Reza Zadeh. 2014. Factorbird-a parameter server approach to distributed matrix factorization. Distributed Machine Learning and Matrix Computations workshop at NeurIPS.Google Scholar

Recommendations

A Genre-Based Item-Item Collaborative Filtering: Facing the Cold-Start Problem
ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications

Recommender System is a technique which is used to recommend an item or product to a user based on the user's preference'. Collaborative filtering is an approach that is vastly used in recommender systems. Item-item-based collaborative filtering is a ...
Read More
Userrank for item-based collaborative filtering recommendation

With the recent explosive growth of the Web, recommendation systems have been widely accepted by users. Item-based Collaborative Filtering (CF) is one of the most popular approaches for determining recommendations. A common problem of current item-based ...
Read More
Enriching user and item profiles for collaborative filtering: from concept hierarchies to user-generated reviews
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management
July 2019
244 pages
ISBN:9781450362160
DOI:10.1145/3335783
Conference Chair:
Carlos Maltzahn,
Program Chair:
Tanu Malik
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate56of146submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 77
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative Filtering

SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management

ABSTRACT

References

Cited By

Recommendations

A Genre-Based Item-Item Collaborative Filtering: Facing the Cold-Start Problem

Userrank for item-based collaborative filtering recommendation

Enriching user and item profiles for collaborative filtering: from concept hierarchies to user-generated reviews

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative Filtering

SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management

ABSTRACT

References

Cited By

Recommendations

A Genre-Based Item-Item Collaborative Filtering: Facing the Cold-Start Problem

Userrank for item-based collaborative filtering recommendation

Enriching user and item profiles for collaborative filtering: from concept hierarchies to user-generated reviews

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media