skip to main content
10.1145/3366423.3380283acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

Clustering and Constructing User Coresets to Accelerate Large-scale Top-K Recommender Systems

Published: 20 April 2020 Publication History

Abstract

Top-K recommender systems aim to generate few but satisfactory personalized recommendations for various practical applications, such as item recommendation for e-commerce and link prediction for social networks. However, the numbers of users and items can be enormous, thereby leading to myriad potential recommendations as well as the bottleneck in evaluating and ranking all possibilities. Existing Maximum Inner Product Search (MIPS) based methods treat the item ranking problem for each user independently and the relationship between users has not been explored. In this paper, we propose a novel model for clustering and navigating for top-K recommenders (CANTOR) to expedite the computation of top-K recommendations based on latent factor models. A clustering-based framework is first presented to leverage user relationships to partition users into affinity groups, each of which contains users with similar preferences. CANTOR then derives a coreset of representative vectors for each affinity group by constructing a set cover with a theoretically guaranteed difference to user latent vectors. Using these representative vectors in the coreset, approximate nearest neighbor search is then applied to obtain a small set of candidate items for each affinity group to be used when computing recommendations for each user in the affinity group. This approach can significantly reduce the computation without compromising the quality of the recommendations. Extensive experiments are conducted on six publicly available large-scale real-world datasets for item recommendation and personalized link prediction. The experimental results demonstrate that CANTOR significantly speeds up matrix factorization models with high precision. For instance, CANTOR can achieve 355.1x speedup for inferring recommendations in a million-user network with 99.5% precision@1 to the original system while the state-of-the-art method can only obtain 93.7x speedup with 99.0% precision@1.

References

[1]
Yoram Bachrach, Yehuda Finkelstein, Ran Gilad-Bachrach, Liran Katzir, Noam Koenigstein, Nir Nice, and Ulrich Paquet. 2014. Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 257–264.
[2]
Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 635–644.
[3]
Grey Ballard, Tamara G Kolda, Ali Pinar, and C Seshadhri. 2015. Diamond sampling for approximate maximum all-pairs dot-product (MAD) search. In 2015 IEEE International Conference on Data Mining. IEEE, 11–20.
[4]
L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28, 2 (2002), 135–151.
[5]
Peer Bork, Lars J Jensen, Christian Von Mering, Arun K Ramani, Insuk Lee, and Edward M Marcotte. 2004. Protein interaction networks from yeast to human. Current opinion in structural biology 14, 3 (2004), 292–299.
[6]
O. Celma. 2010. Music Recommendation and Discovery in the Long Tail. Springer.
[7]
Patrick Chen, Si Si, Sanjiv Kumar, Yang Li, and Cho-Jui Hsieh. 2019. Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=ByeMB3Act7
[8]
Wei-Sheng Chin, Bo-Wen Yuan, Meng-Yuan Yang, Yong Zhuang, Yu-Chin Juan, and Chih-Jen Lin. 2016. LIBMF: a library for parallel matrix factorization in shared-memory systems. JMLR 17, 1 (2016), 2971–2975.
[9]
Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS) 22, 1 (2004), 143–177.
[10]
Qin Ding, Hsiang-Fu Yu, and Cho-Jui Hsieh. 2019. A Fast Sampling Algorithm for Maximum Inner Product Search. In The 22nd International Conference on Artificial Intelligence and Statistics. 3004–3012.
[11]
Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2011. The yahoo! music dataset and kdd-cup’11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18. JMLR. org, 3–18.
[12]
Liang Duan, Charu Aggarwal, Shuai Ma, Renjun Hu, and Jinpeng Huai. 2016. Scaling up link prediction with ensembles. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 367–376.
[13]
Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online clustering of bandits. In International Conference on Machine Learning. 757–765.
[14]
Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou. 2017. Efficient softmax approximation for GPUs. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. 1302–1310.
[15]
F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4(2016), 19.
[16]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604–613.
[17]
Zhao Kang, Chong Peng, and Qiang Cheng. 2016. Top-n recommender system via matrix completion. In Thirtieth AAAI Conference on Artificial Intelligence.
[18]
Ondrej Kaššák, Michal Kompan, and Mária Bieliková. 2016. Personalized hybrid recommendation for group of users: Top-N multimedia recommender. Information Processing & Management 52, 3 (2016), 459–477.
[19]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer8(2009), 30–37.
[20]
Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343–1350.
[21]
Shuai Li, Wei Chen, Shuai Li, and Kwong-Sak Leung. 2019. Improved algorithm on online clustering of bandits. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2923–2929.
[22]
Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 539–548.
[23]
Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing1 (2003), 76–80.
[24]
Rui Liu, Tianyi Wu, and Barzan Mozafari. 2019. A Bandit Approach to Maximum Inner Product Search. CoRR abs/1812.06360(2019).
[25]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45(2014), 61–68.
[26]
Yury A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018).
[27]
Behnam Neyshabur and Nathan Srebro. 2015. On symmetric and asymmetric LSHs for inner product search. In ICML.
[28]
Eirini Ntoutsi, Kostas Stefanidis, Kjetil Nørvåg, and Hans-Peter Kriegel. 2012. Fast group recommendations by applying user clustering. In International Conference on Conceptual Modeling. Springer, 126–140.
[29]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 452–461.
[30]
Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, and Wonyong Sung. 2017. SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks. In Advances in Neural Information Processing Systems 30. 5463–5473.
[31]
Anshumali Shrivastava and Ping Li. 2014. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS). In Advances in Neural Information Processing Systems. 2321–2329.
[32]
Robert F Sproull. 1991. Refinements to nearest-neighbor searching ink-dimensional trees. Algorithmica 6, 1-6 (1991), 579–589.
[33]
Jiliang Tang, Shiyu Chang, Charu Aggarwal, and Huan Liu. 2015. Negative link prediction in social media. In Proceedings of the eighth ACM international conference on web search and data mining. ACM, 87–96.
[34]
Hsiang-Fu Yu, Cho-Jui Hsieh, Qi Lei, and Inderjit S Dhillon. 2017. A greedy approach for budgeted maximum inner product search. In Advances in Neural Information Processing Systems. 5453–5462.
[35]
Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, and Yuxiong He. 2018. Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models. In NIPS.

Cited By

View all
  • (2024)Exploring Coresets for Efficient Training and Consistent Evaluation of Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691716(1152-1157)Online publication date: 8-Oct-2024
  • (2024)PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval ModelsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635791(77-86)Online publication date: 4-Mar-2024
  • (2023)Reverse Maximum Inner Product Search: Formulation, Algorithms, and AnalysisACM Transactions on the Web10.1145/358721517:4(1-23)Online publication date: 11-Jul-2023
  • Show More Cited By

Index Terms

  1. Clustering and Constructing User Coresets to Accelerate Large-scale Top-K Recommender Systems
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Approximate nearest neighbor search *Equal contribution.
        2. Large-scale top-K recommender systems
        3. Latent factor models

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)264
        • Downloads (Last 6 weeks)33
        Reflects downloads up to 01 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Exploring Coresets for Efficient Training and Consistent Evaluation of Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691716(1152-1157)Online publication date: 8-Oct-2024
        • (2024)PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval ModelsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635791(77-86)Online publication date: 4-Mar-2024
        • (2023)Reverse Maximum Inner Product Search: Formulation, Algorithms, and AnalysisACM Transactions on the Web10.1145/358721517:4(1-23)Online publication date: 11-Jul-2023
        • (2022)S3GCProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600505(3248-3261)Online publication date: 28-Nov-2022
        • (2022)Fast neural ranking on bipartite graph indicesProceedings of the VLDB Endowment10.14778/3503585.350358915:4(794-803)Online publication date: 14-Apr-2022
        • (2022)Task-optimized User Clustering based on Mobile App Usage for Cold-start RecommendationsProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539105(3347-3356)Online publication date: 14-Aug-2022
        • (2022)The Datasets DilemmaProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498519(141-149)Online publication date: 11-Feb-2022
        • (2022)Relevance under the IcebergProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531767(1870-1874)Online publication date: 6-Jul-2022
        • (2022)Inference for Trustworthy Machine Intelligence: Challenges and Solutions2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)10.1109/CogMI56440.2022.00014(27-34)Online publication date: Dec-2022
        • (2021)Dynamic Hyperbolic Embeddings with Graph-Centralized Regularization for Recommender SystemsJournal of Information Processing10.2197/ipsjjip.29.72529(725-734)Online publication date: 2021
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media