research-article

Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs

Authors:

Gleb Mezentsev,

Ivan Oseledets,

Evgeny FrolovAuthors Info & Claims

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Pages 475 - 485

https://doi.org/10.1145/3640457.3688140

Published: 08 October 2024 Publication History

Abstract

Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

References

[1]

Ethan Alley, Grigory Khimulya, Surojit Biswas, Mohammed Alquraishi, and George Church. 2019. Unified rational protein engineering with sequence-based deep representation learning. Nature Methods 16 (12 2019). https://doi.org/10.1038/s41592-019-0598-1

[2]

Mehrnaz Amjadi, Seyed Danial Mohseni Taheri, and Theja Tulabandhula. 2021. Katrec: Knowledge aware attentive sequential recommendations. In Discovery Science: 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings 24. Springer, 305–320.

Digital Library

[3]

Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. Advances in neural information processing systems 28 (2015).

[4]

Nabiha Asghar. 2016. Yelp Dataset Challenge: Review Rating Prediction. arXiv preprint arXiv:1605.05362 (2016).

[5]

Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio. 2015. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 (2015).

[6]

Yu Bai, Sally Goldman, and Li Zhang. 2017. Tapas: Two-pass approximate adaptive sampling for softmax. arXiv preprint arXiv:1707.03073 (2017).

[7]

Yoshua Bengio and Jean-Sébastien Senécal. 2003. Quick training of probabilistic neural nets by importance sampling. In International Workshop on Artificial Intelligence and Statistics. PMLR, 17–24.

[8]

Yoshua Bengio and Jean-Sébastien Senécal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks 19, 4 (2008), 713–722.

Digital Library

[9]

Guy Blanc and Steffen Rendle. 2018. Adaptive sampled softmax with kernel based sampling. In International conference on machine learning. PMLR, 590–599.

[10]

Rocío Cañamares and Pablo Castells. 2020. On Target Item Sampling in Offline Recommender System Evaluation. 259–268. https://doi.org/10.1145/3383313.3412259

Digital Library

[11]

Huiyuan Chen, Yusan Lin, Menghai Pan, Lan Wang, Chin-Chia Michael Yeh, Xiaoting Li, Yan Zheng, Fei Wang, and Hao Yang. 2022. Denoising Self-attentive Sequential Recommendation.

[12]

Yongjun Chen, Jia Li, Zhiwei Liu, Nitish Shirish Keskar, Huan Wang, Julian McAuley, and Caiming Xiong. 2022. Generating Negative Samples for Sequential Recommendation. arXiv preprint arXiv:2208.03645 (2022).

[13]

Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: user movement in location-based social networks. In Knowledge Discovery and Data Mining.

[14]

Alexander Dallmann, Daniel Zoller, and Andreas Hotho. 2021. A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models. In Fifteenth ACM Conference on Recommender Systems(RecSys ’21). ACM. https://doi.org/10.1145/3460231.3475943

Digital Library

[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[16]

Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020. Simplify and robustify negative sampling for implicit collaborative filtering. Advances in Neural Information Processing Systems 33 (2020), 1094–1105.

[17]

Jesse Dodge, Maarten Sap, Ana Marasovic, William Agnew, and Gabriel Ilharco. 2021. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. Association for Computational Linguistics, 1286–1305. https://doi.org/10.18653/v1/2021.emnlp-main.98

[18]

Hanwen Du, Hui Shi, Pengpeng Zhao, Deqing Wang, Victor S. Sheng, Yanchi Liu, Guanfeng Liu, and Lei Zhao. 2022. Contrastive Learning with Bidirectional Transformers for Sequential Recommendation. arxiv:2208.03895 [cs.IR]

[19]

Xinyu Du, Huanhuan Yuan, Pengpeng Zhao, Fuzhen Zhuang, Guanfeng Liu, and Yanchi Liu. 2023. Frequency Enhanced Hybrid Attention Network for Sequential Recommendation.

[20]

Xinyan Fan, Zheng Liu, Jianxun Lian, Wayne Zhao, Xing Xie, and Ji-Rong Wen. 2021. Lighter and Better: Low-Rank Decomposed Self-Attention Networks for Next-Item Recommendation. 1733–1737. https://doi.org/10.1145/3404835.3462978

Digital Library

[21]

Evgeny Frolov and Ivan Oseledets. 2022. Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations. arxiv:2212.05720 [cs.LG]

[22]

Emil Julius Gumbel. 1954. Statistical theory of extreme values and some practical applications: a series of lectures. Vol. 33. US Government Printing Office.

[23]

Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based fast inner product search. In Artificial intelligence and statistics. PMLR, 482–490.

[24]

Danil Gusak, Gleb Mezentsev, Ivan Oseledets, and Evgeny Frolov. 2024. RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders. Proceedings of 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24). https://doi.org/10.1145/3627673.3679986

Digital Library

[25]

Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. 2010. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arxiv:0909.4061 [math.NA]

[26]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.

[27]

Ruining He, Chen Fang, Zhaowen Wang, and Julian McAuley. 2016. Vista: A Visually, Socially, and Temporally-aware Model for Artistic Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM. https://doi.org/10.1145/2959100.2959152

Digital Library

[28]

Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM international conference on information and knowledge management. 843–852.

Digital Library

[29]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).

[30]

Yitong Ji, Aixin Sun, Jie Zhang, and Chenliang Li. 2023. A Critical Study on Data Leakage in Recommender System Offline Evaluation. ACM Transactions on Information Systems 41, 3 (Feb. 2023), 1–27. https://doi.org/10.1145/3569930

Digital Library

[31]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.

[32]

Anton Klenitskiy and Alexey Vasilev. 2023. Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?. In Proceedings of the 17th ACM Conference on Recommender Systems(RecSys ’23). ACM. https://doi.org/10.1145/3604915.3610644

Digital Library

[33]

Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recommendation. In KDD 2020. https://dl.acm.org/doi/10.1145/3394486.3403226

[34]

Haoyang Li, Xin Wang, Ziwei Zhang, Jianxin Ma, Peng Cui, and Wenwu Zhu. 2021. Intention-Aware Sequential Recommendation With Structured Intent Transition. IEEE Transactions on Knowledge and Data Engineering PP (2021), 1–1.

[35]

Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time Interval Aware Self-Attention for Sequential Recommendation. 322–330. https://doi.org/10.1145/3336191.3371786

Digital Library

[36]

Defu Lian, Qi Liu, and Enhong Chen. 2020. Personalized ranking with importance sampling. In Proceedings of The Web Conference 2020. 1093–1103.

Digital Library

[37]

Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, and Lifeng Shang. 2021. Non-invasive Self-attention for Side Information Fusion in Sequential Recommendation. arxiv:2103.03578 [cs.IR]

[38]

Zhiwei Liu, Yongjun Chen, Jia Li, Philip Yu, Julian McAuley, and Caiming Xiong. 2021. Contrastive Self-supervised Sequential Recommendation with Robust Augmentation. (08 2021).

[39]

Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, and Jie Zhou. 2023. Exploring false hard negative sample in cross-domain recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems. 502–514.

Digital Library

[40]

Julian McAuley, Jure Leskovec, and Dan Jurafsky. 2012. Learning attitudes and attributes from multi-aspect reviews. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 1020–1025.

Digital Library

[41]

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-based Recommendations on Styles and Substitutes. arxiv:1506.04757 [cs.CV]

[42]

Zaiqiao Meng, Richard McCreadie, Craig Macdonald, and Iadh Ounis. 2020. Exploring Data Splitting Strategies for the Evaluation of Recommendation Models. arxiv:2007.13237 [cs.IR]

[43]

Stephen Mussmann and Stefano Ermon. 2016. Learning and inference via maximum inner product search. In International Conference on Machine Learning. PMLR, 2587–2596.

[44]

Stephen Mussmann, Daniel Levy, and Stefano Ermon. 2017. Fast amortized inference and learning in log-linear models with randomly perturbed nearest neighbor search. arXiv preprint arXiv:1707.03372 (2017).

[45]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. 188–197. https://doi.org/10.18653/v1/D19-1018

[46]

Umaporn Padungkiatwattana, Thitiya Sae-diae, Saranya Maneeroj, and Atsuhiro Takasu. 2022. ARERec: Attentive Local Interaction Model for Sequential Recommendation. IEEE Access 10 (2022), 31340–31358.

[47]

Roberto Pellegrini, Wenjie Zhao, and Iain Murray. 2022. Don’t recommend the obvious: estimate probability ratios. In Proceedings of the 16th ACM Conference on Recommender Systems. 188–197.

Digital Library

[48]

Aleksandr V. Petrov and Craig Macdonald. 2023. Generative Sequential Recommendation with GPTRec. arxiv:2306.11114 [cs.IR]

[49]

Aleksandr Vladimirovich Petrov and Craig Macdonald. 2023. gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling. In Proceedings of the 17th ACM Conference on Recommender Systems(RecSys ’23). ACM. https://doi.org/10.1145/3604915.3608783

Digital Library

[50]

Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. 813–823. https://doi.org/10.1145/3488560.3498433

Digital Library

[51]

Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. 813–823.

[52]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).

[53]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. https://api.semanticscholar.org/CorpusID:160025533

[54]

Ankit Singh Rawat, Jiecao Chen, Felix Xinnan X Yu, Ananda Theertha Suresh, and Sanjiv Kumar. 2019. Sampled softmax with random fourier features. Advances in Neural Information Processing Systems 32 (2019).

[55]

Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendation from implicit feedback. In Proceedings of the 7th ACM international conference on Web search and data mining. 273–282.

Digital Library

[56]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.

Digital Library

[57]

Ryan Spring and Anshumali Shrivastava. 2017. A new unbiased and efficient class of lsh-based samplers and estimators for partition function computation in log-linear models. arXiv preprint arXiv:1703.05160 (2017).

[58]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.

Digital Library

[59]

Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.

Digital Library

[60]

Changxin Tian, Zihan Lin, Shuqing Bian, Jinpeng Wang, and Wayne Zhao. 2022. Temporal Contrastive Pre-Training for Sequential Recommendation. 1925–1934. https://doi.org/10.1145/3511808.3557468

Digital Library

[61]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[62]

Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, and Jay Yagnik. 2014. Deep networks with large output spaces. arXiv preprint arXiv:1412.7479 (2014).

[63]

Jinpeng Wang, Jieming Zhu, and Xiuqiang He. 2021. Cross-batch negative sampling for training two-tower recommenders. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 1632–1636.

Digital Library

[64]

Timo Wilm, Philipp Normann, Sophie Baumeister, and Paul-Vincent Kobow. 2023. Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions. In Proceedings of the 17th ACM Conference on Recommender Systems. 1023–1026.

Digital Library

[65]

Qitian Wu, Chenxiao Yang, Shuodian Yu, Xiaofeng Gao, and Guihai Chen. 2021. Seq2Bubbles: Region-Based Embedding Learning for User Behaviors in Sequential Recommenders. 2160–2169.

[66]

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. 2022 IEEE 38th International Conference on Data Engineering (ICDE) (2022), 1259–1273. https://api.semanticscholar.org/CorpusID:251299631

[67]

Hao Xue and Flora D Salim. 2023. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Transactions on Knowledge and Data Engineering (2023).

[68]

Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, and Pradeep Ravikumar. 2018. Loss decomposition for fast learning in large output spaces. In International Conference on Machine Learning. PMLR, 5640–5649.

[69]

Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019. Feature-level Deeper Self-Attention Network for Sequential Recommendation. In International Joint Conference on Artificial Intelligence.

Cited By

Feng DLi SXiang YZheng J(2025)A user-embedded temporal attention neural network for IoT trajectories predictionPeerJ Computer Science10.7717/peerj-cs.268111(e2681)Online publication date: 11-Feb-2025
https://doi.org/10.7717/peerj-cs.2681
Frolov EMatveeva TMirvakhabova LOseledets I(2024)Self-Attentive Sequential Recommendations with Hyperbolic RepresentationsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688180(981-986)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688180

Index Terms

Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its ...
Towards scalable and accurate item-oriented recommendations
RecSys '13: Proceedings of the 7th ACM conference on Recommender systems

Most recommenders research aims at personalized systems, which suggest items based on user profiles. However, in reality many systems deal with item-oriented recommendations. In such setups, given a single item of interest, the system needs to provide ...
Modeling and predicting user preferences with multiple item attributes for sequential recommendations
Abstract
Sequential recommendations have become a focus of attention across the deep learning community owing to their fitness to the actual application scenario. Although recently we have witnessed a surge of work on sequential recommender ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

October 2024

1438 pages

ISBN:9798400705052

DOI:10.1145/3640457

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RecSys '24

Sponsor:

RecSys '24: 18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
344
Total Downloads

Downloads (Last 12 months)344
Downloads (Last 6 weeks)33

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng DLi SXiang YZheng J(2025)A user-embedded temporal attention neural network for IoT trajectories predictionPeerJ Computer Science10.7717/peerj-cs.268111(e2681)Online publication date: 11-Feb-2025
https://doi.org/10.7717/peerj-cs.2681
Frolov EMatveeva TMirvakhabova LOseledets I(2024)Self-Attentive Sequential Recommendations with Hyperbolic RepresentationsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688180(981-986)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688180

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten