skip to main content
10.1145/3640457.3688140acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs

Published: 08 October 2024 Publication History

Abstract

Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

References

[1]
Ethan Alley, Grigory Khimulya, Surojit Biswas, Mohammed Alquraishi, and George Church. 2019. Unified rational protein engineering with sequence-based deep representation learning. Nature Methods 16 (12 2019). https://doi.org/10.1038/s41592-019-0598-1
[2]
Mehrnaz Amjadi, Seyed Danial Mohseni Taheri, and Theja Tulabandhula. 2021. Katrec: Knowledge aware attentive sequential recommendations. In Discovery Science: 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings 24. Springer, 305–320.
[3]
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. Advances in neural information processing systems 28 (2015).
[4]
Nabiha Asghar. 2016. Yelp Dataset Challenge: Review Rating Prediction. arXiv preprint arXiv:1605.05362 (2016).
[5]
Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio. 2015. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 (2015).
[6]
Yu Bai, Sally Goldman, and Li Zhang. 2017. Tapas: Two-pass approximate adaptive sampling for softmax. arXiv preprint arXiv:1707.03073 (2017).
[7]
Yoshua Bengio and Jean-Sébastien Senécal. 2003. Quick training of probabilistic neural nets by importance sampling. In International Workshop on Artificial Intelligence and Statistics. PMLR, 17–24.
[8]
Yoshua Bengio and Jean-Sébastien Senécal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks 19, 4 (2008), 713–722.
[9]
Guy Blanc and Steffen Rendle. 2018. Adaptive sampled softmax with kernel based sampling. In International conference on machine learning. PMLR, 590–599.
[10]
Rocío Cañamares and Pablo Castells. 2020. On Target Item Sampling in Offline Recommender System Evaluation. 259–268. https://doi.org/10.1145/3383313.3412259
[11]
Huiyuan Chen, Yusan Lin, Menghai Pan, Lan Wang, Chin-Chia Michael Yeh, Xiaoting Li, Yan Zheng, Fei Wang, and Hao Yang. 2022. Denoising Self-attentive Sequential Recommendation.
[12]
Yongjun Chen, Jia Li, Zhiwei Liu, Nitish Shirish Keskar, Huan Wang, Julian McAuley, and Caiming Xiong. 2022. Generating Negative Samples for Sequential Recommendation. arXiv preprint arXiv:2208.03645 (2022).
[13]
Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: user movement in location-based social networks. In Knowledge Discovery and Data Mining.
[14]
Alexander Dallmann, Daniel Zoller, and Andreas Hotho. 2021. A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models. In Fifteenth ACM Conference on Recommender Systems(RecSys ’21). ACM. https://doi.org/10.1145/3460231.3475943
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[16]
Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020. Simplify and robustify negative sampling for implicit collaborative filtering. Advances in Neural Information Processing Systems 33 (2020), 1094–1105.
[17]
Jesse Dodge, Maarten Sap, Ana Marasovic, William Agnew, and Gabriel Ilharco. 2021. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. Association for Computational Linguistics, 1286–1305. https://doi.org/10.18653/v1/2021.emnlp-main.98
[18]
Hanwen Du, Hui Shi, Pengpeng Zhao, Deqing Wang, Victor S. Sheng, Yanchi Liu, Guanfeng Liu, and Lei Zhao. 2022. Contrastive Learning with Bidirectional Transformers for Sequential Recommendation. arxiv:2208.03895 [cs.IR]
[19]
Xinyu Du, Huanhuan Yuan, Pengpeng Zhao, Fuzhen Zhuang, Guanfeng Liu, and Yanchi Liu. 2023. Frequency Enhanced Hybrid Attention Network for Sequential Recommendation.
[20]
Xinyan Fan, Zheng Liu, Jianxun Lian, Wayne Zhao, Xing Xie, and Ji-Rong Wen. 2021. Lighter and Better: Low-Rank Decomposed Self-Attention Networks for Next-Item Recommendation. 1733–1737. https://doi.org/10.1145/3404835.3462978
[21]
Evgeny Frolov and Ivan Oseledets. 2022. Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations. arxiv:2212.05720 [cs.LG]
[22]
Emil Julius Gumbel. 1954. Statistical theory of extreme values and some practical applications: a series of lectures. Vol. 33. US Government Printing Office.
[23]
Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based fast inner product search. In Artificial intelligence and statistics. PMLR, 482–490.
[24]
Danil Gusak, Gleb Mezentsev, Ivan Oseledets, and Evgeny Frolov. 2024. RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders. Proceedings of 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24). https://doi.org/10.1145/3627673.3679986
[25]
Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. 2010. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arxiv:0909.4061 [math.NA]
[26]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
[27]
Ruining He, Chen Fang, Zhaowen Wang, and Julian McAuley. 2016. Vista: A Visually, Socially, and Temporally-aware Model for Artistic Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM. https://doi.org/10.1145/2959100.2959152
[28]
Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM international conference on information and knowledge management. 843–852.
[29]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[30]
Yitong Ji, Aixin Sun, Jie Zhang, and Chenliang Li. 2023. A Critical Study on Data Leakage in Recommender System Offline Evaluation. ACM Transactions on Information Systems 41, 3 (Feb. 2023), 1–27. https://doi.org/10.1145/3569930
[31]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
[32]
Anton Klenitskiy and Alexey Vasilev. 2023. Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?. In Proceedings of the 17th ACM Conference on Recommender Systems(RecSys ’23). ACM. https://doi.org/10.1145/3604915.3610644
[33]
Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recommendation. In KDD 2020. https://dl.acm.org/doi/10.1145/3394486.3403226
[34]
Haoyang Li, Xin Wang, Ziwei Zhang, Jianxin Ma, Peng Cui, and Wenwu Zhu. 2021. Intention-Aware Sequential Recommendation With Structured Intent Transition. IEEE Transactions on Knowledge and Data Engineering PP (2021), 1–1.
[35]
Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time Interval Aware Self-Attention for Sequential Recommendation. 322–330. https://doi.org/10.1145/3336191.3371786
[36]
Defu Lian, Qi Liu, and Enhong Chen. 2020. Personalized ranking with importance sampling. In Proceedings of The Web Conference 2020. 1093–1103.
[37]
Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, and Lifeng Shang. 2021. Non-invasive Self-attention for Side Information Fusion in Sequential Recommendation. arxiv:2103.03578 [cs.IR]
[38]
Zhiwei Liu, Yongjun Chen, Jia Li, Philip Yu, Julian McAuley, and Caiming Xiong. 2021. Contrastive Self-supervised Sequential Recommendation with Robust Augmentation. (08 2021).
[39]
Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, and Jie Zhou. 2023. Exploring false hard negative sample in cross-domain recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems. 502–514.
[40]
Julian McAuley, Jure Leskovec, and Dan Jurafsky. 2012. Learning attitudes and attributes from multi-aspect reviews. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 1020–1025.
[41]
Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-based Recommendations on Styles and Substitutes. arxiv:1506.04757 [cs.CV]
[42]
Zaiqiao Meng, Richard McCreadie, Craig Macdonald, and Iadh Ounis. 2020. Exploring Data Splitting Strategies for the Evaluation of Recommendation Models. arxiv:2007.13237 [cs.IR]
[43]
Stephen Mussmann and Stefano Ermon. 2016. Learning and inference via maximum inner product search. In International Conference on Machine Learning. PMLR, 2587–2596.
[44]
Stephen Mussmann, Daniel Levy, and Stefano Ermon. 2017. Fast amortized inference and learning in log-linear models with randomly perturbed nearest neighbor search. arXiv preprint arXiv:1707.03372 (2017).
[45]
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. 188–197. https://doi.org/10.18653/v1/D19-1018
[46]
Umaporn Padungkiatwattana, Thitiya Sae-diae, Saranya Maneeroj, and Atsuhiro Takasu. 2022. ARERec: Attentive Local Interaction Model for Sequential Recommendation. IEEE Access 10 (2022), 31340–31358.
[47]
Roberto Pellegrini, Wenjie Zhao, and Iain Murray. 2022. Don’t recommend the obvious: estimate probability ratios. In Proceedings of the 16th ACM Conference on Recommender Systems. 188–197.
[48]
Aleksandr V. Petrov and Craig Macdonald. 2023. Generative Sequential Recommendation with GPTRec. arxiv:2306.11114 [cs.IR]
[49]
Aleksandr Vladimirovich Petrov and Craig Macdonald. 2023. gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling. In Proceedings of the 17th ACM Conference on Recommender Systems(RecSys ’23). ACM. https://doi.org/10.1145/3604915.3608783
[50]
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. 813–823. https://doi.org/10.1145/3488560.3498433
[51]
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. 813–823.
[52]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).
[53]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. https://api.semanticscholar.org/CorpusID:160025533
[54]
Ankit Singh Rawat, Jiecao Chen, Felix Xinnan X Yu, Ananda Theertha Suresh, and Sanjiv Kumar. 2019. Sampled softmax with random fourier features. Advances in Neural Information Processing Systems 32 (2019).
[55]
Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendation from implicit feedback. In Proceedings of the 7th ACM international conference on Web search and data mining. 273–282.
[56]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.
[57]
Ryan Spring and Anshumali Shrivastava. 2017. A new unbiased and efficient class of lsh-based samplers and estimators for partition function computation in log-linear models. arXiv preprint arXiv:1703.05160 (2017).
[58]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
[59]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.
[60]
Changxin Tian, Zihan Lin, Shuqing Bian, Jinpeng Wang, and Wayne Zhao. 2022. Temporal Contrastive Pre-Training for Sequential Recommendation. 1925–1934. https://doi.org/10.1145/3511808.3557468
[61]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[62]
Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, and Jay Yagnik. 2014. Deep networks with large output spaces. arXiv preprint arXiv:1412.7479 (2014).
[63]
Jinpeng Wang, Jieming Zhu, and Xiuqiang He. 2021. Cross-batch negative sampling for training two-tower recommenders. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 1632–1636.
[64]
Timo Wilm, Philipp Normann, Sophie Baumeister, and Paul-Vincent Kobow. 2023. Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions. In Proceedings of the 17th ACM Conference on Recommender Systems. 1023–1026.
[65]
Qitian Wu, Chenxiao Yang, Shuodian Yu, Xiaofeng Gao, and Guihai Chen. 2021. Seq2Bubbles: Region-Based Embedding Learning for User Behaviors in Sequential Recommenders. 2160–2169.
[66]
Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. 2022 IEEE 38th International Conference on Data Engineering (ICDE) (2022), 1259–1273. https://api.semanticscholar.org/CorpusID:251299631
[67]
Hao Xue and Flora D Salim. 2023. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Transactions on Knowledge and Data Engineering (2023).
[68]
Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, and Pradeep Ravikumar. 2018. Loss decomposition for fast learning in large output spaces. In International Conference on Machine Learning. PMLR, 5640–5649.
[69]
Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019. Feature-level Deeper Self-Attention Network for Sequential Recommendation. In International Joint Conference on Artificial Intelligence.

Cited By

View all
  • (2025)A user-embedded temporal attention neural network for IoT trajectories predictionPeerJ Computer Science10.7717/peerj-cs.268111(e2681)Online publication date: 11-Feb-2025
  • (2024)Self-Attentive Sequential Recommendations with Hyperbolic RepresentationsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688180(981-986)Online publication date: 8-Oct-2024

Index Terms

  1. Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems
    October 2024
    1438 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Sequential recommendation
    2. cross-entropy loss
    3. negative sampling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Acceptance Rates

    Overall Acceptance Rate 254 of 1,295 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)344
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A user-embedded temporal attention neural network for IoT trajectories predictionPeerJ Computer Science10.7717/peerj-cs.268111(e2681)Online publication date: 11-Feb-2025
    • (2024)Self-Attentive Sequential Recommendations with Hyperbolic RepresentationsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688180(981-986)Online publication date: 8-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media