abstract

Nonlinear Bandits Exploration for Recommendations

Authors:
Yi Su

Deepmind, Google, USA

Deepmind, Google, USA

0009-0000-9207-1719
View Profile

,
Minmin Chen

Deepmind, Google, USA

Deepmind, Google, USA

0000-0002-7342-9022
View Profile

RecSys '23: Proceedings of the 17th ACM Conference on Recommender SystemsSeptember 2023Pages 1054–1057https://doi.org/10.1145/3604915.3610245

Published:14 September 2023Publication History

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Pages 1054–1057

ABSTRACT

The paradigm of framing recommendations as (sequential) decision-making processes has gained significant interest. To achieve long-term user satisfaction, these interactive systems need to strike a balance between exploitation (recommending high-reward items) and exploration (exploring uncertain regions for potentially better items). Classical bandit algorithms like Upper-Confidence-Bound and Thompson Sampling, and their contextual extensions with linear payoffs have exhibited strong theoretical guarantees and empirical success in managing the exploration-exploitation trade-off. Building efficient exploration-based systems for deep neural network powered real-world, large-scale industrial recommender systems remains under studied. In addition, these systems are often multi-stage, multi-objective and response time sensitive. In this talk, we share our experience in addressing these challenges in building exploration based industrial recommender systems. Specifically, we adopt the Neural Linear Bandit algorithm, which effectively combines the representation power of deep neural networks, with the simplicity of linear bandits to incorporate exploration in DNN based recommender systems. We introduce exploration capability to both the nomination and ranking stage of the industrial recommender system. In the context of the ranking stage, we delve into the extension of this algorithm to accommodate the multi-task setup, enabling exploration in systems with multiple objectives. Moving on to the nomination stage, we will address the development of efficient bandit algorithms tailored to factorized bi-linear models. These algorithms play a crucial role in facilitating maximum inner product search, which is commonly employed in large-scale retrieval systems. We validate our algorithms and present findings from real-world live experiments.

References

Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International conference on machine learning. PMLR, 127–135.Google Scholar
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.Google ScholarDigital Library
Walid Bendada, Guillaume Salha, and Théo Bontempelli. 2020. Carousel personalization in music streaming apps with contextual bandits. In Proceedings of the 14th ACM Conference on Recommender Systems. 420–425.Google ScholarDigital Library
Djallel Bouneffouf, Irina Rish, and Charu Aggarwal. 2020. Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.Google ScholarDigital Library
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in neural information processing systems 24 (2011).Google Scholar
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.Google ScholarDigital Library
Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 208–214.Google Scholar
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. 39–46.Google ScholarDigital Library
Qin Ding, Cho-Jui Hsieh, and James Sharpnack. 2021. An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. In International Conference on Artificial Intelligence and Statistics. PMLR, 1585–1593.Google Scholar
Sarah Filippi, Olivier Cappe, Aurélien Garivier, and Csaba Szepesvári. 2010. Parametric bandits: The generalized linear case. Advances in Neural Information Processing Systems 23 (2010).Google Scholar
Yingqiang Ge, Shuya Zhao, Honglu Zhou, Changhua Pei, Fei Sun, Wenwu Ou, and Yongfeng Zhang. 2020. Understanding echo chambers in e-commerce recommender systems. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 2261–2270.Google ScholarDigital Library
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning. PMLR, 3887–3896.Google Scholar
Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, and Pushmeet Kohli. 2019. Degenerate feedback loops in recommender systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 383–390.Google ScholarDigital Library
Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation. Advances in neural information processing systems 28 (2015).Google Scholar
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.Google ScholarDigital Library
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.Google ScholarDigital Library
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930–1939.Google ScholarDigital Library
Sashank J Reddi, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Jiecao Chen, and Sanjiv Kumar. 2019. Stochastic negative mining for learning with large output spaces. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1940–1949.Google Scholar
Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep bayesian bandits showdown. In International conference on learning representations, Vol. 9.Google Scholar
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. 285–295.Google ScholarDigital Library
Ayan Sinha, David F Gleich, and Karthik Ramani. 2016. Deconvolving feedback loops in recommender systems. Advances in neural information processing systems 29 (2016).Google Scholar
William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3-4 (1933), 285–294.Google ScholarCross Ref
Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H Chi. 2020. Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion Proceedings of the Web Conference 2020. 441–447.Google ScholarDigital Library
Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems. 43–51.Google ScholarDigital Library
Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, and Branislav Kveton. 2016. Cascading bandits for large-scale recommendation problems. Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence (2016), 835–844.Google ScholarDigital Library

Index Terms

Nonlinear Bandits Exploration for Recommendations
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Personalized Recommendations for Music Genre Exploration
UMAP '19: Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization

Most recommender systems generate recommendations to match the user's current preference. However, users sometimes might have the goal to develop new preferences away from their current preference and use the recommender to guide them towards it. In ...
Read More
Naïve filterbots for robust cold-start recommendations
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

The goal of a recommender system is to suggest items of interest to a user based on historical behavior of a community of users. Given detailed enough history, item-based collaborative filtering (CF) often performs as well or better than almost any ...
Read More
Mixing bandits: a recipe for improved cold-start recommendations in a social network
SNAKDD '13: Proceedings of the 7th Workshop on Social Network Mining and Analysis

Recommending items to new or "cold-start" users is a challenging problem for recommender systems. Collaborative filtering approaches fail when the preference history of users is not available. A promising direction that has been explored recently [12] ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems
September 2023
1406 pages
ISBN:9798400702419
DOI:10.1145/3604915
Editors:
Jie Zhang,
Li Chen,
Shlomo Berkovsky,
Min Zhang,
Tommaso di Noia,
Justin Basilico,
Luiz Pizzato,
Yang Song
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 September 2023
Check for updates
Author Tags
Exploration
Recommender Systems
Qualifiers
- abstract
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate254of1,295submissions,20%
Upcoming Conference
RecSys '24

Sponsor:

sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 398
  Total Downloads
- Downloads (Last 12 months)398
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Nonlinear Bandits Exploration for Recommendations

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Personalized Recommendations for Music Genre Exploration

Naïve filterbots for robust cold-start recommendations

Mixing bandits: a recipe for improved cold-start recommendations in a social network