research-article

Public Access

Incentivizing Exploration in Linear Contextual Bandits under Information Gap

Authors:

Hongning WangAuthors Info & Claims

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Pages 415 - 425

https://doi.org/10.1145/3604915.3608794

Published: 14 September 2023 Publication History

All formats PDF

Abstract

Contextual bandit algorithms have been popularly used to address interactive recommendation, where the users are assumed to be cooperative to explore all recommendations from a system. In this paper, we relax this strong assumption and study the problem of incentivized exploration with myopic users, where the users are only interested in recommendations with their currently highest estimated reward. As a result, in order to obtain long-term optimality, the system needs to offer compensation to incentivize the users to take the exploratory recommendations. We consider a new and practically motivated setting where the context features employed by the user are more informative than those used by the system: for example, features based on users’ private information are not accessible by the system. We develop an effective solution for incentivized exploration under such an information gap, and prove that the method achieves a sublinear rate in both regret and compensation. We theoretically and empirically analyze the added compensation due to the information gap, compared with the case where the system has access to the same context features as the user does, i.e., without information gap. Moreover, we also provide a compensation lower bound of this problem.

References

[1]

Yasin Abbasi-yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved Algorithms for Linear Stochastic Bandits. In NIPS. 2312–2320.

[2]

Marc Abeille and Alessandro Lazaric. 2017. Linear thompson sampling revisited. In Artificial Intelligence and Statistics. PMLR, 176–184.

[3]

Priyank Agrawal and Theja Tulabandhula. 2020. Incentivising Exploration and Recommendations for Contextual Bandits with Payments. In Multi-Agent Systems and Agreement Technologies. Springer, 159–170.

[4]

Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning. PMLR, 127–135.

[5]

Peter Auer. 2002. Using Confidence Bounds for Exploitation-Exploration Trade-offs. Journal of Machine Learning Research 3 (2002), 397–422.

Digital Library

[6]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.

Digital Library

[7]

Bin Bi, Milad Shokouhi, Michal Kosinski, and Thore Graepel. 2013. Inferring the demographics of search users: Social data meets search queries. In Proceedings of the 22nd international conference on World Wide Web. 131–140.

Digital Library

[8]

Nicolo Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. 2013. A gang of bandits. In Advances in Neural Information Processing Systems. 737–745.

[9]

Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249–2257.

[10]

Bangrui Chen, Peter Frazier, and David Kempe. 2018. Incentivizing exploration by heterogeneous users. In Conference On Learning Theory. PMLR, 798–818.

[11]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.

Digital Library

[12]

Minmin Chen, Can Xu, Vince Gatto, Devanshu Jain, Aviral Kumar, and Ed Chi. 2022. Off-policy actor-critic for recommender systems. In Proceedings of the 16th ACM Conference on Recommender Systems. 338–349.

Digital Library

[13]

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 208–214.

[14]

Peter Frazier, David Kempe, Jon Kleinberg, and Robert Kleinberg. 2014. Incentivizing exploration. In Proceedings of the fifteenth ACM conference on Economics and computation. ACM, 5–22.

Digital Library

[15]

Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana, Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, and Sourav Das. 2020. Deep bayesian bandits: Exploring in online personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems. 456–461.

Digital Library

[16]

Christoph Hirnschall, Adish Singla, Sebastian Tschiatschek, and Andreas Krause. 2018. Learning user preferences to incentivize exploration in the sharing economy. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[17]

Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, and Zhiwei Steven Wu. 2018. Incentivizing Exploration with Selective Data Disclosure. arXiv preprint arXiv:1811.06026 (2018).

[18]

Olivier Jeunen and Bart Goethals. 2021. Pessimistic reward models for off-policy learning in recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems. 63–74.

Digital Library

[19]

Sampath Kannan, Michael Kearns, Jamie Morgenstern, Mallesh Pai, Aaron Roth, Rakesh Vohra, and Zhiwei Steven Wu. 2017. Fairness incentives for myopic agents. In Proceedings of the 2017 ACM Conference on Economics and Computation. 369–386.

Digital Library

[20]

Ilan Kremer, Yishay Mansour, and Motty Perry. 2014. Implementing the “wisdom of the crowd”. Journal of Political Economy 122, 5 (2014), 988–1012.

[21]

Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4–22.

[22]

Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar, and Babak Hassibi. 2019. Stochastic linear bandits with hidden low rank structure. arXiv preprint arXiv:1901.09490 (2019).

[23]

Tor Lattimore and Csaba Szepesvari. 2017. The end of optimism? an asymptotic analysis of finite-armed linear bandits. In Artificial Intelligence and Statistics. PMLR, 728–737.

[24]

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 661–670.

Digital Library

[25]

Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, and Lijun Chen. 2020. Incentivized Exploration for Multi-Armed Bandits under Reward Drift. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4981–4988.

[26]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H Chi. 2020. Off-policy learning in two-stage recommender systems. In Proceedings of The Web Conference 2020. 463–473.

Digital Library

[27]

Yishay Mansour, Aleksandrs Slivkins, and Vasilis Syrgkanis. 2015. Bayesian incentive-compatible bandit exploration. In Proceedings of the Sixteenth ACM Conference on Economics and Computation. ACM, 565–582.

Digital Library

[28]

Mark Sellke and Aleksandrs Slivkins. 2020. Sample complexity of incentivized exploration. arXiv preprint arXiv:2002.00558 (2020).

[29]

Max Simchowitz and Aleksandrs Slivkins. 2021. Exploration and incentives in reinforcement learning. arXiv preprint arXiv:2103.00360 (2021).

[30]

Aleksandrs Slivkins. 2017. Incentivizing exploration via information asymmetry. XRDS: Crossroads, The ACM Magazine for Students 24, 1 (2017), 38–41.

Digital Library

[31]

Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017. Factorization bandits for interactive recommendation. In Thirty-First AAAI Conference on Artificial Intelligence.

Digital Library

[32]

Siwei Wang and Longbo Huang. 2018. Multi-armed Bandits with Compensation. In NeurIPS.

[33]

Ingmar Weber and Carlos Castillo. 2010. The demographics of web search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 523–530.

Digital Library

[34]

Qingyun Wu, Huazheng Wang, Quanquan Gu, and Hongning Wang. 2016. Contextual bandits in a collaborative environment. In SIGIR 2016. ACM, 529–538.

Digital Library

Index Terms

Incentivizing Exploration in Linear Contextual Bandits under Information Gap
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms

Recommendations

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
EC '21: Proceedings of the 22nd ACM Conference on Economics and Computation

We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the ...
Incentivising Exploration and Recommendations for Contextual Bandits with Payments
Multi-Agent Systems and Agreement Technologies
Abstract
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how ...
Incentivizing exploration
EC '14: Proceedings of the fifteenth ACM conference on Economics and computation

We study a Bayesian multi-armed bandit (MAB) setting in which a principal seeks to maximize the sum of expected time-discounted rewards obtained by pulling arms, when the arms are actually pulled by selfish and myopic individuals. Since such individuals ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

September 2023

1406 pages

ISBN:9798400702419

DOI:10.1145/3604915

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

RecSys '23

Sponsor:

RecSys '23: Seventeenth ACM Conference on Recommender Systems

September 18 - 22, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
337
Total Downloads

Downloads (Last 12 months)170
Downloads (Last 6 weeks)23

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten