skip to main content
10.1145/2911451.2911528acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Contextual Bandits in a Collaborative Environment

Published: 07 July 2016 Publication History

Abstract

Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit parameters pertaining to each user independently. This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components.
In this paper, we develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithm against several state-of-the-art contextual bandit algorithms.

References

[1]
Y. Abbasi-yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In NIPS, pages 2312--2320. 2011.
[2]
K. Amin, M. Kearns, and U. Syed. Graphical models for bandit problems. Proceedings of UAI 2011, 2011.
[3]
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397--422, 2002.
[4]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2-3):235--256, May 2002.
[5]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on, pages 322--331, 1995.
[6]
D. Bouneffouf, A. Bouzeghoub, and A. L. Gançarski. A contextual-bandit algorithm for mobile context-aware recommender system. In Neural Information Processing, pages 324--331. 2012.
[7]
J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Technical Report MSR-TR-98-12, Microsoft Research, May 1998.
[8]
S. Buccapatnam, A. Eryilmaz, and N. B. Shroff. Multi-armed bandits in the presence of side observations in social networks. In Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pages 7309--7314. IEEE, 2013.
[9]
S. Buccapatnam, A. Eryilmaz, and N. B. Shroff. Stochastic bandits with side observations on networks. SIGMETRICS Perform. Eval. Rev., 42(1):289--300, June 2014.
[10]
N. Cesa-Bianchi, C. Gentile, and G. Zappella. A gang of bandits. In Pro. NIPS, 2013.
[11]
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In Advances in Neural Information Processing Systems, pages 2249--2257, 2011.
[12]
W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics, pages 208--214, 2011.
[13]
R. B. Cialdini and M. R. Trost. Social influence: Social norms, conformity and compliance. 1998.
[14]
S. Filippi, O. Cappe, A. Garivier, and C. Szepesvári. Parametric bandits: The generalized linear case. In NIPS, pages 586--594, 2010.
[15]
C. Gentile, S. Li, and G. Zappella. Online clustering of bandits. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 757--765, 2014.
[16]
J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), pages 148--177, 1979.
[17]
S. Kar, H. Poor, and S. Cui. Bandit problems in networks: Asymptotically efficient distributed allocation rules. In Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pages 1771--1778, Dec 2011.
[18]
J. Kawale, H. H. Bui, B. Kveton, L. Tran-Thanh, and S. Chawla. Efficient thompson sampling for online matrix-factorization recommendation. In Advances in Neural Information Processing Systems, pages 1297--1305, 2015.
[19]
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4--22, 1985.
[20]
J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, pages 817--824. 2008.
[21]
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of 19th WWW, pages 661--670. ACM, 2010.
[22]
L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of 4th WSDM, pages 297--306. ACM, 2011.
[23]
W. Li, X. Wang, R. Zhang, Y. Cui, J. Mao, and R. Jin. Exploitation and exploration in a performance based contextual advertising system. In Proceedings of 16th SIGKDD, pages 27--36. ACM, 2010.
[24]
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In Proceedings of 25th ICML, pages 784--791. ACM, 2008.
[25]
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of 10th WWW, pages 285--295. ACM, 2001.
[26]
A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In Proceedings of 25th SIGIR, pages 253--260. ACM, 2002.
[27]
A. Slivkins. Contextual bandits with similarity information. The Journal of Machine Learning Research, 15(1):2533--2568, 2014.
[28]
Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of 26th ICML, pages 1201--1208. ACM, 2009.
[29]
X. Zhao, W. Zhang, and J. Wang. Interactive collaborative filtering. In Proceedings of the 22nd CIKM, pages 1411--1420. ACM, 2013.

Cited By

View all
  • (2025)A Differentially Private Approach for Budgeted Combinatorial Multi-Armed BanditsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.340183622:1(424-439)Online publication date: Jan-2025
  • (2024)Dynamic Strategy Optimizer (DSO): Application In Enhancing New User Engagement in Hybrid Recommender System2024 IEEE 15th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)10.1109/UEMCON62879.2024.10754751(750-756)Online publication date: 17-Oct-2024
  • (2024)Online Learning and Detecting Corrupted Users for Conversational Recommendation SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.344825036:12(8939-8953)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collaborative contextual bandits
  2. online recommendations
  3. reinforcement learning

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)5
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Differentially Private Approach for Budgeted Combinatorial Multi-Armed BanditsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.340183622:1(424-439)Online publication date: Jan-2025
  • (2024)Dynamic Strategy Optimizer (DSO): Application In Enhancing New User Engagement in Hybrid Recommender System2024 IEEE 15th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)10.1109/UEMCON62879.2024.10754751(750-756)Online publication date: 17-Oct-2024
  • (2024)Online Learning and Detecting Corrupted Users for Conversational Recommendation SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.344825036:12(8939-8953)Online publication date: Dec-2024
  • (2024)Conversational Recommendation With Online Learning and Clustering on Misspecified UsersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342344236:12(7825-7838)Online publication date: Dec-2024
  • (2024)A systematic literature review of recent advances on context-aware recommender systemsArtificial Intelligence Review10.1007/s10462-024-10939-458:1Online publication date: 16-Nov-2024
  • (2023)Online corrupted user detection and regret minimizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667567(33262-33287)Online publication date: 10-Dec-2023
  • (2023)Noise-adaptive thompson sampling for linear contextual banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667148(23630-23657)Online publication date: 10-Dec-2023
  • (2023)Follow-ups also matterProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666684(12774-12796)Online publication date: 10-Dec-2023
  • (2023)Online clustering of bandits with misspecified user modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666288(3785-3818)Online publication date: 10-Dec-2023
  • (2023)SGNR: A Social Graph Neural Network Based Interactive Recommendation Scheme for E-CommerceTsinghua Science and Technology10.26599/TST.2022.901005028:4(786-798)Online publication date: Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media