research-article

Networked bandits with disjoint linear payoffs

Authors:
Meng Fang

University of Technology, Sydney, Sydney, Australia

University of Technology, Sydney, Sydney, Australia
View Profile

,
Dacheng Tao

University of Technology, Sydney, Australia

University of Technology, Sydney, Australia
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 1106–1115https://doi.org/10.1145/2623330.2623672

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1106–1115

ABSTRACT

In this paper, we study `networked bandits', a new bandit problem where a set of interrelated arms varies over time and, given the contextual information that selects one arm, invokes other correlated arms. This problem remains under-investigated, in spite of its applicability to many practical problems. For instance, in social networks, an arm can obtain payoffs from both the selected user and its relations since they often share the content through the network. We examine whether it is possible to obtain multiple payoffs from several correlated arms based on the relationships. In particular, we formalize the networked bandit problem and propose an algorithm that considers not only the selected arm, but also the relationships between arms. Our algorithm is `optimism in face of uncertainty' style, in that it decides an arm depending on integrated confidence sets constructed from historical data. We analyze the performance in simulation experiments and on two real-world offline datasets. The experimental results demonstrate our algorithm's effectiveness in the networked bandit setting.

Supplemental Material

p1106-sidebyside.mp4

mp4

153.8 MB

Download

References

Y. Abbasi-Yadkori, C. Szepesvári, and D. Tax. Improved algorithms for linear stochastic bandits. In NIPS, pages 2312--2320, 2011.Google ScholarDigital Library
A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. In KDD, pages 7--15. ACM, 2008. Google ScholarDigital Library
J.-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. JMLR, 9999:2785--2836, 2010. Google ScholarDigital Library
J.-Y. Audibert, R. Munos, and C. Szepesvári. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci., 410(19):1876--1902, 2009. Google ScholarDigital Library
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 3:397--422, 2003. Google ScholarDigital Library
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarDigital Library
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J COMPUT, 32(1):48--77, 2002. Google ScholarDigital Library
Z. Bnaya, R. Puzis, R. Stern, and A. Felner. Social network search as a volatile multi-armed bandit problem. HUMAN, 2(2):pp--84, 2013.Google Scholar
S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1--122, 2012.Google ScholarCross Ref
S. Buccapatnam, A. Eryilmaz, and N. B. Shroff. Multi-armed bandits in the presence of side observations in social networks. OSU, Tech. Rep, 2013.Google Scholar
N. Cesa-Bianchi, C. Gentile, and G. Zappella. A gang of bandits. In NIPS, 2013.Google Scholar
W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In AISTAS, pages 208--214, 2011.Google Scholar
D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback effects between similarity and social influence in online communities. In KDD, pages 160--168. ACM, 2008. Google ScholarDigital Library
V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. In COLT, pages 355--366, 2008.Google Scholar
T. Iwata, A. Shah, and Z. Ghahramani. Discovering latent influence in online social activities via shared cascade poisson processes. In KDD, pages 266--274. ACM, 2013. Google ScholarDigital Library
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. ADV APPL PROBAB, 6(1):4--22, 1985. Google ScholarDigital Library
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010. Google ScholarDigital Library
S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks. In KDD, pages 33--41. ACM, 2012. Google ScholarDigital Library
V. H. Peña, T. L. Lai, and Q.-M. Shao. Self-normalized processes: Limit theory and Statistical Applications. Springer, 2008.Google Scholar
H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169--177. Springer, 1985.Google ScholarCross Ref
P. Rusmevichientong and J. N. Tsitsiklis. Linearly parameterized bandits. MATH OPER RES, 35(2):395--411, 2010. Google ScholarDigital Library
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD, pages 807--816. ACM, 2009. Google ScholarDigital Library

Index Terms

Networked bandits with disjoint linear payoffs

Recommendations

Dueling bandits with weak regret
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: ...
Read More
Thompson sampling for contextual bandits with linear payoffs
ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical ...
Read More
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (''experts''), under partial observation: In each round t, only the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
exploration/exploitation dilemma
networked bandits
social network
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 721
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Networked bandits with disjoint linear payoffs

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Dueling bandits with weak regret

Thompson sampling for contextual bandits with linear payoffs

Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments