ABSTRACT
In this paper, we study `networked bandits', a new bandit problem where a set of interrelated arms varies over time and, given the contextual information that selects one arm, invokes other correlated arms. This problem remains under-investigated, in spite of its applicability to many practical problems. For instance, in social networks, an arm can obtain payoffs from both the selected user and its relations since they often share the content through the network. We examine whether it is possible to obtain multiple payoffs from several correlated arms based on the relationships. In particular, we formalize the networked bandit problem and propose an algorithm that considers not only the selected arm, but also the relationships between arms. Our algorithm is `optimism in face of uncertainty' style, in that it decides an arm depending on integrated confidence sets constructed from historical data. We analyze the performance in simulation experiments and on two real-world offline datasets. The experimental results demonstrate our algorithm's effectiveness in the networked bandit setting.
Supplemental Material
- Y. Abbasi-Yadkori, C. Szepesvári, and D. Tax. Improved algorithms for linear stochastic bandits. In NIPS, pages 2312--2320, 2011.Google ScholarDigital Library
- A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. In KDD, pages 7--15. ACM, 2008. Google ScholarDigital Library
- J.-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. JMLR, 9999:2785--2836, 2010. Google ScholarDigital Library
- J.-Y. Audibert, R. Munos, and C. Szepesvári. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci., 410(19):1876--1902, 2009. Google ScholarDigital Library
- P. Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 3:397--422, 2003. Google ScholarDigital Library
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarDigital Library
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J COMPUT, 32(1):48--77, 2002. Google ScholarDigital Library
- Z. Bnaya, R. Puzis, R. Stern, and A. Felner. Social network search as a volatile multi-armed bandit problem. HUMAN, 2(2):pp--84, 2013.Google Scholar
- S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1--122, 2012.Google ScholarCross Ref
- S. Buccapatnam, A. Eryilmaz, and N. B. Shroff. Multi-armed bandits in the presence of side observations in social networks. OSU, Tech. Rep, 2013.Google Scholar
- N. Cesa-Bianchi, C. Gentile, and G. Zappella. A gang of bandits. In NIPS, 2013.Google Scholar
- W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In AISTAS, pages 208--214, 2011.Google Scholar
- D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback effects between similarity and social influence in online communities. In KDD, pages 160--168. ACM, 2008. Google ScholarDigital Library
- V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. In COLT, pages 355--366, 2008.Google Scholar
- T. Iwata, A. Shah, and Z. Ghahramani. Discovering latent influence in online social activities via shared cascade poisson processes. In KDD, pages 266--274. ACM, 2013. Google ScholarDigital Library
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. ADV APPL PROBAB, 6(1):4--22, 1985. Google ScholarDigital Library
- L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010. Google ScholarDigital Library
- S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks. In KDD, pages 33--41. ACM, 2012. Google ScholarDigital Library
- V. H. Peña, T. L. Lai, and Q.-M. Shao. Self-normalized processes: Limit theory and Statistical Applications. Springer, 2008.Google Scholar
- H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169--177. Springer, 1985.Google ScholarCross Ref
- P. Rusmevichientong and J. N. Tsitsiklis. Linearly parameterized bandits. MATH OPER RES, 35(2):395--411, 2010. Google ScholarDigital Library
- J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD, pages 807--816. ACM, 2009. Google ScholarDigital Library
Index Terms
- Networked bandits with disjoint linear payoffs
Recommendations
Dueling bandits with weak regret
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: ...
Thompson sampling for contextual bandits with linear payoffs
ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical ...
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments
The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (''experts''), under partial observation: In each round t, only the ...
Comments