Abstract
"Introduction to multi-armed bandits" is a broad and accessible textbook which emphasizes connections to economics and operations research.
- Berry, D. A. and Fristedt, B. 1985. Bandit problems: sequential allocation of experiments. Springer, Heidelberg, Germany.Google Scholar
- Bubeck, S. and Cesa-Bianchi, N. 2012. Regret Analysis of Stochastic and Non-stochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning 5, 1, 1--122. Published with Now Publishers (Boston, MA, USA). Also available at https://arxiv.org/abs/1204.5721.Google ScholarCross Ref
- Cesa-Bianchi, N. and Lugosi, G. 2006. Prediction, learning, and games. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
- Gittins, J., Glazebrook, K., and Weber, R. 2011. Multi-Armed Bandit Allocation Indices, 2nd ed. John Wiley & Sons, Hoboken, NJ, USA. The first edition, single-authored by John Gittins, has been published in 1989.Google Scholar
- Hazan, E. 2015. Introduction to Online Convex Optimization. Foundations and Trends? in Optimization 2, 3--4, 157--325. Published with Now Publishers (Boston, MA, USA). Also available at https://arxiv.org/abs/1909.05207. Google ScholarDigital Library
- Lattimore, T. and Szepesvari, C. 2020. Bandit Algorithms. Cambridge University Press, Cambridge, UK. Preprint, to be published in 2020. Versions available at https://banditalgs.com/ since 2018.Google Scholar
- Russo, D., Roy, B. V., Kazerouni, A., Osband, I., and Wen, Z. 2018. A tutorial on thompson sampling. Foundations and Trends in Machine Learning 11, 1, 1--96. Published with Now Publishers (Boston, MA, USA). Also available at https://arxiv.org/abs/1707.02038. Google ScholarDigital Library
- Slivkins, A. 2019. Introduction to multi-armed bandits. Foundations and Trends@ in Machine Learning 12, 1--2 (Nov.), 1--286. Published with Now Publishers (Boston, MA, USA). Also available at https://arxiv.org/abs/1904.07272.Google Scholar
Recommendations
Ballooning Multi-Armed Bandits
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent SystemsWe introduce ballooning multi-armed bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. The regret in a BL-MAB setting is computed with respect to the ...
Budgeted Combinatorial Multi-Armed Bandits
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent SystemsWe consider a budgeted combinatorial multi-armed bandit setting where, in every round, the algorithm selects a super-arm consisting of one or more arms. The goal is to minimize the total expected regret after all rounds within a limited budget. Existing ...
Multi-armed bandits with episode context
A multi-armed bandit episode consists of n trials, each allowing selection of one of K arms, resulting in payoff from a distribution over [0,1] associated with that arm. We assume contextual side information is available at the start of the episode. ...
Comments