research-article

Scaling Multi-Armed Bandit Algorithms

Authors:

Edouard Fouché,

Junpei Komiyama,

Klemens BöhmAuthors Info & Claims

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1449 - 1459

https://doi.org/10.1145/3292500.3330862

Published: 25 July 2019 Publication History

Abstract

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real-world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.

References

[1]

Mastane Achab, Stéphan Clé mencc on, and Auré lien Garivier. 2018. Profitable Bandits. In ACML (Proceedings of Machine Learning Research), Vol. 95. PMLR, 694--709. http://proceedings.mlr.press/v95/achab18a.html

[2]

Shipra Agrawal and Navin Goyal. 2013. Further Optimal Regret Bounds for Thompson Sampling. In AISTATS (JMLR Workshop and Conference Proceedings), Vol. 31. JMLR.org, 99--107. http://proceedings.mlr.press/v31/agrawal13a.html

[3]

Venkatachalam Anantharam, Pravin Varaiya, and Jean Walrand. 1987. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards. IEEE Trans. Automat. Control, Vol. 32, 11 (1987), 968--976.

[4]

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002 a. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, 2--3 (2002), 235--256.

Digital Library

[5]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science. 322--331.

Digital Library

[6]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002 b. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput., Vol. 32, 1 (2002), 48--77.

Digital Library

[7]

Baruch Awerbuch and Robert D. Kleinberg. 2004. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In STOC. ACM, 45--53.

Digital Library

[8]

Albert Bifet and Ricard Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In SDM. SIAM, 443--448.

[9]

Sébastien Bubeck and Nicolò Cesa-Bianchi. 2012. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, Vol. 5, 1 (2012), 1--122.

[10]

Giuseppe Burtini, Jason L. Loeppky, and Ramon Lawrence. 2015. A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit. CoRR, Vol. abs/1510.00757 (2015). http://arxiv.org/abs/1510.00757

[11]

Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, and Eli Upfal. 2008. Mortal Multi-Armed Bandits. In NIPS. 273--280.

Digital Library

[12]

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, and Pinyan Lu. 2016. Combinatorial Multi-Armed Bandit with General Reward Functions. In NIPS. 1651--1659.

Digital Library

[13]

Rémy Degenne and Vianney Perchet. 2016. Anytime optimal algorithms in stochastic multi-armed bandits. In ICML (JMLR Workshop and Conference Proceedings). JMLR.org, 1587--1595. http://proceedings.mlr.press/v48/degenne16.html

Digital Library

[14]

Jo a o Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv., Vol. 46, 4 (2014), 44:1--44:37.

Digital Library

[15]

Auré lien Garivier and Olivier Cappé. 2011. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. In COLT (JMLR Proceedings), Vol. 19. JMLR.org, 359--376. http://proceedings.mlr.press/v19/garivier11a.html

[16]

Auré lien Garivier and Eric Moulines. 2008. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems. CoRR, Vol. abs/0805.3415 (2008). https://arxiv.org/abs/0805.3415

[17]

Aurélien Garivier and Eric Moulines. 2011. On Upper-Confidence Bound Policies for Switching Bandit Problems. In ALT (Lecture Notes in Computer Science), Vol. 6925. Springer, 174--188.

[18]

Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. 2012. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In ALT (Lecture Notes in Computer Science), Vol. 7568. Springer, 199--213.

Digital Library

[19]

Robert D. Kleinberg. 2006. Anytime algorithms for multi-armed bandit problems. In SODA. ACM Press, 928--936.

Digital Library

[20]

Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. 2015. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. In ICML (JMLR Workshop and Conference Proceedings), Vol. 37. JMLR.org, 1152--1161. http://proceedings.mlr.press/v37/komiyama15.html

Digital Library

[21]

Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information. Phys. Rev. E, Vol. 69 (Jun 2004), 066138. Issue 6.

[22]

Tor Lattimore and Csaba Szepesvári. 2019. Bandit Algorithms. Cambridge University Press (preprint). https://banditalgs.com/

[23]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In WWW. ACM, 661--670.

Digital Library

[24]

Tian Li, Jie Zhong, Ji Liu, Wentao Wu, and Ce Zhang. 2018. Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads. PVLDB, Vol. 11, 5 (2018), 607--620. https://dl.acm.org/citation.cfmid=3177737

Digital Library

[25]

Odalric-Ambrym Maillard. 2017. Boundary Crossing for General Exponential Families. In ALT (Proceedings of Machine Learning Research), Vol. 76. PMLR, 151--184. http://proceedings.mlr.press/v76/maillard17a.html

[26]

Vishnu Raj and Sheetal Kalyani. 2017. Taming Non-stationary Bandits: A Bayesian Approach. CoRR, Vol. abs/1707.09727 (2017). http://arxiv.org/abs/1707.09727

[27]

Aleksandrs Slivkins and Eli Upfal. 2008. Adapting to a Changing Environment: the Brownian Restless Bandits. In COLT. Omnipress, 343--354.

[28]

Vaibhav Srivastava, Paul Reverdy, and Naomi Ehrich Leonard. 2014. Surveillance in an abruptly changing world via multiarmed bandits. In CDC. IEEE, 692--697.

[29]

Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement learning : an introduction .MIT Press.

Digital Library

[30]

William R. Thompson. 1933. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, Vol. 25, 3/4 (1933), 285--294.

[31]

Long Tran-Thanh, Archie C. Chapman, Enrique Munoz de Cote, Alex Rogers, and Nicholas R. Jennings. 2010. Epsilon-First Policies for Budget-Limited Multi-Armed Bandits. In AAAI. AAAI Press.

[32]

Taishi Uchiya, Atsuyoshi Nakamura, and Mineichi Kudo. 2010. Algorithms for Adversarial Bandit Problems with Multiple Plays. In ALT, Vol. 6331. Springer, 375--389.

Digital Library

[33]

Yingce Xia, Tao Qin, Weidong Ma, Nenghai Yu, and Tie-Yan Liu. 2016. Budgeted Multi-Armed Bandits with Multiple Plays. In IJCAI. IJCAI/AAAI Press, 2210--2216. http://www.ijcai.org/Abstract/16/315

Digital Library

Cited By

Luo HCho YDemmel JLi XLiu Y(2024)Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian OptimizationJournal of Computational and Graphical Statistics10.1080/10618600.2024.230821633:3(855-868)Online publication date: 8-Mar-2024
https://doi.org/10.1080/10618600.2024.2308216
Heyden MFouché EArzamasov VFenn TKalinke FBöhm K(2024)Adaptive Bernstein change detector for high-dimensional data streamsData Mining and Knowledge Discovery10.1007/s10618-023-00999-538:3(1334-1363)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10618-023-00999-5
Souiden IBrahmi ZOmri M(2024)A Metaheuristic-Based Subspace Search Approach for Outlier Detection in High-Dimensional Data StreamsAdvancements in Architectural, Engineering, and Construction Research and Practice10.1007/978-3-031-59329-1_3(29-41)Online publication date: 4-Oct-2024
https://doi.org/10.1007/978-3-031-59329-1_3
Show More Cited By

Index Terms

Scaling Multi-Armed Bandit Algorithms

Recommendations

Multi-armed Bandit with Additional Observations
SIGMETRICS '18

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...
Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2019

3305 pages

ISBN:9781450362016

DOI:10.1145/3292500

General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

German Federal Ministry of Education and Research

Conference

KDD '19

Sponsor:

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 4 - 8, 2019

AK, Anchorage, USA

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
851
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)7

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Luo HCho YDemmel JLi XLiu Y(2024)Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian OptimizationJournal of Computational and Graphical Statistics10.1080/10618600.2024.230821633:3(855-868)Online publication date: 8-Mar-2024
https://doi.org/10.1080/10618600.2024.2308216
Heyden MFouché EArzamasov VFenn TKalinke FBöhm K(2024)Adaptive Bernstein change detector for high-dimensional data streamsData Mining and Knowledge Discovery10.1007/s10618-023-00999-538:3(1334-1363)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10618-023-00999-5
Souiden IBrahmi ZOmri M(2024)A Metaheuristic-Based Subspace Search Approach for Outlier Detection in High-Dimensional Data StreamsAdvancements in Architectural, Engineering, and Construction Research and Practice10.1007/978-3-031-59329-1_3(29-41)Online publication date: 4-Oct-2024
https://doi.org/10.1007/978-3-031-59329-1_3
Fu XModiano E(2023)Optimal Routing to Parallel Servers With Unknown Utilities—Multi-Armed Bandit With QueuesIEEE/ACM Transactions on Networking10.1109/TNET.2022.322713631:5(1997-2012)Online publication date: Oct-2023
https://doi.org/10.1109/TNET.2022.3227136
Koike YKatsura HYakura HKurogome Y(2022)SLOPT: Bandit Optimization Framework for Mutation-Based FuzzingProceedings of the 38th Annual Computer Security Applications Conference10.1145/3564625.3564659(519-533)Online publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1145/3564625.3564659
Song YSun SLian JHuang HLi YJin HXie XSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)Show Me the Whole WorldProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498459(947-956)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498459
Souiden IOmri MBrahmi Z(2022)A survey of outlier detection in high dimensional data streamsComputer Science Review10.1016/j.cosrev.2022.10046344(100463)Online publication date: May-2022
https://doi.org/10.1016/j.cosrev.2022.100463
Cavenaghi ESottocornola GStella FZanker M(2021)Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware AlgorithmEntropy10.3390/e2303038023:3(380)Online publication date: 23-Mar-2021
https://doi.org/10.3390/e23030380
Namba H(2021)Non-stationary Stochastic Multi-armed Bandit Problems with External Information on Stationarity時間変化に関する外部情報を考慮した非定常多腕バンディット問題Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.36-3_D-K8436:3(D-K84_1-11)Online publication date: 1-May-2021
https://doi.org/10.1527/tjsai.36-3_D-K84
Alves RLedent AKloft M(2021)Burst-induced Multi-Armed Bandit for Learning RecommendationProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3474250(292-301)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3460231.3474250
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten