skip to main content
10.1145/3292500.3330862acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Scaling Multi-Armed Bandit Algorithms

Published: 25 July 2019 Publication History

Abstract

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real-world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.

References

[1]
Mastane Achab, Stéphan Clé mencc on, and Auré lien Garivier. 2018. Profitable Bandits. In ACML (Proceedings of Machine Learning Research), Vol. 95. PMLR, 694--709. http://proceedings.mlr.press/v95/achab18a.html
[2]
Shipra Agrawal and Navin Goyal. 2013. Further Optimal Regret Bounds for Thompson Sampling. In AISTATS (JMLR Workshop and Conference Proceedings), Vol. 31. JMLR.org, 99--107. http://proceedings.mlr.press/v31/agrawal13a.html
[3]
Venkatachalam Anantharam, Pravin Varaiya, and Jean Walrand. 1987. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards. IEEE Trans. Automat. Control, Vol. 32, 11 (1987), 968--976.
[4]
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002 a. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, 2--3 (2002), 235--256.
[5]
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science. 322--331.
[6]
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002 b. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput., Vol. 32, 1 (2002), 48--77.
[7]
Baruch Awerbuch and Robert D. Kleinberg. 2004. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In STOC. ACM, 45--53.
[8]
Albert Bifet and Ricard Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In SDM. SIAM, 443--448.
[9]
Sébastien Bubeck and Nicolò Cesa-Bianchi. 2012. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, Vol. 5, 1 (2012), 1--122.
[10]
Giuseppe Burtini, Jason L. Loeppky, and Ramon Lawrence. 2015. A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit. CoRR, Vol. abs/1510.00757 (2015). http://arxiv.org/abs/1510.00757
[11]
Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, and Eli Upfal. 2008. Mortal Multi-Armed Bandits. In NIPS. 273--280.
[12]
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, and Pinyan Lu. 2016. Combinatorial Multi-Armed Bandit with General Reward Functions. In NIPS. 1651--1659.
[13]
Rémy Degenne and Vianney Perchet. 2016. Anytime optimal algorithms in stochastic multi-armed bandits. In ICML (JMLR Workshop and Conference Proceedings). JMLR.org, 1587--1595. http://proceedings.mlr.press/v48/degenne16.html
[14]
Jo a o Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv., Vol. 46, 4 (2014), 44:1--44:37.
[15]
Auré lien Garivier and Olivier Cappé. 2011. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. In COLT (JMLR Proceedings), Vol. 19. JMLR.org, 359--376. http://proceedings.mlr.press/v19/garivier11a.html
[16]
Auré lien Garivier and Eric Moulines. 2008. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems. CoRR, Vol. abs/0805.3415 (2008). https://arxiv.org/abs/0805.3415
[17]
Aurélien Garivier and Eric Moulines. 2011. On Upper-Confidence Bound Policies for Switching Bandit Problems. In ALT (Lecture Notes in Computer Science), Vol. 6925. Springer, 174--188.
[18]
Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. 2012. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In ALT (Lecture Notes in Computer Science), Vol. 7568. Springer, 199--213.
[19]
Robert D. Kleinberg. 2006. Anytime algorithms for multi-armed bandit problems. In SODA. ACM Press, 928--936.
[20]
Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. 2015. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. In ICML (JMLR Workshop and Conference Proceedings), Vol. 37. JMLR.org, 1152--1161. http://proceedings.mlr.press/v37/komiyama15.html
[21]
Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information. Phys. Rev. E, Vol. 69 (Jun 2004), 066138. Issue 6.
[22]
Tor Lattimore and Csaba Szepesvári. 2019. Bandit Algorithms. Cambridge University Press (preprint). https://banditalgs.com/
[23]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In WWW. ACM, 661--670.
[24]
Tian Li, Jie Zhong, Ji Liu, Wentao Wu, and Ce Zhang. 2018. Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads. PVLDB, Vol. 11, 5 (2018), 607--620. https://dl.acm.org/citation.cfmid=3177737
[25]
Odalric-Ambrym Maillard. 2017. Boundary Crossing for General Exponential Families. In ALT (Proceedings of Machine Learning Research), Vol. 76. PMLR, 151--184. http://proceedings.mlr.press/v76/maillard17a.html
[26]
Vishnu Raj and Sheetal Kalyani. 2017. Taming Non-stationary Bandits: A Bayesian Approach. CoRR, Vol. abs/1707.09727 (2017). http://arxiv.org/abs/1707.09727
[27]
Aleksandrs Slivkins and Eli Upfal. 2008. Adapting to a Changing Environment: the Brownian Restless Bandits. In COLT. Omnipress, 343--354.
[28]
Vaibhav Srivastava, Paul Reverdy, and Naomi Ehrich Leonard. 2014. Surveillance in an abruptly changing world via multiarmed bandits. In CDC. IEEE, 692--697.
[29]
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement learning : an introduction .MIT Press.
[30]
William R. Thompson. 1933. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, Vol. 25, 3/4 (1933), 285--294.
[31]
Long Tran-Thanh, Archie C. Chapman, Enrique Munoz de Cote, Alex Rogers, and Nicholas R. Jennings. 2010. Epsilon-First Policies for Budget-Limited Multi-Armed Bandits. In AAAI. AAAI Press.
[32]
Taishi Uchiya, Atsuyoshi Nakamura, and Mineichi Kudo. 2010. Algorithms for Adversarial Bandit Problems with Multiple Plays. In ALT, Vol. 6331. Springer, 375--389.
[33]
Yingce Xia, Tao Qin, Weidong Ma, Nenghai Yu, and Tie-Yan Liu. 2016. Budgeted Multi-Armed Bandits with Multiple Plays. In IJCAI. IJCAI/AAAI Press, 2210--2216. http://www.ijcai.org/Abstract/16/315

Cited By

View all
  • (2024)Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian OptimizationJournal of Computational and Graphical Statistics10.1080/10618600.2024.230821633:3(855-868)Online publication date: 8-Mar-2024
  • (2024)Adaptive Bernstein change detector for high-dimensional data streamsData Mining and Knowledge Discovery10.1007/s10618-023-00999-538:3(1334-1363)Online publication date: 1-May-2024
  • (2024)A Metaheuristic-Based Subspace Search Approach for Outlier Detection in High-Dimensional Data StreamsAdvancements in Architectural, Engineering, and Construction Research and Practice10.1007/978-3-031-59329-1_3(29-41)Online publication date: 4-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive windowing
  2. bandit algorithms
  3. data stream monitoring
  4. predictive maintenance
  5. thompson sampling

Qualifiers

  • Research-article

Funding Sources

  • German Federal Ministry of Education and Research

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)7
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian OptimizationJournal of Computational and Graphical Statistics10.1080/10618600.2024.230821633:3(855-868)Online publication date: 8-Mar-2024
  • (2024)Adaptive Bernstein change detector for high-dimensional data streamsData Mining and Knowledge Discovery10.1007/s10618-023-00999-538:3(1334-1363)Online publication date: 1-May-2024
  • (2024)A Metaheuristic-Based Subspace Search Approach for Outlier Detection in High-Dimensional Data StreamsAdvancements in Architectural, Engineering, and Construction Research and Practice10.1007/978-3-031-59329-1_3(29-41)Online publication date: 4-Oct-2024
  • (2023)Optimal Routing to Parallel Servers With Unknown Utilities—Multi-Armed Bandit With QueuesIEEE/ACM Transactions on Networking10.1109/TNET.2022.322713631:5(1997-2012)Online publication date: Oct-2023
  • (2022)SLOPT: Bandit Optimization Framework for Mutation-Based FuzzingProceedings of the 38th Annual Computer Security Applications Conference10.1145/3564625.3564659(519-533)Online publication date: 5-Dec-2022
  • (2022)Show Me the Whole WorldProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498459(947-956)Online publication date: 11-Feb-2022
  • (2022)A survey of outlier detection in high dimensional data streamsComputer Science Review10.1016/j.cosrev.2022.10046344(100463)Online publication date: May-2022
  • (2021)Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware AlgorithmEntropy10.3390/e2303038023:3(380)Online publication date: 23-Mar-2021
  • (2021)Non-stationary Stochastic Multi-armed Bandit Problems with External Information on Stationarity時間変化に関する外部情報を考慮した非定常多腕バンディット問題Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.36-3_D-K8436:3(D-K84_1-11)Online publication date: 1-May-2021
  • (2021)Burst-induced Multi-Armed Bandit for Learning RecommendationProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3474250(292-301)Online publication date: 13-Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media