Article

Approximation algorithms for budgeted learning problems

Authors:

Kamesh MunagalaAuthors Info & Claims

STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing

Pages 104 - 113

https://doi.org/10.1145/1250790.1250807

Published: 11 June 2007 Publication History

Abstract

We present the first approximation algorithms for a large class of budgeted learning problems. One classicexample of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandithas an unknown reward distribution on which a prior isspecified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase,the arm with the highest (posterior) expected reward is hosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon discounted reward setting, the budgeted version of the problem is NP-Hard. For this problem and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.

References

[1]

{1} Q. An, H. Li, X. Liao, and L. Carin. Active feature acquisition with POMDP models. Submitted to Pattern Recognition Letters, 2006.

[2]

{2} D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 2nd edition, 2001.

Digital Library

[3]

{3} D. Bertsimas, D. Gamarnik, and J. Tsitsiklis. Performance of multiclass markovian queueing networks via piecewise linear Lyapunov functions. Annals of Applied Probability, 11(4):1384-1428, 2002.

[4]

{4} D. Bertsimas and J. Nino-Mora. Conservation laws, extended polymatroids and multi-armed bandit problems: A unified polyhedral approach. Math. of Oper. Res., 21(2):257-306, 1996.

Digital Library

[5]

{5} D. Bertsimas and J. Nino-Mora. Restless bandit, linear programming relaxations and a primal-dual heuristic. Operations Research, 48(1):80-90, 2000.

Digital Library

[6]

{6} D. Bertsimas, I. Paschalidis, and J. N. Tsitsiklis. Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance. Annals of Applied Probability, 4(1):43-75, 1994.

[7]

{7} M. Charikar, C. Chekuri, and M. Pál. Sampling bounds for stochastic optimization. In APPROX-RANDOM, pages 257-269, 2005.

Digital Library

[8]

{8} D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 705-712. The MIT Press, 1995.

[9]

{9} V. Conitzer and T. Sandholm. Definition and complexity of some basic metareasoning problems. In IJCAI, pages 1099-1106, 2003.

Digital Library

[10]

{10} B. Dean. Approximation Algorithms for Stochastic Scheduling Problems. PhD thesis, MIT, 2005.

Digital Library

[11]

{11} B. C. Dean, M. X. Goemans, and J. Vondrak. Approximating the stochastic knapsack problem: The benefit of adaptivity. In FOCS '04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 208-217, 2004.

Digital Library

[12]

{12} B. C. Dean, M. X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In Proc. 16th ACM-SIAM Symp. on Discrete algorithms, pages 395-404, 2005.

Digital Library

[13]

{13} J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.

[14]

{14} A. Goel, S. Guha, and K. Munagala. Asking the right questions: Model-driven optimization using probes. In Proc. ACM Symp. on Principles of Database Systems, 2006.

Digital Library

[15]

{15} A. Goel and P. Indyk. Stochastic load balancing and related problems. In Proc. Symp. on Foundations of Computer Science, 1999.

Digital Library

[16]

{16} S. Guha and K. Munagala. Model driven optimization using adaptive probes. Proc. ACM-SIAM Symp. on Discrete Algorithms, 2007.

Digital Library

[17]

{17} S. Guha and K. Munagala. Approximation Algorithms for Budgeted Learning Problems. Available at http://www.cs.duke.edu/~kamesh/bandits.pdf.

[18]

{18} S. Guha, K. Munagala, and S. Sarkar. Jointly optimal probing and transmission strategies for multi-channel wireless systems. Submitted to IEEE Transactions on Information Theory, 2007. Available at http://www.cs.duke.edu/~kamesh/partialinfo.pdf.

[19]

{19} D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Mach. Learn., 20(3):197-243, 1995.

Digital Library

[20]

{20} M. J. Kearns, Y. Mansour, and A. Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In NIPS, pages 1001-1007, 1999.

Digital Library

[21]

{21} J. Kleinberg, Y. Rabani, and É. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000.

Digital Library

[22]

{22} A. J. Kleywegt, A. Shapiro, and T. Homem de Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. on Optimization, 12(2):479-502, 2002.

Digital Library

[23]

{23} A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. Proc. 21st Conf. on Uncertainty in Artificial Intelligence, 2005.

[24]

{24} S. H. Low and D. E. Lapsley. Optimization flow control-I: Basic algorithm and convergence. IEEE/ACM Trans. Netw., 7(6):861-874, 1999.

Digital Library

[25]

{25} O. Madani, D. J. Lizotte, and R. Greiner. Active model selection. In UAI '04: Proc. 20th Conf. on Uncertainty in Artificial Intelligence, pages 357-365, 2004.

Digital Library

[26]

{26} R. H. Mohring, A. S. Schulz, and M. Uetz. Approximation in stochastic scheduling: the power of LP-based priority policies. J. ACM, 46(6):924-942, 1999.

Digital Library

[27]

{27} A. Moore, J. Schneider, J. Boyan, and M. Lee. Q2: Memory-based active learning for optimizing noisy continuous functions. International Conference on Machine Learning, 1998.

Digital Library

[28]

{28} R. H. Myers and D. C. Montgomery. Response Surface Methodology: Process and Product Optimization Using Designed Experiments (2nd ed.). Wiley, 2002.

Digital Library

[29]

{29} J. Schneider and A. Moore. Active learning in discrete input spaces. The 34th Interface Symposium, Montreal, Quebec, 2002.

[30]

{30} D. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as discrete optimization. In Proc. 45th Symp. on Foundations of Computer Science, pages 228-237, 2004.

Digital Library

[31]

{31} M. Skutella and M. Uetz. Scheduling precedence-constrained jobs with stochastic processing times on parallel machines. In SODA '01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 589-590, 2001.

Digital Library

[32]

{32} J. N. Tsitsiklis. A short proof of the Gittins index theorem. Annals of Applied Probability, 4(1):194-199, 1994.

[33]

{33} P. Whittle. Restless bandits: activity allocation in a changing world. Journal of applied probability, 25(A):287-298, 1988.

Cited By

Chen RWu F(2023)Dynamically Interrupting Deadlocks in Game Learning Using Multisampling Multiarmed BanditsIEEE Transactions on Games10.1109/TG.2022.317759815:3(360-367)Online publication date: Sep-2023
https://doi.org/10.1109/TG.2022.3177598
Lv X(2023)Handwritten Digit Recognition Based on Deep Learning Algorithms2023 International Conference on Internet of Things, Robotics and Distributed Computing (ICIRDC)10.1109/ICIRDC62824.2023.00093(476-481)Online publication date: 29-Dec-2023
https://doi.org/10.1109/ICIRDC62824.2023.00093
Gao XChen SChen G(2022)MAB-Based Reinforced Worker Selection Framework for Budgeted Spatial CrowdsensingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299253134:3(1303-1316)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TKDE.2020.2992531
Show More Cited By

Index Terms

Approximation algorithms for budgeted learning problems
1. Theory of computation
  1. Randomness, geometry and discrete structures

Recommendations

Approximation algorithms for restless bandit problems

The restless bandit problem is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit (MAB) problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to ...
Approximation Algorithms for Stochastic Inventory Control Models

We consider two classical stochastic inventory control models, the periodic-review stochastic inventory control problem and the stochastic lot-sizing problem. The goal is to coordinate a sequence of orders of a single commodity, aiming to supply ...
Approximation Algorithms for the Capacitated Multi-Item Lot-Sizing Problem via Flow-Cover Inequalities

We study the classical capacitated multi-item lot-sizing problem with hard capacities. There are N items, each of which has a specified sequence of demands over a finite planning horizon of T discrete periods; the demands are known in advance but can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing

June 2007

734 pages

ISBN:9781595936318

DOI:10.1145/1250790

General Chair:
David Johnson
AT&T Labs - Research
,
Program Chair:
Uriel Feige
Microsoft Research and Weizmann Institute

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

STOC07

Sponsor:

STOC07: Symposium on Theory of Computing

June 11 - 13, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25

Sponsor:
sigact

57th Annual ACM Symposium on Theory of Computing (STOC 2025)

June 23 - 27, 2025

Prague , Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
857
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen RWu F(2023)Dynamically Interrupting Deadlocks in Game Learning Using Multisampling Multiarmed BanditsIEEE Transactions on Games10.1109/TG.2022.317759815:3(360-367)Online publication date: Sep-2023
https://doi.org/10.1109/TG.2022.3177598
Lv X(2023)Handwritten Digit Recognition Based on Deep Learning Algorithms2023 International Conference on Internet of Things, Robotics and Distributed Computing (ICIRDC)10.1109/ICIRDC62824.2023.00093(476-481)Online publication date: 29-Dec-2023
https://doi.org/10.1109/ICIRDC62824.2023.00093
Gao XChen SChen G(2022)MAB-Based Reinforced Worker Selection Framework for Budgeted Spatial CrowdsensingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299253134:3(1303-1316)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TKDE.2020.2992531
Cohen I(2022)Stochastic Graph Exploration with Limited ResourcesApproximation and Online Algorithms10.1007/978-3-031-18367-6_9(172-189)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-3-031-18367-6_9
Saad EBlanchard GRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Fast rates for prediction with limited expert adviceProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542067(23582-23591)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542067
Mansour YSlivkins ASyrgkanis V(2020)Bayesian Incentive-Compatible Bandit ExplorationOperations Research10.1287/opre.2019.194968:4(1132-1161)Online publication date: 1-Jul-2020
https://dl.acm.org/doi/10.1287/opre.2019.1949
Alijani RBanerjee SGollapudi SMunagala KWang K(2020)Predict and MatchProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33794704:1(1-23)Online publication date: 5-Jun-2020
https://dl.acm.org/doi/10.1145/3379470
Gan CZhou RYang JShen C(2020)Cost-aware Cascading BanditsIEEE Transactions on Signal Processing10.1109/TSP.2020.3001388(1-1)Online publication date: 2020
https://doi.org/10.1109/TSP.2020.3001388
Kleinberg RLeyton-Brown KLucier BGraham DWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Procrastinating with confidenceProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455084(8883-8893)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455084
Agrawal SDevanur N(2019)Bandits with Global Convex Constraints and ObjectiveOperations Research10.1287/opre.2019.184067:5(1486-1502)Online publication date: 7-Aug-2019
https://dl.acm.org/doi/10.1287/opre.2019.1840
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten