skip to main content
10.1145/1250790.1250807acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Approximation algorithms for budgeted learning problems

Published: 11 June 2007 Publication History

Abstract

We present the first approximation algorithms for a large class of budgeted learning problems. One classicexample of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandithas an unknown reward distribution on which a prior isspecified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase,the arm with the highest (posterior) expected reward is hosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon discounted reward setting, the budgeted version of the problem is NP-Hard. For this problem and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.

References

[1]
{1} Q. An, H. Li, X. Liao, and L. Carin. Active feature acquisition with POMDP models. Submitted to Pattern Recognition Letters, 2006.
[2]
{2} D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 2nd edition, 2001.
[3]
{3} D. Bertsimas, D. Gamarnik, and J. Tsitsiklis. Performance of multiclass markovian queueing networks via piecewise linear Lyapunov functions. Annals of Applied Probability, 11(4):1384-1428, 2002.
[4]
{4} D. Bertsimas and J. Nino-Mora. Conservation laws, extended polymatroids and multi-armed bandit problems: A unified polyhedral approach. Math. of Oper. Res., 21(2):257-306, 1996.
[5]
{5} D. Bertsimas and J. Nino-Mora. Restless bandit, linear programming relaxations and a primal-dual heuristic. Operations Research, 48(1):80-90, 2000.
[6]
{6} D. Bertsimas, I. Paschalidis, and J. N. Tsitsiklis. Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance. Annals of Applied Probability, 4(1):43-75, 1994.
[7]
{7} M. Charikar, C. Chekuri, and M. Pál. Sampling bounds for stochastic optimization. In APPROX-RANDOM, pages 257-269, 2005.
[8]
{8} D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 705-712. The MIT Press, 1995.
[9]
{9} V. Conitzer and T. Sandholm. Definition and complexity of some basic metareasoning problems. In IJCAI, pages 1099-1106, 2003.
[10]
{10} B. Dean. Approximation Algorithms for Stochastic Scheduling Problems. PhD thesis, MIT, 2005.
[11]
{11} B. C. Dean, M. X. Goemans, and J. Vondrak. Approximating the stochastic knapsack problem: The benefit of adaptivity. In FOCS '04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 208-217, 2004.
[12]
{12} B. C. Dean, M. X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In Proc. 16th ACM-SIAM Symp. on Discrete algorithms, pages 395-404, 2005.
[13]
{13} J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.
[14]
{14} A. Goel, S. Guha, and K. Munagala. Asking the right questions: Model-driven optimization using probes. In Proc. ACM Symp. on Principles of Database Systems, 2006.
[15]
{15} A. Goel and P. Indyk. Stochastic load balancing and related problems. In Proc. Symp. on Foundations of Computer Science, 1999.
[16]
{16} S. Guha and K. Munagala. Model driven optimization using adaptive probes. Proc. ACM-SIAM Symp. on Discrete Algorithms, 2007.
[17]
{17} S. Guha and K. Munagala. Approximation Algorithms for Budgeted Learning Problems. Available at http://www.cs.duke.edu/~kamesh/bandits.pdf.
[18]
{18} S. Guha, K. Munagala, and S. Sarkar. Jointly optimal probing and transmission strategies for multi-channel wireless systems. Submitted to IEEE Transactions on Information Theory, 2007. Available at http://www.cs.duke.edu/~kamesh/partialinfo.pdf.
[19]
{19} D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Mach. Learn., 20(3):197-243, 1995.
[20]
{20} M. J. Kearns, Y. Mansour, and A. Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In NIPS, pages 1001-1007, 1999.
[21]
{21} J. Kleinberg, Y. Rabani, and É. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000.
[22]
{22} A. J. Kleywegt, A. Shapiro, and T. Homem de Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. on Optimization, 12(2):479-502, 2002.
[23]
{23} A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. Proc. 21st Conf. on Uncertainty in Artificial Intelligence, 2005.
[24]
{24} S. H. Low and D. E. Lapsley. Optimization flow control-I: Basic algorithm and convergence. IEEE/ACM Trans. Netw., 7(6):861-874, 1999.
[25]
{25} O. Madani, D. J. Lizotte, and R. Greiner. Active model selection. In UAI '04: Proc. 20th Conf. on Uncertainty in Artificial Intelligence, pages 357-365, 2004.
[26]
{26} R. H. Mohring, A. S. Schulz, and M. Uetz. Approximation in stochastic scheduling: the power of LP-based priority policies. J. ACM, 46(6):924-942, 1999.
[27]
{27} A. Moore, J. Schneider, J. Boyan, and M. Lee. Q2: Memory-based active learning for optimizing noisy continuous functions. International Conference on Machine Learning, 1998.
[28]
{28} R. H. Myers and D. C. Montgomery. Response Surface Methodology: Process and Product Optimization Using Designed Experiments (2nd ed.). Wiley, 2002.
[29]
{29} J. Schneider and A. Moore. Active learning in discrete input spaces. The 34th Interface Symposium, Montreal, Quebec, 2002.
[30]
{30} D. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as discrete optimization. In Proc. 45th Symp. on Foundations of Computer Science, pages 228-237, 2004.
[31]
{31} M. Skutella and M. Uetz. Scheduling precedence-constrained jobs with stochastic processing times on parallel machines. In SODA '01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 589-590, 2001.
[32]
{32} J. N. Tsitsiklis. A short proof of the Gittins index theorem. Annals of Applied Probability, 4(1):194-199, 1994.
[33]
{33} P. Whittle. Restless bandits: activity allocation in a changing world. Journal of applied probability, 25(A):287-298, 1988.

Cited By

View all
  • (2023)Dynamically Interrupting Deadlocks in Game Learning Using Multisampling Multiarmed BanditsIEEE Transactions on Games10.1109/TG.2022.317759815:3(360-367)Online publication date: Sep-2023
  • (2023)Handwritten Digit Recognition Based on Deep Learning Algorithms2023 International Conference on Internet of Things, Robotics and Distributed Computing (ICIRDC)10.1109/ICIRDC62824.2023.00093(476-481)Online publication date: 29-Dec-2023
  • (2022)MAB-Based Reinforced Worker Selection Framework for Budgeted Spatial CrowdsensingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299253134:3(1303-1316)Online publication date: 1-Mar-2022
  • Show More Cited By

Index Terms

  1. Approximation algorithms for budgeted learning problems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
    June 2007
    734 pages
    ISBN:9781595936318
    DOI:10.1145/1250790
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithms
    2. approximation
    3. learning

    Qualifiers

    • Article

    Conference

    STOC07
    Sponsor:
    STOC07: Symposium on Theory of Computing
    June 11 - 13, 2007
    California, San Diego, USA

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Dynamically Interrupting Deadlocks in Game Learning Using Multisampling Multiarmed BanditsIEEE Transactions on Games10.1109/TG.2022.317759815:3(360-367)Online publication date: Sep-2023
    • (2023)Handwritten Digit Recognition Based on Deep Learning Algorithms2023 International Conference on Internet of Things, Robotics and Distributed Computing (ICIRDC)10.1109/ICIRDC62824.2023.00093(476-481)Online publication date: 29-Dec-2023
    • (2022)MAB-Based Reinforced Worker Selection Framework for Budgeted Spatial CrowdsensingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299253134:3(1303-1316)Online publication date: 1-Mar-2022
    • (2022)Stochastic Graph Exploration with Limited ResourcesApproximation and Online Algorithms10.1007/978-3-031-18367-6_9(172-189)Online publication date: 21-Oct-2022
    • (2021)Fast rates for prediction with limited expert adviceProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542067(23582-23591)Online publication date: 6-Dec-2021
    • (2020)Bayesian Incentive-Compatible Bandit ExplorationOperations Research10.1287/opre.2019.194968:4(1132-1161)Online publication date: 1-Jul-2020
    • (2020)Predict and MatchProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33794704:1(1-23)Online publication date: 5-Jun-2020
    • (2020)Cost-aware Cascading BanditsIEEE Transactions on Signal Processing10.1109/TSP.2020.3001388(1-1)Online publication date: 2020
    • (2019)Procrastinating with confidenceProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455084(8883-8893)Online publication date: 8-Dec-2019
    • (2019)Bandits with Global Convex Constraints and ObjectiveOperations Research10.1287/opre.2019.184067:5(1486-1502)Online publication date: 7-Aug-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media