Skip to main content

Multi-armed Bandits with Metric Switching Costs

  • Conference paper
Automata, Languages and Programming (ICALP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5556))

Included in the following conference series:

Abstract

In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C. This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc.

In this paper we develop a general duality-based framework to provide the first O(1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3 + ε.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Hegde, M.V., Teneketzis, D.: Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costs. IEEE Transactions on Optimal Control 33, 899–906 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  2. Asawa, M., Teneketzis, D.: Multi-armed bandits with switching penalties. IEEE Transactions on Automatic Control 41(3), 328–348 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Banks, J.S., Sundaram, R.K.: Denumerable-armed bandits. Econometrica 60(5), 1071–1096 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  4. Banks, J.S., Sundaram, R.K.: Switching costs and the gittins index. Econometrica 62(3), 687–694 (1994)

    Article  MATH  Google Scholar 

  5. Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Approximation algorithms for deadline-tsp and vehicle routing with time-windows. In: STOC, pp. 166–174 (2004)

    Google Scholar 

  6. Berry, D.A., Fristed, B.: Bandit problems; Sequential Allocation of Experiments. Chapman & Hall, New York (1985)

    Book  Google Scholar 

  7. Bertsekas, D.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific (2001)

    Google Scholar 

  8. Blum, A., Chawla, S., Karger, D.R., Lane, T., Meyerson, A., Minkoff, M.: Approximation algorithms for orienteering and discounted-reward TSP. SIAM J. Comput. 37(2), 653–670 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  9. Brezzi, M., Lai, T.-L.: Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27(1), 87–108 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chekuri, C., Korula, N., Pál, M.: Improved algorithms for orienteering and related problems. In: SODA, pp. 661–670 (2008)

    Google Scholar 

  12. Chu, D., Deshpande, A., Hellerstein, J., Hong, W.: Approximate data collection in sensor networks using probabilistic models. In: ICDE (2006)

    Google Scholar 

  13. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)

    Google Scholar 

  14. Flaxman, A., Kalai, A., McMahan, H.B.: Online convex optimization in the bandit setting: Gradient descent without a gradient. In: Annual ACM-SIAM Symp. on Discrete Algorithms (2005)

    Google Scholar 

  15. Flikkema, P.G., Agarwal, P.K., Clark, J.S., Ellis, C.S., Gelfand, A., Munagala, K., Yang, J.: Model-driven dynamic control of embedded wireless sensor networks. In: Proc. 6th Intl. Conf. on Computational Science (3), pp. 409–416 (2006)

    Google Scholar 

  16. Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. In: Progress in statistics (European Meeting of Statisticians) (1972)

    Google Scholar 

  17. Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989)

    MATH  Google Scholar 

  18. Goel, A., Guha, S., Munagala, K.: Asking the right questions: Model-driven optimization using probes. In: Proc. of the 2006 ACM Symp. on Principles of Database Systems (2006)

    Google Scholar 

  19. Goel, A., Khanna, S., Null, B.: The ratio index for budgeted learning, with applications. In: Proc. ACM-SIAM Symp. on Discrete Algorithms, SODA (2009)

    Google Scholar 

  20. Guha, S., Munagala, K.: Sequential design of experiments via linear programming. CoRR, arxiv:0805.0766 (2008); preliminary version in STOC 2007

    Google Scholar 

  21. Guha, S., Munagala, K.: Multi-armed bandits with metric switching costs (2009), http://www.cs.duke.edu/~kamesh/metric.pdf

  22. Guha, S., Munagala, K., Shi, P.: Approximation algorithms for restless bandit problems. CoRR, abs/0711.3861 (2007); preliminary version in SODA 2009

    Google Scholar 

  23. Jovanovich, B.: Job-search and the theory of turnover. J. Political Economy 87, 972–990 (1979)

    Article  Google Scholar 

  24. Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. In: Proc. of 16th Conf. on Computational Learning Theory (2003)

    Google Scholar 

  25. Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandit problems in metric spaces. In: Proc. 40th ACM Symposium on Theory of Computing (2008)

    Google Scholar 

  26. Madani, O., Lizotte, D.J., Greiner, R.: Active model selection. In: UAI 2004: Proc. 20th Conf. on Uncertainty in Artificial Intelligence, pp. 357–365 (2004)

    Google Scholar 

  27. Mortensen, D.: Job-search and labor market analysis. In: Ashenfelter, O., Layard, R. (eds.) Handbook of Labor Economics, vol. 2, pp. 849–919. North Holland, Amsterdam (1985)

    Google Scholar 

  28. Robbins, H.: Some aspects of the sequential design of experiments. Bulletin American Mathematical Society 55, 527–535 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  29. Rothschild, M.: A two-armed bandit theory of market pricing. J. Economic Theory 9, 185–202 (1974)

    Article  MathSciNet  Google Scholar 

  30. Schneider, J., Moore, A.: Active learning in discrete input spaces. In: 34th Interface Symp. (2002)

    Google Scholar 

  31. Silberstein, A., Munagala, K., Yang, J.: Energy-efficient extreme value estimation in sensor networks. In: SIGMOD (2006)

    Google Scholar 

  32. Wetherill, G.B., Glazebrook, K.D.: Sequential Methods in Statistics (Monographs on Statistics and Applied Probability). Chapman & Hall, London (1986)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guha, S., Munagala, K. (2009). Multi-armed Bandits with Metric Switching Costs. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02930-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02930-1_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02929-5

  • Online ISBN: 978-3-642-02930-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics