Multi-armed Bandits with Metric Switching Costs

Guha, Sudipto; Munagala, Kamesh

doi:10.1007/978-3-642-02930-1_41

Sudipto Guha²¹ &
Kamesh Munagala²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5556))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

1410 Accesses
23 Citations

Abstract

In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C. This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc.

In this paper we develop a general duality-based framework to provide the first O(1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3 + ε.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Hegde, M.V., Teneketzis, D.: Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costs. IEEE Transactions on Optimal Control 33, 899–906 (1988)
Article MathSciNet MATH Google Scholar
Asawa, M., Teneketzis, D.: Multi-armed bandits with switching penalties. IEEE Transactions on Automatic Control 41(3), 328–348 (1996)
Article MathSciNet MATH Google Scholar
Banks, J.S., Sundaram, R.K.: Denumerable-armed bandits. Econometrica 60(5), 1071–1096 (1992)
Article MathSciNet MATH Google Scholar
Banks, J.S., Sundaram, R.K.: Switching costs and the gittins index. Econometrica 62(3), 687–694 (1994)
Article MATH Google Scholar
Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Approximation algorithms for deadline-tsp and vehicle routing with time-windows. In: STOC, pp. 166–174 (2004)
Google Scholar
Berry, D.A., Fristed, B.: Bandit problems; Sequential Allocation of Experiments. Chapman & Hall, New York (1985)
Book Google Scholar
Bertsekas, D.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific (2001)
Google Scholar
Blum, A., Chawla, S., Karger, D.R., Lane, T., Meyerson, A., Minkoff, M.: Approximation algorithms for orienteering and discounted-reward TSP. SIAM J. Comput. 37(2), 653–670 (2007)
Article MathSciNet MATH Google Scholar
Brezzi, M., Lai, T.-L.: Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27(1), 87–108 (2002)
Article MathSciNet MATH Google Scholar
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)
Article MathSciNet MATH Google Scholar
Chekuri, C., Korula, N., Pál, M.: Improved algorithms for orienteering and related problems. In: SODA, pp. 661–670 (2008)
Google Scholar
Chu, D., Deshpande, A., Hellerstein, J., Hong, W.: Approximate data collection in sensor networks using probabilistic models. In: ICDE (2006)
Google Scholar
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)
Google Scholar
Flaxman, A., Kalai, A., McMahan, H.B.: Online convex optimization in the bandit setting: Gradient descent without a gradient. In: Annual ACM-SIAM Symp. on Discrete Algorithms (2005)
Google Scholar
Flikkema, P.G., Agarwal, P.K., Clark, J.S., Ellis, C.S., Gelfand, A., Munagala, K., Yang, J.: Model-driven dynamic control of embedded wireless sensor networks. In: Proc. 6th Intl. Conf. on Computational Science (3), pp. 409–416 (2006)
Google Scholar
Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. In: Progress in statistics (European Meeting of Statisticians) (1972)
Google Scholar
Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989)
MATH Google Scholar
Goel, A., Guha, S., Munagala, K.: Asking the right questions: Model-driven optimization using probes. In: Proc. of the 2006 ACM Symp. on Principles of Database Systems (2006)
Google Scholar
Goel, A., Khanna, S., Null, B.: The ratio index for budgeted learning, with applications. In: Proc. ACM-SIAM Symp. on Discrete Algorithms, SODA (2009)
Google Scholar
Guha, S., Munagala, K.: Sequential design of experiments via linear programming. CoRR, arxiv:0805.0766 (2008); preliminary version in STOC 2007
Google Scholar
Guha, S., Munagala, K.: Multi-armed bandits with metric switching costs (2009), http://www.cs.duke.edu/~kamesh/metric.pdf
Guha, S., Munagala, K., Shi, P.: Approximation algorithms for restless bandit problems. CoRR, abs/0711.3861 (2007); preliminary version in SODA 2009
Google Scholar
Jovanovich, B.: Job-search and the theory of turnover. J. Political Economy 87, 972–990 (1979)
Article Google Scholar
Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. In: Proc. of 16th Conf. on Computational Learning Theory (2003)
Google Scholar
Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandit problems in metric spaces. In: Proc. 40th ACM Symposium on Theory of Computing (2008)
Google Scholar
Madani, O., Lizotte, D.J., Greiner, R.: Active model selection. In: UAI 2004: Proc. 20th Conf. on Uncertainty in Artificial Intelligence, pp. 357–365 (2004)
Google Scholar
Mortensen, D.: Job-search and labor market analysis. In: Ashenfelter, O., Layard, R. (eds.) Handbook of Labor Economics, vol. 2, pp. 849–919. North Holland, Amsterdam (1985)
Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bulletin American Mathematical Society 55, 527–535 (1952)
Article MathSciNet MATH Google Scholar
Rothschild, M.: A two-armed bandit theory of market pricing. J. Economic Theory 9, 185–202 (1974)
Article MathSciNet Google Scholar
Schneider, J., Moore, A.: Active learning in discrete input spaces. In: 34th Interface Symp. (2002)
Google Scholar
Silberstein, A., Munagala, K., Yang, J.: Energy-efficient extreme value estimation in sensor networks. In: SIGMOD (2006)
Google Scholar
Wetherill, G.B., Glazebrook, K.D.: Sequential Methods in Statistics (Monographs on Statistics and Applied Probability). Chapman & Hall, London (1986)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, 19104-6389, USA
Sudipto Guha
Department of Computer Science, Duke University, Durham, NC, 27708-0129, USA
Kamesh Munagala

Authors

Sudipto Guha
View author publications
You can also search for this author in PubMed Google Scholar
Kamesh Munagala
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Freiburg, Georges Köhler Allee 79, 79110, Freiburg, Germany
Susanne Albers
Department of Computer and Systems Sciences, Sapienza University of Rome, Via Ariosto 25, 00184, Roma, Italy
Alberto Marchetti-Spaccamela
Tel Aviv University Google R&D Center, School of Computer Science, Tel Aviv University, 69978, Tel Aviv, Israel
Yossi Matias
University of Patras and CTI, N. Kazantzaki Street 1, 26504, Rion, Patras, Greece
Sotiris Nikoletseas
RWTH Aachen, Lehrstuhl Informatik 7, Ahornstraße 55, 52056, Aachen, Germany
Wolfgang Thomas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guha, S., Munagala, K. (2009). Multi-armed Bandits with Metric Switching Costs. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02930-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-02930-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02929-5
Online ISBN: 978-3-642-02930-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics