Abstract
In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C. This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc.
In this paper we develop a general duality-based framework to provide the first O(1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3 + ε.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Hegde, M.V., Teneketzis, D.: Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costs. IEEE Transactions on Optimal Control 33, 899–906 (1988)
Asawa, M., Teneketzis, D.: Multi-armed bandits with switching penalties. IEEE Transactions on Automatic Control 41(3), 328–348 (1996)
Banks, J.S., Sundaram, R.K.: Denumerable-armed bandits. Econometrica 60(5), 1071–1096 (1992)
Banks, J.S., Sundaram, R.K.: Switching costs and the gittins index. Econometrica 62(3), 687–694 (1994)
Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Approximation algorithms for deadline-tsp and vehicle routing with time-windows. In: STOC, pp. 166–174 (2004)
Berry, D.A., Fristed, B.: Bandit problems; Sequential Allocation of Experiments. Chapman & Hall, New York (1985)
Bertsekas, D.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific (2001)
Blum, A., Chawla, S., Karger, D.R., Lane, T., Meyerson, A., Minkoff, M.: Approximation algorithms for orienteering and discounted-reward TSP. SIAM J. Comput. 37(2), 653–670 (2007)
Brezzi, M., Lai, T.-L.: Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27(1), 87–108 (2002)
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)
Chekuri, C., Korula, N., Pál, M.: Improved algorithms for orienteering and related problems. In: SODA, pp. 661–670 (2008)
Chu, D., Deshpande, A., Hellerstein, J., Hong, W.: Approximate data collection in sensor networks using probabilistic models. In: ICDE (2006)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)
Flaxman, A., Kalai, A., McMahan, H.B.: Online convex optimization in the bandit setting: Gradient descent without a gradient. In: Annual ACM-SIAM Symp. on Discrete Algorithms (2005)
Flikkema, P.G., Agarwal, P.K., Clark, J.S., Ellis, C.S., Gelfand, A., Munagala, K., Yang, J.: Model-driven dynamic control of embedded wireless sensor networks. In: Proc. 6th Intl. Conf. on Computational Science (3), pp. 409–416 (2006)
Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. In: Progress in statistics (European Meeting of Statisticians) (1972)
Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989)
Goel, A., Guha, S., Munagala, K.: Asking the right questions: Model-driven optimization using probes. In: Proc. of the 2006 ACM Symp. on Principles of Database Systems (2006)
Goel, A., Khanna, S., Null, B.: The ratio index for budgeted learning, with applications. In: Proc. ACM-SIAM Symp. on Discrete Algorithms, SODA (2009)
Guha, S., Munagala, K.: Sequential design of experiments via linear programming. CoRR, arxiv:0805.0766 (2008); preliminary version in STOC 2007
Guha, S., Munagala, K.: Multi-armed bandits with metric switching costs (2009), http://www.cs.duke.edu/~kamesh/metric.pdf
Guha, S., Munagala, K., Shi, P.: Approximation algorithms for restless bandit problems. CoRR, abs/0711.3861 (2007); preliminary version in SODA 2009
Jovanovich, B.: Job-search and the theory of turnover. J. Political Economy 87, 972–990 (1979)
Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. In: Proc. of 16th Conf. on Computational Learning Theory (2003)
Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandit problems in metric spaces. In: Proc. 40th ACM Symposium on Theory of Computing (2008)
Madani, O., Lizotte, D.J., Greiner, R.: Active model selection. In: UAI 2004: Proc. 20th Conf. on Uncertainty in Artificial Intelligence, pp. 357–365 (2004)
Mortensen, D.: Job-search and labor market analysis. In: Ashenfelter, O., Layard, R. (eds.) Handbook of Labor Economics, vol. 2, pp. 849–919. North Holland, Amsterdam (1985)
Robbins, H.: Some aspects of the sequential design of experiments. Bulletin American Mathematical Society 55, 527–535 (1952)
Rothschild, M.: A two-armed bandit theory of market pricing. J. Economic Theory 9, 185–202 (1974)
Schneider, J., Moore, A.: Active learning in discrete input spaces. In: 34th Interface Symp. (2002)
Silberstein, A., Munagala, K., Yang, J.: Energy-efficient extreme value estimation in sensor networks. In: SIGMOD (2006)
Wetherill, G.B., Glazebrook, K.D.: Sequential Methods in Statistics (Monographs on Statistics and Applied Probability). Chapman & Hall, London (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guha, S., Munagala, K. (2009). Multi-armed Bandits with Metric Switching Costs. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02930-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-02930-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02929-5
Online ISBN: 978-3-642-02930-1
eBook Packages: Computer ScienceComputer Science (R0)