Abstract
The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).
Keywords
- Generalize Extreme Value
- Feasible Schedule
- Slot Machine
- Generalize Extreme Value Distribution
- Project Schedule Problem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002a)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002b)
Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1986)
Cicirello, V.A., Smith, S.F.: Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In: Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pp. 197–211 (2004)
Cicirello, V.A., Smith, S.F.: The max k-armed bandit: A new model of exploration applied to search heuristic selection. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1355–1361 (2005)
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)
Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge (1993)
Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 15(3), 1091–1114 (1987)
Möhring, R.H., Schulz, A.S., Stork, F., Uetz, M.: Solving project scheduling problems by minimum cut computations. Management Science 49(3), 330–350 (2003)
Neumann, K., Schwindt, C., Zimmerman, J.: Project Scheduling with Time Windows and Scarce Resources. Springer, Heidelberg (2002)
Robbins, H.: Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society 58, 527–535 (1952)
Schwindt, C.: Generation of resource–constrained project scheduling problems with minimal and maximal time lags. Technical Report WIOR-489, Universität Karlsruhe (1996)
Streeter, M.J., Smith, S.F.: An asymptotically optimal algorithm for the max k-armed bandit problem. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Streeter, M.J., Smith, S.F. (2006). A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem. In: Benhamou, F. (eds) Principles and Practice of Constraint Programming - CP 2006. CP 2006. Lecture Notes in Computer Science, vol 4204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11889205_40
Download citation
DOI: https://doi.org/10.1007/11889205_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46267-5
Online ISBN: 978-3-540-46268-2
eBook Packages: Computer ScienceComputer Science (R0)