Abstract
In this paper we present an evolutionary approach for the finite-horizon, undiscounted multi-armed bandit problem. The evolutionary algorithm that we designed exhibits a number of novel features dictated by its intended application to the bandit problem. We also present five efficient ad hoc techniques for solving the multi-armed bandit problem that exist in the literature. In order to gain insight on the presented algorithms’ behaviour and compare their performance, we carried out a series of simulation experiments. We present the numerical results and then discuss the way in which performance is affected by parameter selection. The paper is concluded with a number of empirical suggestions on how to select a suitable evolutionary algorithm parameter set for a particular bandit task along with some directions for future research.


Similar content being viewed by others
References
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a Rigged Casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp 322–331
Azoulay-Schwartz R, Kraus S, Wilkenfeld J (2004) Exploitation vs. exploration: choosing a supplier in an environment of incomplete information. Decis Support Syst 38:1–18
Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62:687–694
Berry DA, Fristedt B (1985) Bandit problems: sequential allocation of experiments. Chapman and Hall, London
Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J Econ Dyn Control 27:87–107
Chulkov DV, Desai MS (2005) Information technology project failures: applying the bandit problem to evaluate managerial decision making. Inf Manag Comp Secur 13(2): 135–143
Fogel DB, Beyer HG (2000) Do evolutionary processes minimize expected losses? J Theor Biol 207(1):117–123
Gittins JC (1989) Multi-armed bandit allocation indices. Wiley, New York
Jun T (2004) A survey on the bandit problem with switching costs. Economist 152:513–541
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237–285
Kaspi H, Mandelbaum A (1998) Multi-armed bandits in discrete and continuous time. Ann Appl Probab 8(4):1270–1290
Leloup B, Deveaux L (2001) Dynamic pricing on the internet: theory and simulations. Electron Commer Res 1:265–276
McCall BP, McCall JJ (1981) Systematic search, related information and the Gittins’ index. Econ Lett 8:327–333
Robinson DR (1982) Algorithms for evaluating the dynamic allocation index. Oper Res Lett 1(2):72–74
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Thierens D (2005) An adaptive pursuit strategy for allocating operator probabilities. In Proceedings of the genetic and evolutionary computing conference (GECCO 2005), pp 1539–1546
Valsecchi I (2003) Job assignment and bandit problems. Int J Manpow 24(7):844–866
Varaiya PP Walrand JC, Buyukkoc C (1985) Extensions of the multi armed bandit problem: The discounted case. IEEE Trans Autom Control 30(5):426–439
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
,
,
,
,
,
Rights and permissions
About this article
Cite this article
Koulouriotis, D.E., Xanthopoulos, A. A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems. Oper Res Int J 8, 105–122 (2008). https://doi.org/10.1007/s12351-008-0007-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12351-008-0007-5