Skip to main content
Log in

A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems

  • Original Paper
  • Published:
Operational Research Aims and scope Submit manuscript

Abstract

In this paper we present an evolutionary approach for the finite-horizon, undiscounted multi-armed bandit problem. The evolutionary algorithm that we designed exhibits a number of novel features dictated by its intended application to the bandit problem. We also present five efficient ad hoc techniques for solving the multi-armed bandit problem that exist in the literature. In order to gain insight on the presented algorithms’ behaviour and compare their performance, we carried out a series of simulation experiments. We present the numerical results and then discuss the way in which performance is affected by parameter selection. The paper is concluded with a number of empirical suggestions on how to select a suitable evolutionary algorithm parameter set for a particular bandit task along with some directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a Rigged Casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp 322–331

  • Azoulay-Schwartz R, Kraus S, Wilkenfeld J (2004) Exploitation vs. exploration: choosing a supplier in an environment of incomplete information. Decis Support Syst 38:1–18

    Article  Google Scholar 

  • Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62:687–694

    Article  Google Scholar 

  • Berry DA, Fristedt B (1985) Bandit problems: sequential allocation of experiments. Chapman and Hall, London

    Google Scholar 

  • Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J Econ Dyn Control 27:87–107

    Article  Google Scholar 

  • Chulkov DV, Desai MS (2005) Information technology project failures: applying the bandit problem to evaluate managerial decision making. Inf Manag Comp Secur 13(2): 135–143

    Article  Google Scholar 

  • Fogel DB, Beyer HG (2000) Do evolutionary processes minimize expected losses? J Theor Biol 207(1):117–123

    Article  Google Scholar 

  • Gittins JC (1989) Multi-armed bandit allocation indices. Wiley, New York

    Google Scholar 

  • Jun T (2004) A survey on the bandit problem with switching costs. Economist 152:513–541

    Article  Google Scholar 

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237–285

    Google Scholar 

  • Kaspi H, Mandelbaum A (1998) Multi-armed bandits in discrete and continuous time. Ann Appl Probab 8(4):1270–1290

    Article  Google Scholar 

  • Leloup B, Deveaux L (2001) Dynamic pricing on the internet: theory and simulations. Electron Commer Res 1:265–276

    Article  Google Scholar 

  • McCall BP, McCall JJ (1981) Systematic search, related information and the Gittins’ index. Econ Lett 8:327–333

    Article  Google Scholar 

  • Robinson DR (1982) Algorithms for evaluating the dynamic allocation index. Oper Res Lett 1(2):72–74

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • Thierens D (2005) An adaptive pursuit strategy for allocating operator probabilities. In Proceedings of the genetic and evolutionary computing conference (GECCO 2005), pp 1539–1546

  • Valsecchi I (2003) Job assignment and bandit problems. Int J Manpow 24(7):844–866

    Article  Google Scholar 

  • Varaiya PP Walrand JC, Buyukkoc C (1985) Extensions of the multi armed bandit problem: The discounted case. IEEE Trans Autom Control 30(5):426–439

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. E. Koulouriotis.

Appendix

Appendix

Fig. 3
figure 3

Greedy selection—Average reward and % average optimality charts

,

Fig. 4
figure 4

E-greedy selection—Average reward and % average optimality charts

,

Fig. 5
figure 5

Softmax selection—% average optimality and average reward charts

,

Fig. 6
figure 6

Reinforcement comparison—% average optimality and average reward charts

,

Fig. 7
figure 7

Pursuit method—% average optimality and average reward charts

,

Fig. 8
figure 8

a Evolutionary algorithms—% average optimality and average reward charts (parameter set numbers correspond to those in Table 4). b Evolutionary algorithms—% average optimality and average reward charts (parameter set numbers correspond to those in Table 4)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koulouriotis, D.E., Xanthopoulos, A. A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems. Oper Res Int J 8, 105–122 (2008). https://doi.org/10.1007/s12351-008-0007-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12351-008-0007-5

Keywords

Navigation