A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems

Koulouriotis, D. E.; Xanthopoulos, A.

doi:10.1007/s12351-008-0007-5

A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems

Original Paper
Published: 21 February 2008

Volume 8, pages 105–122, (2008)
Cite this article

Operational Research Aims and scope Submit manuscript

D. E. Koulouriotis¹ &
A. Xanthopoulos¹

152 Accesses
Explore all metrics

Abstract

In this paper we present an evolutionary approach for the finite-horizon, undiscounted multi-armed bandit problem. The evolutionary algorithm that we designed exhibits a number of novel features dictated by its intended application to the bandit problem. We also present five efficient ad hoc techniques for solving the multi-armed bandit problem that exist in the literature. In order to gain insight on the presented algorithms’ behaviour and compare their performance, we carried out a series of simulation experiments. We present the numerical results and then discuss the way in which performance is affected by parameter selection. The paper is concluded with a number of empirical suggestions on how to select a suitable evolutionary algorithm parameter set for a particular bandit task along with some directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Next Generation of Optimization: A Unified Framework for Dynamic Resource Allocation Problems

The ecological rationality of decision criteria

Article Open access 01 August 2020

Current Trends in the Population-Based Optimization

References

Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a Rigged Casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp 322–331
Azoulay-Schwartz R, Kraus S, Wilkenfeld J (2004) Exploitation vs. exploration: choosing a supplier in an environment of incomplete information. Decis Support Syst 38:1–18
Article Google Scholar
Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62:687–694
Article Google Scholar
Berry DA, Fristedt B (1985) Bandit problems: sequential allocation of experiments. Chapman and Hall, London
Google Scholar
Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J Econ Dyn Control 27:87–107
Article Google Scholar
Chulkov DV, Desai MS (2005) Information technology project failures: applying the bandit problem to evaluate managerial decision making. Inf Manag Comp Secur 13(2): 135–143
Article Google Scholar
Fogel DB, Beyer HG (2000) Do evolutionary processes minimize expected losses? J Theor Biol 207(1):117–123
Article Google Scholar
Gittins JC (1989) Multi-armed bandit allocation indices. Wiley, New York
Google Scholar
Jun T (2004) A survey on the bandit problem with switching costs. Economist 152:513–541
Article Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237–285
Google Scholar
Kaspi H, Mandelbaum A (1998) Multi-armed bandits in discrete and continuous time. Ann Appl Probab 8(4):1270–1290
Article Google Scholar
Leloup B, Deveaux L (2001) Dynamic pricing on the internet: theory and simulations. Electron Commer Res 1:265–276
Article Google Scholar
McCall BP, McCall JJ (1981) Systematic search, related information and the Gittins’ index. Econ Lett 8:327–333
Article Google Scholar
Robinson DR (1982) Algorithms for evaluating the dynamic allocation index. Oper Res Lett 1(2):72–74
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Thierens D (2005) An adaptive pursuit strategy for allocating operator probabilities. In Proceedings of the genetic and evolutionary computing conference (GECCO 2005), pp 1539–1546
Valsecchi I (2003) Job assignment and bandit problems. Int J Manpow 24(7):844–866
Article Google Scholar
Varaiya PP Walrand JC, Buyukkoc C (1985) Extensions of the multi armed bandit problem: The discounted case. IEEE Trans Autom Control 30(5):426–439
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Production and Management Engineering, School of Engineering, Democritus University of Thrace, Xanthi, Greece
D. E. Koulouriotis & A. Xanthopoulos

Authors

D. E. Koulouriotis
View author publications
You can also search for this author inPubMed Google Scholar
A. Xanthopoulos
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to D. E. Koulouriotis.

Appendix

,

,

,

,

,

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koulouriotis, D.E., Xanthopoulos, A. A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems. Oper Res Int J 8, 105–122 (2008). https://doi.org/10.1007/s12351-008-0007-5

Download citation

Received: 26 January 2007
Revised: 21 June 2007
Accepted: 21 June 2007
Published: 21 February 2008
Issue Date: August 2008
DOI: https://doi.org/10.1007/s12351-008-0007-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Next Generation of Optimization: A Unified Framework for Dynamic Resource Allocation Problems

The ecological rationality of decision criteria

Current Trends in the Population-Based Optimization

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now