Skip to main content

The Bayesian Learning Automaton — Empirical Evaluation with Two-Armed Bernoulli Bandit Problems

  • Conference paper
Research and Development in Intelligent Systems XXV (SGAI 2008)

Abstract

The two-armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information.

In the last decades, several computationally efficient algorithms for tackling this problem have emerged, with Learning Automata (LA) being known for their eoptimality, and confidence interval based for logarithmically growing regret. Applications include treatment selection in clinical trials, route selection in adaptive routing, and plan exploration in games like Go. The TABB has also been extensively studied from a Bayesian perspective, however, in general, such analysis leads to computationally inefficient solution policies.

This paper introduces the Bayesian Learning Automaton (BLA). The BLA is inherently Bayesian in nature, yet relies simply on counting rewards/penalties and on random sampling from a pair of twin beta distributions. Extensive experiments demonstrate that, in contrast to most LA, BLA does not rely on external learning speed/accuracy control. It also outperforms recently proposed confidence interval based algorithms. We thus believe that BLA opens up for improved perfonuance in an extensive number of applications, and that it forms the basis for a new avenue of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)

    Article  MATH  Google Scholar 

  2. B. J. Oommen, S.M., Granmo, O.C.: Routing Bandwidth Guaranteed Paths in MPLS Traffic Engineering: A Multiple Race Track Learning Approach. IEEE Transactions on Computers 56(7), 959–976 (2007)

    Article  MathSciNet  Google Scholar 

  3. Bhulai, S., Koole, G.: On the Value of Learning for Bernoulli Bandits with Unknown Parameters. IEEE Transactions on Automatic Control 45(11), 2135–2140 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  4. Blum, A., Even-Dar, E., Ligett, K.: Routing Without Regret: On Convergence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games. In: Proceedings of the Twenty-Fifth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2006), pp. 45–52. ACM (2006)

    Google Scholar 

  5. Gelly, S., Wang, Y.: Exploration exploitation in Go: UCT for Monte-Carlo Go. In: Proceedings of NIPS-2006. NIPS (2006)

    Google Scholar 

  6. Granmo, O.C., Bouhmala, N.: Solving the Satisfiability Problem Using Finite Learning Automata. International Journal of Computer Science and Applications, 4(3), 15–29 (2007)

    Google Scholar 

  7. Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning Automata-based Solutions to the Nonlinear Fractional Knapsack Problem with Applications to Optimal Resource Allocation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 37(1), 166–175 (2007)

    Article  Google Scholar 

  8. Kaelbling, L.P.: Learning in embedded systems. Ph.D. thesis, Stanford University (1993)

    Google Scholar 

  9. Kocsis, L., Szepesvari, C.: Bandit Based Monte-Carlo Planning. In: Proceedings of the 17th European Conference on Machine Learning (ECML 2006), pp. 282–293. Springer (2006)

    Google Scholar 

  10. Lakshmivarahan, S.: Learning Algorithms Theory and Applications. Springer-Verlag (1981)

    Google Scholar 

  11. Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)

    Google Scholar 

  12. Najim, K., Poznyak, A.S.: Learning Automata: Theory and Applications. Pergamon Press, Oxford (1994)

    Google Scholar 

  13. Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall (1989)

    Google Scholar 

  14. Obaidat, M.S., Papadimitriou, G.I., Pomportsis, A.S.: Learning automata: Theory, paradigms and applications. IEEE Transactions on Systems Man and Cybernetics SMC-32, 706–709 (2002)

    Article  Google Scholar 

  15. Poznyak, A.S., Najim, K.: Learning Automata and Stochastic Optimization. Springer-Verlag, Berlin (1997)

    MATH  Google Scholar 

  16. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

    Google Scholar 

  17. Thathachar, M.A.L., Sastry, P.S.: Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers (2004)

    Google Scholar 

  18. Tsetlin, M.L.: Automaton Theory and Modeling of Biological Systems. Academic Press (1973)

    Google Scholar 

  19. Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Proceedings of the 16th European Conference on Machine Learning (ECML 2005), pp. 437–448. Springer (2005)

    Google Scholar 

  20. Wyatt, J.: Exploration and inference in learning from reinforcement. Ph.D. thesis, University of Edinburgh (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this paper

Cite this paper

Granmo, OC. (2009). The Bayesian Learning Automaton — Empirical Evaluation with Two-Armed Bernoulli Bandit Problems. In: Bramer, M., Petridis, M., Coenen, F. (eds) Research and Development in Intelligent Systems XXV. SGAI 2008. Springer, London. https://doi.org/10.1007/978-1-84882-171-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-171-2_17

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-170-5

  • Online ISBN: 978-1-84882-171-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics