The Bayesian Learning Automaton — Empirical Evaluation with Two-Armed Bernoulli Bandit Problems

Granmo, Ole-Christoffer

doi:10.1007/978-1-84882-171-2_17

Ole-Christoffer Granmo⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

472 Accesses
2 Citations

Abstract

The two-armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information.

In the last decades, several computationally efficient algorithms for tackling this problem have emerged, with Learning Automata (LA) being known for their eoptimality, and confidence interval based for logarithmically growing regret. Applications include treatment selection in clinical trials, route selection in adaptive routing, and plan exploration in games like Go. The TABB has also been extensively studied from a Bayesian perspective, however, in general, such analysis leads to computationally inefficient solution policies.

This paper introduces the Bayesian Learning Automaton (BLA). The BLA is inherently Bayesian in nature, yet relies simply on counting rewards/penalties and on random sampling from a pair of twin beta distributions. Extensive experiments demonstrate that, in contrast to most LA, BLA does not rely on external learning speed/accuracy control. It also outperforms recently proposed confidence interval based algorithms. We thus believe that BLA opens up for improved perfonuance in an extensive number of applications, and that it forms the basis for a new avenue of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)
Article MATH Google Scholar
B. J. Oommen, S.M., Granmo, O.C.: Routing Bandwidth Guaranteed Paths in MPLS Traffic Engineering: A Multiple Race Track Learning Approach. IEEE Transactions on Computers 56(7), 959–976 (2007)
Article MathSciNet Google Scholar
Bhulai, S., Koole, G.: On the Value of Learning for Bernoulli Bandits with Unknown Parameters. IEEE Transactions on Automatic Control 45(11), 2135–2140 (2000)
Article MATH MathSciNet Google Scholar
Blum, A., Even-Dar, E., Ligett, K.: Routing Without Regret: On Convergence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games. In: Proceedings of the Twenty-Fifth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2006), pp. 45–52. ACM (2006)
Google Scholar
Gelly, S., Wang, Y.: Exploration exploitation in Go: UCT for Monte-Carlo Go. In: Proceedings of NIPS-2006. NIPS (2006)
Google Scholar
Granmo, O.C., Bouhmala, N.: Solving the Satisfiability Problem Using Finite Learning Automata. International Journal of Computer Science and Applications, 4(3), 15–29 (2007)
Google Scholar
Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning Automata-based Solutions to the Nonlinear Fractional Knapsack Problem with Applications to Optimal Resource Allocation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 37(1), 166–175 (2007)
Article Google Scholar
Kaelbling, L.P.: Learning in embedded systems. Ph.D. thesis, Stanford University (1993)
Google Scholar
Kocsis, L., Szepesvari, C.: Bandit Based Monte-Carlo Planning. In: Proceedings of the 17th European Conference on Machine Learning (ECML 2006), pp. 282–293. Springer (2006)
Google Scholar
Lakshmivarahan, S.: Learning Algorithms Theory and Applications. Springer-Verlag (1981)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)
Google Scholar
Najim, K., Poznyak, A.S.: Learning Automata: Theory and Applications. Pergamon Press, Oxford (1994)
Google Scholar
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall (1989)
Google Scholar
Obaidat, M.S., Papadimitriou, G.I., Pomportsis, A.S.: Learning automata: Theory, paradigms and applications. IEEE Transactions on Systems Man and Cybernetics SMC-32, 706–709 (2002)
Article Google Scholar
Poznyak, A.S., Najim, K.: Learning Automata and Stochastic Optimization. Springer-Verlag, Berlin (1997)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Thathachar, M.A.L., Sastry, P.S.: Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers (2004)
Google Scholar
Tsetlin, M.L.: Automaton Theory and Modeling of Biological Systems. Academic Press (1973)
Google Scholar
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Proceedings of the 16th European Conference on Machine Learning (ECML 2005), pp. 437–448. Springer (2005)
Google Scholar
Wyatt, J.: Exploration and inference in learning from reinforcement. Ph.D. thesis, University of Edinburgh (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Ole-Christoffer Granmo (Associate Professor)

Authors

Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, CITP, FBCS, FIET, FRSA, FHEA
University of Greenwich, UK
Miltos Petridis DipEng, MBA, PhD, MBCS, AMBA
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen BSc, PhD

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Granmo, OC. (2009). The Bayesian Learning Automaton — Empirical Evaluation with Two-Armed Bernoulli Bandit Problems. In: Bramer, M., Petridis, M., Coenen, F. (eds) Research and Development in Intelligent Systems XXV. SGAI 2008. Springer, London. https://doi.org/10.1007/978-1-84882-171-2_17

Download citation

DOI: https://doi.org/10.1007/978-1-84882-171-2_17
Publisher Name: Springer, London
Print ISBN: 978-1-84882-170-5
Online ISBN: 978-1-84882-171-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics