Abstract
The two-armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information.
In the last decades, several computationally efficient algorithms for tackling this problem have emerged, with Learning Automata (LA) being known for their eoptimality, and confidence interval based for logarithmically growing regret. Applications include treatment selection in clinical trials, route selection in adaptive routing, and plan exploration in games like Go. The TABB has also been extensively studied from a Bayesian perspective, however, in general, such analysis leads to computationally inefficient solution policies.
This paper introduces the Bayesian Learning Automaton (BLA). The BLA is inherently Bayesian in nature, yet relies simply on counting rewards/penalties and on random sampling from a pair of twin beta distributions. Extensive experiments demonstrate that, in contrast to most LA, BLA does not rely on external learning speed/accuracy control. It also outperforms recently proposed confidence interval based algorithms. We thus believe that BLA opens up for improved perfonuance in an extensive number of applications, and that it forms the basis for a new avenue of research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)
B. J. Oommen, S.M., Granmo, O.C.: Routing Bandwidth Guaranteed Paths in MPLS Traffic Engineering: A Multiple Race Track Learning Approach. IEEE Transactions on Computers 56(7), 959–976 (2007)
Bhulai, S., Koole, G.: On the Value of Learning for Bernoulli Bandits with Unknown Parameters. IEEE Transactions on Automatic Control 45(11), 2135–2140 (2000)
Blum, A., Even-Dar, E., Ligett, K.: Routing Without Regret: On Convergence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games. In: Proceedings of the Twenty-Fifth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2006), pp. 45–52. ACM (2006)
Gelly, S., Wang, Y.: Exploration exploitation in Go: UCT for Monte-Carlo Go. In: Proceedings of NIPS-2006. NIPS (2006)
Granmo, O.C., Bouhmala, N.: Solving the Satisfiability Problem Using Finite Learning Automata. International Journal of Computer Science and Applications, 4(3), 15–29 (2007)
Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning Automata-based Solutions to the Nonlinear Fractional Knapsack Problem with Applications to Optimal Resource Allocation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 37(1), 166–175 (2007)
Kaelbling, L.P.: Learning in embedded systems. Ph.D. thesis, Stanford University (1993)
Kocsis, L., Szepesvari, C.: Bandit Based Monte-Carlo Planning. In: Proceedings of the 17th European Conference on Machine Learning (ECML 2006), pp. 282–293. Springer (2006)
Lakshmivarahan, S.: Learning Algorithms Theory and Applications. Springer-Verlag (1981)
Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)
Najim, K., Poznyak, A.S.: Learning Automata: Theory and Applications. Pergamon Press, Oxford (1994)
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall (1989)
Obaidat, M.S., Papadimitriou, G.I., Pomportsis, A.S.: Learning automata: Theory, paradigms and applications. IEEE Transactions on Systems Man and Cybernetics SMC-32, 706–709 (2002)
Poznyak, A.S., Najim, K.: Learning Automata and Stochastic Optimization. Springer-Verlag, Berlin (1997)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Thathachar, M.A.L., Sastry, P.S.: Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers (2004)
Tsetlin, M.L.: Automaton Theory and Modeling of Biological Systems. Academic Press (1973)
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Proceedings of the 16th European Conference on Machine Learning (ECML 2005), pp. 437–448. Springer (2005)
Wyatt, J.: Exploration and inference in learning from reinforcement. Ph.D. thesis, University of Edinburgh (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag London Limited
About this paper
Cite this paper
Granmo, OC. (2009). The Bayesian Learning Automaton — Empirical Evaluation with Two-Armed Bernoulli Bandit Problems. In: Bramer, M., Petridis, M., Coenen, F. (eds) Research and Development in Intelligent Systems XXV. SGAI 2008. Springer, London. https://doi.org/10.1007/978-1-84882-171-2_17
Download citation
DOI: https://doi.org/10.1007/978-1-84882-171-2_17
Publisher Name: Springer, London
Print ISBN: 978-1-84882-170-5
Online ISBN: 978-1-84882-171-2
eBook Packages: Computer ScienceComputer Science (R0)