Tug-of-War Model for Multi-armed Bandit Problem

Kim, Song-Ju; Aono, Masashi; Hara, Masahiko

doi:10.1007/978-3-642-13523-1_10

Song-Ju Kim²¹,
Masashi Aono²¹ &
Masahiko Hara²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6079))

Included in the following conference series:

International Conference on Unconventional Computation

858 Accesses

Abstract

We propose a model – the “tug-of-war (TOW) model” – to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a “nonlocal correlation” among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this “exploration–exploitation dilemma” and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Infomax Strategies for an Optimal Balance Between Exploration and Exploitation

Article 22 April 2016

A Physarum-inspired approach to the Euclidean Steiner tree problem

Article Open access 25 August 2022

Amoeba-Inspired Heuristic Search Dynamics for Exploring Chemical Reaction Paths

Article Open access 01 July 2015

References

Nakagaki, T., Yamada, H., Toth, A.: Maze-solving by an amoeboid organism. Nature 407, 470 (2000)
Article Google Scholar
Tero, A., Kobayashi, R., Nakagaki, T.: Physarum solver: A biologically inspired method of road-network navigation. Physica A 363, 115–119 (2006)
Article Google Scholar
Nakagaki, T., Iima, M., Ueda, T., Nishiura, Y., Saigusa, T., Tero, A., Kobayashi, R., Showalter, K.: Minimum-risk path finding by an adaptive amoebal network. Phys. Rev. Lett. 99, 068104 (2007)
Article Google Scholar
Saigusa, T., Tero, A., Nakagaki, T., Kuramoto, Y.: Amoebae anticipate periodic events. Phys. Rev. Lett. 100, 018101 (2008)
Article Google Scholar
Aono, M., Hara, M., Aihara, K.: Amoeba-based neurocomputing with chaotic dynamics. Communications of the ACM 50(9), 69–72 (2007)
Article Google Scholar
Aono, M., Hara, M.: Spontaneous deadlock breaking on amoeba-based neurocomputer. BioSystems 91, 83–93 (2008)
Article Google Scholar
Aono, M., Hirata, Y., Hara, M., Aihara, K.: Amoeba-based chaotic neurocomputing: Combinatorial optimization by coupled biological oscillators. New Generation Computing 27, 129–157 (2009)
Article MATH Google Scholar
Aono, M., Hirata, Y., Hara, M., Aihara, K.: Resource-competing oscillator network as a model of amoeba-based neurocomputer. In: Calude, C.S., Costa, J.F., Dershowitz, N., Freire, E., Rozenberg, G. (eds.) UC 2009. LNCS, vol. 5715, pp. 56–69. Springer, Heidelberg (2009)
Chapter Google Scholar
Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for two-bandit problem. In: Calude, C.S., Costa, J.F., Dershowitz, N., Freire, E., Rozenberg, G. (eds.) UC 2009. LNCS, vol. 5715, p. 289. Springer, Heidelberg (2009)
Chapter Google Scholar
Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for the two-bandit problem: nonlocally-correlated parallel exploration via resource conservation (submitted)
Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527–536 (1952)
Article MATH MathSciNet Google Scholar
Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
MATH Google Scholar
Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gans, J. (ed.) Progress in Statistics, pp. 241–266. North Holland, Amsterdam (1974)
Google Scholar
Gittins, J.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. B 41, 148–177 (1979)
MATH MathSciNet Google Scholar
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Article MATH MathSciNet Google Scholar
Agrawal, R.: Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Prob. 27, 1054–1078 (1995)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)
Article MATH Google Scholar
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L., et al. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437–448. Springer, Heidelberg (2005)
Chapter Google Scholar
Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Google Scholar
Daw, N., O’Doherty, J., Dayan, P., Seymour, B., Dolan, R.: Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006)
Article Google Scholar
Cohen, J., McClure, S., Yu, A.: Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Phil. Trans. R. Soc. B 362(1481), 933–942 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Flucto-Order Functions Research Team, RIKEN-HYU Collaboration Research Center, Advanced Science Institute, RIKEN, Fusion Technology Center 5F, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, Korea, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
Song-Ju Kim, Masashi Aono & Masahiko Hara

Authors

Song-Ju Kim
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Aono
View author publications
You can also search for this author in PubMed Google Scholar
Masahiko Hara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Auckland, Science Centre, 38 Princes Street, 1142, Auckland, New Zealand
Cristian S. Calude
Graduate School of Information Science and Technology, Department of Computer Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
Masami Hagiya
Graduate School of Engineering, Department of Information Engineering, Hiroshima University, 739-8527, Higashi-Hiroshima, Japan
Kenichi Morita
Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Niels Bohrweg 1, 2333, Leiden, CA, The Netherlands
Grzegorz Rozenberg
Department of Computer Science and Department of Electronics, University of York, YO10 5DD, Heslington, York, UK
Jon Timmis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, SJ., Aono, M., Hara, M. (2010). Tug-of-War Model for Multi-armed Bandit Problem. In: Calude, C.S., Hagiya, M., Morita, K., Rozenberg, G., Timmis, J. (eds) Unconventional Computation. UC 2010. Lecture Notes in Computer Science, vol 6079. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13523-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-13523-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13522-4
Online ISBN: 978-3-642-13523-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics