Abstract
We propose a model – the “tug-of-war (TOW) model” – to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a “nonlocal correlation” among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this “exploration–exploitation dilemma” and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Nakagaki, T., Yamada, H., Toth, A.: Maze-solving by an amoeboid organism. Nature 407, 470 (2000)
Tero, A., Kobayashi, R., Nakagaki, T.: Physarum solver: A biologically inspired method of road-network navigation. Physica A 363, 115–119 (2006)
Nakagaki, T., Iima, M., Ueda, T., Nishiura, Y., Saigusa, T., Tero, A., Kobayashi, R., Showalter, K.: Minimum-risk path finding by an adaptive amoebal network. Phys. Rev. Lett. 99, 068104 (2007)
Saigusa, T., Tero, A., Nakagaki, T., Kuramoto, Y.: Amoebae anticipate periodic events. Phys. Rev. Lett. 100, 018101 (2008)
Aono, M., Hara, M., Aihara, K.: Amoeba-based neurocomputing with chaotic dynamics. Communications of the ACM 50(9), 69–72 (2007)
Aono, M., Hara, M.: Spontaneous deadlock breaking on amoeba-based neurocomputer. BioSystems 91, 83–93 (2008)
Aono, M., Hirata, Y., Hara, M., Aihara, K.: Amoeba-based chaotic neurocomputing: Combinatorial optimization by coupled biological oscillators. New Generation Computing 27, 129–157 (2009)
Aono, M., Hirata, Y., Hara, M., Aihara, K.: Resource-competing oscillator network as a model of amoeba-based neurocomputer. In: Calude, C.S., Costa, J.F., Dershowitz, N., Freire, E., Rozenberg, G. (eds.) UC 2009. LNCS, vol. 5715, pp. 56–69. Springer, Heidelberg (2009)
Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for two-bandit problem. In: Calude, C.S., Costa, J.F., Dershowitz, N., Freire, E., Rozenberg, G. (eds.) UC 2009. LNCS, vol. 5715, p. 289. Springer, Heidelberg (2009)
Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for the two-bandit problem: nonlocally-correlated parallel exploration via resource conservation (submitted)
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527–536 (1952)
Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gans, J. (ed.) Progress in Statistics, pp. 241–266. North Holland, Amsterdam (1974)
Gittins, J.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. B 41, 148–177 (1979)
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Agrawal, R.: Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Prob. 27, 1054–1078 (1995)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L., et al. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437–448. Springer, Heidelberg (2005)
Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Daw, N., O’Doherty, J., Dayan, P., Seymour, B., Dolan, R.: Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006)
Cohen, J., McClure, S., Yu, A.: Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Phil. Trans. R. Soc. B 362(1481), 933–942 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, SJ., Aono, M., Hara, M. (2010). Tug-of-War Model for Multi-armed Bandit Problem. In: Calude, C.S., Hagiya, M., Morita, K., Rozenberg, G., Timmis, J. (eds) Unconventional Computation. UC 2010. Lecture Notes in Computer Science, vol 6079. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13523-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-13523-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13522-4
Online ISBN: 978-3-642-13523-1
eBook Packages: Computer ScienceComputer Science (R0)