Abstract
We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare two learning algorithms: TD-Q learning with linear neural networks (TD-Q) and Probabilistic Incremental Program Evolution (PIPE). TD-Q is based on evaluation functions (EFs) mapping input/action pairs to expected reward, while PIPE searches policy space directly. PIPE uses an adaptive probability distribution to synthesize programs that calculate action probabilities from current inputs. Our results show that TD-Q has difficulties to learn appropriate shared EFs. PIPE, however, does not depend on EFs and finds good policies faster and more reliably.
Preview
Unable to display preview. Download preview PDF.
References
D. P. Bertsekas. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
N. L. Cramer. A representation for the adaptive generation of simple sequential programs. In J.J. Grefenstette, editor, Proceedings of an International Conference on Genetic Algorithms and Their Applications, pages 183–187, Hillsdale NJ, 1985. Lawrence Erlbaum Associates.
L. A. Levin. Universal sequential search problems. Problems of Information Transmission, 9(3):265–266, 1973.
L. J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, January 1993.
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Eleventh International Conference, pages 157–163. Morgan Kaufmann Publishers, San Francisco, CA, 1994.
R. P. Sałustowicz and J. Schmidhuber. Probabilistic incremental program evolution. Evolutionary Computation, to appear, 1997. See ftp://ftp.idsia.ch/pub/rafal/PIPE.ps.gz.
R. P. Sałustowicz, M. A. Wiering, and J. Schmidhuber. Learning team strategies with multiple policy-sharing agents: A soccer case study. Technical Report IDSIA-29-97, IDSIA, 1997. See ftp://ftp.idsia.ch/pub/rafal/soccer.ps.gz.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sałustowicz, R., Wiering, M., Schmidhuber, J. (1997). On learning soccer strategies. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020247
Download citation
DOI: https://doi.org/10.1007/BFb0020247
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63631-1
Online ISBN: 978-3-540-69620-9
eBook Packages: Springer Book Archive