Reinforcement Learning Soccer Teams with Incomplete World Models

Wiering, Marco; Sałustowicz, Rafał; Schmidhuber, Jürgen

doi:10.1023/A:1008921914343

Reinforcement Learning Soccer Teams with Incomplete World Models

Published: July 1999

Volume 7, pages 77–88, (1999)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Marco Wiering¹,
Rafał Sałustowicz¹ &
Jürgen Schmidhuber¹

223 Accesses
20 Citations
Explore all metrics

Abstract

We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(λ)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Albus, J.S. 1975. A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Dynamic Systems, Measurement and Control, pp. 220-227.
Baluja, S. and Caruana, R. 1995. Removing the genetics from the standard genetic algorithm. In Machine Learning: Proceedings of the Twelfth International Conference, A. Prieditis and S. Russell (Eds.), Morgan Kaufmann Publishers: San Francisco, CA, pp. 38-46.
Google Scholar
Barto, A.G., Sutton, R.S., and Anderson, C.W. 1983. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834-846.
Google Scholar
Bellman, R. 1961. Adaptive Control Processes, Princeton University Press.
Bertsekas, D.P. and Tsitsiklis, J.N. 1996. Neuro-Dynamic Programming, Athena Scientific: Belmont, MA.
Google Scholar
Chapman, D. and Kaelbling, L.P. 1991. Input generalization in delayed reinforcement learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), Morgan Kaufman, Vol. 2, pp. 726-731.
Cramer, N.L. 1985. A representation for the adaptive generation of simple sequential programs. In Proceedings of an International Conference on Genetic Algorithms and Their Applications, J.J. Grefenstette (Ed.), Lawrence Erlbaum Associates: Hillsdale, NJ, pp. 183-187.
Google Scholar
Dickmanns, D., Schmidhuber, J., and Winklhofer, A. 1986. Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.
Holland, J.H. 1975. Adaptation in Natural and Artificial Systems, University of Michigan Press: Ann Arbor.
Google Scholar
Kaelbling, L. 1993. Learning in Embedded Systems, MIT Press.
Kearns, M. and Singh, S. 1999. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 12, M. Kearns, S.A. Solla, and D. Cohn (Eds.), MIT Press: Cambridge, MA.
Google Scholar
Koza, J.R. 1992. Genetic evolution and co-evolution of computer programs. In Artificial Life II, C.G. Langton, C. Taylor, J.D. Farmer, and S. Rasmussen (Eds.), Addison Wesley Publishing Company, pp. 313-324.
Lin, L.-J. 1993. Reinforcement Learning for Robots Using Neural Networks. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh.
Google Scholar
Moore, A. and Atkeson, C.G. 1993. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103-130.
Google Scholar
Nowlan, S.J. and Hinton, G.E. 1992. Simplifying neural networks by soft weight sharing. Neural Computation, 4:173-193.
Google Scholar
Peng, J. and Williams, R. 1996. Incremental multi-step Q-learning. Machine Learning, 22:283-290.
Google Scholar
Rechenberg, I. 1971. Evolutions strategie—Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Dissertation, Published in 1973 by Fromman-Holzboog.
Rummery, G.A. and Niranjan, M. 1994. On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK.
Google Scholar
Sałustowicz, R.P. and Schmidhuber, J. 1997. Probabilistic incremental program evolution. Evolutionary Computation, 5(2):123-141.
Google Scholar
Sałustowicz, R.P., Wiering, M.A., and Schmidhuber, J. 1997a. Evolving soccer strategies. In Proceedings of the Fourth International Conference on Neural Information Processing (ICONIP'97), Springer-Verlag: Singapore, pp. 502-506.
Google Scholar
Sałustowicz, R.P., Wiering, M.A., and Schmidhuber, J. 1997b. On learning soccer strategies. In Proceedings of the Seventh International Conference on Artificial Neural Networks (ICANN'97), volume 1327 of Lecture Notes in Computer Science, W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud (Eds.), Springer-Verlag: Berlin, Heidelberg, pp. 769-774.
Google Scholar
Sałustowicz, R.P., Wiering, M.A., and Schmidhuber, J. 1998. Learning team strategies: Soccer case studies. Machine Learning, 33(2/3):263-282.
Google Scholar
Samuel, A.L. 1959. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3:210-229.
Google Scholar
Santamaria, J.C., Sutton, R.S., and Ram, A. 1996. Experiments with reinforcement learning in problems with continuous state and action spaces. Technical Report CIONS 96-088, Georgia Institute of Technology, Atlanta.
Google Scholar
Schmidhuber, J. 1995. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, Revised January 1995.
Schmidhuber, J., Zhao, J., and Schraudolph, N. 1997a. Reinforcement learning with self-modifying policies. In Learning to Learn, S. Thrun and L. Pratt (Eds.), Kluwer, pp. 293-309.
Schmidhuber, J., Zhao, J., and Wiering, M. 1997b. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28:105-130.
Google Scholar
Singh, S.P. and Sutton, R.S. 1996. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158.
Google Scholar
Sutton, R.S. 1988. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44.
Google Scholar
Sutton, R.S. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo (Eds.), MIT Press: Cambridge, MA, pp. 1038-1045.
Google Scholar
Sutton, R.S. and Barto, A.G. 1988. Reinforcement Learning: An Introduction, MIT Press/Bradford Books.
Thrun, S., Fox, D., and Burgard, W. 1998. A probabilistic approach to concurrent mapping and localization for mobile robots. Machine Learning, (31):29-53. Also appeared in Autonomous Robots, 5:253–271, 1998 as joint issue.
Watkins, C.J.C.H. 1989. Learning from Delayed Rewards. Ph.D. Thesis, King's College, Cambridge, England.
Google Scholar
Watkins, C.J.C.H. and Dayan, P. 1992. Q-learning. Machine Learning, 8:279-292.
Google Scholar
Wiering, M.A. 1999. Explorations in Efficient Reinforcement Learning. Ph.D. Thesis, University of Amsterdam/IDSIA.
Wiering, M.A. and Schmidhuber, J. 1998a. Efficient model-based exploration. In Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior: From Animals to Animats 6, J.A. Meyer and S.W. Wilson (Eds.), MIT Press/Bradford Books, pp. 223-228.
Wiering, M.A. and Schmidhuber, J. 1998b. Fast online Q(λ). Machine Learning, 33(1):105-116.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, 6900, Lugano, Switzerland
Marco Wiering, Rafał Sałustowicz & Jürgen Schmidhuber

Authors

Marco Wiering
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Sałustowicz
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wiering, M., Sałustowicz, R. & Schmidhuber, J. Reinforcement Learning Soccer Teams with Incomplete World Models. Autonomous Robots 7, 77–88 (1999). https://doi.org/10.1023/A:1008921914343

Download citation

Issue Date: July 1999
DOI: https://doi.org/10.1023/A:1008921914343

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning Soccer Teams with Incomplete World Models

Abstract

Access this article

Similar content being viewed by others

Towards a Principled Solution to Simulated Robot Soccer

Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning

rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Reinforcement Learning Soccer Teams with Incomplete World Models

Abstract

Access this article

Similar content being viewed by others

Towards a Principled Solution to Simulated Robot Soccer

Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning

rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation