Skip to main content
Log in

Reinforcement learning for robot soccer

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Batch reinforcement learning methods provide a powerful framework for learning efficiently and effectively in autonomous robots. The paper reviews some recent work of the authors aiming at the successful application of reinforcement learning in a challenging and complex domain. It discusses several variants of the general batch learning framework, particularly tailored to the use of multilayer perceptrons to approximate value functions over continuous state spaces. The batch learning framework is successfully used to learn crucial skills in our soccer-playing robots participating in the RoboCup competitions. This is demonstrated on three different case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Asada, M., Uchibe, E., & Hosoda, K. (1999). Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development. Artificial Intelligence, 110(2), 275–292.

    Article  MATH  Google Scholar 

  • Bagnell, J., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the 2001 IEEE international conference on robotics and automation (ICRA 2001) (pp. 1615–1620), Seoul, South Korea. New York: IEEE Press.

    Google Scholar 

  • Behnke, S., Egorova, A., Gloye, A., Rojas, R., & Simon, M. (2003). Predicting away robot control latency. In D. Polani, B. Browning, A. Bonarini, & K. Yoshida (Eds.), LNCS. RoboCup 2003: robot soccer world cup VII (pp. 712–719), Padua, Italy. Berlin: Springer.

    Google Scholar 

  • Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.

    Google Scholar 

  • Bertsekas, D., & Tsitsiklis, J. (1996). Neuro dynamic programming. Belmont: Athena Scientific.

    MATH  Google Scholar 

  • Chernova, S., & Veloso, M. (2004). An evolutionary approach to gait learning for four-legged robots. In Proceedings of the 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS 2004), Sendai, Japan. New York: IEEE Press.

    Google Scholar 

  • Crites, R., & Barto, A. (1995). Improving elevator performance using reinforcement learning. In Advances in neural information processing systems 8 (NIPS 1995) (pp. 1017–1023), Denver, USA. Cambridge: MIT Press.

    Google Scholar 

  • Ernst, D., Geurts, P., & Wehenkel, L. (2006). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(1), 503–556.

    MathSciNet  Google Scholar 

  • Gabel, T., & Riedmiller, M. (2007). Adaptive reactive job-shop scheduling with learning agents. International Journal of Information Technology and Intelligent Computing, 2(4).

  • Gabel, T., Hafner, R., Lange, S., Lauer, M., & Riedmiller, M. (2006). Bridging the gap: learning in the RoboCup simulation and midsize league. In Proceedings of the 7th Portuguese conference on automatic control (Controlo 2006), Porto, Portugal.

  • Gabel, T., Riedmiller, M., & Trost, F. (2008). A case study on improving defense behavior in soccer simulation 2D: the NeuroHassle approach. In Iocchi, L., Matsubara, H., Weitzenfeld, A., & Zhou, C. (Eds.), LNCS. RoboCup 2008: robot soccer world cup XII, Suzhou, China. Berlin: Springer.

    Google Scholar 

  • Gordon, G., Prieditis, A., & Russell, S. (1995). Stable function approximation in dynamic programming. In Proceedings of the twelfth international conference on machine learning (ICML 1995) (pp. 261–268), Tahoe City, USA. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy. New York: IEEE Press.

    Google Scholar 

  • Kaufmann, U., Mayer, G., Kraetzschmar, G., & Palm, G. (2004). Visual robot detection in RoboCup using neural networks. In D. Nardi, M. Riedmiller, C. Sammut, & J. Santos-Victor (Eds.), LNCS. RoboCup 2004: robot soccer world cup VIII (pp. 310–322), Porto, Portugal. Berlin: Springer.

    Google Scholar 

  • Kitano, H. (Ed.). (1997). RoboCup-97: robot soccer world cup I. Berlin: Springer.

    Google Scholar 

  • Kober, J., Mohler, B., & Peters, J. (2008). Learning perceptual coupling for motor primitives. In Proceedings of the 2008 IEEE/RSJ international conference on intelligent robots and systems (IROS 2008) (pp. 834–839), Nice, France. New York: IEEE Press.

    Google Scholar 

  • Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

    Article  MathSciNet  Google Scholar 

  • Lauer, M., Lange, S., & Riedmiller, M. (2005). Calculating the perfect match: an efficient and accurate approach for robot self-localization. In A. Bredenfeld, A. Jacoff, I. Noda, & Y. Takahashi (Eds.), LNCS. RoboCup 2005: robot soccer world cup IX (pp. 142–153), Osaka, Japan. Berlin: Springer.

    Google Scholar 

  • Lauer, M., Lange, S., & Riedmiller, M. (2006). Motion estimation of moving objects for autonomous mobile robots. Kunstliche Intelligenz, 20(1), 11–17.

    Google Scholar 

  • Li, B., Hu, H., & Spacek, L. (2003). An adaptive color segmentation algorithm for Sony legged robots. In The 21st IASTED international multi-conference on applied informatics (AI 2003) (pp. 126–131), Innsbruck, Austria. New York: IASTED/ACTA Press.

    Google Scholar 

  • Lin, L. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3), 293–321.

    Google Scholar 

  • Ma, J., & Cameron, S. (2008). Combining policy search with planning in multi-agent cooperation. In L. Iocchi, H. Matsubara, A. Weitzenfeld, & C. Zhou (Eds.), LNAI. RoboCup 2008: robot soccer world cup XII, Suzhou, China. Berlin: Springer.

    Google Scholar 

  • Nakashima, T., Takatani, M., Udo, M., Ishibuchi, H., & Nii, M. (2005). Performance evaluation of an evolutionary method for RoboCup soccer strategies. In A. Bredenfeld, A. Jacoff, I. Noda, & Y. Takahashi (Eds.), LNAI. RoboCup 2005: robot soccer world cup IX, Osaka, Japan. Berlin: Springer.

    Google Scholar 

  • Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Autonomous inverted helicopter flight via reinforcement learning. In Experimental robotics IX, the 9th international symposium on experimental robotics (ISER) (pp. 363–372), Singapore, China. Berlin: Springer.

    Google Scholar 

  • Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: a tool for research on multi-agent systems. Applied Artificial Intelligence, 12(2–3), 233–250.

    Google Scholar 

  • Ogino, M., Katoh, Y., Aono, M., Asada, M., & Hosoda, K. (2004). Reinforcement learning of humanoid rhythmic walking parameters based on visual information. Advanced Robotics, 18(7), 677–697.

    Article  Google Scholar 

  • Oubbati, M., Schanz, M., & Levi, P. (2005). Kinematic and dynamic adaptive control of a nonholonomic mobile robot using a RNN. In Proceedings of the 20005 IEEE international symposium on computational intelligence in robotics and automation (CIRA 2005) (pp. 27–33). New York: IEEE Press.

    Chapter  Google Scholar 

  • Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Beijing, China. New York: IEEE Press.

    Google Scholar 

  • Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212.

    Article  Google Scholar 

  • Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.

    Article  Google Scholar 

  • Puterman, M. (2005). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley-Interscience.

    Google Scholar 

  • Riedmiller, M. (1997). Generating continuous control signals for reinforcement controllers using dynamic output elements. In Proceedings of the European symposium on artificial neural networks (ESANN 1997), Bruges, Belgium.

  • Riedmiller, M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In Machine learning: ECML 2005, 16th European conference on machine learning, Porto, Portugal. Berlin: Springer.

    Google Scholar 

  • Riedmiller, M., & Braun, H., (1993). A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In H. Ruspini (Ed.), Proceedings of the IEEE international conference on neural networks (ICNN) (pp. 586–591), San Francisco.

  • Riedmiller, M., & Merke, A. (2003). Using machine learning techniques in complex multi-agent domains. In I. Stamatescu, W. Menzel, M. Richter, & U. Ratsch (Eds.), Adaptivity and learning. Berlin: Springer.

    Google Scholar 

  • Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer.

    Google Scholar 

  • Röfer, T. (2004). Evolutionary gait-optimization using a fitness function based on proprioception. In Nardi, D., Riedmiller, M., Sammut, C., & Santos-Victor, J. (Eds.), LNCS. RoboCup 2004: robot soccer world cup VIII (pp. 310–322), Porto, Portugal. Berlin: Springer.

    Google Scholar 

  • Stone, P., Sutton, R., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188.

    Article  Google Scholar 

  • Sutton, R., & Barto, A. (1998). Reinforcement learning. An introduction. Cambridge: MIT Press/A Bradford Book.

    Google Scholar 

  • Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems 12 (NIPS 1999) (pp. 1057–1063), Denver, USA. Cambridge: MIT Press.

    Google Scholar 

  • Tesauro, G., & Galpering, G. (1995). On-line policy improvement using Monte Carlo search. In Neural information processing systems (NIPS 1996) (pp. 206–221), Denver, USA. Berlin: Springer.

    Google Scholar 

  • Tesauro, G., & Sejnowski, T. (1989). A parallel network that learns to play backgammon. Artificial Intelligence, 39(3), 357–390.

    Article  MATH  Google Scholar 

  • Treptow, A., & Zell, A. (2004). Real-time object tracking for soccer-robots without color information. Robotics and Autonomous Systems, 48(1), 41–48.

    Article  Google Scholar 

  • Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.

    MATH  Google Scholar 

  • Wehenkel, L., Glavic, M., & Ernst, D. (2005). New developments in the application of automatic learning to power system control. In Proceedings of the 15th power systems computation conference (PSCC05), Liege, Belgium.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Riedmiller.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Riedmiller, M., Gabel, T., Hafner, R. et al. Reinforcement learning for robot soccer. Auton Robot 27, 55–73 (2009). https://doi.org/10.1007/s10514-009-9120-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-009-9120-4

Keywords

Navigation