Policy gradient learning for quadruped soccer robots

doi:10.1016/j.robot.2010.03.008

Robotics and Autonomous Systems

Volume 58, Issue 7, 31 July 2010, Pages 872-878

https://doi.org/10.1016/j.robot.2010.03.008 Get rights and content

Abstract

In real-world robotic applications, many factors, both at low level (e.g., vision, motion control and behaviors) and at high level (e.g., plans and strategies) determine the quality of the robot performance. Consequently, fine tuning of the parameters, in the implementation of the basic functionalities, as well as in the strategic decisions, is a key issue in robot software development. In recent years, machine learning techniques have been successfully used to find optimal parameters for typical robotic functionalities. However, one major drawback of learning techniques is time consumption: in practical applications, methods designed for physical robots must be effective with small amounts of data. In this paper, we present a method for concurrent learning of best strategy and optimal parameters using policy gradient reinforcement learning algorithm. The results of our experimental work in a simulated environment and on a real robot show a very high convergence rate.

Introduction

In order for robots to be useful in real-world applications, they must adapt to novel and changing environments. In most domains, the robot should respond to changes in its surroundings by adapting both its low-level skills (e.g., vision, motion control and basic behaviors) and the higher-level skills (e.g., plans or strategies as combination of basic behaviors), which depend on them.

In recent years, machine learning techniques have been used in several robotic applications, both for finding optimal parameters of specific behaviors and for determining the best combination of actions required to accomplish a complex task. In fact, machine learning approaches generate solutions with little human interaction, and explore the search space of possible solutions in a systematic way, whereas humans are often biased towards a small part of the space.

Despite a significant body of work on the subject, several issues need to be addressed, mainly due to the difficulties associated with using machine learning in the real world. Compared to other scenarios, learning on physical robots presents several additional challenges: it must be effective with small amounts of data and must converge in short time [1].

Following up on all these considerations, in this article we describe a reinforcement learning algorithm for learning at the same time the best strategy (i.e., the best composition of basic behaviors) and the best parameters for the behaviors in the strategy. The proposed algorithm, named PG–RC, is based on the Policy Gradient (PG) learning technique for mobile robots, first introduced in [2], and exploits additional information on the system properties:relevance of parameters, and contiguities between strategies.

The contribution of this paper is twofold. Firstly, parameter and behavior learning are concurrently handled by the same algorithm. Secondly, the PG–RC algorithm guarantees faster training than the PG algorithm. The method has been tested in the application example of an attacker soccer robot in the RoboCup Standard Platform League (four-legged division).

Section snippets

Related work

Robot learning is a growing area of research at the intersection of robotics and machine learning. We can distinguish between different levels of learning: low level for sensing and control issues, and high level, for cognitive and behavior issues. We include in the first class–parameter learning–all problems where learning is aimed at fine tuning of the parameters used by the low-level algorithms, and in the second class–behavior learning–problems where learning aims at finding the optimal

Problem definition

In this paper, we consider learning a complex task composed of different behaviors. More specifically, we consider situations where a task $T$ can be accomplished by applying different strategies, and each strategy is a composition of different behaviors.

The learning problem we focus on is the following. A set $S = {S_{1}, \dots, S_{N_{s t r}}}$ of $N_{s t r}$ different strategies for accomplishing a certain task $T$ is given. Each strategy is implemented through a combination of behaviors: $B = {B_{1}, \dots, B_{m}}$ , each one characterized

Complex task learning with policy gradient

A solution to the problem defined in the previous section can be obtained by applying a Policy Gradient learning task for each strategy and then selecting the pair $(S_{v *},^{v *} {\underline{X}}^{*})$ that returns the highest fitness value for task $T$ . This procedure requires $N_{s t r}$ executions of the PG algorithm (one execution for each strategy). The algorithm presented in this article is an approximation of the above solution, that guarantees a higher convergence rate when a small number of experiments is available.

The

Experiments with a soccer robot

Concurrent behavior and parameter learning is useful in many complex high dimensional systems. Here, we present an example of application of the proposed method: a robot playing soccer within the RoboCup Standard Platform League competitions. One of the main tasks to be accomplished in this scenario is to approach the ball and kick it to the opponent goal. Many strategies can be defined to accomplish this task, but a winning strategy is difficult to identify since it depends on many factors:

Conclusions

In this paper, we presented a method for concurrent learning of best strategy and optimal parameters, by extending the policy gradient reinforcement learning algorithm. The proposed method guarantees fast convergence by exploiting information on the system properties, while learning. In particular, the contiguities between strategies, and the parameter relevance, are estimated and utilized by the algorithm during training. Parameter relevance enables reduction of the search space size during

A. Cherubini received the M.Sc. degree (“Laurea”) in Mechanical Engineering in 2001 from the University of Rome “La Sapienza”, the M.Sc. degree in Control Systems in 2003 from the University of Sheffield, UK, and the Ph.D. degree in Systems Engineering in 2008 from the University of Rome “La Sapienza”.

During his Ph.D. programme (2004–2007), he was a visiting scientist at the Lagadic group at INRIA Rennes - Bretagne Atlantique in Rennes (France), where he is currently working as post-doctoral

References (8)

S.K. Chalup et al.
Machine learning with AIBO robots in the four-legged league of RoboCup
IEEE Transactions on Systems, Man and Cybernetics, Part C
(2006)
N. Kohl, P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, in: Proc. of IEEE...
P. Fidelman, P. Stone, The chin pinch: a case study in skill learning on a legged robot, in: Proc. of 10th...
P. Stone, M. Veloso, Layered learning, in: Proc. of 11th European Conference on Machine Learning,...

There are more references available in the full text version of this article.

Cited by (11)

Autonomous intelligent decision-making system based on Bayesian SOM neural network for robot soccer
2014, Neurocomputing
Citation Excerpt :
In this type of systems, priori-knowledge is extracted automatically from training samples by the neural network, while professional robot soccer knowledge of designers is not indispensable. As for the on-line learning, reinforcement learning is one of the most widely used methods in this domain [3,7,11,12]. In some control problems, such as obstacle avoidance navigation, robot arm control and cart-pole balancing, reinforcement learning has got successful application [13,14].
The complex confrontation in robot soccer match requires the decision-making system to learn the priori-knowledge given by humans and learn from its own experience. The two learning issues are usually addressed in two phases: off-line learning and on-line learning. Though lots of methods have been developed to address the two issues separately, the construction of a fully autonomous intelligent decision-making system remains challenging because of the difficulty of connecting the two phases. Most existing intelligent decision-making systems focus on only one of the two phases consequently. The model and algorithms of the Bayesian SOM neural network are proposed in this paper, based on which a fully autonomous intelligent decision-making system for robot soccer is built. This model provides a knowledge structure which can be shared by the off-line learning and on-line learning algorithms. By integrating the Bayesian classifier into each neuron, the whole neural network is equivalent to a multi-agent decision-making system. In the on-line learning phase, the Bayesian method is used to update each neuron's beliefs and the whole network's estimation of the state space. In matches with different opponents, this Bayesian SOM intelligent decision-making system showed outstanding learning ability and great adaptivity.
Legged Robots for Object Manipulation: A Review
2023, arXiv
Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning
2023, IEEE International Conference on Intelligent Robots and Systems
Legged robots for object manipulation: A review
2023, Frontiers in Mechanical Engineering
Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning
2022, arXiv
Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot
2022, arXiv

View all citing articles on Scopus

During his Ph.D. programme (2004–2007), he was a visiting scientist at the Lagadic group at INRIA Rennes - Bretagne Atlantique in Rennes (France), where he is currently working as post-doctoral fellow.

His research interests include: visual servoing for mobile robotic applications, robot learning, nonholonomic robot navigation, assistive robotics and legged locomotion.

F. Giannone received the B.Sc. degree in Computer Engineering from Università di Roma “La Sapienza” in December 2005. Her main interests are machine learning and behavior modeling through Petri nets applied to Cognitive Robotics.

L. Iocchi received his Master (Laurea) degree in 1995 and his Ph.D. in 1999 from Sapienza University of Rome. He is currently Assistant Professor at Department of Computer and Systems Science, Sapienza University of Rome, Italy. His main research interests are in the areas of cognitive robotics, action planning, multi-robot coordination, robot perception, robot learning, stereo vision, and vision based applications. He is author of more than 100 referred papers in international journals and conferences.

D. Nardi is Full Professor at Dipartimento di Informatica e Sistemistica, Sapienza University of Rome, Italy. His main research interests include various aspects of knowledge representation and reasoning, such as description logics and nonmonotonic reasoning, cognitive robotics, multi-agent and multi-robot systems.

P.F. Palamara received his B.Sc. (2005) and M.Sc. (2008) in Computer Science from the University of Rome “Sapienza”, and an M.Sc. in Computer science from Columbia University (2009). He is currently a doctoral student in the Itsik Pe’er Lab of Computational Genetics, Columbia University. His main research interests are Computational Genetics, Machine Learning and Cognitive Robotics.

View full text