Policy gradient learning for quadruped soccer robots
Introduction
In order for robots to be useful in real-world applications, they must adapt to novel and changing environments. In most domains, the robot should respond to changes in its surroundings by adapting both its low-level skills (e.g., vision, motion control and basic behaviors) and the higher-level skills (e.g., plans or strategies as combination of basic behaviors), which depend on them.
In recent years, machine learning techniques have been used in several robotic applications, both for finding optimal parameters of specific behaviors and for determining the best combination of actions required to accomplish a complex task. In fact, machine learning approaches generate solutions with little human interaction, and explore the search space of possible solutions in a systematic way, whereas humans are often biased towards a small part of the space.
Despite a significant body of work on the subject, several issues need to be addressed, mainly due to the difficulties associated with using machine learning in the real world. Compared to other scenarios, learning on physical robots presents several additional challenges: it must be effective with small amounts of data and must converge in short time [1].
Following up on all these considerations, in this article we describe a reinforcement learning algorithm for learning at the same time the best strategy (i.e., the best composition of basic behaviors) and the best parameters for the behaviors in the strategy. The proposed algorithm, named PG–RC, is based on the Policy Gradient (PG) learning technique for mobile robots, first introduced in [2], and exploits additional information on the system properties:relevance of parameters, and contiguities between strategies.
The contribution of this paper is twofold. Firstly, parameter and behavior learning are concurrently handled by the same algorithm. Secondly, the PG–RC algorithm guarantees faster training than the PG algorithm. The method has been tested in the application example of an attacker soccer robot in the RoboCup Standard Platform League (four-legged division).
Section snippets
Related work
Robot learning is a growing area of research at the intersection of robotics and machine learning. We can distinguish between different levels of learning: low level for sensing and control issues, and high level, for cognitive and behavior issues. We include in the first class–parameter learning–all problems where learning is aimed at fine tuning of the parameters used by the low-level algorithms, and in the second class–behavior learning–problems where learning aims at finding the optimal
Problem definition
In this paper, we consider learning a complex task composed of different behaviors. More specifically, we consider situations where a task can be accomplished by applying different strategies, and each strategy is a composition of different behaviors.
The learning problem we focus on is the following. A set of different strategies for accomplishing a certain task is given. Each strategy is implemented through a combination of behaviors: , each one characterized
Complex task learning with policy gradient
A solution to the problem defined in the previous section can be obtained by applying a Policy Gradient learning task for each strategy and then selecting the pair that returns the highest fitness value for task . This procedure requires executions of the PG algorithm (one execution for each strategy). The algorithm presented in this article is an approximation of the above solution, that guarantees a higher convergence rate when a small number of experiments is available.
The
Experiments with a soccer robot
Concurrent behavior and parameter learning is useful in many complex high dimensional systems. Here, we present an example of application of the proposed method: a robot playing soccer within the RoboCup Standard Platform League competitions. One of the main tasks to be accomplished in this scenario is to approach the ball and kick it to the opponent goal. Many strategies can be defined to accomplish this task, but a winning strategy is difficult to identify since it depends on many factors:
Conclusions
In this paper, we presented a method for concurrent learning of best strategy and optimal parameters, by extending the policy gradient reinforcement learning algorithm. The proposed method guarantees fast convergence by exploiting information on the system properties, while learning. In particular, the contiguities between strategies, and the parameter relevance, are estimated and utilized by the algorithm during training. Parameter relevance enables reduction of the search space size during
A. Cherubini received the M.Sc. degree (“Laurea”) in Mechanical Engineering in 2001 from the University of Rome “La Sapienza”, the M.Sc. degree in Control Systems in 2003 from the University of Sheffield, UK, and the Ph.D. degree in Systems Engineering in 2008 from the University of Rome “La Sapienza”.
During his Ph.D. programme (2004–2007), he was a visiting scientist at the Lagadic group at INRIA Rennes - Bretagne Atlantique in Rennes (France), where he is currently working as post-doctoral
References (8)
- et al.
Machine learning with AIBO robots in the four-legged league of RoboCup
IEEE Transactions on Systems, Man and Cybernetics, Part C
(2006) - N. Kohl, P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, in: Proc. of IEEE...
- P. Fidelman, P. Stone, The chin pinch: a case study in skill learning on a legged robot, in: Proc. of 10th...
- P. Stone, M. Veloso, Layered learning, in: Proc. of 11th European Conference on Machine Learning,...
Cited by (11)
Autonomous intelligent decision-making system based on Bayesian SOM neural network for robot soccer
2014, NeurocomputingCitation Excerpt :In this type of systems, priori-knowledge is extracted automatically from training samples by the neural network, while professional robot soccer knowledge of designers is not indispensable. As for the on-line learning, reinforcement learning is one of the most widely used methods in this domain [3,7,11,12]. In some control problems, such as obstacle avoidance navigation, robot arm control and cart-pole balancing, reinforcement learning has got successful application [13,14].
Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning
2023, IEEE International Conference on Intelligent Robots and SystemsLegged robots for object manipulation: A review
2023, Frontiers in Mechanical Engineering
A. Cherubini received the M.Sc. degree (“Laurea”) in Mechanical Engineering in 2001 from the University of Rome “La Sapienza”, the M.Sc. degree in Control Systems in 2003 from the University of Sheffield, UK, and the Ph.D. degree in Systems Engineering in 2008 from the University of Rome “La Sapienza”.
During his Ph.D. programme (2004–2007), he was a visiting scientist at the Lagadic group at INRIA Rennes - Bretagne Atlantique in Rennes (France), where he is currently working as post-doctoral fellow.
His research interests include: visual servoing for mobile robotic applications, robot learning, nonholonomic robot navigation, assistive robotics and legged locomotion.
F. Giannone received the B.Sc. degree in Computer Engineering from Università di Roma “La Sapienza” in December 2005. Her main interests are machine learning and behavior modeling through Petri nets applied to Cognitive Robotics.
L. Iocchi received his Master (Laurea) degree in 1995 and his Ph.D. in 1999 from Sapienza University of Rome. He is currently Assistant Professor at Department of Computer and Systems Science, Sapienza University of Rome, Italy. His main research interests are in the areas of cognitive robotics, action planning, multi-robot coordination, robot perception, robot learning, stereo vision, and vision based applications. He is author of more than 100 referred papers in international journals and conferences.
D. Nardi is Full Professor at Dipartimento di Informatica e Sistemistica, Sapienza University of Rome, Italy. His main research interests include various aspects of knowledge representation and reasoning, such as description logics and nonmonotonic reasoning, cognitive robotics, multi-agent and multi-robot systems.
P.F. Palamara received his B.Sc. (2005) and M.Sc. (2008) in Computer Science from the University of Rome “Sapienza”, and an M.Sc. in Computer science from Columbia University (2009). He is currently a doctoral student in the Itsik Pe’er Lab of Computational Genetics, Columbia University. His main research interests are Computational Genetics, Machine Learning and Cognitive Robotics.