Contextual Direct Policy Search

Abdolmaleki, Abbas; Simões, David; Lau, Nuno; Reis, Luís Paulo; Neumann, Gerhard

doi:10.1007/s10846-018-0968-4

Contextual Direct Policy Search

With Regularized Covariance Matrix Estimation

Published: 08 January 2019

Volume 96, pages 141–157, (2019)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abbas Abdolmaleki¹,
David Simões ORCID: orcid.org/0000-0002-6464-8012¹,
Nuno Lau¹,
Luís Paulo Reis² &
…
Gerhard Neumann^3,4

236 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Stochastic search and optimization techniques are used in a vast number of areas, ranging from refining the design of vehicles, determining the effectiveness of new drugs, developing efficient strategies in games, or learning proper behaviors in robotics. However, they specialize for the specific problem they are solving, and if the problem’s context slightly changes, they cannot adapt properly. In fact, they require complete re-leaning in order to perform correctly in new unseen scenarios, regardless of how similar they are to previous learned environments. Contextual algorithms have recently emerged as solutions to this problem. They learn the policy for a task that depends on a given context, such that widely different contexts belonging to the same task are learned simultaneously. That being said, the state-of-the-art proposals of this class of algorithms prematurely converge, and simply cannot compete with algorithms that learn a policy for a single context. We describe the Contextual Relative Entropy Policy Search (CREPS) algorithm, which belongs to the before-mentioned class of contextual algorithms. We extend it with a technique that allows the algorithm to severely increase its performance, and we call it Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We propose two variants, and demonstrate their behavior in a set of classic contextual optimization problems, and on complex simulator robot tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Path Planning and Trajectory Planning Algorithms: A General Overview

References

Abdolmaleki, A., Lau, N., Reis, L.P., Peters, J., Neumann, G.: Contextual policy search for linear and nonlinear generalization of a humanoid walking controller. J. Intell. Robot. Syst. 10, 1–16 (2016)
Google Scholar
Abdolmaleki, A., Lioutikov, R., Peters, J., Lua, N., Reis, L., Neumann, G.: Regularized Covariance Estimation for Weighted Maximum Likelihood Policy Search Methods. In: Advances in Neural Information Processing Systems (NIPS). MIT Press (2015)
Abdolmaleki, A., Lua, N., Reis, L., Neumann, G.: Regularized covariance estimation for weighted maximum likelihood policy search methods. In: Proceedings of the International Conference on Humanoid Robots (HUMANOIDS) (2015)
Abdolmaleki, A., Lua, N., Reis, L., Peters, J., Neumann, G.: Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller. In: IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) (2015)
Abdolmaleki, A., Simoes, D., Lau, N., Reis, L.P., Neumann, G.: Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation. In: 2016 IEEE International Conference On Autonomous Robot Systems and Competitions (ICARSC), pp. 94–99. IEEE (2016)
Boyd, S., Vandenberghe, L.: Convex optimization. University Press, Cambridge (2004)
Book Google Scholar
Broomhead, D.S., Lowe, D.: Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks. Tech. rep., DTIC Document (1988)
Da Silva, B., Konidaris, G., Barto, A.: Learning parameterized skills. International Conference on Machine Learning (ICML) (2012)
Daniel, C., Neumann, G., Peters, J.: Hierarchical Relative Entropy Policy Search. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2012)
Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task Policy Search for Robotics. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)
Ha, S., Liu, C.: Evolutionary optimization for parameterized whole-body dynamic motor skills. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2016)
Hansen, N., Muller, S., Koumoutsakos, P.: Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation (2003)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Article Google Scholar
Igel, C., Suttorp, T., Hansen, N.: A computational efficient covariance matrix update and a (1 + 1)-CMA for evolution strategies. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation (2006)
Ijspeert, A., Schaal, S.: Learning Attractor Landscapes for Learning Motor Primitives. In: Advances in Neural Information Processing Systems 15(NIPS) (2003)
Kober, J., Oztop, E., Peters, J.: Reinforcement Learning to adjust Robot Movements to New Situations. In: Proceedings of the Robotics: Science and Systems Conference (RSS) (2010)
Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 8, 1–33 (2010)
MATH Google Scholar
Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-Efficient contextual policy search for robot movement skills. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)
Mannor, S., Rubinstein, R., Gat, Y.: The Cross Entropy method for Fast Policy Search. In: Proceedings of the 20th International Conference on Machine Learning (ICML) (2003)
Molga, M., Smutnicki, C.: Test Functions for Optimization Needs. In: http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf (2005)
Niehaus, C., Röfer, T., Laue, T.: Gait optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the Second Workshop on Humanoid Soccer Robots in conjunction with the, pp. 1–7 (2007)
Peters, J., Mülling, K., Altun, Y.: Relative Entropy Policy Search. In: Proceedings of the 24th National Conference on Artificial Intelligence (AAAI). AAAI Press (2010)
Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent Exploration for Policy Gradient Methods. In: Proceedings of the European Conference on Machine Learning (ECML) (2008)
Stulp, F., Raiola, G., Hoarau, A., Ivaldi, S., Sigaud, O.: Learning Compact Parameterized Skills with a Single Regression. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids) (2013)
Stulp, F., Sigaud, O.: Path Integral Policy Improvement with Covariance Matrix Adaptation. In: International Conference on Machine Learning (ICML) (2012)
Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization. Tech. rep., Nanyang Technological University, Singapore (2005)
Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient Natural Evolution Strategies. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation(GECCO). https://doi.org/10.1145/1569901.1569976 (2009)
Theodorou, E., Buchli, J., Schaal, S.: A Generalized Path Integral Control Approach to Reinforcement Learning. The Journal of Machine Learning Research (2010)
Wang, J.M., Fleet, D.J., Hertzmann, A.: Optimizing walking controllers. ACM Trans. Graph. (TOG) 28(5), 168 (2009)
Google Scholar
Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness Expectation Maximization. In: International Conference on Parallel Problem Solving from Nature, pp. 337–346. Springer (2008)

Download references

Author information

Authors and Affiliations

IEETA - Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
Abbas Abdolmaleki, David Simões & Nuno Lau
LIACC - Artificial Intelligence and Computer Science Laboratory, University of Porto, Porto, Portugal
Luís Paulo Reis
CLAS - Computational Learning for Autonomous Systems, Technische Universität Darmstadt, Darmstadt, Germany
Gerhard Neumann
University of Lincoln, Lincoln, UK
Gerhard Neumann

Authors

Abbas Abdolmaleki
View author publications
You can also search for this author in PubMed Google Scholar
David Simões
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Lau
View author publications
You can also search for this author in PubMed Google Scholar
Luís Paulo Reis
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Neumann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abbas Abdolmaleki.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of an ICARSC 2016 paper [5]. The second author is supported by Fundação para a Ciência e a Tecnologia under grant PD/BD/113963/2015. The work was also partially funded by the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 and by FCT – Portuguese Foundation for Science and Technology under projects PEst-OE/EEI/UI0027/2013 and UID/CEC/00127/2013 (IEETA). The work was also funded by project EuRoC, reference 608849 from call FP7-2013-NMP-ICT-FOF.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdolmaleki, A., Simões, D., Lau, N. et al. Contextual Direct Policy Search. J Intell Robot Syst 96, 141–157 (2019). https://doi.org/10.1007/s10846-018-0968-4

Download citation

Received: 14 December 2017
Accepted: 03 December 2018
Published: 08 January 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s10846-018-0968-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Contextual Direct Policy Search

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Path Planning and Trajectory Planning Algorithms: A General Overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Contextual Direct Policy Search

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Path Planning and Trajectory Planning Algorithms: A General Overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation