Skip to main content
Log in

Contextual Direct Policy Search

With Regularized Covariance Matrix Estimation

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Stochastic search and optimization techniques are used in a vast number of areas, ranging from refining the design of vehicles, determining the effectiveness of new drugs, developing efficient strategies in games, or learning proper behaviors in robotics. However, they specialize for the specific problem they are solving, and if the problem’s context slightly changes, they cannot adapt properly. In fact, they require complete re-leaning in order to perform correctly in new unseen scenarios, regardless of how similar they are to previous learned environments. Contextual algorithms have recently emerged as solutions to this problem. They learn the policy for a task that depends on a given context, such that widely different contexts belonging to the same task are learned simultaneously. That being said, the state-of-the-art proposals of this class of algorithms prematurely converge, and simply cannot compete with algorithms that learn a policy for a single context. We describe the Contextual Relative Entropy Policy Search (CREPS) algorithm, which belongs to the before-mentioned class of contextual algorithms. We extend it with a technique that allows the algorithm to severely increase its performance, and we call it Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We propose two variants, and demonstrate their behavior in a set of classic contextual optimization problems, and on complex simulator robot tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdolmaleki, A., Lau, N., Reis, L.P., Peters, J., Neumann, G.: Contextual policy search for linear and nonlinear generalization of a humanoid walking controller. J. Intell. Robot. Syst. 10, 1–16 (2016)

    Google Scholar 

  2. Abdolmaleki, A., Lioutikov, R., Peters, J., Lua, N., Reis, L., Neumann, G.: Regularized Covariance Estimation for Weighted Maximum Likelihood Policy Search Methods. In: Advances in Neural Information Processing Systems (NIPS). MIT Press (2015)

  3. Abdolmaleki, A., Lua, N., Reis, L., Neumann, G.: Regularized covariance estimation for weighted maximum likelihood policy search methods. In: Proceedings of the International Conference on Humanoid Robots (HUMANOIDS) (2015)

  4. Abdolmaleki, A., Lua, N., Reis, L., Peters, J., Neumann, G.: Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller. In: IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) (2015)

  5. Abdolmaleki, A., Simoes, D., Lau, N., Reis, L.P., Neumann, G.: Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation. In: 2016 IEEE International Conference On Autonomous Robot Systems and Competitions (ICARSC), pp. 94–99. IEEE (2016)

  6. Boyd, S., Vandenberghe, L.: Convex optimization. University Press, Cambridge (2004)

    Book  Google Scholar 

  7. Broomhead, D.S., Lowe, D.: Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks. Tech. rep., DTIC Document (1988)

  8. Da Silva, B., Konidaris, G., Barto, A.: Learning parameterized skills. International Conference on Machine Learning (ICML) (2012)

  9. Daniel, C., Neumann, G., Peters, J.: Hierarchical Relative Entropy Policy Search. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2012)

  10. Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task Policy Search for Robotics. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)

  11. Ha, S., Liu, C.: Evolutionary optimization for parameterized whole-body dynamic motor skills. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2016)

  12. Hansen, N., Muller, S., Koumoutsakos, P.: Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation (2003)

  13. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)

    Article  Google Scholar 

  14. Igel, C., Suttorp, T., Hansen, N.: A computational efficient covariance matrix update and a (1 + 1)-CMA for evolution strategies. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation (2006)

  15. Ijspeert, A., Schaal, S.: Learning Attractor Landscapes for Learning Motor Primitives. In: Advances in Neural Information Processing Systems 15(NIPS) (2003)

  16. Kober, J., Oztop, E., Peters, J.: Reinforcement Learning to adjust Robot Movements to New Situations. In: Proceedings of the Robotics: Science and Systems Conference (RSS) (2010)

  17. Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 8, 1–33 (2010)

    MATH  Google Scholar 

  18. Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-Efficient contextual policy search for robot movement skills. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)

  19. Mannor, S., Rubinstein, R., Gat, Y.: The Cross Entropy method for Fast Policy Search. In: Proceedings of the 20th International Conference on Machine Learning (ICML) (2003)

  20. Molga, M., Smutnicki, C.: Test Functions for Optimization Needs. In: http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf (2005)

  21. Niehaus, C., Röfer, T., Laue, T.: Gait optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the Second Workshop on Humanoid Soccer Robots in conjunction with the, pp. 1–7 (2007)

  22. Peters, J., Mülling, K., Altun, Y.: Relative Entropy Policy Search. In: Proceedings of the 24th National Conference on Artificial Intelligence (AAAI). AAAI Press (2010)

  23. Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent Exploration for Policy Gradient Methods. In: Proceedings of the European Conference on Machine Learning (ECML) (2008)

  24. Stulp, F., Raiola, G., Hoarau, A., Ivaldi, S., Sigaud, O.: Learning Compact Parameterized Skills with a Single Regression. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids) (2013)

  25. Stulp, F., Sigaud, O.: Path Integral Policy Improvement with Covariance Matrix Adaptation. In: International Conference on Machine Learning (ICML) (2012)

  26. Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization. Tech. rep., Nanyang Technological University, Singapore (2005)

  27. Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient Natural Evolution Strategies. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation(GECCO). https://doi.org/10.1145/1569901.1569976 (2009)

  28. Theodorou, E., Buchli, J., Schaal, S.: A Generalized Path Integral Control Approach to Reinforcement Learning. The Journal of Machine Learning Research (2010)

  29. Wang, J.M., Fleet, D.J., Hertzmann, A.: Optimizing walking controllers. ACM Trans. Graph. (TOG) 28(5), 168 (2009)

    Google Scholar 

  30. Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness Expectation Maximization. In: International Conference on Parallel Problem Solving from Nature, pp. 337–346. Springer (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abbas Abdolmaleki.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of an ICARSC 2016 paper [5]. The second author is supported by Fundação para a Ciência e a Tecnologia under grant PD/BD/113963/2015. The work was also partially funded by the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 and by FCT – Portuguese Foundation for Science and Technology under projects PEst-OE/EEI/UI0027/2013 and UID/CEC/00127/2013 (IEETA). The work was also funded by project EuRoC, reference 608849 from call FP7-2013-NMP-ICT-FOF.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdolmaleki, A., Simões, D., Lau, N. et al. Contextual Direct Policy Search. J Intell Robot Syst 96, 141–157 (2019). https://doi.org/10.1007/s10846-018-0968-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-018-0968-4

Keywords

Navigation