Abstract
Stochastic search and optimization techniques are used in a vast number of areas, ranging from refining the design of vehicles, determining the effectiveness of new drugs, developing efficient strategies in games, or learning proper behaviors in robotics. However, they specialize for the specific problem they are solving, and if the problem’s context slightly changes, they cannot adapt properly. In fact, they require complete re-leaning in order to perform correctly in new unseen scenarios, regardless of how similar they are to previous learned environments. Contextual algorithms have recently emerged as solutions to this problem. They learn the policy for a task that depends on a given context, such that widely different contexts belonging to the same task are learned simultaneously. That being said, the state-of-the-art proposals of this class of algorithms prematurely converge, and simply cannot compete with algorithms that learn a policy for a single context. We describe the Contextual Relative Entropy Policy Search (CREPS) algorithm, which belongs to the before-mentioned class of contextual algorithms. We extend it with a technique that allows the algorithm to severely increase its performance, and we call it Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We propose two variants, and demonstrate their behavior in a set of classic contextual optimization problems, and on complex simulator robot tasks.
Similar content being viewed by others
References
Abdolmaleki, A., Lau, N., Reis, L.P., Peters, J., Neumann, G.: Contextual policy search for linear and nonlinear generalization of a humanoid walking controller. J. Intell. Robot. Syst. 10, 1–16 (2016)
Abdolmaleki, A., Lioutikov, R., Peters, J., Lua, N., Reis, L., Neumann, G.: Regularized Covariance Estimation for Weighted Maximum Likelihood Policy Search Methods. In: Advances in Neural Information Processing Systems (NIPS). MIT Press (2015)
Abdolmaleki, A., Lua, N., Reis, L., Neumann, G.: Regularized covariance estimation for weighted maximum likelihood policy search methods. In: Proceedings of the International Conference on Humanoid Robots (HUMANOIDS) (2015)
Abdolmaleki, A., Lua, N., Reis, L., Peters, J., Neumann, G.: Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller. In: IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) (2015)
Abdolmaleki, A., Simoes, D., Lau, N., Reis, L.P., Neumann, G.: Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation. In: 2016 IEEE International Conference On Autonomous Robot Systems and Competitions (ICARSC), pp. 94–99. IEEE (2016)
Boyd, S., Vandenberghe, L.: Convex optimization. University Press, Cambridge (2004)
Broomhead, D.S., Lowe, D.: Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks. Tech. rep., DTIC Document (1988)
Da Silva, B., Konidaris, G., Barto, A.: Learning parameterized skills. International Conference on Machine Learning (ICML) (2012)
Daniel, C., Neumann, G., Peters, J.: Hierarchical Relative Entropy Policy Search. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2012)
Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task Policy Search for Robotics. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)
Ha, S., Liu, C.: Evolutionary optimization for parameterized whole-body dynamic motor skills. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2016)
Hansen, N., Muller, S., Koumoutsakos, P.: Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation (2003)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Igel, C., Suttorp, T., Hansen, N.: A computational efficient covariance matrix update and a (1 + 1)-CMA for evolution strategies. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation (2006)
Ijspeert, A., Schaal, S.: Learning Attractor Landscapes for Learning Motor Primitives. In: Advances in Neural Information Processing Systems 15(NIPS) (2003)
Kober, J., Oztop, E., Peters, J.: Reinforcement Learning to adjust Robot Movements to New Situations. In: Proceedings of the Robotics: Science and Systems Conference (RSS) (2010)
Kober, J., Peters, J.: Policy Search for Motor Primitives in Robotics. Mach. Learn. 8, 1–33 (2010)
Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-Efficient contextual policy search for robot movement skills. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)
Mannor, S., Rubinstein, R., Gat, Y.: The Cross Entropy method for Fast Policy Search. In: Proceedings of the 20th International Conference on Machine Learning (ICML) (2003)
Molga, M., Smutnicki, C.: Test Functions for Optimization Needs. In: http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf (2005)
Niehaus, C., Röfer, T., Laue, T.: Gait optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the Second Workshop on Humanoid Soccer Robots in conjunction with the, pp. 1–7 (2007)
Peters, J., Mülling, K., Altun, Y.: Relative Entropy Policy Search. In: Proceedings of the 24th National Conference on Artificial Intelligence (AAAI). AAAI Press (2010)
Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent Exploration for Policy Gradient Methods. In: Proceedings of the European Conference on Machine Learning (ECML) (2008)
Stulp, F., Raiola, G., Hoarau, A., Ivaldi, S., Sigaud, O.: Learning Compact Parameterized Skills with a Single Regression. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids) (2013)
Stulp, F., Sigaud, O.: Path Integral Policy Improvement with Covariance Matrix Adaptation. In: International Conference on Machine Learning (ICML) (2012)
Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization. Tech. rep., Nanyang Technological University, Singapore (2005)
Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient Natural Evolution Strategies. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation(GECCO). https://doi.org/10.1145/1569901.1569976 (2009)
Theodorou, E., Buchli, J., Schaal, S.: A Generalized Path Integral Control Approach to Reinforcement Learning. The Journal of Machine Learning Research (2010)
Wang, J.M., Fleet, D.J., Hertzmann, A.: Optimizing walking controllers. ACM Trans. Graph. (TOG) 28(5), 168 (2009)
Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness Expectation Maximization. In: International Conference on Parallel Problem Solving from Nature, pp. 337–346. Springer (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is an extended version of an ICARSC 2016 paper [5]. The second author is supported by Fundação para a Ciência e a Tecnologia under grant PD/BD/113963/2015. The work was also partially funded by the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 and by FCT – Portuguese Foundation for Science and Technology under projects PEst-OE/EEI/UI0027/2013 and UID/CEC/00127/2013 (IEETA). The work was also funded by project EuRoC, reference 608849 from call FP7-2013-NMP-ICT-FOF.
Rights and permissions
About this article
Cite this article
Abdolmaleki, A., Simões, D., Lau, N. et al. Contextual Direct Policy Search. J Intell Robot Syst 96, 141–157 (2019). https://doi.org/10.1007/s10846-018-0968-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-018-0968-4