ABSTRACT
Model-based Relative Entropy Policy Search (MORE) is a population-based stochastic search algorithm with desirable properties such as a well defined policy search objective, i.e., it optimizes the expected return, and exact closed form information theoretic update rules. This is in contrast with existing population-based methods, that are often referred to as evolutionary strategies, such as CMA-ES. While these methods work very well in practice, the updates of the search distribution are often based on heuristics and they do not optimize the expected return of the population but instead implicitly optimize the return of elite samples, which may yield a poor expected return and unreliable or risky solutions. We show that the MORE algorithm can be improved with distinct updates based on coordinate ascent on the mean and covariance of the search distribution, which considerably improves the convergence speed while maintaining the exact closed form updates. In this way, we can match the performance of elite samples of CMA-ES while also showing a considerably improved performance of the sample average. We evaluate our new algorithm on simulated robotic tasks and compare to the state of the art CMA-ES.
- Abbas Abdolmaleki, Rudolf Lioutikov, Jan R Peters, Nuno Lau, Luis Pualo Reis, and Gerhard Neumann. 2015. Model-based relative entropy stochastic search. Advances in Neural Information Processing Systems 28 (2015), 3537--3545.Google Scholar
- Marc Peter Deisenroth, Gerhard Neumann, Jan Peters, et al. 2013. A Survey on Policy Search for Robotics. Foundations and Trends in Robotics 2, 1-2 (2013), 1--142.Google ScholarDigital Library
- Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github. Zenodo, https://doi.org/10.5281/zenodo.2559634 Google ScholarCross Ref
- N. Hansen and A. Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation 9, 2 (2001), 159--195.Google Scholar
- Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation 25, 2 (2013), 328--373.Google Scholar
- Jens Kober and Jan Peters. 2011. Policy search for motor primitives in robotics. Machine Learning 84, 1 (2011), 171--203.Google ScholarDigital Library
- Shie Mannor, Reuven Y Rubinstein, and Yohai Gat. 2003. The cross entropy method for fast policy search. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 512--519.Google ScholarDigital Library
- Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026--5033.Google ScholarCross Ref
- Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. 2014. Natural evolution strategies. The Journal of Machine Learning Research 15, 1 (2014), 949--980.Google ScholarDigital Library
Index Terms
- Coordinate ascent MORE with adaptive entropy control for population-based regret minimization
Recommendations
Multi-population differential evolution with adaptive parameter control for global optimization
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computationDifferential evolution (DE) is one of the most successful evolutionary algorithms (EAs) for global numerical optimization. Like other EAs, maintaining population diversity is important for DE to escape from local optima and locate a near-global optimum. ...
A dual-population genetic algorithm for adaptive diversity control
A variety of previous works exist on maintaining population diversity of genetic algorithms (GAs). Dual-population GA (DPGA) is a type of multipopulation GA (MPGA) that uses an additional population as a reservoir of diversity. The main population is ...
Evolutionary Action Selection for Gradient-Based Policy Learning
Neural Information ProcessingAbstractEvolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take advantage of both methods for better exploration and exploitation. The evolutionary part of these hybrid methods maintains a population of ...
Comments