research-article

Coordinate ascent MORE with adaptive entropy control for population-based regret minimization

Authors:
Maximilian Hüttenrauch

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

,
Gerhard Neumann

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference CompanionJuly 2021Pages 1493–1497https://doi.org/10.1145/3449726.3463183

Published:08 July 2021Publication History

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1493–1497

ABSTRACT

Model-based Relative Entropy Policy Search (MORE) is a population-based stochastic search algorithm with desirable properties such as a well defined policy search objective, i.e., it optimizes the expected return, and exact closed form information theoretic update rules. This is in contrast with existing population-based methods, that are often referred to as evolutionary strategies, such as CMA-ES. While these methods work very well in practice, the updates of the search distribution are often based on heuristics and they do not optimize the expected return of the population but instead implicitly optimize the return of elite samples, which may yield a poor expected return and unreliable or risky solutions. We show that the MORE algorithm can be improved with distinct updates based on coordinate ascent on the mean and covariance of the search distribution, which considerably improves the convergence speed while maintaining the exact closed form updates. In this way, we can match the performance of elite samples of CMA-ES while also showing a considerably improved performance of the sample average. We evaluate our new algorithm on simulated robotic tasks and compare to the state of the art CMA-ES.

References

Abbas Abdolmaleki, Rudolf Lioutikov, Jan R Peters, Nuno Lau, Luis Pualo Reis, and Gerhard Neumann. 2015. Model-based relative entropy stochastic search. Advances in Neural Information Processing Systems 28 (2015), 3537--3545.Google Scholar
Marc Peter Deisenroth, Gerhard Neumann, Jan Peters, et al. 2013. A Survey on Policy Search for Robotics. Foundations and Trends in Robotics 2, 1-2 (2013), 1--142.Google ScholarDigital Library
Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github. Zenodo, https://doi.org/10.5281/zenodo.2559634 Google ScholarCross Ref
N. Hansen and A. Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation 9, 2 (2001), 159--195.Google Scholar
Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation 25, 2 (2013), 328--373.Google Scholar
Jens Kober and Jan Peters. 2011. Policy search for motor primitives in robotics. Machine Learning 84, 1 (2011), 171--203.Google ScholarDigital Library
Shie Mannor, Reuven Y Rubinstein, and Yohai Gat. 2003. The cross entropy method for fast policy search. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 512--519.Google ScholarDigital Library
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026--5033.Google ScholarCross Ref
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. 2014. Natural evolution strategies. The Journal of Machine Learning Research 15, 1 (2014), 949--980.Google ScholarDigital Library

Index Terms

Coordinate ascent MORE with adaptive entropy control for population-based regret minimization
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
2. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Continuous space search
  2. Machine learning
    1. Machine learning algorithms

Recommendations

Multi-population differential evolution with adaptive parameter control for global optimization
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation

Differential evolution (DE) is one of the most successful evolutionary algorithms (EAs) for global numerical optimization. Like other EAs, maintaining population diversity is important for DE to escape from local optima and locate a near-global optimum. ...
Read More
A dual-population genetic algorithm for adaptive diversity control

A variety of previous works exist on maintaining population diversity of genetic algorithms (GAs). Dual-population GA (DPGA) is a type of multipopulation GA (MPGA) that uses an additional population as a reservoir of diversity. The main population is ...
Read More
Evolutionary Action Selection for Gradient-Based Policy Learning
Neural Information Processing
Abstract
Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take advantage of both methods for better exploration and exploitation. The evolutionary part of these hybrid methods maintains a population of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2021
2047 pages
ISBN:9781450383516
DOI:10.1145/3449726
Editor:
Francisco Chicano
University of Malaga
,
General Chair:
Krzysztof Krawiec
Poznan University of Technology
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 July 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
policy search
reinforcement learning
robotics
stochastic search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 65
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Coordinate ascent MORE with adaptive entropy control for population-based regret minimization

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-population differential evolution with adaptive parameter control for global optimization

A dual-population genetic algorithm for adaptive diversity control

Evolutionary Action Selection for Gradient-Based Policy Learning