research-article

Understanding the Role of Population Experiences in Proximal Distilled Evolutionary Reinforcement Learning

Authors:

Thai Huy Nguyen,

Ngoc Hoang LuongAuthors Info & Claims

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Pages 205 - 212

https://doi.org/10.1145/3628797.3629006

Published: 07 December 2023 Publication History

Get Access

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Understanding the Role of Population Experiences in Proximal Distilled Evolutionary Reinforcement Learning

Pages 205 - 212

Abstract
References

Abstract

Evolutionary Reinforcement Learning (ERL) combines the sample-efficiency property of Reinforcement Learning and exploration capabilities from the population-based search of Evolutionary Computation. These methods have shown promising performance on many continuous control tasks. However, one could observe the instability that may occur from such methods. Several works have shown that the experiences coming from the population individuals lead the state distribution shift in the RL policy updating process. A vanilla remedy method has been proposed to alleviate this issue by separating the experience transitions into two distinct replay buffers for the RL policy and the population and mixing the samples from the two buffers with a fixed ratio to update the RL policy. The effectiveness of this approach has been shown empirically on an ERL method where Evolution Strategies (ES) assists an external RL agent. Nevertheless, there has not been any thorough investigation on Genetic Algorithm (GA) based ERL to understand how this method performs on these ERL approaches. In this paper, we analyze the influence of off-policy data coming from the GA population to the RL policy and how the mixing method performs on a state-of-the-art ERL method, namely Proximal Distilled Evolutionary Reinforcement Learning (PDERL).

References

[1]

Alex M. Andrew. 1999. Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto, Adaptive Computation and Machine Learning series, MIT Press (Bradford Book), Cambridge, Mass., 1998, xviii + 322 pp, ISBN 0-262-19398-1, (hardback, £31.95). Robotica 17, 2 (1999), 229–235. https://doi.org/10.1017/s0263574799211174

Abstract

References

Index Terms

Recommendations

Stable and Sample-Efficient Policy Search for Continuous Control via Hybridizing Phenotypic Evolutionary Algorithm with the Double Actors Regularized Critics

An Efficient Evaluation Mechanism for Evolutionary Reinforcement Learning

Rethinking Population-assisted Off-policy Reinforcement Learning

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations