Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation
DOI:
https://doi.org/10.1609/aaai.v38i18.30079Keywords:
SO: Evolutionary Computation, ML: Evolutionary Learning, ML: Reinforcement LearningAbstract
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolution when the critic network in RL falls into local optimal. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the rest individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.Downloads
Published
2024-03-24
How to Cite
Zhu, Q., Wu, X., Lin, Q., & Chen, W.-N. (2024). Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(18), 20892-20900. https://doi.org/10.1609/aaai.v38i18.30079
Issue
Section
AAAI Technical Track on Search and Optimization