Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

Authors

  • Qingling Zhu Shenzhen University
  • Xiaoqiang Wu Shenzhen University
  • Qiuzhen Lin Shenzhen University
  • Wei-Neng Chen South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v38i18.30079

Keywords:

SO: Evolutionary Computation, ML: Evolutionary Learning, ML: Reinforcement Learning

Abstract

The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolution when the critic network in RL falls into local optimal. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the rest individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.

Published

2024-03-24

How to Cite

Zhu, Q., Wu, X., Lin, Q., & Chen, W.-N. (2024). Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(18), 20892-20900. https://doi.org/10.1609/aaai.v38i18.30079

Issue

Section

AAAI Technical Track on Search and Optimization