Differential evolution with hybrid parameters and mutation strategies based on reinforcement learning

https://doi.org/10.1016/j.swevo.2022.101194Get rights and content

Abstract

Differential evolution (DE) has recently attracted a lot of attention as a simple and powerful numerical optimization approach for solving various real-world applications. However, the performance of DE is significantly influenced by the configuration of control parameters and mutation strategy. To address this issue, we proposed reinforcement learning-based hybrid parameters and mutation strategies differential evolution (RL-HPSDE) in this paper. The RL-HPSDE is based on the Q-learning framework, the population individual is regarded as an agent. The optimization problem's dynamic fitness landscape analysis results are utilized to represent the environmental states. The ensemble of parameters and mutation strategy is employed as optional actions for the agent. Furthermore, a reward function is designed to guide the agent to perform the optimal action strategy. Based on its reinforcement learning experience stored by the corresponding Q table, the agent could adaptively select an optimal combination of mutation strategy and parameters to generate offspring individual during each generation. The proposed algorithm is evaluated using the CEC2017 single objective test function set. Several well-known DE variants are also compared with the proposed algorithm. Empirical studies suggest that the proposed RL-HPSDE algorithm is competitive with all other competitors.

Introduction

Differential evolution is a famous evolutionary algorithm (EA) proposed by Price and Storn based on the idea of recombination of individual differences, which is suitable for solving various kinds of optimization problems. As an essential part of EA, DE is also a population-based heuristic algorithm and mainly consists of three operating steps, including mutation, crossover, and selection operations [1]. Because of its simple structure and strong convergence ability, DE has been widely utilized to solve engineering optimization problems in practical applications such as feature selection [2], multilevel image thresholding segmentation [3], drug synergy prediction [4], and robot control [5].

The performance of DE largely relies on the combination of mutation strategy and control parameters. Generally, the most suitable parameters and mutation strategy configurations demanded by DE to solve disparate optimization problems are different. The reason is those partial mutation strategies efficiently solve multimodal functions; some others are suitable for composite functions. Likewise, some control parameter settings can effectively improve the convergence speed, and some other settings can enhance the exploration ability. In addition, some related research indicates that even for the same problem, the selection of mutation strategies and parameters is different at each evolutionary stage [6]. Hence, the research on the adjustment of the mutation strategy and control parameters for DE has attracted much attention. However, the traditional empirical and trial-and-error methods for setting up the feasible parameters and strategy are inefficient and time-expensive in most cases, particularly when solving diverse real-world optimization problems [7].

As a result, approaches for the adaptive selection of mutation strategy and tuning parameters received much attention [8]. Consequently, a large amount of enhanced DE variants such as JADE (with new mutation strategy and adaptive parameters) [9], EPSDE (with mutation strategies and parameters combination) [10], MPEDE (with adaptive parameters and multi-population methods) [11], SHADE (with historical memory storage mechanism) [12], LSHADE (with population size linear decrease) [13], TVDE (with time-varying strategy) [14], CSDE (with combined mutation strategies) [15], MPPCEDE (with multi-population and multi-strategy) [16], QLDE (with Q-learning based parameter tuning strategy) [17], DEDQN (with mixed mutation strategy) [18], have been proposed. Hence, designing an algorithm framework to select the parameters and mutation strategy during each evolution stage is critical. However, most current approaches lack learning or feedback mechanisms adjustment based on the evolutionary process. Different optimization problem characteristics may require additional parameters and strategies to effectively search for the optimal solution. Therefore, an ideal DE variant is expected to deal with various challenges in solving numerical optimization problems.

Motivated by the self-learning ideas, a reinforcement learning-based hybrid parameters and mutation strategies differential evolution named RL-HPSDE is presented. A reinforcement learning (RL) approach utilizes a Q-learning algorithm as a decision controller, an adaptive parameter, and a mutation strategy technique for DE. In RL-HPSDE, the agent learns the fitness landscape features of the optimization problem and then determines which action to be taken during the evolutionary process. Based on the Q-learning algorithm, the agent can select the optimal action operation to generate offspring individuals using the Q table, according to the optimization problem's dynamic fitness landscape analysis results. After executing each action strategy, a reward function is designed to update the Q table. The agent will efficiently search for the optimal solution through this reinforcement learning method. The CEC2017 function set is applied to evaluate the performance of the RL-HPSDE algorithm. The corresponding experiments are compared and analyzed with the other five well-known DE variants regarding accuracy, stability, and convergence.

The rest of the document is organized as follows: Section 2 presents the classical DE, a few variations of DE, and the Q-learning algorithm. Section 3 introduces two dynamic fitness landscape analysis techniques. The proposed RL-HPSDE algorithm is presented in Section 4. The experimental simulation results and discussion are shown in Section 5. Section 6 draws the conclusions and presents possible future research directions.

Section snippets

Preliminaries and related work

Primarily, we describe the framework of the DE algorithm. Next, some improvement measures for DE are briefly presented. Finally, one of the reinforcement learning algorithms, Q-learning is introduced.

Dynamic fitness landscape analysis

In most cases, measuring the statistical fitness landscape of an optimization problem can be helpful in judging the best solution to the issue, whether easy or difficult to search. Many landscape analysis methods have been proposed, including modality, ruggedness, information content, and dynamic severity. However, the search process for the optimal solution is dynamic. The landscape information of the problem will change dynamically. Therefore, analyzing an optimization problem's dynamic

The proposed algorithm

This section presents an RL-HPSDE algorithm. Firstly, we designed the main innovative component of Q-learning, including state representations, action strategies, and reward function. Then a population renewal mechanism is introduced. Finally, the general framework of RL-HPSDE is outlined.

Experimental study

This section presents the experiment's description, parameter settings of the contrasted algorithm, and time complexity analysis and discusses the proposed algorithm's performance compared to other DE algorithms.

Conclusions

RL-HPSDE, a combination of reinforcement learning and hybrid parameters and mutation strategies differential evolution, is presented in this paper, which realizes the adaptive selection of parameters and mutation strategy through a reinforcement learning algorithm. The environmental states have consisted of the optimization problem's dynamic fitness landscape analysis results, which are used to determine the evolution direction for DE. Since reinforcement learning with hybrid parameters and

CRediT authorship contribution statement

Zhiping Tan: Conceptualization, Methodology, Software, Writing – original draft. Yu Tang: Writing – review & editing. Kangshun Li: Supervision, Resources. Huasheng Huang: Formal analysis, Data curation. Shaoming Luo: Investigation, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors acknowledge support from the Planned Science and Technology Project of Guangdong Province, China (Grant nos. 2019B020216001, 2019A050510045, and 2021A0505030075), the Fundamental and Applied Basic Research Fund of Guangdong Province (Grant no. 2021A1515110637), the National Natural Science Foundation of China (Grant no. 32071895), the Natural Science Foundation of Guangdong Province, China (Grant nos. 2020B1515120070, and 2021A1515010824), the Key Project of Universities in

References (39)

  • W Deng et al.

    Differential evolution algorithm with wavelet basis function and optimal mutation strategy for complex optimization problem

    Appl. Soft Comput.

    (2021)
  • Y Li et al.

    An improved differential evolution algorithm with dual mutation strategies collaboration

    Exp. Syst. Appl.

    (2020)
  • Z Meng et al.

    Parameters with adaptive learning mechanism (PALM) for the enhancement of differential evolution

    Knowl.-Base. Syst.

    (2018)
  • Z Tan et al.

    Differential evolution with adaptive mutation strategy based on fitness landscape analysis

    Inform. Sci.

    (2021)
  • K M Malan et al.

    A survey of techniques for characterising fitness landscapes and some possible ways forward

    Inform. Sci.

    (2013)
  • Y Huang et al.

    A fitness landscape ruggedness multiobjective differential evolution algorithm with a reinforcement learning strategy

    Appl. Soft Comput.

    (2020)
  • R Storn et al.

    Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces

    J. Global Optim.

    (1997)
  • M Kaur et al.

    Drug synergy prediction using dynamic mutation based differential evolution

    Curr. Pharm. Des.

    (2021)
  • M Pant et al.

    Differential Evolution: A review of more than two decades of research

    Eng. Appl. Artif. Intell.

    (2020)
  • Cited by (13)

    View all citing articles on Scopus
    View full text