Abstract
Parameter control methods for metaheuristics with reinforcement learning put forward so far usually present the following shortcomings: (1) Their training processes are usually highly time-consuming and they are not able to benefit from parallel or distributed platforms; (2) they are usually sensitive to their hyperparameters, which means that the quality of the final results is heavily dependent on their values; (3) and limited benchmarks have been used to assess their generality. This paper addresses these issues by proposing a methodology for training out-of-the-box parameter control policies for mono-objective non-niching evolutionary and swarm-based algorithms using distributed reinforcement learning with population-based training. The proposed methodology is suitable to be used in any mono-objective optimization problem and for any mono-objective and non-niching Evolutionary and swarm-based algorithm. The results in this paper achieved through extensive experiments show that the proposed method satisfactorily improves all the aforementioned issues, overcoming constant, random and human-designed policies in several different scenarios.
Similar content being viewed by others
Data Availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Code Availability
The following link takes the reader to a git repository for the implementation of the proposed training method used in our experiments: https://github.com/lacerdamarcelo/rl_based_parameter_control_ea_si.
References
Aine, S., Kumar, R., & Chakrabarti, P. P. (2006). Adaptive parameter control of evolutionary algorithms under time constraints. In A. Tiwari, R. Roy, J. Knowles, E. Avineri, & K. Dahal (Eds.), Applications of Software Computing. Berlin: Springer.
Aleti, A., & Moser, I. (2016). A systematic literature review of adaptive parameter control methods for evolutionary algorithms. ACM Computing Survey, 49(3), 56–15635. https://doi.org/10.1145/2996355
Aleti, A., Moser, I., Meedeniya, I., & Grunske, L. (2014). Choosing the appropriate forecasting model for predictive parameter control. Evolutionary Computation, 22(2), 319–349.
Aleti, A., & Moser, I. (2013). Entropy-based adaptive range parameter control for evolutionary algorithms. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. GECCO ’13. ACM, NY, USA , pp. 1501–1508.https://doi.org/10.1145/2463372.2463560.
Aleti, A., Moser, I., & Mostaghim, S. (2012). Adaptive range parameter control. In: 2012 IEEE congress on evolutionary computation, pp. 1–8 https://doi.org/10.1109/CEC.2012.6256567
Aleti, A., & Moser, I. (2011). Predictive parameter control. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. GECCO ’11. ACM, NY, pp. 561–568. https://doi.org/10.1145/2001576.2001653.
Antoniou, M., Hribar, R., & Papa, G. (2021). A geometrical picture of anisotropic elastic tensors. In M. Vasile (Ed.), Parameter control in evolutionary optimisation (pp. 357–385). Cham: Springer. https://doi.org/10.1007/978-3-030-60166-9_11.
Awad, N. H., Ali, M. Z., Suganthan, P. N., Liang, J. J., & Qu, B. Y. (2016). Problem definitions and evaluation criteria for the cec 2017 special session and competition on single objective real-parameter numerical optimization. Singapore: Technical report Nanyang Technological University.
Balaprakash, P., Birattari, M., & Stützle, T. (2007a). Improvement strategies for the f-race algorithm: Sampling design and iterative refinement. In T. Bartz-Beielstein, M. J. Blesa Aguilera, C. Blum, B. Naujoks, A. Roli, G. Rudolph, & M. Sampels (Eds.), Hybrid Metaheuristics (pp. 108–122). Berlin, Heidelberg: Springer.
Balaprakash, P., Birattari, M., & Stützle, T. (2007b). Improvement strategies for the f-race algorithm: Sampling design and iterative refinement. In: Hybrid Metaheuristics. Springer, Berlin, pp. 108–122.
Bielza, C., del Pozo, J. A. F., & Larrañaga, P. (2013). Parameter control of genetic algorithms by learning and simulation of bayesian networks - a case study for the optimal ordering of tables. Journal of Computer Science and Technology, 28(4), 720–731.
Birattari, M., Yuan, Z., Balaprakash, P., & Stützle, T. (2010). F-race and iterated f-race: An overview In: Experimental methods for the analysis of optimization algorithms. Springer, Singapore.
Birattari, M., Stützle, T., Paquete, L., & Varrentrapp, K. (2002) A racing algorithm for configuring metaheuristics. In: Proceedings of the 4th annual conference on genetic and evolutionary computation. GECCO’02. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp. 11–18.
Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). From Natural to Artificial Swarm Intelligence. USA: Oxford University Press Inc.
Chatzinikolaou, N. (2011). Coordinating evolution: An open, peer-to-peer architecture for a self-adapting genetic algorithm. In: Enterprise information systems, vol. 73. Springer, Berlin.
Das, S., Mullick, S. S., & Suganthan, P. N. (2016). Recent advances in differential evolution - an updated survey. Swarm and Evolutionary Computation, 27, 1–30. https://doi.org/10.1016/j.swevo.2016.01.004
Dorigo, M. (1992). Optimization, learning and natural algorithms. PhD thesis, Politecnico di Milano, Italy.
Eberhart, R. C. (2007). Computational Intelligence: Concepts to Implementations. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Eiben, A. E., Hinterding, R., & Michalewicz, Z. (1999). Parameter control in evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 3(2), 124–141. https://doi.org/10.1109/4235.771166
Eiben, A. E., & Smith, J. E. (2015). Introduction to evolutionary computing (2nd ed.). Singapore: Springer.
Eiben, A.E., Horvath, M., Kowalczyk, W., & Schut, M.C. (2007). Reinforcement learning for online control of evolutionary algorithms. In: Proceedings of the 4th international conference on engineering self-organising systems. ESOA’06, pp. 151–160. Springer, Berlin. http://dl.acm.org/citation.cfm?id=1763581.1763595
Engelbrecht, A. P. (2007). Computational intelligence: An introduction (2nd ed.). Hoboken: Wiley Publishing.
Filho, C.J.A.B., de Lima Neto, F.B., Lins, A.J.C.C., Nascimento, A.I.S., & Lima, M.P. (2008). A novel search algorithm based on fish school behavior. In: 2008 IEEE International conference on systems, Man and Cybernetics, Melbourne, pp. 2646–2651. https://doi.org/10.1109/ICSMC.2008.4811695
Filho, C. J. A. B., de Lima Neto, F. B., Lins, A. J. C. C., Nascimento, A. I. S., & Lima, M.P. (2009), Chiong, R. (ed.) Fish School Search. Springer, Berlin, pp. 261–277.
Filho, C.J.A.B., Neto, F.B.L., Sousa, M.F.C., Pontes, M.R., & Madeiro, S.S. (2009). On the influence of the swimming operators in the fish school search algorithm. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, Melbourne, pp. 5012–5017.
Fortnow, L. (2009). The status of the p versus np problem. Commun. ACM, 52(9), 78–86. https://doi.org/10.1145/1562164.1562186
Foulds, L. (1983). The heuristic problem-solving approach. Journal of the Operational Research Society, 34, 927–934.
Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In International conference on machine learning. PMLR, NY, pp. 1587–1596.
Guan, Y., Yang, L., & Sheng, W. (2017). Population control in evolutionary algorithms: Review and comparison (pp. 161–174). In Bio-inspired computing: Theories and applications.
Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., & Silver, D. (2018). Distributed Prioritized Experience Replay. arXiv preprint arXiv:1803.00933
Hristakeva, M. (2004) Solving the 0–1 knapsack problem with genetic algorithms. In Midwest instruction and computing symposium, pp. 16–17
Ilavarasi, K., & Joseph, K.S. (2014). Variants of travelling salesman problem: A survey. In: International conference on information communication and embedded systems (ICICES2014), pp. 1–7.
Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W.M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C., & Kavukcuoglu, K. (2017). Population based training of neural networks.
Karafotias, G., Hoogendoorn, M., & Eiben, A. E. (2015a). Parameter control in evolutionary algorithms: Trends and challenges. IEEE Transactions on Evolutionary Computation, 19(2), 167–187. https://doi.org/10.1109/TEVC.2014.2308294
Karafotias, G., Smit, S.K., & Eiben, A.E. (2012). A generic approach to parameter control. In: Proceedings of the 2012 European conference on the applications of evolutionary computation. EvoApplications ’12.
Karafotias, G., Eiben, A.E., & Hoogendoorn, M. (2014a). Generic parameter control with reinforcement learning. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation. GECCO ’14, pp. 1319–1326.
Karafotias, G., Hoogendoorn, M., & Weel, B. (2014b). Comparing generic parameter controllers for eas giorgos. In: Proceedings of the 2014 IEEE symposium series on computational intelligence. SSCI ’14, pp. 16–53.
Karafotias, G., Hoogendoorn, M., & Eiben, A.E. (2015b). Evaluating reward definitions for parameter control. In: Proceedings of the 2015 European conference on the applications of evolutionary computation. EvoApplications ’15, pp. 667–680.
Kennedy, J., & Eberhart, R.C. (1995). Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp. 1942–1948.
Kingma, D.P., & Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980
Lacerda de, M.G.P., de Andrade Amorim Neto, H., Ludermir, T.B., Kuchen, H., & de Lima Neto, F.B. (2018). Population size control for efficiency and efficacy optimization in population based metaheuristics. In: 2018 IEEE congress on evolutionary computation (CEC), pp. 1–8. https://doi.org/10.1109/CEC.2018.8477792
Leung, S. W., Yuen, S. Y., & Chow, C. K. (2012). Parameter control system of evolutionary algorithm that is aided by the entire search history. Appl. Soft Comput., 12(9), 3063–3078. https://doi.org/10.1016/j.asoc.2012.05.008
Liang, E., Liaw, R., Moritz, P., Nishihara, R., Fox, R., Goldberg, K., Gonzalez, J.E., Jordan, M.I., & Stoica, I. (2017). RLlib: Abstractions for Distributed Reinforcement Learning. In International conference on machine learning. PMLR, NY, pp. 3053–3062
Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid Search, Random Search. Genetic Algorithm: A Big Comparison for NAS. arXiv preprint arXiv:1912.06059
Lynn, N., & Suganthan, P. N. (2015a). Heterogeneous comprehensive learning particle swarm optimization with enhanced exploration and exploitation. Swarm Evolution Computation, 24, 11–24.
Lynn, N., & Suganthan, P. N. (2015b). Heterogeneous comprehensive learning particle swarm optimization with enhanced exploration and exploitation. Swarm and Evolutionary Computation, 24, 11–24. https://doi.org/10.1016/j.swevo.2015.05.002
Maturana, J., & Saubion, F. (2008). On the design of adaptive control strategies for evolutionary algorithms. In: Proceedings of the Evolution Artificielle, 8th international conference on artificial evolution. EA’07. Springer, Berlin, pp. 303–315. http://dl.acm.org/citation.cfm?id=1793671.1793702
Mersmann, O., Bischl, B., Trautmann, H., Preuss, M., Weihs, C., & Rudolph, G. (2011). Exploratory landscape analysis. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. GECCO ’11, pp. 829–836. Association for Computing Machinery, New York, USA . https://doi.org/10.1145/2001576.2001690.
Michalewicz, Z., & Arabas, J. (1994). Genetic algorithms for the 0/1 knapsack problem. In Z. W. Ras & M. Zemankova (Eds.), Methodologies for Intelligent Systems (pp. 134–143). Berlin: Springer.
Miguel de Gomez, A., & Toosi, F. (2021). Continuous parameter control in genetic algorithms using policy gradient reinforcement learning. In: Proceedings of the 13th international joint conference on computational intelligence (IJCCI 2021), pp. 115–122. https://doi.org/10.1109/CEC.2018.8477792
Nocedal, J., & Wright, S. J. (2006). Numerical optimization (2nd ed.). NY,: Springer.
Panigrahi, B. K., Shi, Y., & Lim, M.-H. (2011). Handbook of Swarm intelligence: Concepts, principles and applications (1st ed.). Singapore: Springer.
Parker-Holder, J., Nguyen, V., & Roberts, S. (2021). Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits. Advances in Neural Information Processing System, 33, 17200–17211.
Parpinelli, R. S., Plichoski, G. F., & da Silva, R. S. (2019). A review of techniques for on-line control of parameters in swarm intelligence and evolutionary computation algorithms. International Journal of Bio-inspired Computation, 13(1), 1–17.
Pereira, Gomes, de Lacerda, M., de Araujo Pessoa, L. F., de Lima, Buarque, Neto, F., Ludermir, T. B., & Kuchen, H. (2021). A systematic literature review on general parameter control for evolutionary and swarm-based algorithms. Swarm and Evolutionary Computation, 60, 100777. https://doi.org/10.1016/j.swevo.2020.100777
Pisinger, D. (2005). Where are the hard knapsack problems? Computers & Operations Research, 32(9), 2271–2284. https://doi.org/10.1016/j.cor.2004.03.002
Quevedo, J., Abdelatti, M., Imani, F., & Sodhi, M. (2021). Using reinforcement learning for tuning genetic algorithms. In: Proceedings of the genetic and evolutionary computation conference companion. GECCO ’21. Association for computing machinery, New York, NY, pp. 1503–1507 10.1145/3449726.3463203.
Rost, A., Petrova, I., & Buzdalova, A. (2016). Adaptive parameter selection in evolutionary algorithms by reinforcement learning with dynamic discretization of parameter range. In: Proceedings of the 2016 on genetic and evolutionary computation. GECCO ’16.
Rummery, G.A., & Niranjan, M. (1994). On-line q-learning using connectionist systems. Technical report.
Schuchardt, J., Golkov, V., & Cremers, D. (2019). Learning to Evolve. arXiv preprint arXiv:1905.03389
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. 1048550/ARXIV.1707.06347
Sharma, M., Komninos, A., Ibanez, M.L., & Kazakov, D. (2019). Deep Reinforcement Learning Based Parameter Control in Differential Evolution.
Silver, E. (2004). An overview of heuristic solution methods. Journal of the Operational Research Society, 55(9), 936–956.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2017) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv preprint arXiv:1712.01815
Storn, R., & Price, K. (1997). Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359. https://doi.org/10.1023/A:1008202821328
Sutton, R. S., & Barto, A. G. (2018a). Reinforcement Learning: An Introduction. Cambridge: A Bradford Book.
Sutton, R. S., & Barto, A. G. (2018b). Reinforcement learning: An introduction (2nd ed.). Cambridge: The MIT Press.
Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Johnsen: Morgan and Claypool Publishers.
Talbi, E.-G. (2009). Metaheuristics: From design to implementation. Hoboken: Wiley Publishing.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292. https://doi.org/10.1007/BF00992698
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
Zhang, J., Chen, W.-N., Zhan, Z.-H., Yu, W.-J., Li, Y.-L., Chen, N., & Zhou, Q. (2012). A survey on algorithm adaptation in evolutionary computation. Frontiers of Electrical and Electronic Engineering, 7(1), 16–31. https://doi.org/10.1007/s11460-012-0192-0
Acknowledgements
The authors of this paper would like to thank CNPq and CAPES (Brazil) for funding the research that originated this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 TD3’s hyperparameters
-
Update delay between policy and Q-Function parameters: 2 (i.e., for each policy update, the Q-Function is updated twice);
-
Target noise (i.e., variance of the gaussian noise \(\epsilon\)): 0.2;
-
Target noise clip (i.e., c): 0.5;
-
Standard deviation of the zero-mean gaussian noise added to the actions: 0.1;
-
\(\gamma\): 0.99;
-
Initial random steps (i.e., number of steps with random decisions executed before the algorithm starts learning): 45,000;
-
Adam \(\beta _1\): 0.9;
-
Adam \(\beta _2\): 0.999;
-
Adam \(\epsilon\): \(10^{-7}\);
1.2 PBT’s hyperparameters
-
Perturbation interval: 4;
-
Quantile fraction: 0.125;
-
Resample probability: 0.5.
1.3 I/F-race’s hyperparameters
-
Number of parameter configurations evaluated for each new F-Race process: 48;
-
Minimum F-Race iterations before it starts removing bad setups: 20;
-
Parameter setup generator standard deviation: 0.3 * range of values of the parameter;
-
Minimum number of configurations in the F-Race pool: 10;
-
Maximum number of F-Race iterations: 30.
1.4 Human-designed parameter control policies
-
HCLPSO (Lynn & Suganthan, 2015b):
-
w: Linear decrease from 0.99 to 0.2;
-
c: Linear decrease from 3 to 1.5;
-
candidate solution step: Linear decrease from 0.1 to 0.000001;
-
c1: Linear decrease from 2.5 to 0.5;
-
c2: Linear increase from 0.5 to 2.5;
-
m: 5.
-
-
FSS (Filho et al., 2009):
-
candidate solution step: Linear decrease from 0.1 to 0.000001;
-
Volitive step: twice the candidate solution step;
-
Maximum weight: 5000.
-
-
DE (Das et al., 2016):
-
F: 2;
-
Crossover probability: 0.5;
-
-
ACO (Das et al., 2016):
-
\(\alpha\): 1;
-
\(\beta\): 2;
-
\(\rho\): 0.98;
-
Probability of using the best ant ever to update the pheromone trail instead of the best in the iteration: linear increase from 0 to 1.
-
-
GA (Michalewicz & Arabas, 1994; Hristakeva, 2004):
-
Mutation probability: 0.1;
-
Crossover probability: 0.75;
-
Elitism size: 2.
-
1.5 Sampling interval for the random parameter control policy
-
HCLPSO:
-
w: [0.2, 0.99];
-
c: [1.5, 3];
-
c1: [0.5, 2.5];
-
c2: [0.5, 2.5];
-
m: 5.
-
-
FSS:
-
candidate solution step: [0, 0.1];
-
Volitive step [\(-\)0.2, 0.2] (the RL algorithm decides whether to contract or not);
-
-
DE:
-
F: [0.01, 4];
-
Crossover probability: [0.01, 1].
-
-
ACO:
-
\(\alpha\): [0, 4];
-
\(\beta\): [0, 4];
-
\(\rho\): [0, 1];
-
Probability of using the best ant ever to update the pheromone trail instead of the best in the iteration: [0, 1].
-
-
GA:
-
Mutation probability: [0.001, 1];
-
Crossover probability: [0.001, 1];
-
Elitism size: [1, 5].
-
1.6 Fully detailed experimental results
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
de Lacerda, M.G.P., de Lima Neto, F.B., Ludermir, T.B. et al. Out-of-the-box parameter control for evolutionary and swarm-based algorithms with distributed reinforcement learning. Swarm Intell 17, 173–217 (2023). https://doi.org/10.1007/s11721-022-00222-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11721-022-00222-z