skip to main content
10.1145/3638529.3654090acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Accelerate Evolution Strategy by Proximal Policy Optimization

Published: 14 July 2024 Publication History

Abstract

A pivotal challenge in meta-heuristic optimization is the lack of knowledge inheritance in heuristic rules. For example, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) requires generating historical data from scratch for its adaptation mechanisms for each new instance, demanding an extensive number of fitness evaluations. This severely limits the practicality of meta-heuristics, especially under restricted evaluation budgets, hindering efficient navigation in high-dimensional spaces. To overcome this, we integrate Proximal Policy Optimization (PPO) with a vanilla Evolution Strategy (ES), forming the novel PPO-ES approach. Distinct from other adaptive ES variants like CMA-ES with cumulative path calculation and covariance matrix adaptation, it leverages PPO's capability for dynamic step-size adjustment. Our method streamlines the optimization process and incorporates a meticulously designed reward system to adeptly navigate the scalability challenge, significantly enhancing adaptability and efficiency of ES. PPO-ES, trained on part of the bbob benchmarks, was tested on these and the rest unseen problems, and further validated on bbob-largescale benchmarks with much higher dimensions. Results show that PPO-ES achieves faster or comparable convergence to CMA-ES. Our code can be found online: https://github.com/burningxt/PPO-ES_GECCO24.

Supplemental Material

PDF File
Supplementary Material

References

[1]
Thomas Bäck. 2002. Adaptive business intelligence based on evolution strategies: some application examples of self-adaptive software. Information Sciences 148, 1-4 (2002), 113--121.
[2]
Juan Cruz Barsce, J. Palombarini, and E. Martínez. 2017. Towards autonomous reinforcement learning: Automatic setting of hyper-parameters using Bayesian optimization. 2017 XLIII Latin American Computer Conference (CLEI) (2017), 1--9.
[3]
André Biedenkapp, H Furkan Bozkurt, Theresa Eimer, Frank Hutter, and Marius Lindauer. 2020. Dynamic algorithm configuration: Foundation of a new meta-algorithmic framework. In ECAI 2020. IOS Press, 427--434.
[4]
André Biedenkapp, Nguyen Dang, Martin S. Krejca, F. Hutter, and Carola Doerr. 2022. Theory-inspired parameter control benchmarks for dynamic algorithm configuration. Proceedings of the Genetic and Evolutionary Computation Conference (2022).
[5]
Deyao Chen, M. Buzdalov, Carola Doerr, and Nguyen Dang. 2023. Using Automated Algorithm Configuration for Parameter Control. ArXiv abs/2302.12334 (2023).
[6]
Tianlong Chen, Weiyi Zhang, Jingyang Zhou, Shiyu Chang, Sijia Liu, Lisa Amini, and Zhangyang Wang. 2020. Training Stronger Baselines for Learning to Optimize. ArXiv abs/2010.09089 (2020).
[7]
Yutian Chen, Matthew W Hoffman, Sergio Gómez Colmenarejo, Misha Denil, Timothy P Lillicrap, Matt Botvinick, and Nando Freitas. 2017. Learning to learn without gradient descent by gradient descent. In International Conference on Machine Learning. PMLR, 748--756.
[8]
Christian Daniel, Jonathan Taylor, and Sebastian Nowozin. 2016. Learning step size controllers for robust neural network training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
[9]
Theresa Eimer, André Biedenkapp, Maximilian V Reimer, Steven Adriaensen, F. Hutter, and M. Lindauer. 2021. DACBench: A Benchmark Library for Dynamic Algorithm Configuration. (2021), 1668--1674.
[10]
Enas Elgeldawi, Awny Sayed, Ahmed R. Galal, and Alaa M. Zaki. 2021. Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. Informatics 8 (2021), 79.
[11]
Ouassim Elhara, Konstantinos Varelas, Duc Nguyen, Tea Tusar, Dimo Brockhoff, Nikolaus Hansen, and Anne Auger. 2019. COCO: the large scale black-box optimization benchmarking (BBOB-largescale) test suite. arXiv preprint arXiv:1903.06396 (2019).
[12]
N. Hansen, A. Auger, R. Ros, O. Mersmann, T. Tušar, and D. Brockhoff. 2021. COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting. Optimization Methods and Software 36 (2021), 114--144. Issue 1.
[13]
Nikolaus Hansen, Steffen Finck, Raymond Ros, and Anne Auger. 2009. Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions. Ph. D. Dissertation. INRIA.
[14]
Nikolaus Hansen and Andreas Ostermeier. 1996. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of IEEE international conference on evolutionary computation. IEEE, 312--317.
[15]
Ping Hu, Dongqi Cai, Shandong Wang, Anbang Yao, and Yurong Chen. 2017. Learning supervised scoring ensemble for emotion recognition in the wild. In Proceedings of the 19th ACM international conference on multimodal interaction. 553--560.
[16]
Robert Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, and Sebastian Flennerhag. 2023. Discovering attention-based genetic algorithms via meta-black-box optimization. In Proceedings of the Genetic and Evolutionary Computation Conference. 929--937.
[17]
Robert Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, and Sebastian Flennerhag. 2023. Discovering evolution strategies via meta-black-box optimization. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. 29--30.
[18]
Sergey Levine and Vladlen Koltun. 2013. Guided policy search. In International conference on machine learning. PMLR, 1--9.
[19]
Ke Li and Jitendra Malik. 2016. Learning to optimize. arXiv preprint arXiv:1606.01885 (2016).
[20]
Meng-Lin Li, Xueqiang Gu, Chengyi Zeng, and Yuan Feng. 2020. Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment. Algorithms 13 (2020), 239.
[21]
David J. Miller and K. Rose. 1996. Hierarchical, Unsupervised Learning with Growing via Phase Transitions. Neural Computation 8 (1996), 425--450.
[22]
Mona Nasr, Omar Farouk, Ahmed Mohamedeen, Ali Elrafie, Marwan Bedeir, and Ali Khaled. 2020. Benchmarking meta-heuristic optimization. arXiv preprint arXiv:2007.13476 (2020).
[23]
Mahamed GH Omran, Ayed Salman, and Andries P Engelbrecht. 2005. Self-adaptive differential evolution. In International conference on computational and information science. Springer, 192--199.
[24]
A Kai Qin, Vicky Ling Huang, and Ponnuthurai N Suganthan. 2008. Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE transactions on Evolutionary Computation 13, 2 (2008), 398--417.
[25]
Yangjun Ruan, Yuanhao Xiong, Sashank J. Reddi, Sanjiv Kumar, and Cho-Jui Hsieh. 2019. Learning to Learn by Zeroth-Order Oracle. ArXiv abs/1910.09464 (2019).
[26]
S Salcedo-Sanz. 2016. Modern meta-heuristics based on nonlinear physics processes: A review of models and design procedures. Physics Reports 655 (2016), 1--70.
[27]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[28]
Hans-Paul Schwefel. 1992. Imitating evolution: Collective two-level learning processes. Explaining Process and Change-Approaches to Evolutionary Economics (1992), 49--63.
[29]
Hans-Paul Paul Schwefel. 1993. Evolution and optimum seeking: the sixth generation. John Wiley & Sons, Inc.
[30]
Gresa Shala, André Biedenkapp, Noor Awad, Steven Adriaensen, Marius Lindauer, and Frank Hutter. 2020. Learning step-size adaptation in CMA-ES. In Parallel Problem Solving from Nature-PPSN XVI: 16th International Conference, PPSN 2020, Leiden, The Netherlands, September 5-9, 2020, Proceedings, Part I 16. Springer, 691--706.
[31]
Manik Sharma and Prableen Kaur. 2021. A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives of Computational Methods in Engineering 28 (2021), 1103--1127.
[32]
P. Somol, J. Novovicová, and P. Pudil. 2010. Efficient Feature Subset Selection and Subset Size Optimization. (2010).
[33]
Ryoji Tanabe and Alex Fukunaga. 2013. Success-history based parameter adaptation for differential evolution. In 2013 IEEE congress on evolutionary computation. IEEE, 71--78.
[34]
Michele Tessari and Giovanni Iacca. 2022. Reinforcement learning based adaptive metaheuristics. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 1854--1861.
[35]
Sebastian Thrun and Lorien Pratt. 2012. Learning to learn. Springer Science & Business Media.
[36]
D. Vermetten, Hongya Wang, Carola Doerr, and Thomas Bäck. 2020. Towards dynamic algorithm selection for numerical black-box optimization: investigating BBOB as a use case. Proceedings of the 2020 Genetic and Evolutionary Computation Conference (2020).
[37]
Zhen Xu, Andrew M Dai, Jonas Kemp, and Luke Metz. 2019. Learning an adaptive learning rate schedule. arXiv preprint arXiv:1909.09712 (2019).
[38]
Furong Ye, Carola Doerr, and Thomas Bäck. 2021. Leveraging benchmarking data for informed one-shot dynamic algorithm selection. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 245--246.
[39]
Dapeng Zhang, Lifeng Du, and Zhiwei Gao. 2021. Real-Time Parameter Identification for Forging Machine Using Reinforcement Learning. Processes (2021).
[40]
Wenqing Zheng, Tianlong Chen, Ting-Kuei Hu, and Zhangyang Wang. 2022. Symbolic Learning to Optimize: Towards Interpretability and Scalability. ArXiv abs/2203.06578 (2022).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference
July 2024
1657 pages
ISBN:9798400704949
DOI:10.1145/3638529
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 July 2024

Check for updates

Author Tags

  1. evolution strategy
  2. reinforcement learning
  3. black-box optimization
  4. large-scale optimization
  5. dynamic algorithm configuration

Qualifiers

  • Research-article

Funding Sources

  • Key Research Project of Zhejiang Lab

Conference

GECCO '24
Sponsor:
GECCO '24: Genetic and Evolutionary Computation Conference
July 14 - 18, 2024
VIC, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 84
    Total Downloads
  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)11
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media