research-article

Accelerate Evolution Strategy by Proximal Policy Optimization

Authors:

Hongyang Chen Chen,

Jun HeAuthors Info & Claims

GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1064 - 1072

https://doi.org/10.1145/3638529.3654090

Published: 14 July 2024 Publication History

Abstract

A pivotal challenge in meta-heuristic optimization is the lack of knowledge inheritance in heuristic rules. For example, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) requires generating historical data from scratch for its adaptation mechanisms for each new instance, demanding an extensive number of fitness evaluations. This severely limits the practicality of meta-heuristics, especially under restricted evaluation budgets, hindering efficient navigation in high-dimensional spaces. To overcome this, we integrate Proximal Policy Optimization (PPO) with a vanilla Evolution Strategy (ES), forming the novel PPO-ES approach. Distinct from other adaptive ES variants like CMA-ES with cumulative path calculation and covariance matrix adaptation, it leverages PPO's capability for dynamic step-size adjustment. Our method streamlines the optimization process and incorporates a meticulously designed reward system to adeptly navigate the scalability challenge, significantly enhancing adaptability and efficiency of ES. PPO-ES, trained on part of the bbob benchmarks, was tested on these and the rest unseen problems, and further validated on bbob-largescale benchmarks with much higher dimensions. Results show that PPO-ES achieves faster or comparable convergence to CMA-ES. Our code can be found online: https://github.com/burningxt/PPO-ES_GECCO24.

Supplemental Material

PDF File

Supplementary Material

Download
2.70 MB

References

[1]

Thomas Bäck. 2002. Adaptive business intelligence based on evolution strategies: some application examples of self-adaptive software. Information Sciences 148, 1-4 (2002), 113--121.

Digital Library

[2]

Juan Cruz Barsce, J. Palombarini, and E. Martínez. 2017. Towards autonomous reinforcement learning: Automatic setting of hyper-parameters using Bayesian optimization. 2017 XLIII Latin American Computer Conference (CLEI) (2017), 1--9.

[3]

André Biedenkapp, H Furkan Bozkurt, Theresa Eimer, Frank Hutter, and Marius Lindauer. 2020. Dynamic algorithm configuration: Foundation of a new meta-algorithmic framework. In ECAI 2020. IOS Press, 427--434.

[4]

André Biedenkapp, Nguyen Dang, Martin S. Krejca, F. Hutter, and Carola Doerr. 2022. Theory-inspired parameter control benchmarks for dynamic algorithm configuration. Proceedings of the Genetic and Evolutionary Computation Conference (2022).

Digital Library

[5]

Deyao Chen, M. Buzdalov, Carola Doerr, and Nguyen Dang. 2023. Using Automated Algorithm Configuration for Parameter Control. ArXiv abs/2302.12334 (2023).

[6]

Tianlong Chen, Weiyi Zhang, Jingyang Zhou, Shiyu Chang, Sijia Liu, Lisa Amini, and Zhangyang Wang. 2020. Training Stronger Baselines for Learning to Optimize. ArXiv abs/2010.09089 (2020).

[7]

Yutian Chen, Matthew W Hoffman, Sergio Gómez Colmenarejo, Misha Denil, Timothy P Lillicrap, Matt Botvinick, and Nando Freitas. 2017. Learning to learn without gradient descent by gradient descent. In International Conference on Machine Learning. PMLR, 748--756.

[8]

Christian Daniel, Jonathan Taylor, and Sebastian Nowozin. 2016. Learning step size controllers for robust neural network training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[9]

Theresa Eimer, André Biedenkapp, Maximilian V Reimer, Steven Adriaensen, F. Hutter, and M. Lindauer. 2021. DACBench: A Benchmark Library for Dynamic Algorithm Configuration. (2021), 1668--1674.

[10]

Enas Elgeldawi, Awny Sayed, Ahmed R. Galal, and Alaa M. Zaki. 2021. Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. Informatics 8 (2021), 79.

[11]

Ouassim Elhara, Konstantinos Varelas, Duc Nguyen, Tea Tusar, Dimo Brockhoff, Nikolaus Hansen, and Anne Auger. 2019. COCO: the large scale black-box optimization benchmarking (BBOB-largescale) test suite. arXiv preprint arXiv:1903.06396 (2019).

[12]

N. Hansen, A. Auger, R. Ros, O. Mersmann, T. Tušar, and D. Brockhoff. 2021. COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting. Optimization Methods and Software 36 (2021), 114--144. Issue 1.

[13]

Nikolaus Hansen, Steffen Finck, Raymond Ros, and Anne Auger. 2009. Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions. Ph. D. Dissertation. INRIA.

[14]

Nikolaus Hansen and Andreas Ostermeier. 1996. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of IEEE international conference on evolutionary computation. IEEE, 312--317.

[15]

Ping Hu, Dongqi Cai, Shandong Wang, Anbang Yao, and Yurong Chen. 2017. Learning supervised scoring ensemble for emotion recognition in the wild. In Proceedings of the 19th ACM international conference on multimodal interaction. 553--560.

Digital Library

[16]

Robert Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, and Sebastian Flennerhag. 2023. Discovering attention-based genetic algorithms via meta-black-box optimization. In Proceedings of the Genetic and Evolutionary Computation Conference. 929--937.

Digital Library

[17]

Robert Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, and Sebastian Flennerhag. 2023. Discovering evolution strategies via meta-black-box optimization. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. 29--30.

Digital Library

[18]

Sergey Levine and Vladlen Koltun. 2013. Guided policy search. In International conference on machine learning. PMLR, 1--9.

[19]

Ke Li and Jitendra Malik. 2016. Learning to optimize. arXiv preprint arXiv:1606.01885 (2016).

[20]

Meng-Lin Li, Xueqiang Gu, Chengyi Zeng, and Yuan Feng. 2020. Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment. Algorithms 13 (2020), 239.

[21]

David J. Miller and K. Rose. 1996. Hierarchical, Unsupervised Learning with Growing via Phase Transitions. Neural Computation 8 (1996), 425--450.

Digital Library

[22]

Mona Nasr, Omar Farouk, Ahmed Mohamedeen, Ali Elrafie, Marwan Bedeir, and Ali Khaled. 2020. Benchmarking meta-heuristic optimization. arXiv preprint arXiv:2007.13476 (2020).

[23]

Mahamed GH Omran, Ayed Salman, and Andries P Engelbrecht. 2005. Self-adaptive differential evolution. In International conference on computational and information science. Springer, 192--199.

Digital Library

[24]

A Kai Qin, Vicky Ling Huang, and Ponnuthurai N Suganthan. 2008. Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE transactions on Evolutionary Computation 13, 2 (2008), 398--417.

[25]

Yangjun Ruan, Yuanhao Xiong, Sashank J. Reddi, Sanjiv Kumar, and Cho-Jui Hsieh. 2019. Learning to Learn by Zeroth-Order Oracle. ArXiv abs/1910.09464 (2019).

[26]

S Salcedo-Sanz. 2016. Modern meta-heuristics based on nonlinear physics processes: A review of models and design procedures. Physics Reports 655 (2016), 1--70.

[27]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[28]

Hans-Paul Schwefel. 1992. Imitating evolution: Collective two-level learning processes. Explaining Process and Change-Approaches to Evolutionary Economics (1992), 49--63.

[29]

Hans-Paul Paul Schwefel. 1993. Evolution and optimum seeking: the sixth generation. John Wiley & Sons, Inc.

[30]

Gresa Shala, André Biedenkapp, Noor Awad, Steven Adriaensen, Marius Lindauer, and Frank Hutter. 2020. Learning step-size adaptation in CMA-ES. In Parallel Problem Solving from Nature-PPSN XVI: 16th International Conference, PPSN 2020, Leiden, The Netherlands, September 5-9, 2020, Proceedings, Part I 16. Springer, 691--706.

[31]

Manik Sharma and Prableen Kaur. 2021. A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives of Computational Methods in Engineering 28 (2021), 1103--1127.

[32]

P. Somol, J. Novovicová, and P. Pudil. 2010. Efficient Feature Subset Selection and Subset Size Optimization. (2010).

[33]

Ryoji Tanabe and Alex Fukunaga. 2013. Success-history based parameter adaptation for differential evolution. In 2013 IEEE congress on evolutionary computation. IEEE, 71--78.

[34]

Michele Tessari and Giovanni Iacca. 2022. Reinforcement learning based adaptive metaheuristics. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 1854--1861.

Digital Library

[35]

Sebastian Thrun and Lorien Pratt. 2012. Learning to learn. Springer Science & Business Media.

[36]

D. Vermetten, Hongya Wang, Carola Doerr, and Thomas Bäck. 2020. Towards dynamic algorithm selection for numerical black-box optimization: investigating BBOB as a use case. Proceedings of the 2020 Genetic and Evolutionary Computation Conference (2020).

Digital Library

[37]

Zhen Xu, Andrew M Dai, Jonas Kemp, and Luke Metz. 2019. Learning an adaptive learning rate schedule. arXiv preprint arXiv:1909.09712 (2019).

[38]

Furong Ye, Carola Doerr, and Thomas Bäck. 2021. Leveraging benchmarking data for informed one-shot dynamic algorithm selection. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 245--246.

Digital Library

[39]

Dapeng Zhang, Lifeng Du, and Zhiwei Gao. 2021. Real-Time Parameter Identification for Forging Machine Using Reinforcement Learning. Processes (2021).

[40]

Wenqing Zheng, Tianlong Chen, Ting-Kuei Hu, and Zhangyang Wang. 2022. Symbolic Learning to Optimize: Towards Interpretability and Scalability. ArXiv abs/2203.06578 (2022).

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

Black-box optimization benchmarking of IPOP-saACM-ES and BIPOP-saACM-ES on the BBOB-2012 noiseless testbed
GECCO '12: Proceedings of the 14th annual conference companion on Genetic and evolutionary computation

In this paper, we study the performance of IPOP-saACM-ES and BIPOP-saACM-ES, recently proposed self-adaptive surrogate-assisted Covariance Matrix Adaptation Evolution Strategies. Both algorithms were tested using restarts till a total number of function ...
Black-box optimization benchmarking of NEWUOA compared to BIPOP-CMA-ES: on the BBOB noiseless testbed
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computation

In this paper, the performances of the NEW Unconstrained Optimization Algorithm (NEWUOA) on some noiseless functions are compared to those of the BI-POPulation Covariance Matrix Adaptation-Evolution Strategy (BIPOP-CMA-ES). The two algorithms were ...
Black-box optimization benchmarking the IPOP-CMA-ES on the noisy testbed: comparison to the BIPOP-CMA-ES
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computation

We benchmark the IPOP-CMA-ES on the noisy testbed of the BBOB 2010 workshop. The performances of the IPOP-CMA-ES are compared to those of the BIPOP-CMA-ES. Both algorithms are shown to perform comparably on the BBOB noisy testbed.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference

July 2024

1657 pages

ISBN:9798400704949

DOI:10.1145/3638529

Chair:
Xiaodong Li,
Program Chair:
Julia Handl

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Key Research Project of Zhejiang Lab

Conference

GECCO '24

Sponsor:

SIGEVO

GECCO '24: Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

VIC, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
84
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)11

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten