skip to main content
10.1145/3628797.3629006acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Understanding the Role of Population Experiences in Proximal Distilled Evolutionary Reinforcement Learning

Published: 07 December 2023 Publication History

Abstract

Evolutionary Reinforcement Learning (ERL) combines the sample-efficiency property of Reinforcement Learning and exploration capabilities from the population-based search of Evolutionary Computation. These methods have shown promising performance on many continuous control tasks. However, one could observe the instability that may occur from such methods. Several works have shown that the experiences coming from the population individuals lead the state distribution shift in the RL policy updating process. A vanilla remedy method has been proposed to alleviate this issue by separating the experience transitions into two distinct replay buffers for the RL policy and the population and mixing the samples from the two buffers with a fixed ratio to update the RL policy. The effectiveness of this approach has been shown empirically on an ERL method where Evolution Strategies (ES) assists an external RL agent. Nevertheless, there has not been any thorough investigation on Genetic Algorithm (GA) based ERL to understand how this method performs on these ERL approaches. In this paper, we analyze the influence of off-policy data coming from the GA population to the RL policy and how the mixing method performs on a state-of-the-art ERL method, namely Proximal Distilled Evolutionary Reinforcement Learning (PDERL).

References

[1]
Alex M. Andrew. 1999. Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto, Adaptive Computation and Machine Learning series, MIT Press (Bradford Book), Cambridge, Mass., 1998, xviii + 322 pp, ISBN 0-262-19398-1, (hardback, £31.95). Robotica 17, 2 (1999), 229–235. https://doi.org/10.1017/s0263574799211174
[2]
Alessio Benavoli, Giorgio Corani, and Francesca Mangili. 2016. Should We Really Use Post-Hoc Tests Based on Mean-Ranks?J. Mach. Learn. Res. 17 (2016), 5:1–5:10. http://jmlr.org/papers/v17/benavoli16a.html
[3]
Cristian Bodnar, Ben Day, and Pietro Lió. 2020. Proximal Distilled Evolutionary Reinforcement Learning. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 3283–3290. https://ojs.aaai.org/index.php/AAAI/article/view/5728
[4]
Xiaocong Chen, Lina Yao, Julian J. McAuley, Guanglin Zhou, and Xianzhi Wang. 2023. Deep reinforcement learning in recommender systems: A survey and new perspectives. Knowl. Based Syst. 264 (2023), 110335. https://doi.org/10.1016/j.knosys.2023.110335
[5]
Scott Fujimoto, Herke van Hoof, and David Meger. 2018. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018(Proceedings of Machine Learning Research, Vol. 80). PMLR, 1582–1591.
[6]
Tanmay Gangwani and Jian Peng. 2018. Policy Optimization by Genetic Distillation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=ByOnmlWC-
[7]
Shauharda Khadka and Kagan Tumer. 2018. Evolution-Guided Policy Gradient in Reinforcement Learning. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 1196–1208.
[8]
Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, and Dmitry P. Vetrov. 2020. Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 5556–5566. http://proceedings.mlr.press/v119/kuznetsov20a.html
[9]
Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. Safe mutations for deep and recurrent neural networks through output gradients. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2018, Kyoto, Japan, July 15-19, 2018, Hernán E. Aguirre and Keiki Takadama (Eds.). ACM, 117–124. https://doi.org/10.1145/3205455.3205473
[10]
Yuanzheng Li, Chaofan Yu, Mohammad Shahidehpour, Tao Yang, Zhigang Zeng, and Tianyou Chai. 2023. Deep Reinforcement Learning for Smart Grid Operations: Algorithms, Applications, and Prospects. Proc. IEEE 111, 9 (2023), 1055–1096. https://doi.org/10.1109/JPROC.2023.3303358
[11]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
[12]
Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, and Xiu Li. 2022. Efficient Continuous Control with Double Actors and Regularized Critics. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Virtual Event, February 22 - March 1, 2022. AAAI Press, 7655–7663.
[13]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602 (2013). arXiv:1312.5602http://arxiv.org/abs/1312.5602
[14]
Nils Müller and Tobias Glasmachers. 2018. Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies. In Parallel Problem Solving from Nature - PPSN XV - 15th International Conference, Coimbra, Portugal, September 8-12, 2018, Proceedings, Part II(Lecture Notes in Computer Science, Vol. 11102), Anne Auger, Carlos M. Fonseca, Nuno Lourenço, Penousal Machado, Luís Paquete, and L. Darrell Whitley (Eds.). Springer, 411–423. https://doi.org/10.1007/978-3-319-99259-4_33
[15]
Hieu Trung Nguyen and Ngoc Hoang Luong. 2021. Applying Deep Reinforcement Learning in Automated Stock Trading. Springer International Publishing, Cham, 285–297. https://doi.org/10.1007/978-3-030-76620-7_25
[16]
Hieu Trung Nguyen, Khang Tran, and Ngoc Hoang Luong. 2021. Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method. In 2021 8th NAFOSTED Conference on Information and Computer Science (NICS). 132–137. https://doi.org/10.1109/NICS54270.2021.9701549
[17]
Hieu Trung Nguyen, Khang Tran, and Ngoc Hoang Luong. 2022. Combining Soft-Actor Critic with Cross-Entropy Method for Policy Search in Continuous Control. In IEEE Congress on Evolutionary Computation, CEC 2022, Padua, Italy, July 18-23, 2022. IEEE, 1–8. https://doi.org/10.1109/CEC55065.2022.9870209
[18]
Thai Huy Nguyen and Ngoc Hoang Luong. 2023. Stable and Sample-Efficient Policy Search for Continuous Control via Hybridizing Phenotypic Evolutionary Algorithm with the Double Actors Regularized Critics. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 1239–1247. https://doi.org/10.1145/3583131.3590455
[19]
Aloïs Pourchot and Olivier Sigaud. 2019. CEM-RL: Combining evolutionary and gradient-based methods for policy search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=BkeU5j0ctQ
[20]
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller. 2014. Deterministic Policy Gradient Algorithms. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014(JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 387–395. http://proceedings.mlr.press/v32/silver14.html
[21]
Thai Bao Tran and Ngoc Hoang Luong. 2022. Benchmarking Gradient Estimation Mechanisms in Evolution Strategies for Solving Black-Box Optimization Functions and Reinforcement Learning Problems. In The 11th International Symposium on Information and Communication Technology, SoICT 2022, Hanoi, Vietnam, December 1-3, 2022. ACM, 39–46. https://doi.org/10.1145/3568562.3568579
[22]
Bowen Zheng and Ran Cheng. 2023. Rethinking Population-assisted Off-policy Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 624–632. https://doi.org/10.1145/3583131.3590512

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
December 2023
1058 pages
ISBN:9798400708916
DOI:10.1145/3628797
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. continuous control
  2. evolutionary reinforcement learning
  3. policy search
  4. variation operators

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • The VNUHCM-University of Information Technology?s Scientific Research Support Fund

Conference

SOICT 2023

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 31
    Total Downloads
  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media