skip to main content
10.1145/3520304.3533980acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Safety-informed mutations for evolutionary deep reinforcement learning

Published: 19 July 2022 Publication History

Abstract

Evolutionary Algorithms have been combined with Deep Reinforcement Learning (DRL) to address the limitations of the two approaches while leveraging their benefits. In this paper, we discuss objective-informed mutations to bias the evolutionary population toward exploring the desired objective. We focus on Safe DRL domains to show how these mutations exploit visited unsafe states to search for safer actions. Empirical evidence on a 12 degrees of freedom locomotion benchmark and a practical navigation task, confirm that we improve the safety of the policy while maintaining comparable return with the original DRL algorithm.

References

[1]
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML).
[2]
Eitan Altman. 1999. Constrained Markov Decision Processes. In CRC Press.
[3]
Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, and Animesh Garg. 2021. Conservative Safety Critics for Exploration. In International Conference on Learning Representations (ICLR).
[4]
Pietro Lio' Bodnar, Ben Day. 2020. Proximal Distilled Evolutionary Reinforcement Learning. In AAAI Conference on Artificial Intelligence.
[5]
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. 2018. A Lyapunov-based Approach to Safe Reinforcement Learning. In Conference on Neural Information Processing Systems (NeurIPS).
[6]
Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. 2018. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. In International Conference on Machine Learning (ICML).
[7]
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Conference on Neural Information Processing Systems (NeurIPS).
[8]
Davide Corsi, Enrico Marchesini, and Alessandro Farinelli. 2021. Formal verification of neural networks for safety-critical tasks in deep reinforcement learning. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence (Proceedings of Machine Learning Research, Vol. 161), Cassio de Campos and Marloes H. Maathuis (Eds.). PMLR, 333--343.
[9]
Davide Corsi, Enrico Marchesini, Alessandro Farinelli, and Paolo Fiorini. 2020. Formal Verification for Safe Deep Reinforcement Learning in Trajectory Generation. In 2020 Fourth IEEE International Conference on Robotic Computing (IRC). 352--359.
[10]
Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.
[11]
David Fogel. 2006. Evolutionary computation - toward a new philosophy of machine intelligence (3. ed.).
[12]
S. Gu, E. Holly, T. Lillicrap, and S. Levine. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE International Conference on Robotics and Automation (ICRA).
[13]
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep Reinforcement Learning that Matters. In AAAI Conference on Artificial Intelligence.
[14]
Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, and Chun-Yi Lee. 2018. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning. In Conference on Neural Information Processing Systems (NeurIPS).
[15]
Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, and Kagan Tumer. 2019. Collaborative Evolutionary Reinforcement Learning. In International Conference on Machine Learning (ICML).
[16]
Shauharda Khadka and Kagan Tumer. 2018. Evolutionary Reinforcement Learning. In Conference on Neural Information Processing Systems (NeurIPS).
[17]
Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients. In GECCO.
[18]
Ming Liu Lei Tai, Giuseppe Paolo. 2017. Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[19]
Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, and Mykel J. Kochenderfer. 2021. Algorithms for Verifying Deep Neural Networks. Foundations and Trends® in Optimization 4, 3--4 (2021), 244--404.
[20]
Yongshuai Liu, Jiaxin Ding, and Xin Liu. 2020. IPO: Interior-point Policy Optimization under Constraints. In AAAI.
[21]
Enrico Marchesini, Davide Corsi, and Alessandro Farinelli. 2021. Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning. In International Conference on Learning Representations (ICLR).
[22]
E. Marchesini, D. Corsi, and A. Farinelli. 2022. Exploring Safer Behaviors for Deep Reinforcement Learning. In AAAI Conference on Artificial Intelligence.
[23]
E. Marchesini and A. Farinelli. 2020. Discrete Deep Reinforcement Learning for Mapless Navigation. In IEEE International Conference on Robotics and Automation (ICRA).
[24]
E. Marchesini and A. Farinelli. 2020. Genetic Deep Reinforcement Learning for Mapless Navigation. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
[25]
Enrico Marchesini and Alessandro Farinelli. 2021. Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4583--4588.
[26]
Josè Antonio Martin H. and Javier de Lope. 2009. Learning Autonomous Helicopter Flight with Evolutionary Reinforcement Learning. In Computer Aided Systems Theory.
[27]
Luca Marzari, Davide Corsi, Enrico Marchesini, and Alessandro Farinelli. 2021. Curriculum Learning for Safe Mapless Navigation. arXiv (2021).
[28]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. In Workshop of Conference on Neural Information Processing Systems (NeurIPS).
[29]
D. Montana and L. Davis. 1989. Training Feedforward Neural Networks Using Genetic Algorithms. In International Joint Conference on Artificial Intelligence (IJCAI).
[30]
Olle Nilsson and Antoine Cully. 2021. Policy Gradient Assisted MAP-Elites. In GECCO '21.
[31]
OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, and Lei Zhang. 2019. Solving Rubik's Cube with a Robot Hand. In arXiv.
[32]
Aloïs Pourchot and Olivier Sigaud. 2019. CEM-RL: Combining evolutionary and gradient-based methods for policy search. In International Conference on Learning Representations (ICLR).
[33]
Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking Safe Exploration in Deep Reinforcement Learning. In OpenAI.
[34]
Olivier Sigaud. 2022. Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey. In arXiv.
[35]
David Silver, Aja Huang, Chris Maddison, and et al. 2018. Mastering the game of Go with deep neural networks and tree search. In Nature.
[36]
George Gaylord Simpson. 1953. The Baldwin Effect. In Evolution.
[37]
Adam Stooke, Joshua Achiam, and Pieter Abbeel. 2020. Responsive Safety in Reinforcement Learning by PID Lagrangian Methods. In ICML.
[38]
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2017. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. In CoRR.
[39]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. The MIT Press.
[40]
Brijen Thananjeyan, Ashwin Balakrishna, Ugo Rosolia, Felix Li, Rowan McAllister, Joseph E. Gonzalez, Sergey Levine, Francesco Borrelli, and Ken Goldberg. 2020. Safety Augmented Value Estimation from Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks. In RA-L.

Cited By

View all
  • (2023)Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10161312(7133-7139)Online publication date: 29-May-2023
  • (2022)Strategies for Scaleable Communication and Coordination in Multi-Agent (UAV) SystemsAerospace10.3390/aerospace90904889:9(488)Online publication date: 31-Aug-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2022
2395 pages
ISBN:9781450392686
DOI:10.1145/3520304
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep
  2. evolutionary algorithms
  3. mutations
  4. reinforcement learning
  5. robotics

Qualifiers

  • Research-article

Conference

GECCO '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10161312(7133-7139)Online publication date: 29-May-2023
  • (2022)Strategies for Scaleable Communication and Coordination in Multi-Agent (UAV) SystemsAerospace10.3390/aerospace90904889:9(488)Online publication date: 31-Aug-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media