research-article

Safety-informed mutations for evolutionary deep reinforcement learning

Authors:

Enrico Marchesini,

Christopher AmatoAuthors Info & Claims

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1966 - 1970

https://doi.org/10.1145/3520304.3533980

Published: 19 July 2022 Publication History

Abstract

Evolutionary Algorithms have been combined with Deep Reinforcement Learning (DRL) to address the limitations of the two approaches while leveraging their benefits. In this paper, we discuss objective-informed mutations to bias the evolutionary population toward exploring the desired objective. We focus on Safe DRL domains to show how these mutations exploit visited unsafe states to search for safer actions. Empirical evidence on a 12 degrees of freedom locomotion benchmark and a practical navigation task, confirm that we improve the safety of the policy while maintaining comparable return with the original DRL algorithm.

References

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML).

[2]

Eitan Altman. 1999. Constrained Markov Decision Processes. In CRC Press.

[3]

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, and Animesh Garg. 2021. Conservative Safety Critics for Exploration. In International Conference on Learning Representations (ICLR).

[4]

Pietro Lio' Bodnar, Ben Day. 2020. Proximal Distilled Evolutionary Reinforcement Learning. In AAAI Conference on Artificial Intelligence.

[5]

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. 2018. A Lyapunov-based Approach to Safe Reinforcement Learning. In Conference on Neural Information Processing Systems (NeurIPS).

[6]

Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. 2018. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. In International Conference on Machine Learning (ICML).

[7]

Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Conference on Neural Information Processing Systems (NeurIPS).

[8]

Davide Corsi, Enrico Marchesini, and Alessandro Farinelli. 2021. Formal verification of neural networks for safety-critical tasks in deep reinforcement learning. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence (Proceedings of Machine Learning Research, Vol. 161), Cassio de Campos and Marloes H. Maathuis (Eds.). PMLR, 333--343.

[9]

Davide Corsi, Enrico Marchesini, Alessandro Farinelli, and Paolo Fiorini. 2020. Formal Verification for Safe Deep Reinforcement Learning in Trajectory Generation. In 2020 Fourth IEEE International Conference on Robotic Computing (IRC). 352--359.

[10]

Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.

[11]

David Fogel. 2006. Evolutionary computation - toward a new philosophy of machine intelligence (3. ed.).

[12]

S. Gu, E. Holly, T. Lillicrap, and S. Levine. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE International Conference on Robotics and Automation (ICRA).

[13]

Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep Reinforcement Learning that Matters. In AAAI Conference on Artificial Intelligence.

[14]

Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, and Chun-Yi Lee. 2018. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning. In Conference on Neural Information Processing Systems (NeurIPS).

[15]

Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, and Kagan Tumer. 2019. Collaborative Evolutionary Reinforcement Learning. In International Conference on Machine Learning (ICML).

[16]

Shauharda Khadka and Kagan Tumer. 2018. Evolutionary Reinforcement Learning. In Conference on Neural Information Processing Systems (NeurIPS).

[17]

Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients. In GECCO.

[18]

Ming Liu Lei Tai, Giuseppe Paolo. 2017. Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]

Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, and Mykel J. Kochenderfer. 2021. Algorithms for Verifying Deep Neural Networks. Foundations and Trends® in Optimization 4, 3--4 (2021), 244--404.

Digital Library

[20]

Yongshuai Liu, Jiaxin Ding, and Xin Liu. 2020. IPO: Interior-point Policy Optimization under Constraints. In AAAI.

[21]

Enrico Marchesini, Davide Corsi, and Alessandro Farinelli. 2021. Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning. In International Conference on Learning Representations (ICLR).

[22]

E. Marchesini, D. Corsi, and A. Farinelli. 2022. Exploring Safer Behaviors for Deep Reinforcement Learning. In AAAI Conference on Artificial Intelligence.

[23]

E. Marchesini and A. Farinelli. 2020. Discrete Deep Reinforcement Learning for Mapless Navigation. In IEEE International Conference on Robotics and Automation (ICRA).

[24]

E. Marchesini and A. Farinelli. 2020. Genetic Deep Reinforcement Learning for Mapless Navigation. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

[25]

Enrico Marchesini and Alessandro Farinelli. 2021. Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4583--4588.

[26]

Josè Antonio Martin H. and Javier de Lope. 2009. Learning Autonomous Helicopter Flight with Evolutionary Reinforcement Learning. In Computer Aided Systems Theory.

[27]

Luca Marzari, Davide Corsi, Enrico Marchesini, and Alessandro Farinelli. 2021. Curriculum Learning for Safe Mapless Navigation. arXiv (2021).

[28]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. In Workshop of Conference on Neural Information Processing Systems (NeurIPS).

[29]

D. Montana and L. Davis. 1989. Training Feedforward Neural Networks Using Genetic Algorithms. In International Joint Conference on Artificial Intelligence (IJCAI).

[30]

Olle Nilsson and Antoine Cully. 2021. Policy Gradient Assisted MAP-Elites. In GECCO '21.

[31]

OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, and Lei Zhang. 2019. Solving Rubik's Cube with a Robot Hand. In arXiv.

[32]

Aloïs Pourchot and Olivier Sigaud. 2019. CEM-RL: Combining evolutionary and gradient-based methods for policy search. In International Conference on Learning Representations (ICLR).

[33]

Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking Safe Exploration in Deep Reinforcement Learning. In OpenAI.

[34]

Olivier Sigaud. 2022. Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey. In arXiv.

[35]

David Silver, Aja Huang, Chris Maddison, and et al. 2018. Mastering the game of Go with deep neural networks and tree search. In Nature.

[36]

George Gaylord Simpson. 1953. The Baldwin Effect. In Evolution.

[37]

Adam Stooke, Joshua Achiam, and Pieter Abbeel. 2020. Responsive Safety in Reinforcement Learning by PID Lagrangian Methods. In ICML.

[38]

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2017. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. In CoRR.

[39]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. The MIT Press.

Digital Library

[40]

Brijen Thananjeyan, Ashwin Balakrishna, Ugo Rosolia, Felix Li, Rowan McAllister, Joseph E. Gonzalez, Sergey Levine, Francesco Borrelli, and Ken Goldberg. 2020. Safety Augmented Value Estimation from Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks. In RA-L.

Cited By

Marzari LMarchesini EFarinelli A(2023)Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10161312(7133-7139)Online publication date: 29-May-2023
https://doi.org/10.1109/ICRA48891.2023.10161312
Ponniah JDantsker O(2022)Strategies for Scaleable Communication and Coordination in Multi-Agent (UAV) SystemsAerospace10.3390/aerospace90904889:9(488)Online publication date: 31-Aug-2022
https://doi.org/10.3390/aerospace9090488

Index Terms

Safety-informed mutations for evolutionary deep reinforcement learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Machine learning approaches
      1. Bio-inspired approaches
        Evolutionary robotics

Recommendations

Off-Policy Evolutionary Reinforcement Learning with Maximum Mutations
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Advances in Reinforcement Learning (RL) have demonstrated data efficiency and optimal control over large state spaces at the cost of scalable performance. Genetic methods, on the other hand, provide scalability but depict hyperparameter sensitivity ...
Adaptive evolutionary programming based on reinforcement learning

This paper studies evolutionary programming and adopts reinforcement learning theory to learn individual mutation operators. A novel algorithm named RLEP (Evolutionary Programming based on Reinforcement Learning) is proposed. In this algorithm, each ...
Use of the q-Gaussian mutation in evolutionary algorithms
Special issue on advances in computational intelligence and bioinformatics

This paper proposes the use of the q-Gaussian mutation with self-adaptation of the shape of the mutation distribution in evolutionary algorithms. The shape of the q-Gaussian mutation distribution is controlled by a real parameter q. In the proposed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference Companion

July 2022

2395 pages

ISBN:9781450392686

DOI:10.1145/3520304

Editor:
Jonathan E. Fieldsend
University of Exeter
,
General Chair:
Markus Wagner
The University of Adelaide

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '22

Sponsor:

SIGEVO

GECCO '22: Genetic and Evolutionary Computation Conference

July 9 - 13, 2022

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
173
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marzari LMarchesini EFarinelli A(2023)Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10161312(7133-7139)Online publication date: 29-May-2023
https://doi.org/10.1109/ICRA48891.2023.10161312
Ponniah JDantsker O(2022)Strategies for Scaleable Communication and Coordination in Multi-Agent (UAV) SystemsAerospace10.3390/aerospace90904889:9(488)Online publication date: 31-Aug-2022
https://doi.org/10.3390/aerospace9090488

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents