short-paper

DEMO: Self-Preserving Genetic Algorithms vs. Safe Reinforcement Learning in Discrete Action Spaces

Authors:
Preston K. Robinette

Vanderbilt University, Nashville, Tennessee, USA

Vanderbilt University, Nashville, Tennessee, USA

https://orcid.org/0000-0002-4906-2179
View Profile

,
Nathaniel P. Hamilton

Parallax Advanced Research, Beavercreek, Ohio, USA

Parallax Advanced Research, Beavercreek, Ohio, USA

https://orcid.org/0000-0002-7147-1964
View Profile

,
Taylor T. Johnson

Vanderbilt University, Nashville, Tennessee, United States of America

Vanderbilt University, Nashville, Tennessee, United States of America

https://orcid.org/0000-0001-8021-9923
View Profile

ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)May 2023Pages 278–279https://doi.org/10.1145/3576841.3589635

Published:09 May 2023Publication History

ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)

Pages 278–279

ABSTRACT

Safe learning techniques are learning frameworks that take safety into consideration during the training process. Safe reinforcement learning (SRL) combines reinforcement learning (RL) with safety mechanisms such as action masking and run time assurance to protect an agent during the exploration of its environment. This protection, though, can severely hinder an agent's ability to learn optimal policies as the safety systems exacerbate an already difficult exploration challenge for RL agents. An alternative to RL is an optimization approach known as genetic algorithms (GA), which utilize operators that mimic biological evolution to evolve better policies. By combining safety mechanisms with genetic algorithms, this work demonstrates a novel approach to safe learning called Self-Preserving Genetic Algorithms.

To highlight the training benefits of SPGA compared to SRL in discrete action spaces, this demonstration trains and deploys an SPGA agent with action masking (SPGA-AM) and an SRL agent with action masking (SRL-AM) in real-time in the CartPole-v0 environment with a safety boundary condition b = 0.75. After training, each of the learned policies are tested in a CartPole-v0 environment with an extended max timesteps value (T = 200 → T = 1000). After the demo, users will have a better understanding of SPGA and SRL training, as well as the benefits of using SPGA to train in discrete action spaces.

References

Javier Garcia and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16, 1 (2015), 1437--1480.Google ScholarDigital Library
Enrico Marchesini, Davide Corsi, and Alessandro Farinelli. 2022. Exploring safer behaviors for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 7701--7709.Google ScholarCross Ref
Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708 7, 1 (2019), 2.Google Scholar
Preston K. Robinette, Nathaniel P. Hamilton, and Taylor T. Johnson. 2023. Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces. In 2023 ACM/IEEE 14th International Conference on Cyber-Physical Systems (ICCPS). IEEE.Google Scholar
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarDigital Library
Michel Tokic. 2010. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21--24, 2010. Proceedings 33. Springer, 203--210.Google ScholarCross Ref

Recommendations

Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces
ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)

Self-Preserving Genetic Algorithms (SPGA) combine the evolutionary strategy of a genetic algorithm with safety assurance methods commonly implemented in safe reinforcement learning (SRL), a branch of reinforcement learning (RL) that accounts for ...
Read More
Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Abstract
Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...
Read More
Imperative Action Masking for Safe Exploration in Reinforcement Learning
Explainable and Transparent AI and Multi-Agent Systems
Abstract
Reinforcement Learning (RL) needs sufficient exploration to learn an optimal policy. However, exploratory actions could lead the learning agent to safety hazards, not necessarily in the next state but in the future. Therefore, it is essential to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)
May 2023
291 pages
ISBN:9798400700361
DOI:10.1145/3576841
General Chairs:
Sayan Mitra
University of Illinois at Urbana Champaign
,
Nalini Venkatasubramanian
University of California at Irvine
,
Program Chairs:
Abhishek Dubey
Vanderbilt University
,
Lu Feng
University of Virginia
,
Publications Chairs:
Mahsa Ghasemi
Purdue University
,
Jonathan Sprinkle
Vanderbilt University
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate25of91submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 36
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DEMO: Self-Preserving Genetic Algorithms vs. Safe Reinforcement Learning in Discrete Action Spaces

ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)

ABSTRACT

References

Cited By

Recommendations

Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Imperative Action Masking for Safe Exploration in Reinforcement Learning