ABSTRACT
Safe learning techniques are learning frameworks that take safety into consideration during the training process. Safe reinforcement learning (SRL) combines reinforcement learning (RL) with safety mechanisms such as action masking and run time assurance to protect an agent during the exploration of its environment. This protection, though, can severely hinder an agent's ability to learn optimal policies as the safety systems exacerbate an already difficult exploration challenge for RL agents. An alternative to RL is an optimization approach known as genetic algorithms (GA), which utilize operators that mimic biological evolution to evolve better policies. By combining safety mechanisms with genetic algorithms, this work demonstrates a novel approach to safe learning called Self-Preserving Genetic Algorithms.
To highlight the training benefits of SPGA compared to SRL in discrete action spaces, this demonstration trains and deploys an SPGA agent with action masking (SPGA-AM) and an SRL agent with action masking (SRL-AM) in real-time in the CartPole-v0 environment with a safety boundary condition b = 0.75. After training, each of the learned policies are tested in a CartPole-v0 environment with an extended max timesteps value (T = 200 → T = 1000). After the demo, users will have a better understanding of SPGA and SRL training, as well as the benefits of using SPGA to train in discrete action spaces.
- Javier Garcia and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16, 1 (2015), 1437--1480.Google ScholarDigital Library
- Enrico Marchesini, Davide Corsi, and Alessandro Farinelli. 2022. Exploring safer behaviors for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 7701--7709.Google ScholarCross Ref
- Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708 7, 1 (2019), 2.Google Scholar
- Preston K. Robinette, Nathaniel P. Hamilton, and Taylor T. Johnson. 2023. Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces. In 2023 ACM/IEEE 14th International Conference on Cyber-Physical Systems (ICCPS). IEEE.Google Scholar
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarDigital Library
- Michel Tokic. 2010. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21--24, 2010. Proceedings 33. Springer, 203--210.Google ScholarCross Ref
Recommendations
Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces
ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)Self-Preserving Genetic Algorithms (SPGA) combine the evolutionary strategy of a genetic algorithm with safety assurance methods commonly implemented in safe reinforcement learning (SRL), a branch of reinforcement learning (RL) that accounts for ...
Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
AbstractReinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...
Imperative Action Masking for Safe Exploration in Reinforcement Learning
Explainable and Transparent AI and Multi-Agent SystemsAbstractReinforcement Learning (RL) needs sufficient exploration to learn an optimal policy. However, exploratory actions could lead the learning agent to safety hazards, not necessarily in the next state but in the future. Therefore, it is essential to ...
Comments