Safe Controller Synthesis for Nonlinear Systems Using Bayesian Optimization Enhanced Reinforcement Learning

Authors:
Chaomin Jin

School of Information Science and Technology, Zhejiang Sci-Tech University, China

School of Information Science and Technology, Zhejiang Sci-Tech University, China

0009-0002-9596-623X
View Profile

,
Xiaoxuan Ma

School of Information Science and Technology, Zhejiang Sci-Tech University, China

School of Information Science and Technology, Zhejiang Sci-Tech University, China

0009-0005-5215-8058
View Profile

,
Tianxiang Ren

School of Information Science and Technology, Zhejiang Sci-Tech University, China

School of Information Science and Technology, Zhejiang Sci-Tech University, China

0009-0001-7800-3477
View Profile

,
Wang Lin

School of Information Science and Technology, Zhejiang Sci-Tech University, China

School of Information Science and Technology, Zhejiang Sci-Tech University, China

0000-0002-7206-6178
View Profile

,
Zuohua Ding

School of Information Science and Technology, Zhejiang Sci-Tech University, China

School of Information Science and Technology, Zhejiang Sci-Tech University, China

0000-0002-9671-7836
View Profile

HSCC '24: Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and ControlMay 2024Article No.: 13Pages 1–10https://doi.org/10.1145/3641513.3650137

Published:14 May 2024Publication History

HSCC '24: Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control

Pages 1–10

ABSTRACT

Formal synthesis of safe controllers is essential for safety-critical cyber-physical systems. In this paper, we propose a novel counterexample guided approach for synthesizing safe controllers of nonlinear systems using Bayesian optimization enhanced reinforcement learning, to improve the efficiency of the training process while ensuring safety property. First, we utilize the control barrier function technique to establish a constrained Markov decision process, which enables us to learn an initial controller with minimal safety violations. We then design a counterexample guided policy refinement using Bayesian optimization, to fine-tune the initial controller based on the failure trajectories. Finally, we suggest a compensatory mechanism to correct the tuned controller to guarantee the safety property. We implement the CEGRLPR tool and evaluate its performance over a set of benchmarks. The experimental results demonstrate the effectiveness and efficiency of our approach.

References

A. Edwards M. Giacobbe A. Abate, D. Ahmed and A. Peruffo. 2021. Fossil: A software tool for the formal synthesis of lyapunov functions and barrier certificates using neural networks. In the 24th International Conference on Hybrid Systems: Computation and Control. ACM, 1–11.Google Scholar
Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
Eitan Altman. 2021. Constrained Markov decision processes. Routledge.Google Scholar
OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39, 1 (2020), 3–20.Google ScholarDigital Library
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).Google Scholar
Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru C Serban, Bernd Becker, and Ufuk Topcu. 2019. Counterexample-guided strategy improvement for pomdps using recurrent neural networks. arXiv preprint arXiv:1903.08428 (2019).Google Scholar
Yi Chen, Jing Dong, and Zhaoran Wang. 2021. A primal-dual approach to constrained markov decision processes. arXiv preprint arXiv:2101.10895 (2021).Google Scholar
Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3387–3395.Google ScholarDigital Library
Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2017. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research 18, 1 (2017), 6070–6120.Google ScholarDigital Library
Max H Cohen and Calin Belta. 2020. Approximate optimal control for safety-critical systems with control barrier functions. In 2020 59th IEEE conference on decision and control (CDC). IEEE, 2062–2067.Google ScholarDigital Library
Hongkai Dai, Benoit Landry, Marco Pavone, and Russ Tedrake. 2020. Counter-example guided synthesis of neural network lyapunov functions for piecewise linear systems. In 2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 1274–1281.Google ScholarDigital Library
Jerry Ding and Claire J Tomlin. 2010. Robust reach-avoid controller synthesis for switched nonlinear systems. In 49th IEEE conference on decision and control (CDC). IEEE, 6481–6486.Google ScholarCross Ref
Adel Djaballah, Alexandre Chapoutot, Michel Kieffer, and Olivier Bouissou. 2017. Construction of parametric barrier functions for dynamical systems using interval analysis. Autom. 78 (2017), 287–296.Google ScholarCross Ref
Priya L Donti, Melrose Roderick, Mahyar Fazlyab, and J Zico Kolter. 2020. Enforcing robust control guarantees within neural network policies. arXiv preprint arXiv:2011.08105 (2020).Google Scholar
Briti Gangopadhyay and Pallab Dasgupta. 2021. Counterexample guided RL policy refinement using bayesian optimization. Advances in Neural Information Processing Systems 34 (2021), 22783–22794.Google Scholar
Briti Gangopadhyay, Somi Vishnoi, and Pallab Dasgupta. 2022. Refinement Of Reinforcement Learning Algorithms Guided By Counterexamples. In 2022 IEEE Women in Technology Conference (WINTECHCON). IEEE, 1–6.Google Scholar
Tairan He, Weiye Zhao, and Changliu Liu. 2023. Autocost: Evolving intrinsic cost for zero-violation reinforcement learning. arXiv preprint arXiv:2301.10339 (2023).Google Scholar
Hejun Huang, Zhenglong Li, and Dongkun Han. 2022. Barrier certified safety learning control: When sum-of-square programming meets reinforcement learning. In 2022 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1190–1195.Google ScholarCross Ref
B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick Pérez. 2021. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems 23, 6 (2021), 4909–4926.Google ScholarCross Ref
Amir Salimi Lafmejani, Spring Berman, and Georgios Fainekos. 2022. NMPC-LBF: Nonlinear MPC with learned barrier function for decentralized safe navigation of multiple robots in unknown environments. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10297–10303.Google ScholarCross Ref
H. Ravanbakhsh and S. Sankaranarayanan. 2015. Counter-Example Guided Synthesis of Control Lyapunov Functions for Switched Systems. In 54th IEEE conference on decision and control (CDC). IEEE, 4232–4239.Google Scholar
Hadi Ravanbakhsh and Sriram Sankaranarayanan. 2019. Learning control lyapunov functions from counterexamples and demonstrations. Autonomous Robots 43 (2019), 275–307.Google ScholarDigital Library
Eric J Rossetter and J Christian Gerdes. 2006. Lyapunov based performance guarantees for the potential field lane-keeping assistance system. (2006).Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104, 1 (2015), 148–175.Google ScholarCross Ref
Eric Squires, Pietro Pierpaoli, and Magnus Egerstedt. 2018. Constructive barrier certificates with applications to fixed-wing aircraft collision avoidance. In 2018 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1656–1661.Google ScholarCross Ref
Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074 (2018).Google Scholar
Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, and Qi Zhu. 2023. Joint differentiable optimization and verification for certified reinforcement learning. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023). 132–141.Google ScholarDigital Library
Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, and Qi Zhu. 2023. Joint differentiable optimization and verification for certified reinforcement learning. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023). 132–141.Google ScholarDigital Library
Yujie Yang, Yuxuan Jiang, Yichen Liu, Jianyu Chen, and Shengbo Eben Li. 2023. Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters 8, 3 (2023), 1295–1302.Google ScholarCross Ref
Zhengfeng Yang, Li Zhang, Xia Zeng, Xiaochao Tang, Chao Peng, and Zhenbing Zeng. 2023. Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints. In International Conference on Computer Aided Verification. Springer, 304–325.Google ScholarDigital Library
Yiming Zhang, Quan Vuong, and Keith Ross. 2020. First order constrained optimization in policy space. Advances in Neural Information Processing Systems 33 (2020), 15338–15349.Google Scholar
Hengjun Zhao, Xia Zeng, Taolue Chen, and Zhiming Liu. 2020. Synthesizing barrier certificates using neural networks. In Proceedings of the 23rd international conference on hybrid systems: computation and control. 1–11.Google ScholarDigital Library
Qingye Zhao, Yi Zhang, and Xuandong Li. 2022. Safe reinforcement learning for dynamical systems using barrier certificates. Connection Science 34, 1 (2022), 2822–2844.Google ScholarCross Ref
Tianqiao Zhao, Jianhui Wang, and Meng Yue. 2023. A Barrier-Certificated Reinforcement Learning Approach for Enhancing Power System Transient Stability. IEEE Transactions on Power Systems (2023).Google ScholarCross Ref
Weichao Zhou and Wenchao Li. 2018. Safety-aware apprenticeship learning. In In Proceedings of the 30th International Conference on Computer Aided Verificatio. Springer, 662–680.Google ScholarCross Ref

Recommendations

Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints
Computer Aided Verification
Abstract
There is a pressing need for learning controllers to endow systems with properties of safety and goal-reaching, which are crucial for many safety-critical systems. Reinforcement learning (RL) has been deployed successfully to synthesize ...
Read More
Nonlinear Controller Synthesis and Automatic Workspace Partitioning for Reactive High-Level Behaviors
HSCC '16: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control

Motivated by the provably-correct execution of complex reactive tasks for robots with nonlinear, under-actuated dynamics, our focus is on the synthesis of a library of low-level controllers that implements the behaviors of a high-level controller. The ...
Read More
An Iterative Scheme of Safe Reinforcement Learning for Nonlinear Systems via Barrier Certificate Generation
Computer Aided Verification
Abstract
In this paper, we propose a safe reinforcement learning approach to synthesize deep neural network (DNN) controllers for nonlinear systems subject to safety constraints. The proposed approach employs an iterative scheme where a learner and a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HSCC '24: Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control
May 2024
307 pages
ISBN:9798400705229
DOI:10.1145/3641513
Editors:
Erika Ábrahám,
Manuel Mazo
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian optimization
Formal verification
control barrier function
controller synthesis
reinforcement learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate153of373submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 20
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Safe Controller Synthesis for Nonlinear Systems Using Bayesian Optimization Enhanced Reinforcement Learning

HSCC '24: Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control

ABSTRACT

References

Cited By

Recommendations

Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints

Nonlinear Controller Synthesis and Automatic Workspace Partitioning for Reactive High-Level Behaviors

An Iterative Scheme of Safe Reinforcement Learning for Nonlinear Systems via Barrier Certificate Generation