skip to main content
10.1145/3641513.3650137acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article
Free Access

Safe Controller Synthesis for Nonlinear Systems Using Bayesian Optimization Enhanced Reinforcement Learning

Authors Info & Claims
Published:14 May 2024Publication History

ABSTRACT

Formal synthesis of safe controllers is essential for safety-critical cyber-physical systems. In this paper, we propose a novel counterexample guided approach for synthesizing safe controllers of nonlinear systems using Bayesian optimization enhanced reinforcement learning, to improve the efficiency of the training process while ensuring safety property. First, we utilize the control barrier function technique to establish a constrained Markov decision process, which enables us to learn an initial controller with minimal safety violations. We then design a counterexample guided policy refinement using Bayesian optimization, to fine-tune the initial controller based on the failure trajectories. Finally, we suggest a compensatory mechanism to correct the tuned controller to guarantee the safety property. We implement the CEGRLPR tool and evaluate its performance over a set of benchmarks. The experimental results demonstrate the effectiveness and efficiency of our approach.

References

  1. A. Edwards M. Giacobbe A. Abate, D. Ahmed and A. Peruffo. 2021. Fossil: A software tool for the formal synthesis of lyapunov functions and barrier certificates using neural networks. In the 24th International Conference on Hybrid Systems: Computation and Control. ACM, 1–11.Google ScholarGoogle Scholar
  2. Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  3. Eitan Altman. 2021. Constrained Markov decision processes. Routledge.Google ScholarGoogle Scholar
  4. OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39, 1 (2020), 3–20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).Google ScholarGoogle Scholar
  6. Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru C Serban, Bernd Becker, and Ufuk Topcu. 2019. Counterexample-guided strategy improvement for pomdps using recurrent neural networks. arXiv preprint arXiv:1903.08428 (2019).Google ScholarGoogle Scholar
  7. Yi Chen, Jing Dong, and Zhaoran Wang. 2021. A primal-dual approach to constrained markov decision processes. arXiv preprint arXiv:2101.10895 (2021).Google ScholarGoogle Scholar
  8. Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3387–3395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2017. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research 18, 1 (2017), 6070–6120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Max H Cohen and Calin Belta. 2020. Approximate optimal control for safety-critical systems with control barrier functions. In 2020 59th IEEE conference on decision and control (CDC). IEEE, 2062–2067.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hongkai Dai, Benoit Landry, Marco Pavone, and Russ Tedrake. 2020. Counter-example guided synthesis of neural network lyapunov functions for piecewise linear systems. In 2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 1274–1281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jerry Ding and Claire J Tomlin. 2010. Robust reach-avoid controller synthesis for switched nonlinear systems. In 49th IEEE conference on decision and control (CDC). IEEE, 6481–6486.Google ScholarGoogle ScholarCross RefCross Ref
  13. Adel Djaballah, Alexandre Chapoutot, Michel Kieffer, and Olivier Bouissou. 2017. Construction of parametric barrier functions for dynamical systems using interval analysis. Autom. 78 (2017), 287–296.Google ScholarGoogle ScholarCross RefCross Ref
  14. Priya L Donti, Melrose Roderick, Mahyar Fazlyab, and J Zico Kolter. 2020. Enforcing robust control guarantees within neural network policies. arXiv preprint arXiv:2011.08105 (2020).Google ScholarGoogle Scholar
  15. Briti Gangopadhyay and Pallab Dasgupta. 2021. Counterexample guided RL policy refinement using bayesian optimization. Advances in Neural Information Processing Systems 34 (2021), 22783–22794.Google ScholarGoogle Scholar
  16. Briti Gangopadhyay, Somi Vishnoi, and Pallab Dasgupta. 2022. Refinement Of Reinforcement Learning Algorithms Guided By Counterexamples. In 2022 IEEE Women in Technology Conference (WINTECHCON). IEEE, 1–6.Google ScholarGoogle Scholar
  17. Tairan He, Weiye Zhao, and Changliu Liu. 2023. Autocost: Evolving intrinsic cost for zero-violation reinforcement learning. arXiv preprint arXiv:2301.10339 (2023).Google ScholarGoogle Scholar
  18. Hejun Huang, Zhenglong Li, and Dongkun Han. 2022. Barrier certified safety learning control: When sum-of-square programming meets reinforcement learning. In 2022 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1190–1195.Google ScholarGoogle ScholarCross RefCross Ref
  19. B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick Pérez. 2021. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems 23, 6 (2021), 4909–4926.Google ScholarGoogle ScholarCross RefCross Ref
  20. Amir Salimi Lafmejani, Spring Berman, and Georgios Fainekos. 2022. NMPC-LBF: Nonlinear MPC with learned barrier function for decentralized safe navigation of multiple robots in unknown environments. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10297–10303.Google ScholarGoogle ScholarCross RefCross Ref
  21. H. Ravanbakhsh and S. Sankaranarayanan. 2015. Counter-Example Guided Synthesis of Control Lyapunov Functions for Switched Systems. In 54th IEEE conference on decision and control (CDC). IEEE, 4232–4239.Google ScholarGoogle Scholar
  22. Hadi Ravanbakhsh and Sriram Sankaranarayanan. 2019. Learning control lyapunov functions from counterexamples and demonstrations. Autonomous Robots 43 (2019), 275–307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Eric J Rossetter and J Christian Gerdes. 2006. Lyapunov based performance guarantees for the potential field lane-keeping assistance system. (2006).Google ScholarGoogle Scholar
  24. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google ScholarGoogle Scholar
  25. Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104, 1 (2015), 148–175.Google ScholarGoogle ScholarCross RefCross Ref
  26. Eric Squires, Pietro Pierpaoli, and Magnus Egerstedt. 2018. Constructive barrier certificates with applications to fixed-wing aircraft collision avoidance. In 2018 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1656–1661.Google ScholarGoogle ScholarCross RefCross Ref
  27. Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074 (2018).Google ScholarGoogle Scholar
  28. Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, and Qi Zhu. 2023. Joint differentiable optimization and verification for certified reinforcement learning. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023). 132–141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, and Qi Zhu. 2023. Joint differentiable optimization and verification for certified reinforcement learning. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023). 132–141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yujie Yang, Yuxuan Jiang, Yichen Liu, Jianyu Chen, and Shengbo Eben Li. 2023. Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters 8, 3 (2023), 1295–1302.Google ScholarGoogle ScholarCross RefCross Ref
  31. Zhengfeng Yang, Li Zhang, Xia Zeng, Xiaochao Tang, Chao Peng, and Zhenbing Zeng. 2023. Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints. In International Conference on Computer Aided Verification. Springer, 304–325.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yiming Zhang, Quan Vuong, and Keith Ross. 2020. First order constrained optimization in policy space. Advances in Neural Information Processing Systems 33 (2020), 15338–15349.Google ScholarGoogle Scholar
  33. Hengjun Zhao, Xia Zeng, Taolue Chen, and Zhiming Liu. 2020. Synthesizing barrier certificates using neural networks. In Proceedings of the 23rd international conference on hybrid systems: computation and control. 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Qingye Zhao, Yi Zhang, and Xuandong Li. 2022. Safe reinforcement learning for dynamical systems using barrier certificates. Connection Science 34, 1 (2022), 2822–2844.Google ScholarGoogle ScholarCross RefCross Ref
  35. Tianqiao Zhao, Jianhui Wang, and Meng Yue. 2023. A Barrier-Certificated Reinforcement Learning Approach for Enhancing Power System Transient Stability. IEEE Transactions on Power Systems (2023).Google ScholarGoogle ScholarCross RefCross Ref
  36. Weichao Zhou and Wenchao Li. 2018. Safety-aware apprenticeship learning. In In Proceedings of the 30th International Conference on Computer Aided Verificatio. Springer, 662–680.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    HSCC '24: Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control
    May 2024
    307 pages
    ISBN:9798400705229
    DOI:10.1145/3641513

    Copyright © 2024 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2024

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate153of373submissions,41%
  • Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)12

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format