Skip to main content

Formal Methods Assisted Training of Safe Reinforcement Learning Agents

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11460))

Abstract

Reinforcement learning (RL) is emerging as a powerful machine learning paradigm to develop autonomous safety critical systems; RL enables the systems to learn optimal control strategies by interacting with the environment. However, there is also widespread apprehension to deploying such systems in the real world since rigorously ensuring if they had learned safe strategies by interacting with an environment that is representative of the real world remains a challenge. Hence, there is a surge of interest to establish safety-focused RL techniques.

In this paper, we present a safety-assured training approach that augments standard RL with formal analysis and simulation technology. The benefits of coupling these techniques is three-fold: the formal analysis tools (SMT solvers) guide the system to learn strategies that rigorously uphold specified safety properties; the sophisticated simulators provide a wide-range of quantifiable, realistic learning environments; the adequacy of the safety properties can be assessed as agent explores complex environments. We illustrate this approach using a Flappy Bird game.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Flappy bird safe RL git. https://github.com/sinamf/SafeRL. Accessed 3 Aug 2019

  2. Using keras and deep q-network to play flappy bird. https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html. Accessed 3 Aug 2019

  3. X-plane. https://www.x-plane.com. Accessed 3 Aug 2019

  4. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  5. Barrett, C., Tinelli, C.: Satisfiability modulo theories. In: Clarke, E., Henzinger, T., Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp. 305–343. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_11

    Chapter  Google Scholar 

  6. Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics (2016). arXiv preprint: arXiv:1602.04450

  7. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24

    Chapter  Google Scholar 

  8. Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  9. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  10. Gario, M., Micheli, A.: PySMT: a solver-agnostic library for fast prototyping of SMT-based algorithms. In: Proceedings of the 13th International Workshop on Satisfiability Modulo Theories (SMT), pp. 373–384 (2015)

    Google Scholar 

  11. Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning (2018). arXiv preprint: arXiv:1810.00950

  12. Jansen, N., Könighofer, B., Junges, S., Bloem, R.: Shielded decision-making in MDPS (2018). arXiv preprint: arXiv:1807.06096

  13. Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.-P.: Safety-constrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_8

    Chapter  Google Scholar 

  14. Jin Kim, H., Jordan, M.I., Sastry, S., Ng, A.Y.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 799–806 (2004)

    Google Scholar 

  15. Mason, G.R., Calinescu, R.C., Kudenko, D., Banks, A.: Assured reinforcement learning for safety-critical applications. In: Doctoral Consortium at the 10th International Conference on Agents and Artificial Intelligence. SciTePress (2017)

    Google Scholar 

  16. Moldovan, T.M., Abbeel, P., Jordan, M., Borrelli, F.: Safety, Risk Awareness and Exploration in Reinforcement Learning. Ph.D. thesis, University of California, Berkeley, USA (2014)

    Google Scholar 

  17. Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M.: Safe exploration for active learning with gaussian processes. In: Bifet, A., May, M., Zadrozny, B., Gavalda, R., Pedreschi, D., Bonchi, F., Cardoso, J., Spiliopoulou, M. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 133–149. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_9

    Chapter  Google Scholar 

  18. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anitha Murugesan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Murugesan, A., Moghadamfalahi, M., Chattopadhyay, A. (2019). Formal Methods Assisted Training of Safe Reinforcement Learning Agents. In: Badger, J., Rozier, K. (eds) NASA Formal Methods. NFM 2019. Lecture Notes in Computer Science(), vol 11460. Springer, Cham. https://doi.org/10.1007/978-3-030-20652-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20652-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20651-2

  • Online ISBN: 978-3-030-20652-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics