Skip to main content

Task Independent Safety Assessment for Reinforcement Learning

  • Conference paper
  • First Online:
Towards Autonomous Robotic Systems (TAROS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13546))

Included in the following conference series:

  • 1048 Accesses

Abstract

Given the black-box characteristics of a robotic system using reinforcement learning, validating its behaviour is an extensive process. Especially if the main focus lies on efficient task execution and at the same time predefined requirements or safety standards must be met for real-world applications. Once a particular system is verified to behave according to safety requirements, this may no longer be the case if the underlying conditions are modified.

As research yields more efficient and performant algorithms rather than safer and controllable algorithms, their safe use should be ensured. Our approach enables the use of an algorithm for the execution of the main task while leaving the assessment of the cause-effect relationship in terms of safety to another instance. This way, the presented approach preserves efficiency at the main task level with as little interference as possible. The tasks of safety assessment and task execution are separated to the extent possible, allowing the initially learned safety assessment to be applied to varying tasks and scenarios with minimal effort. The main challenges hereby are the provision of sufficient information for a reliable safety assessment and the precise allocation of the sparse rewards in terms of safety requirements. Finally, we evaluate Task Independent Safety Assessment on three different picking, placing and moving tasks. We show that the main task is guided to safe behaviour by safety instance and even converges against safe behaviour with additional learning phase of the main task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)

    Article  Google Scholar 

  2. Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017)

  3. Alshiekh, M., Bloem, R., Ehlers, R., Kƶnighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. arXiv preprint arXiv:1708.08611 (2017)

  4. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)

  5. Babcock, J., Kramar, J., Yampolskiy, R.V.: Guidelines for artificial intelligence containment. arXiv preprint arXiv:1707.08476 (2017)

  6. BƩrard, B., Bidoit, M., Finkel, A., Laroussinie, F., Petit, A., Petrucci, L., Schnoebelen, P.: Systems and Software Verification: Model-Checking Techniques and Tools. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  7. Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv preprint arXiv:1705.08551 (2017)

  8. Brunke, L., et al.: Safe learning in robotics: from learning-based control to safe reinforcement learning. Ann. Rev. Control Robot. Auton. Syst. 5, 411–444 (2021)

    Article  Google Scholar 

  9. Gao, Y., Lin, J., Yu, F., Levine, S., Darrell, T., et al.: Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313 (2018)

  10. Garcia, J., Fernandez, F.: Safe exploration of state and action spaces in reinforcement learning. CoRR abs/1402.0560 (2014). http://arxiv.org/abs/1402.0560

  11. Garcıa, J., FernĆ”ndez, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. J. Artif. Intell. Res. 24, 81–108 (2005)

    Article  Google Scholar 

  13. Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)

  14. Ha, S., Xu, P., Tan, Z., Levine, S., Tan, J.: Learning to walk in the real world with minimal human effort. arXiv preprint arXiv:2002.08550 (2020)

  15. Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2555–2565. PMLR, 09–15 June 2019. http://proceedings.mlr.press/v97/hafner19a.html

  16. Hans, A., Schneegaß, D., SchƤfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: ESANN, pp. 143–148 (2008)

    Google Scholar 

  17. Juliani, A., et al.: Unity: a general platform for intelligent agents. arXiv preprint arXiv:1809.02627 (2018)

  18. Kaiser, L., et al.: Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374 (2019)

  19. Kurrek, P., Jocas, M., Zoghlami, F., Stoelen, M., Salehi, V.: AI motion control - a generic approach to develop control policies for robotic manipulation tasks. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1, no. 1, pp. 3561–3570 (2019). https://doi.org/10.1017/dsi.2019.363

  20. Menda, K., Driggs-Campbell, K., Kochenderfer, M.J.: DropoutDAgger: a Bayesian approach to safe imitation learning. arXiv preprint arXiv:1709.06166 (2017)

  21. Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. arXiv preprint arXiv:1205.4810 (2012)

  22. Osborne, M., Shin, H.S., Tsourdos, A.: A review of safe online learning for nonlinear control systems** this work has been jointly funded by the EPSRC and BAE systems under an industrial case studentship. In: 2021 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 794–803. IEEE (2021). The authors would also like to thank the following researchers for their kind assistance. Sumeet Singh, Ian Manchester and Johan Lƶfberg

    Google Scholar 

  23. Pan, F., et al.: Policy optimization with model-based explorations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4675–4682 (2019)

    Google Scholar 

  24. Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97–114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6

    Chapter  Google Scholar 

  25. Ramakrishnan, R., Kamar, E., Dey, D., Horvitz, E., Shah, J.: Blind spot detection for safe sim-to-real transfer. J. Artif. Intell. Res. 67, 191–234 (2020)

    Article  MathSciNet  Google Scholar 

  26. Rosenstein, M.T., Barto, A.G., Si, J., Barto, A., Powell, W.: Supervised actor-critic reinforcement learning. In: Learning and Approximate Dynamic Programming: Scaling Up to the Real World, pp. 359–380 (2004)

    Google Scholar 

  27. Saunders, W., Sastry, G., Stuhlmueller, A., Evans, O.: Trial without error: towards safe reinforcement learning via human intervention. arXiv preprint arXiv:1707.05173 (2017)

  28. Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. In: International Conference on Machine Learning, pp. 9870–9879. PMLR (2021)

    Google Scholar 

  29. Tambon, F., et al.: How to certify machine learning based safety-critical systems? A systematic literature review. arXiv preprint arXiv:2107.12045 (2021)

  30. Thomas, P., Theocharous, G., Ghavamzadeh, M.: High confidence policy improvement. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2380–2388 (2015)

    Google Scholar 

  31. Zoghlami, F., Kurrek, P., Jocas, M., Masala, G., Salehi, V.: Usage identification of anomaly detection in an industrial context. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1, no. 1, pp. 3761–3770 (2019). https://doi.org/10.1017/dsi.2019.383

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Jocas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jocas, M., Zoghlami, F., Kurrek, P., Gianni, M., Salehi, V. (2022). Task Independent Safety Assessment for Reinforcement Learning. In: Pacheco-Gutierrez, S., Cryer, A., Caliskanelli, I., Tugal, H., Skilton, R. (eds) Towards Autonomous Robotic Systems. TAROS 2022. Lecture Notes in Computer Science(), vol 13546. Springer, Cham. https://doi.org/10.1007/978-3-031-15908-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15908-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15907-7

  • Online ISBN: 978-3-031-15908-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics