Task Independent Safety Assessment for Reinforcement Learning

Jocas, Mark; Zoghlami, Firas; Kurrek, Philip; Gianni, Mario; Salehi, Vahid

doi:10.1007/978-3-031-15908-4_16

Mark Jocas¹²,
Firas Zoghlami¹²,
Philip Kurrek¹²,
Mario Gianni¹² &
…
Vahid Salehi¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13546))

Included in the following conference series:

Annual Conference Towards Autonomous Robotic Systems

1048 Accesses

Abstract

Given the black-box characteristics of a robotic system using reinforcement learning, validating its behaviour is an extensive process. Especially if the main focus lies on efficient task execution and at the same time predefined requirements or safety standards must be met for real-world applications. Once a particular system is verified to behave according to safety requirements, this may no longer be the case if the underlying conditions are modified.

As research yields more efficient and performant algorithms rather than safer and controllable algorithms, their safe use should be ensured. Our approach enables the use of an algorithm for the execution of the main task while leaving the assessment of the cause-effect relationship in terms of safety to another instance. This way, the presented approach preserves efficiency at the main task level with as little interference as possible. The tasks of safety assessment and task execution are separated to the extent possible, allowing the initially learned safety assessment to be applied to varying tasks and scenarios with minimal effort. The main challenges hereby are the provision of sufficient information for a reliable safety assessment and the precise allocation of the sparse rewards in terms of safety requirements. Finally, we evaluate Task Independent Safety Assessment on three different picking, placing and moving tasks. We show that the main task is guided to safe behaviour by safety instance and even converges against safe behaviour with additional learning phase of the main task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Testing a Learn-Verify-Repair Approach for Safe Human-Robot Interaction

Multi-robot hierarchical safe reinforcement learning autonomous decision-making strategy based on uniformly ultimate boundedness constraints

Article Open access 18 February 2025

MoveRL: To a Safer Robotic Reinforcement Learning Environment

References

Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)
Article Google Scholar
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017)
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. arXiv preprint arXiv:1708.08611 (2017)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)
Babcock, J., Kramar, J., Yampolskiy, R.V.: Guidelines for artificial intelligence containment. arXiv preprint arXiv:1707.08476 (2017)
Bérard, B., Bidoit, M., Finkel, A., Laroussinie, F., Petit, A., Petrucci, L., Schnoebelen, P.: Systems and Software Verification: Model-Checking Techniques and Tools. Springer, Heidelberg (2013)
MATH Google Scholar
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv preprint arXiv:1705.08551 (2017)
Brunke, L., et al.: Safe learning in robotics: from learning-based control to safe reinforcement learning. Ann. Rev. Control Robot. Auton. Syst. 5, 411–444 (2021)
Article Google Scholar
Gao, Y., Lin, J., Yu, F., Levine, S., Darrell, T., et al.: Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313 (2018)
Garcia, J., Fernandez, F.: Safe exploration of state and action spaces in reinforcement learning. CoRR abs/1402.0560 (2014). http://arxiv.org/abs/1402.0560
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. J. Artif. Intell. Res. 24, 81–108 (2005)
Article Google Scholar
Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)
Ha, S., Xu, P., Tan, Z., Levine, S., Tan, J.: Learning to walk in the real world with minimal human effort. arXiv preprint arXiv:2002.08550 (2020)
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2555–2565. PMLR, 09–15 June 2019. http://proceedings.mlr.press/v97/hafner19a.html
Hans, A., Schneegaß, D., Schäfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: ESANN, pp. 143–148 (2008)
Google Scholar
Juliani, A., et al.: Unity: a general platform for intelligent agents. arXiv preprint arXiv:1809.02627 (2018)
Kaiser, L., et al.: Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374 (2019)
Kurrek, P., Jocas, M., Zoghlami, F., Stoelen, M., Salehi, V.: AI motion control - a generic approach to develop control policies for robotic manipulation tasks. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1, no. 1, pp. 3561–3570 (2019). https://doi.org/10.1017/dsi.2019.363
Menda, K., Driggs-Campbell, K., Kochenderfer, M.J.: DropoutDAgger: a Bayesian approach to safe imitation learning. arXiv preprint arXiv:1709.06166 (2017)
Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. arXiv preprint arXiv:1205.4810 (2012)
Osborne, M., Shin, H.S., Tsourdos, A.: A review of safe online learning for nonlinear control systems** this work has been jointly funded by the EPSRC and BAE systems under an industrial case studentship. In: 2021 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 794–803. IEEE (2021). The authors would also like to thank the following researchers for their kind assistance. Sumeet Singh, Ian Manchester and Johan Löfberg
Google Scholar
Pan, F., et al.: Policy optimization with model-based explorations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4675–4682 (2019)
Google Scholar
Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97–114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6
Chapter Google Scholar
Ramakrishnan, R., Kamar, E., Dey, D., Horvitz, E., Shah, J.: Blind spot detection for safe sim-to-real transfer. J. Artif. Intell. Res. 67, 191–234 (2020)
Article MathSciNet Google Scholar
Rosenstein, M.T., Barto, A.G., Si, J., Barto, A., Powell, W.: Supervised actor-critic reinforcement learning. In: Learning and Approximate Dynamic Programming: Scaling Up to the Real World, pp. 359–380 (2004)
Google Scholar
Saunders, W., Sastry, G., Stuhlmueller, A., Evans, O.: Trial without error: towards safe reinforcement learning via human intervention. arXiv preprint arXiv:1707.05173 (2017)
Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. In: International Conference on Machine Learning, pp. 9870–9879. PMLR (2021)
Google Scholar
Tambon, F., et al.: How to certify machine learning based safety-critical systems? A systematic literature review. arXiv preprint arXiv:2107.12045 (2021)
Thomas, P., Theocharous, G., Ghavamzadeh, M.: High confidence policy improvement. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2380–2388 (2015)
Google Scholar
Zoghlami, F., Kurrek, P., Jocas, M., Masala, G., Salehi, V.: Usage identification of anomaly detection in an industrial context. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1, no. 1, pp. 3761–3770 (2019). https://doi.org/10.1017/dsi.2019.383

Download references

Author information

Authors and Affiliations

Faculty of Science and Engineering, University of Plymouth, Plymouth, UK
Mark Jocas, Firas Zoghlami, Philip Kurrek & Mario Gianni
Department of Mechatronics, University of Applied Sciences Munich, Munich, Germany
Vahid Salehi

Authors

Mark Jocas
View author publications
You can also search for this author in PubMed Google Scholar
Firas Zoghlami
View author publications
You can also search for this author in PubMed Google Scholar
Philip Kurrek
View author publications
You can also search for this author in PubMed Google Scholar
Mario Gianni
View author publications
You can also search for this author in PubMed Google Scholar
Vahid Salehi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Jocas .

Editor information

Editors and Affiliations

UKAEA's RACE, Abingdon, UK
Salvador Pacheco-Gutierrez
UKAEA's RACE, Abingdon, UK
Alice Cryer
UKAEA's RACE, Abingdon, UK
Ipek Caliskanelli
UKAEA's RACE, Abingdon, UK
Harun Tugal
UKAEA's RACE, Abingdon, UK
Robert Skilton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jocas, M., Zoghlami, F., Kurrek, P., Gianni, M., Salehi, V. (2022). Task Independent Safety Assessment for Reinforcement Learning. In: Pacheco-Gutierrez, S., Cryer, A., Caliskanelli, I., Tugal, H., Skilton, R. (eds) Towards Autonomous Robotic Systems. TAROS 2022. Lecture Notes in Computer Science(), vol 13546. Springer, Cham. https://doi.org/10.1007/978-3-031-15908-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-15908-4_16
Published: 01 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15907-7
Online ISBN: 978-3-031-15908-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Task Independent Safety Assessment for Reinforcement Learning