Abstract
Given the black-box characteristics of a robotic system using reinforcement learning, validating its behaviour is an extensive process. Especially if the main focus lies on efficient task execution and at the same time predefined requirements or safety standards must be met for real-world applications. Once a particular system is verified to behave according to safety requirements, this may no longer be the case if the underlying conditions are modified.
As research yields more efficient and performant algorithms rather than safer and controllable algorithms, their safe use should be ensured. Our approach enables the use of an algorithm for the execution of the main task while leaving the assessment of the cause-effect relationship in terms of safety to another instance. This way, the presented approach preserves efficiency at the main task level with as little interference as possible. The tasks of safety assessment and task execution are separated to the extent possible, allowing the initially learned safety assessment to be applied to varying tasks and scenarios with minimal effort. The main challenges hereby are the provision of sufficient information for a reliable safety assessment and the precise allocation of the sparse rewards in terms of safety requirements. Finally, we evaluate Task Independent Safety Assessment on three different picking, placing and moving tasks. We show that the main task is guided to safe behaviour by safety instance and even converges against safe behaviour with additional learning phase of the main task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608ā1639 (2010)
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017)
Alshiekh, M., Bloem, R., Ehlers, R., Kƶnighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. arXiv preprint arXiv:1708.08611 (2017)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)
Babcock, J., Kramar, J., Yampolskiy, R.V.: Guidelines for artificial intelligence containment. arXiv preprint arXiv:1707.08476 (2017)
BƩrard, B., Bidoit, M., Finkel, A., Laroussinie, F., Petit, A., Petrucci, L., Schnoebelen, P.: Systems and Software Verification: Model-Checking Techniques and Tools. Springer, Heidelberg (2013)
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv preprint arXiv:1705.08551 (2017)
Brunke, L., et al.: Safe learning in robotics: from learning-based control to safe reinforcement learning. Ann. Rev. Control Robot. Auton. Syst. 5, 411ā444 (2021)
Gao, Y., Lin, J., Yu, F., Levine, S., Darrell, T., et al.: Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313 (2018)
Garcia, J., Fernandez, F.: Safe exploration of state and action spaces in reinforcement learning. CoRR abs/1402.0560 (2014). http://arxiv.org/abs/1402.0560
Garcıa, J., FernĆ”ndez, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437ā1480 (2015)
Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. J. Artif. Intell. Res. 24, 81ā108 (2005)
Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)
Ha, S., Xu, P., Tan, Z., Levine, S., Tan, J.: Learning to walk in the real world with minimal human effort. arXiv preprint arXiv:2002.08550 (2020)
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2555ā2565. PMLR, 09ā15 June 2019. http://proceedings.mlr.press/v97/hafner19a.html
Hans, A., SchneegaĆ, D., SchƤfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: ESANN, pp. 143ā148 (2008)
Juliani, A., et al.: Unity: a general platform for intelligent agents. arXiv preprint arXiv:1809.02627 (2018)
Kaiser, L., et al.: Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374 (2019)
Kurrek, P., Jocas, M., Zoghlami, F., Stoelen, M., Salehi, V.: AI motion control - a generic approach to develop control policies for robotic manipulation tasks. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1, no. 1, pp. 3561ā3570 (2019). https://doi.org/10.1017/dsi.2019.363
Menda, K., Driggs-Campbell, K., Kochenderfer, M.J.: DropoutDAgger: a Bayesian approach to safe imitation learning. arXiv preprint arXiv:1709.06166 (2017)
Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. arXiv preprint arXiv:1205.4810 (2012)
Osborne, M., Shin, H.S., Tsourdos, A.: A review of safe online learning for nonlinear control systems** this work has been jointly funded by the EPSRC and BAE systems under an industrial case studentship. In: 2021 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 794ā803. IEEE (2021). The authors would also like to thank the following researchers for their kind assistance. Sumeet Singh, Ian Manchester and Johan Lƶfberg
Pan, F., et al.: Policy optimization with model-based explorations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4675ā4682 (2019)
Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97ā114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6
Ramakrishnan, R., Kamar, E., Dey, D., Horvitz, E., Shah, J.: Blind spot detection for safe sim-to-real transfer. J. Artif. Intell. Res. 67, 191ā234 (2020)
Rosenstein, M.T., Barto, A.G., Si, J., Barto, A., Powell, W.: Supervised actor-critic reinforcement learning. In: Learning and Approximate Dynamic Programming: Scaling Up to the Real World, pp. 359ā380 (2004)
Saunders, W., Sastry, G., Stuhlmueller, A., Evans, O.: Trial without error: towards safe reinforcement learning via human intervention. arXiv preprint arXiv:1707.05173 (2017)
Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. In: International Conference on Machine Learning, pp. 9870ā9879. PMLR (2021)
Tambon, F., et al.: How to certify machine learning based safety-critical systems? A systematic literature review. arXiv preprint arXiv:2107.12045 (2021)
Thomas, P., Theocharous, G., Ghavamzadeh, M.: High confidence policy improvement. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2380ā2388 (2015)
Zoghlami, F., Kurrek, P., Jocas, M., Masala, G., Salehi, V.: Usage identification of anomaly detection in an industrial context. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1, no. 1, pp. 3761ā3770 (2019). https://doi.org/10.1017/dsi.2019.383
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jocas, M., Zoghlami, F., Kurrek, P., Gianni, M., Salehi, V. (2022). Task Independent Safety Assessment for Reinforcement Learning. In: Pacheco-Gutierrez, S., Cryer, A., Caliskanelli, I., Tugal, H., Skilton, R. (eds) Towards Autonomous Robotic Systems. TAROS 2022. Lecture Notes in Computer Science(), vol 13546. Springer, Cham. https://doi.org/10.1007/978-3-031-15908-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-15908-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15907-7
Online ISBN: 978-3-031-15908-4
eBook Packages: Computer ScienceComputer Science (R0)