Abstract
Reinforcement learning (RL) has proven a successful technique for teaching autonomous agents goal-directed behaviour. As RL agents further integrate with our society, they must learn to comply with ethical, social, or legal norms. Defeasible deontic logics are natural formal frameworks to specify and reason about such norms in a transparent way. However, their effective and efficient integration in RL agents remains an open problem. On the other hand, linear temporal logic (LTL) has been successfully employed to synthesize RL policies satisfying, e.g., safety requirements. In this paper, we investigate the extent to which the established machinery for safe reinforcement learning can be leveraged for directing normative behaviour for RL agents. We analyze some of the difficulties that arise from attempting to represent norms with LTL, provide an algorithm for synthesizing LTL specifications from certain normative systems, and analyze its power and limits with a case study.
This work was supported by the DC-RES run by the TU Wien’s Faculty of Informatics and the FH-Technikum Wien and by the project WWTF MA16–028.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Any defeasible deontic logic equipped with a theorem prover could in theory be used.
- 2.
An implementation of Algorithms 1 and 2 can be found here: https://github.com/lexeree/normative-player-characters.
- 3.
The LTL specifications have not been implemented as shields, since the shielding tool TEMPEST [25] is still under development. We instead manually chose optimal paths from among those paths obeying the compliance specifications.
- 4.
Note that Spec. (6) is semantically equivalent to \(G(\lnot at\_danger)\).
- 5.
Note that Spec. (7) is semantically equivalent to \(G(at\_danger\rightarrow (\lnot empty \wedge X(empty))\).
References
Alechina, N., Dastani, M., Logan, B.: Norm specification and verification in multi-agent systems. J. Appl. Logics 5(2), 457 (2018)
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedigs of AAAI, pp. 2669–2678 (2018)
Boella, G., van der Torre, L.: Permissions and obligations in hierarchical normative systems. In: Proceedings of ICAIL, pp. 81–82 (2003)
Boella, G., van der Torre, L.: Regulative and constitutive norms in normative multiagent systems. In: Proceedings of KR 2004, pp. 255–266. AAAI Press (2004)
De Giacomo, G., De Masellis, R., Grasso, M., Maggi, F.M., Montali, M.: Monitoring business metaconstraints based on LTL and LDL for finite traces. In: Sadiq, S., Soffer, P., Völzer, H. (eds.) Business Process Management, pp. 1–17 (2014)
De Giacomo, G., Iocchi, L., Favorito, M., Patrizi, F.: Foundations for restraining bolts: reinforcement learning with LTLf/LDLf restraining specifications. In: Proceedings of ICAPS, vol. 29, pp. 128–136 (2019)
Esparza, J., Křetínský, J.: From LTL to deterministic automata: a safraless compositional approach. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 192–208. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_13
Forrester, J.W.: Gentle murder, or the adverbial samaritan. J. Philos. 81(4), 193–197 (1984)
Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Proceedings of RSS (2014)
Governatori, G.: Thou shalt is not you will. In: Proceedings of ICAIL, pp. 63–68 (2015)
Governatori, G.: Practical normative reasoning with defeasible deontic logic. In: d’Amato, C., Theobald, M. (eds.) Reasoning Web 2018. LNCS, vol. 11078, pp. 1–25. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00338-8_1
Governatori, G., Hashmi, M.: No time for compliance. In: Proceedings of EDOC, pp. 9–18. IEEE (2015)
Governatori, G., Hulstijn, J., Riveret, R., Rotolo, A.: Characterising deadlines in temporal modal defeasible logic. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 486–496. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_50
Governatori, G., Olivieri, F., Rotolo, A., Scannapieco, S.: Computing strong and weak permissions in defeasible logic. J. Philos. Logic 42(6), 799–829 (2013)
Governatori, G., Rotolo, A.: BIO logical agents: norms, beliefs, intentions in defeasible logic. J. Auton. Agents Multi Agent Syst. 17(1), 36–69 (2008)
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: Proceedings of AAMAS, pp. 483–491 (2020)
Hodkinson, I., Reynolds, M.: Temporal logic. In: Blackburn, P., Van Benthem, J., Wolter, F. (eds.) Handbook of Modal Logic, vol. 3, pp. 655–720. Elsevier (2007)
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe Reinforcement Learning Using Probabilistic Shields. In: Proceedings of CONCUR. LIPIcs, vol. 171, pp. 1–16 (2020)
Lam, H.P., Governatori, G.: The making of SPINdle. In: Proc. of RuleML. LNCS, vol. 5858, pp. 315–322 (2009)
Neufeld, E., Bartocci, E., Ciabattoni, A., Governatori, G.: A normative supervisor for reinforcement learning agents. In: Proceedings of CADE, pp. 565–576 (2021)
Neufeld, E.A., Bartocci, E., Ciabattoni, A., Governatori, G.: Enforcing ethical goals over reinforcement-learning policies. J. Ethics Inf. Technol. 24, 43 (2022). https://doi.org/10.1007/s10676-022-09665-8
Noothigattu, R., et al.: Teaching AI agents ethical values using reinforcement learning and policy orchestration. In: Proceedings of IJCAI, LNCS, vol. 12158, pp. 217–234 (2019)
Panagiotidi, S., Alvarez-Napagao, S., Vázquez-Salceda, J.: Towards the norm-aware agent: bridging the gap between deontic specifications and practical mechanisms for norm monitoring and norm-aware planning. In: Balke, T., Dignum, F., van Riemsdijk, M.B., Chopra, A.K. (eds.) COIN 2013. LNCS (LNAI), vol. 8386, pp. 346–363. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07314-9_19
Pnueli, A.: The temporal logic of programs. In: Proceedings of FOCS, pp. 46–57 (1977)
Pranger, S., Könighofer, B., Posch, L., Bloem, R.: TEMPEST - synthesis tool for reactive systems and shields in probabilistic environments. In: Hou, Z., Ganesh, V. (eds.) ATVA 2021. LNCS, vol. 12971, pp. 222–228. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88885-5_15
Rodriguez-Soto, M., Lopez-Sanchez, M., Rodriguez Aguilar, J.A.: Multi-objective reinforcement learning for designing ethical environments. In: Proceedings of IJCAI, pp. 545–551 (2021)
Sadigh, D., Kim, E.S., Coogan, S., Sastry, S.S., Seshia, S.A.: A learning based approach to control synthesis of markov decision processes for linear temporal logic specifications. In: Proceedings of CDC, pp. 1091–1096 (2014)
Searle, J.R.: Speech acts: an essay in the philosophy of language. Cambridge University Press, Cambridge, England (1969)
Sickert, S., Esparza, J., Jaax, S., Křetínskỳ, J.: Limit-deterministic büchi automata for linear temporal logic. In: Proceedings of CAV, LNCS, vol. 9780, pp. 312–332 (2016)
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK (1989). https://www.cs.rhul.ac.uk/~chrisw/thesis.pdf
Wen, M., Ehlers, R., Topcu, U.: Correct-by-synthesis reinforcement learning with temporal logic constraints. In: Procedings of IROS, pp. 4983–4990. IEEE (2015)
Wu, Y.H., Lin, S.D.: A low-cost ethics shaping approach for designing reinforcement learning agents. In: Proceedings of AAAI, pp. 1687–1694 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Neufeld, E.A., Bartocci, E., Ciabattoni, A. (2023). On Normative Reinforcement Learning via Safe Reinforcement Learning. In: Aydoğan, R., Criado, N., Lang, J., Sanchez-Anguix, V., Serramia, M. (eds) PRIMA 2022: Principles and Practice of Multi-Agent Systems. PRIMA 2022. Lecture Notes in Computer Science(), vol 13753. Springer, Cham. https://doi.org/10.1007/978-3-031-21203-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-21203-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21202-4
Online ISBN: 978-3-031-21203-1
eBook Packages: Computer ScienceComputer Science (R0)