On Normative Reinforcement Learning via Safe Reinforcement Learning

Neufeld, Emery A.; Bartocci, Ezio; Ciabattoni, Agata

doi:10.1007/978-3-031-21203-1_5

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13753))

Included in the following conference series:

International Conference on Principles and Practice of Multi-Agent Systems

765 Accesses
2 Citations

Abstract

Reinforcement learning (RL) has proven a successful technique for teaching autonomous agents goal-directed behaviour. As RL agents further integrate with our society, they must learn to comply with ethical, social, or legal norms. Defeasible deontic logics are natural formal frameworks to specify and reason about such norms in a transparent way. However, their effective and efficient integration in RL agents remains an open problem. On the other hand, linear temporal logic (LTL) has been successfully employed to synthesize RL policies satisfying, e.g., safety requirements. In this paper, we investigate the extent to which the established machinery for safe reinforcement learning can be leveraged for directing normative behaviour for RL agents. We analyze some of the difficulties that arise from attempting to represent norms with LTL, provide an algorithm for synthesizing LTL specifications from certain normative systems, and analyze its power and limits with a case study.

This work was supported by the DC-RES run by the TU Wien’s Faculty of Informatics and the FH-Technikum Wien and by the project WWTF MA16–028.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Any defeasible deontic logic equipped with a theorem prover could in theory be used.
2.
An implementation of Algorithms 1 and 2 can be found here: https://github.com/lexeree/normative-player-characters.
3.
The LTL specifications have not been implemented as shields, since the shielding tool TEMPEST [25] is still under development. We instead manually chose optimal paths from among those paths obeying the compliance specifications.
4.
Note that Spec. (6) is semantically equivalent to \(G(\lnot at\_danger)\).
5.
Note that Spec. (7) is semantically equivalent to \(G(at\_danger\rightarrow (\lnot empty \wedge X(empty))\).

References

Alechina, N., Dastani, M., Logan, B.: Norm specification and verification in multi-agent systems. J. Appl. Logics 5(2), 457 (2018)
MathSciNet MATH Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedigs of AAAI, pp. 2669–2678 (2018)
Google Scholar
Boella, G., van der Torre, L.: Permissions and obligations in hierarchical normative systems. In: Proceedings of ICAIL, pp. 81–82 (2003)
Google Scholar
Boella, G., van der Torre, L.: Regulative and constitutive norms in normative multiagent systems. In: Proceedings of KR 2004, pp. 255–266. AAAI Press (2004)
Google Scholar
De Giacomo, G., De Masellis, R., Grasso, M., Maggi, F.M., Montali, M.: Monitoring business metaconstraints based on LTL and LDL for finite traces. In: Sadiq, S., Soffer, P., Völzer, H. (eds.) Business Process Management, pp. 1–17 (2014)
Google Scholar
De Giacomo, G., Iocchi, L., Favorito, M., Patrizi, F.: Foundations for restraining bolts: reinforcement learning with LTLf/LDLf restraining specifications. In: Proceedings of ICAPS, vol. 29, pp. 128–136 (2019)
Google Scholar
Esparza, J., Křetínský, J.: From LTL to deterministic automata: a safraless compositional approach. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 192–208. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_13
Chapter Google Scholar
Forrester, J.W.: Gentle murder, or the adverbial samaritan. J. Philos. 81(4), 193–197 (1984)
Article MathSciNet Google Scholar
Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Proceedings of RSS (2014)
Google Scholar
Governatori, G.: Thou shalt is not you will. In: Proceedings of ICAIL, pp. 63–68 (2015)
Google Scholar
Governatori, G.: Practical normative reasoning with defeasible deontic logic. In: d’Amato, C., Theobald, M. (eds.) Reasoning Web 2018. LNCS, vol. 11078, pp. 1–25. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00338-8_1
Chapter Google Scholar
Governatori, G., Hashmi, M.: No time for compliance. In: Proceedings of EDOC, pp. 9–18. IEEE (2015)
Google Scholar
Governatori, G., Hulstijn, J., Riveret, R., Rotolo, A.: Characterising deadlines in temporal modal defeasible logic. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 486–496. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_50
Chapter Google Scholar
Governatori, G., Olivieri, F., Rotolo, A., Scannapieco, S.: Computing strong and weak permissions in defeasible logic. J. Philos. Logic 42(6), 799–829 (2013)
Article MathSciNet MATH Google Scholar
Governatori, G., Rotolo, A.: BIO logical agents: norms, beliefs, intentions in defeasible logic. J. Auton. Agents Multi Agent Syst. 17(1), 36–69 (2008)
Article Google Scholar
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: Proceedings of AAMAS, pp. 483–491 (2020)
Google Scholar
Hodkinson, I., Reynolds, M.: Temporal logic. In: Blackburn, P., Van Benthem, J., Wolter, F. (eds.) Handbook of Modal Logic, vol. 3, pp. 655–720. Elsevier (2007)
Google Scholar
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe Reinforcement Learning Using Probabilistic Shields. In: Proceedings of CONCUR. LIPIcs, vol. 171, pp. 1–16 (2020)
Google Scholar
Lam, H.P., Governatori, G.: The making of SPINdle. In: Proc. of RuleML. LNCS, vol. 5858, pp. 315–322 (2009)
Google Scholar
Neufeld, E., Bartocci, E., Ciabattoni, A., Governatori, G.: A normative supervisor for reinforcement learning agents. In: Proceedings of CADE, pp. 565–576 (2021)
Google Scholar
Neufeld, E.A., Bartocci, E., Ciabattoni, A., Governatori, G.: Enforcing ethical goals over reinforcement-learning policies. J. Ethics Inf. Technol. 24, 43 (2022). https://doi.org/10.1007/s10676-022-09665-8
Noothigattu, R., et al.: Teaching AI agents ethical values using reinforcement learning and policy orchestration. In: Proceedings of IJCAI, LNCS, vol. 12158, pp. 217–234 (2019)
Google Scholar
Panagiotidi, S., Alvarez-Napagao, S., Vázquez-Salceda, J.: Towards the norm-aware agent: bridging the gap between deontic specifications and practical mechanisms for norm monitoring and norm-aware planning. In: Balke, T., Dignum, F., van Riemsdijk, M.B., Chopra, A.K. (eds.) COIN 2013. LNCS (LNAI), vol. 8386, pp. 346–363. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07314-9_19
Chapter Google Scholar
Pnueli, A.: The temporal logic of programs. In: Proceedings of FOCS, pp. 46–57 (1977)
Google Scholar
Pranger, S., Könighofer, B., Posch, L., Bloem, R.: TEMPEST - synthesis tool for reactive systems and shields in probabilistic environments. In: Hou, Z., Ganesh, V. (eds.) ATVA 2021. LNCS, vol. 12971, pp. 222–228. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88885-5_15
Chapter Google Scholar
Rodriguez-Soto, M., Lopez-Sanchez, M., Rodriguez Aguilar, J.A.: Multi-objective reinforcement learning for designing ethical environments. In: Proceedings of IJCAI, pp. 545–551 (2021)
Google Scholar
Sadigh, D., Kim, E.S., Coogan, S., Sastry, S.S., Seshia, S.A.: A learning based approach to control synthesis of markov decision processes for linear temporal logic specifications. In: Proceedings of CDC, pp. 1091–1096 (2014)
Google Scholar
Searle, J.R.: Speech acts: an essay in the philosophy of language. Cambridge University Press, Cambridge, England (1969)
Google Scholar
Sickert, S., Esparza, J., Jaax, S., Křetínskỳ, J.: Limit-deterministic büchi automata for linear temporal logic. In: Proceedings of CAV, LNCS, vol. 9780, pp. 312–332 (2016)
Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK (1989). https://www.cs.rhul.ac.uk/~chrisw/thesis.pdf
Wen, M., Ehlers, R., Topcu, U.: Correct-by-synthesis reinforcement learning with temporal logic constraints. In: Procedings of IROS, pp. 4983–4990. IEEE (2015)
Google Scholar
Wu, Y.H., Lin, S.D.: A low-cost ethics shaping approach for designing reinforcement learning agents. In: Proceedings of AAAI, pp. 1687–1694 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

TU Wien, Vienna, Austria
Emery A. Neufeld, Ezio Bartocci & Agata Ciabattoni

Authors

Emery A. Neufeld
View author publications
You can also search for this author in PubMed Google Scholar
Ezio Bartocci
View author publications
You can also search for this author in PubMed Google Scholar
Agata Ciabattoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emery A. Neufeld .

Editor information

Editors and Affiliations

Özyeğin University, Istanbul, Turkey
Reyhan Aydoğan
Universitat Politècnica de València, Valencia, Spain
Natalia Criado
Université Paris-Dauphine, Paris, France
Jérôme Lang
Universitat Politècnica de València, Valencia, Spain
Victor Sanchez-Anguix
King's College London, London, UK
Marc Serramia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neufeld, E.A., Bartocci, E., Ciabattoni, A. (2023). On Normative Reinforcement Learning via Safe Reinforcement Learning. In: Aydoğan, R., Criado, N., Lang, J., Sanchez-Anguix, V., Serramia, M. (eds) PRIMA 2022: Principles and Practice of Multi-Agent Systems. PRIMA 2022. Lecture Notes in Computer Science(), vol 13753. Springer, Cham. https://doi.org/10.1007/978-3-031-21203-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-21203-1_5
Published: 12 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21202-4
Online ISBN: 978-3-031-21203-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Normative Reinforcement Learning via Safe Reinforcement Learning