Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

Movin, Maria; Junior, Guilherme Dinis; Hollmén, Jaakko; Papapetrou, Panagiotis

doi:10.1007/978-3-031-30047-9_25

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13876))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

841 Accesses

Abstract

Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent’s behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our code is available at https://github.com/dsv-data-science/rl-counterfactual-policy-explanations.git.

References

Afsar, M.M., Crump, T., Far, B.: Reinforcement learning based recommender systems: a survey. ACM Comput. Surv. 55(7), 1–38 (2021)
Article Google Scholar
Amitai, Y., Amir, O.: Summarizing policy disagreements for agent comparison. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (2022)
Google Scholar
Frost, J., Watkins, O., Weiner, E., et al.: Explaining reinforcement learning policies through counterfactual trajectories. arXiv preprint arXiv:2201.12462v1 (2021)
Heuillet, A., Couthouis, F., Díaz-Rodríguez, N.: Explainability in deep reinforcement learning. Knowl.-Based Syst. 214, 106685 (2021)
Article Google Scholar
Huang, S.H., Bhatia, K., Abbeel, P., Dragan, A.D.: Establishing appropriate trust via critical states. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3929–3936. IEEE (2018)
Google Scholar
Karimi, A.H., Barthe, G., Schölkopf, B., Valera, I.: A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput. Surv. 55(5), 1–29 (2022)
Article Google Scholar
Laugel, T., Lesot, M.J., Marsala, C., Detyniecki, M.: Issues with post-hoc counterfactual explanations: a discussion. arXiv preprint arXiv:1906.04774 (2019)
Liu, S., See, K.C., Ngiam, K.Y., Celi, L.A., Sun, X., Feng, M.: Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. 22(7), e18477 (2020)
Article Google Scholar
Olson, M.L., Khanna, R., Neal, L., Li, F., Wong, W.K.: Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artif. Intell. 295, 103455 (2021)
Article MathSciNet MATH Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, 2nd edn. Adaptive Computation and Machine Learning Series, The MIT Press (2018)
Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. J. L. Tech. 31, 841 (2017)
Google Scholar

Download references

Acknowledgements

Special thanks to docent Jussi Karlgren working at Spotify, who provided us with valuable early feedback on the project and thorough feedback on the final paper.

Author information

Authors and Affiliations

Spotify, Stockholm, Sweden
Maria Movin & Guilherme Dinis Junior
Stockholm University, Stockholm, Sweden
Maria Movin, Guilherme Dinis Junior, Jaakko Hollmén & Panagiotis Papapetrou

Authors

Maria Movin
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Dinis Junior
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Hollmén
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Papapetrou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Movin .

Editor information

Editors and Affiliations

Université de Caen Normandie, Caen, France
Bruno Crémilleux
Eindhoven University of Technology, Eindhoven, The Netherlands
Sibylle Hess
UCLouvain, Louvain-la-Neuve, Belgium
Siegfried Nijssen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Movin, M., Junior, G.D., Hollmén, J., Papapetrou, P. (2023). Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies. In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham. https://doi.org/10.1007/978-3-031-30047-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-30047-9_25
Published: 01 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30046-2
Online ISBN: 978-3-031-30047-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics