Abstract
Causal attribution aided by counterfactual reasoning is recognised as a key feature of human explanation. In this paper we propose a post-hoc contrastive explanation framework for reinforcement learning (RL) based on comparing learned policies under actual environmental rewards vs. hypothetical (counterfactual) rewards. The framework provides policy-level explanations by accessing learned Q-functions and identifying intersecting critical states. Global explanations are generated to summarise policy behaviour through the visualisation of sub-trajectories based on these states, while local explanations are based on the action-values in states. We conduct experiments on several grid-world examples. Our results show that it is possible to explain the difference between learned policies based on Q-functions. This demonstrates the potential for more informed human decision-making when deploying policies and highlights the possibility of developing further XAI techniques in RL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
With 90% probability the agent moves one cell in the direction specified by the action (i.e. the action succeeds), or with 5% probability each the agent moves one cell either clockwise or anti-clockwise relative to the direction specified by the action (i.e. the action fails). This grid-world was implemented by Minigrid [7].
References
Amir, D., Amir, O.: Highlights: summarizing agent behavior to people. In: AAMAS 2018, pp. 1168–1176 (2018)
Anderson, A., et al.: Explaining reinforcement learning to mere mortals: an empirical study. In: IJCAI 2019, pp. 1328–1334 (2019)
Annasamy, R., Sycara, K.: Towards better interpretability in deep q-networks. In: AAAI 2019, vol. 33, pp. 4561–4569 (2019)
Bellman, R.E.: Dynamic Programming. Princeton University Press (2010)
Chakraborti, T., Kulkarni, A., Sreedharan, S., Smith, D.E., Kambhampati, S.: Explicability? legibility? predictability? transparency? privacy? security? the emerging landscape of interpretable agent behavior. In: ICAPS 2019, vol. 29, pp. 86–96 (2019)
Chakraborti, T., Sreedharan, S., Kambhampati, S.: The emerging landscape of explainable automated planning & decision making. In: IJCAI 2020, pp. 4803–4811 (2020). Survey track
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for gymnasium (2018). https://github.com/Farama-Foundation/Minigrid
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: NeurIPS 2017, vol. 30 (2017)
Cruz, F., Dazeley, R., Vamplew, P.: Memory-based explainable reinforcement learning. In: Liu, J., Bailey, J. (eds.) AI 2019. LNCS (LNAI), vol. 11919, pp. 66–77. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35288-2_6
Gottesman, O., et al.: Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions. In: ICML 2020, vol. 119, pp. 3658–3667 (2020)
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding atari agents. In: ICML 2018, pp. 2877–2886 (2018)
Gunning, D.: Darpa’s explainable artificial intelligence (XAI) program. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, p. ii (2019)
Gupta, P., et al.: Explain your move: understanding agent actions using specific and relevant feature attribution. In: ICLR 2020 (2020)
Hayes, B., Shah, J.A.: Improving robot controller transparency through autonomous policy explanation. In: 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 303–312 (2017)
Hoffmann, J., Magazzeni, D.: Explainable AI planning (XAIP): overview and the case of contrastive explanation (extended abstract). In: Krötzsch, M., Stepanova, D. (eds.) Reasoning Web. Explainable Artificial Intelligence. LNCS, vol. 11810, pp. 277–282. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31423-1_9
Huang, S.H., Bhatia, K., Abbeel, P., Dragan, A.D.: Establishing appropriate trust via critical states. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3929–3936. IEEE (2018)
Huang, S.H., Held, D., Abbeel, P., Dragan, A.D.: Enabling robots to communicate their objectives. Auton. Robot. 43, 309–326 (2017)
Huber, T., Weitz, K., André, E., Amir, O.: Local and global explanations of agent behavior: integrating strategy summaries with saliency maps. Artif. Intell. 301, 103571 (2021)
Hüyük, A., Jarrett, D., Tekin, C., van der Schaar, M.: Explaining by imitating: understanding decisions by interpretable policy learning. In: ICLR 2021 (2021)
Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., Amodei, D.: Reward learning from human preferences and demonstrations in atari. In: NeurIPS 2018, vol. 31 (2018)
Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., Doshi-Velez, F.: Explainable reinforcement learning via reward decomposition. arxiv (2019)
Karino, I., Ohmura, Y., Kuniyoshi, Y.: Identifying critical states by the action-based variance of expected return. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 366–378. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_29
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)
Lage, I., Lifschitz, D., Doshi-Velez, F., Amir, O.: Exploring computational user models for agent policy summarization. In: IJCAI 2019, pp. 1401–1407 (2019)
Lin, Y.C., Hong, Z.W., Liao, Y.H., Shih, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. In: IJCAI 2017, pp. 3756–3762 (2017)
Lipton, P., Knowles, D.: Contrastive Explanations, p. 247–266. Royal Institute of Philosophy Supplements, Cambridge University Press (1991)
Liu, R., Bai, F., Du, Y., Yang, Y.: Meta-reward-net: implicitly differentiable reward learning for preference-based reinforcement learning. In: NeurIPS 2022, vol. 35, pp. 22270–22284 (2022)
Lu, W., Magg, S., Zhao, X., Gromniak, M., Wermter, S.: A closer look at reward decomposition for high-level robotic explanations. arXiv abs/2304.12958 (2023)
Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens. In: AAAI 2020, pp. 2493–2500 (2020)
Marcus, G., Davis, E.: Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon Books, USA (2019)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Miller, T.: Contrastive explanation: a structural-model approach. Knowl. Eng. Rev. 36, e14 (2021)
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. arXiv abs/1706.07979 (2017)
Mott, A., Zoran, D., Chrzanowski, M., Wierstra, D., Rezende, D.J.: Towards interpretable reinforcement learning using attention augmented agents. In: NeurIPS 2019, pp. 12360–12369 (2019)
Narayanan, S., Lage, I., Doshi-Velez, F.: (when) are contrastive explanations of reinforcement learning helpful? arXiv abs/2211.07719 (2022)
Olson, M.L., Khanna, R., Neal, L., Li, F., Wong, W.K.: Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artif. Intell. 295, 103455 (2021)
Puiutta, E., Veith, E.M.S.P.: Explainable reinforcement learning: a survey. arXiv abs/2005.06247 (2020)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. In: Wiley Series in Probability and Statistics (1994)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 4th edn. Pearson (2020)
Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2019)
Sequeira, P., Gervasio, M.: Interestingness elements for explainable reinforcement learning: understanding agents’ capabilities and limitations. Artif. Intell. 288, 103367 (2020)
Sequeira, P., Hostetler, J., Gervasio, M.T.: Global and local analysis of interestingness for competency-aware deep reinforcement learning. arXiv abs/2211.06376 (2022)
Shu, T., Xiong, C., Socher, R.: Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. In: ICLR 2018 (2018)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)
Sreedharan, S., Srivastava, S., Kambhampati, S.: TLDR: policy summarization for factored SSP problems using temporal abstractions. In: ICAPS 2020, vol. 30, pp. 272–280 (2020)
Sreedharan, S., Srivastava, S., Kambhampati, S.: Using state abstractions to compute personalized contrastive explanations for AI agent behavior. Artif. Intell. 301, 103570 (2021)
Topin, N., Veloso, M.: Generation of policy-level explanations for reinforcement learning. In: AAAI 2019, pp. 2514–2521 (2019)
Vouros, G.A.: Explainable deep reinforcement learning: state of the art and challenges. ACM Comput. Surv. 55(5) (2022)
Waa, J., Diggelen, J., Bosch, K., Neerincx, M.: Contrastive explanations for reinforcement learning in terms of expected consequences. In: IJCAI 2018 - Explainable Artificial Intelligence (XAI) Workshop (2018)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Wells, L., Bednarz, T.: Explainable AI and reinforcement learning-a systematic review of current approaches and trends. Front. Artif. Intell. 4 (2021)
Yau, H., Russell, C., Hadfield, S.: What did you think would happen? Explaining agent behaviour through intended outcomes. In: NeurIPS 2020, vol. 33, pp. 18375–18386 (2020)
Yeh, E., Sequeira, P., Hostetler, J., Gervasio, M.T.: Outcome-guided counterfactuals for reinforcement learning agents from a jointly trained generative latent space. arXiv abs/2207.07710 (2022)
Zahavy, T., Ben-Zrihem, N., Mannor, S.: Graying the black box: understanding DQNs. In: ICML 2016, pp. 1899–1908 (2016)
Zelvelder, A.E., Westberg, M., Främling, K.: Assessing explainability in reinforcement learning. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds.) EXTRAAMAS 2021. LNCS (LNAI), vol. 12688, pp. 223–240. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-82017-6_14
Čyras, K., Rago, A., Albini, E., Baroni, P., Toni, F.: Argumentative XAI: a survey. In: IJCAI 2021, pp. 4392–4399 (2021). Survey Track
Acknowledgement
The authors would thank anonymous reviewers for their valuable comments. This work is partially funded by the EPSRC CHAI project (EP/T026820/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., McAreavey, K., Liu, W. (2023). Contrastive Visual Explanations for Reinforcement Learning via Counterfactual Rewards. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-44067-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44066-3
Online ISBN: 978-3-031-44067-0
eBook Packages: Computer ScienceComputer Science (R0)