Skip to main content

Contrastive Visual Explanations for Reinforcement Learning via Counterfactual Rewards

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1902))

Included in the following conference series:

  • 1116 Accesses

Abstract

Causal attribution aided by counterfactual reasoning is recognised as a key feature of human explanation. In this paper we propose a post-hoc contrastive explanation framework for reinforcement learning (RL) based on comparing learned policies under actual environmental rewards vs. hypothetical (counterfactual) rewards. The framework provides policy-level explanations by accessing learned Q-functions and identifying intersecting critical states. Global explanations are generated to summarise policy behaviour through the visualisation of sub-trajectories based on these states, while local explanations are based on the action-values in states. We conduct experiments on several grid-world examples. Our results show that it is possible to explain the difference between learned policies based on Q-functions. This demonstrates the potential for more informed human decision-making when deploying policies and highlights the possibility of developing further XAI techniques in RL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    With 90% probability the agent moves one cell in the direction specified by the action (i.e. the action succeeds), or with 5% probability each the agent moves one cell either clockwise or anti-clockwise relative to the direction specified by the action (i.e. the action fails). This grid-world was implemented by Minigrid [7].

References

  1. Amir, D., Amir, O.: Highlights: summarizing agent behavior to people. In: AAMAS 2018, pp. 1168–1176 (2018)

    Google Scholar 

  2. Anderson, A., et al.: Explaining reinforcement learning to mere mortals: an empirical study. In: IJCAI 2019, pp. 1328–1334 (2019)

    Google Scholar 

  3. Annasamy, R., Sycara, K.: Towards better interpretability in deep q-networks. In: AAAI 2019, vol. 33, pp. 4561–4569 (2019)

    Google Scholar 

  4. Bellman, R.E.: Dynamic Programming. Princeton University Press (2010)

    Google Scholar 

  5. Chakraborti, T., Kulkarni, A., Sreedharan, S., Smith, D.E., Kambhampati, S.: Explicability? legibility? predictability? transparency? privacy? security? the emerging landscape of interpretable agent behavior. In: ICAPS 2019, vol. 29, pp. 86–96 (2019)

    Google Scholar 

  6. Chakraborti, T., Sreedharan, S., Kambhampati, S.: The emerging landscape of explainable automated planning & decision making. In: IJCAI 2020, pp. 4803–4811 (2020). Survey track

    Google Scholar 

  7. Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for gymnasium (2018). https://github.com/Farama-Foundation/Minigrid

  8. Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: NeurIPS 2017, vol. 30 (2017)

    Google Scholar 

  9. Cruz, F., Dazeley, R., Vamplew, P.: Memory-based explainable reinforcement learning. In: Liu, J., Bailey, J. (eds.) AI 2019. LNCS (LNAI), vol. 11919, pp. 66–77. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35288-2_6

    Chapter  Google Scholar 

  10. Gottesman, O., et al.: Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions. In: ICML 2020, vol. 119, pp. 3658–3667 (2020)

    Google Scholar 

  11. Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding atari agents. In: ICML 2018, pp. 2877–2886 (2018)

    Google Scholar 

  12. Gunning, D.: Darpa’s explainable artificial intelligence (XAI) program. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, p. ii (2019)

    Google Scholar 

  13. Gupta, P., et al.: Explain your move: understanding agent actions using specific and relevant feature attribution. In: ICLR 2020 (2020)

    Google Scholar 

  14. Hayes, B., Shah, J.A.: Improving robot controller transparency through autonomous policy explanation. In: 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 303–312 (2017)

    Google Scholar 

  15. Hoffmann, J., Magazzeni, D.: Explainable AI planning (XAIP): overview and the case of contrastive explanation (extended abstract). In: Krötzsch, M., Stepanova, D. (eds.) Reasoning Web. Explainable Artificial Intelligence. LNCS, vol. 11810, pp. 277–282. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31423-1_9

    Chapter  Google Scholar 

  16. Huang, S.H., Bhatia, K., Abbeel, P., Dragan, A.D.: Establishing appropriate trust via critical states. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3929–3936. IEEE (2018)

    Google Scholar 

  17. Huang, S.H., Held, D., Abbeel, P., Dragan, A.D.: Enabling robots to communicate their objectives. Auton. Robot. 43, 309–326 (2017)

    Article  Google Scholar 

  18. Huber, T., Weitz, K., André, E., Amir, O.: Local and global explanations of agent behavior: integrating strategy summaries with saliency maps. Artif. Intell. 301, 103571 (2021)

    Article  MathSciNet  Google Scholar 

  19. Hüyük, A., Jarrett, D., Tekin, C., van der Schaar, M.: Explaining by imitating: understanding decisions by interpretable policy learning. In: ICLR 2021 (2021)

    Google Scholar 

  20. Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., Amodei, D.: Reward learning from human preferences and demonstrations in atari. In: NeurIPS 2018, vol. 31 (2018)

    Google Scholar 

  21. Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., Doshi-Velez, F.: Explainable reinforcement learning via reward decomposition. arxiv (2019)

    Google Scholar 

  22. Karino, I., Ohmura, Y., Kuniyoshi, Y.: Identifying critical states by the action-based variance of expected return. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 366–378. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_29

    Chapter  Google Scholar 

  23. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)

    Article  Google Scholar 

  24. Lage, I., Lifschitz, D., Doshi-Velez, F., Amir, O.: Exploring computational user models for agent policy summarization. In: IJCAI 2019, pp. 1401–1407 (2019)

    Google Scholar 

  25. Lin, Y.C., Hong, Z.W., Liao, Y.H., Shih, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. In: IJCAI 2017, pp. 3756–3762 (2017)

    Google Scholar 

  26. Lipton, P., Knowles, D.: Contrastive Explanations, p. 247–266. Royal Institute of Philosophy Supplements, Cambridge University Press (1991)

    Google Scholar 

  27. Liu, R., Bai, F., Du, Y., Yang, Y.: Meta-reward-net: implicitly differentiable reward learning for preference-based reinforcement learning. In: NeurIPS 2022, vol. 35, pp. 22270–22284 (2022)

    Google Scholar 

  28. Lu, W., Magg, S., Zhao, X., Gromniak, M., Wermter, S.: A closer look at reward decomposition for high-level robotic explanations. arXiv abs/2304.12958 (2023)

    Google Scholar 

  29. Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens. In: AAAI 2020, pp. 2493–2500 (2020)

    Google Scholar 

  30. Marcus, G., Davis, E.: Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon Books, USA (2019)

    Google Scholar 

  31. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MathSciNet  Google Scholar 

  32. Miller, T.: Contrastive explanation: a structural-model approach. Knowl. Eng. Rev. 36, e14 (2021)

    Article  Google Scholar 

  33. Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. arXiv abs/1706.07979 (2017)

    Google Scholar 

  34. Mott, A., Zoran, D., Chrzanowski, M., Wierstra, D., Rezende, D.J.: Towards interpretable reinforcement learning using attention augmented agents. In: NeurIPS 2019, pp. 12360–12369 (2019)

    Google Scholar 

  35. Narayanan, S., Lage, I., Doshi-Velez, F.: (when) are contrastive explanations of reinforcement learning helpful? arXiv abs/2211.07719 (2022)

    Google Scholar 

  36. Olson, M.L., Khanna, R., Neal, L., Li, F., Wong, W.K.: Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artif. Intell. 295, 103455 (2021)

    Article  MathSciNet  Google Scholar 

  37. Puiutta, E., Veith, E.M.S.P.: Explainable reinforcement learning: a survey. arXiv abs/2005.06247 (2020)

    Google Scholar 

  38. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. In: Wiley Series in Probability and Statistics (1994)

    Google Scholar 

  39. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 4th edn. Pearson (2020)

    Google Scholar 

  40. Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2019)

    Article  Google Scholar 

  41. Sequeira, P., Gervasio, M.: Interestingness elements for explainable reinforcement learning: understanding agents’ capabilities and limitations. Artif. Intell. 288, 103367 (2020)

    Article  MathSciNet  Google Scholar 

  42. Sequeira, P., Hostetler, J., Gervasio, M.T.: Global and local analysis of interestingness for competency-aware deep reinforcement learning. arXiv abs/2211.06376 (2022)

    Google Scholar 

  43. Shu, T., Xiong, C., Socher, R.: Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. In: ICLR 2018 (2018)

    Google Scholar 

  44. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)

    Article  Google Scholar 

  45. Sreedharan, S., Srivastava, S., Kambhampati, S.: TLDR: policy summarization for factored SSP problems using temporal abstractions. In: ICAPS 2020, vol. 30, pp. 272–280 (2020)

    Google Scholar 

  46. Sreedharan, S., Srivastava, S., Kambhampati, S.: Using state abstractions to compute personalized contrastive explanations for AI agent behavior. Artif. Intell. 301, 103570 (2021)

    Article  MathSciNet  Google Scholar 

  47. Topin, N., Veloso, M.: Generation of policy-level explanations for reinforcement learning. In: AAAI 2019, pp. 2514–2521 (2019)

    Google Scholar 

  48. Vouros, G.A.: Explainable deep reinforcement learning: state of the art and challenges. ACM Comput. Surv. 55(5) (2022)

    Google Scholar 

  49. Waa, J., Diggelen, J., Bosch, K., Neerincx, M.: Contrastive explanations for reinforcement learning in terms of expected consequences. In: IJCAI 2018 - Explainable Artificial Intelligence (XAI) Workshop (2018)

    Google Scholar 

  50. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    Article  Google Scholar 

  51. Wells, L., Bednarz, T.: Explainable AI and reinforcement learning-a systematic review of current approaches and trends. Front. Artif. Intell. 4 (2021)

    Google Scholar 

  52. Yau, H., Russell, C., Hadfield, S.: What did you think would happen? Explaining agent behaviour through intended outcomes. In: NeurIPS 2020, vol. 33, pp. 18375–18386 (2020)

    Google Scholar 

  53. Yeh, E., Sequeira, P., Hostetler, J., Gervasio, M.T.: Outcome-guided counterfactuals for reinforcement learning agents from a jointly trained generative latent space. arXiv abs/2207.07710 (2022)

    Google Scholar 

  54. Zahavy, T., Ben-Zrihem, N., Mannor, S.: Graying the black box: understanding DQNs. In: ICML 2016, pp. 1899–1908 (2016)

    Google Scholar 

  55. Zelvelder, A.E., Westberg, M., Främling, K.: Assessing explainability in reinforcement learning. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds.) EXTRAAMAS 2021. LNCS (LNAI), vol. 12688, pp. 223–240. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-82017-6_14

    Chapter  Google Scholar 

  56. Čyras, K., Rago, A., Albini, E., Baroni, P., Toni, F.: Argumentative XAI: a survey. In: IJCAI 2021, pp. 4392–4399 (2021). Survey Track

    Google Scholar 

Download references

Acknowledgement

The authors would thank anonymous reviewers for their valuable comments. This work is partially funded by the EPSRC CHAI project (EP/T026820/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaowei Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., McAreavey, K., Liu, W. (2023). Contrastive Visual Explanations for Reinforcement Learning via Counterfactual Rewards. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44067-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44066-3

  • Online ISBN: 978-3-031-44067-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics