Skip to main content

Causal Based Action Selection Policy for Reinforcement Learning

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2021)

Abstract

Reinforcement learning (RL) is the de facto learning by interaction paradigm within machine learning. One of the intrinsic challenges of RL is the trade-off between exploration and exploitation. To solve this problem, in this paper, we propose to improve the reinforcement learning exploration process with an agent that can exploit causal relationships of the world. A causal graphical model is used to restrict the search space by reducing the actions that an agent can take through graph queries that check which variables are direct causes of the variables of interest. Our main contributions are a framework to represent causal information and an algorithm to guide the action selection process of a reinforcement learning agent, by querying the causal graph. We test our approach on discrete and continuous domains and show that using the causal structure in the Q-learning action selection step, leads to higher jump-start reward and stability. Furthermore, it is also shown that a better performance is obtained even with partial and spurious relationships in the causal graphical model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this setting, each action is associated with a cost and the agent cannot spend more than a fixed budget allocated for all the task.

  2. 2.

    A high-level variable or macro variable is a function over a data structure, which in turn is defined from other variables [8]. These variables can be seen as a quantity that summarizes information about some aspect of the data structure.

  3. 3.

    \(\epsilon \) is a value between 0 and 1 that weights the relationship between exploration and exploitation.

  4. 4.

    We consider problems where it is not necessary to perform several actions at the same time to affect x.

  5. 5.

    The software developed is available at https://anonymous.4open.science/r/cbdbb0ba-d371-4e0b-97b6-24613aff69ac/.

References

  1. Abel, D., et al.: Goal-based action priors. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 25 (2015)

    Google Scholar 

  2. Akkaya, I., et al.: Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)

  3. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)

  4. Bareinboim, E., Forney, A., Pearl, J.: Bandits with unobserved confounders: A causal approach. In: Conference on Neural Information Processing Systems, pp. 1342–1350 (2015)

    Google Scholar 

  5. Beyret, B., Hernández-Orallo, J., Cheke, L., Halina, M., Shanahan, M., Crosby, M.: The animal-AI environment: training and testing animal-like artificial cognition (2019)

    Google Scholar 

  6. Brockman, G., et al.: OpenAI gym (2016)

    Google Scholar 

  7. Campbell, D.T., Cook, T.D.: Quasi-experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Publishing Company, Chicago (1979)

    Google Scholar 

  8. Chalupka, K., Perona, P., Eberhardt, F.: Visual causal feature learning. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (UAI 2015), pp. 181–190. AUAI Press, Arlington, Virginia (2015)

    Google Scholar 

  9. Dasgupta, I., et al.: Causal reasoning from meta-reinforcement learning. CoRR abs/1901.08162 (2019). http://arxiv.org/abs/1901.08162

  10. Everitt, T., Hutter, M.: Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective. CoRR abs/1908.04734 (2019). http://arxiv.org/abs/1908.04734

  11. Geibel, P.: Reinforcement learning with bounded risk. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 162–169. Morgan Kaufmann (2001)

    Google Scholar 

  12. Gershman, S.J.: Reinforcement Learning and Causal Models. The Oxford handbook of causal reasoning, p. 295 (2017)

    Google Scholar 

  13. Gonzalez-Soto, M., Sucar, L.E., Escalante, H.J.: Playing against nature: causal discovery for decision making under uncertainty. CoRR abs/1807.01268 (2018). http://arxiv.org/abs/1807.01268

  14. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge(2016)

    Google Scholar 

  15. Gottesman, O., et al.: Evaluating reinforcement learning algorithms in observational health settings (2018)

    Google Scholar 

  16. Hafner, D., Lillicrap, T.P., Ba, J., Norouzi, M.: Dream to control: learning behaviors by latent imagination. In: 8th International Conference on Learning Representations (ICLR). OpenReview.net (2020). https://openreview.net/forum?id=S1lOTC4tDS

  17. Hitchcock, C.: Causal models. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Stanford University, Metaphysics Research Lab (2019)

    Google Scholar 

  18. Ho, S.: Causal learning versus reinforcement learning for knowledge learning and problem solving. In: Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence. AAAI Workshops, vol. WS-17. AAAI Press (2017)

    Google Scholar 

  19. Lattimore, F., Lattimore, T., Reid, M.D.: Causal bandits: Learning good interventions via causal inference. In: Advances in Neural Information Processing Systems, pp. 1181–1189 (2016)

    Google Scholar 

  20. Lu, C., Schölkopf, B., Hernández-Lobato, J.M.: Deconfounding reinforcement learning in observational settings. CoRR abs/1812.10576 (2018), http://arxiv.org/abs/1812.10576

  21. Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958 (2019)

  22. Mazumder, S., et al.: Guided exploration in deep reinforcement learning (2019). https://openreview.net/forum?id=SJMeTo09YQ

  23. McFarlane, R.: A Survey of Exploration Strategies in Reinforcement Learning. McGill University (2018). http://www.cs.mcgill.ca/cs526/roger.pdf

  24. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  25. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Google Scholar 

  26. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations (2017)

    Google Scholar 

  27. Nair, S., Zhu, Y., Savarese, S., Fei-Fei, L.: Causal induction from visual observations for goal directed tasks. arXiv preprint arXiv:1910.01751 (2019)

  28. Pearl, J.: Causality: Models, Reasoning, and Interference. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  29. Rezende, D.J., et al.: Causally correct partial models for reinforcement learning. CoRR abs/2002.02836 (2020). https://arxiv.org/abs/2002.02836

  30. Saunders, W., Sastry, G., Stuhlmüller, A., Evans, O.: Trial without error: towards safe reinforcement learning via human intervention. In: André, E., Koenig, S., Dastani, M., Sukthankar, G. (eds.) Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp. 2067–2069. ACM (2018)

    Google Scholar 

  31. Sen, R., Shanmugam, K., Dimakis, A.G., Shakkottai, S.: Identifying best interventions through online importance sampling. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3057–3066. JMLR. org (2017)

    Google Scholar 

  32. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)

    Google Scholar 

  33. Vinyals, O., et al.: Grandmaster level in Starcraft III using multi-agent reinforcement learning. Nature 575, 350–354 (2019)

    Google Scholar 

  34. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Article  MATH  Google Scholar 

  35. Woodward, J.: Making Things Happen: A Theory of Causal Explanation. Oxford University Press, Oxford (2005)

    Google Scholar 

  36. Woodward, J.: Causation and manipulability. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, winter 2016 edn. (2016)

    Google Scholar 

  37. Zhang, J., Bareinboim, E.: Transfer learning in multi-armed bandit: a causal approach. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1778–1780 (2017)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by CONACYT, Project A1-S-43346 and scholarships 725976 (first author) and 754972 (second author) .

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arquímides Méndez-Molina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feliciano-Avelino, I., Méndez-Molina, A., Morales, E.F., Sucar, L.E. (2021). Causal Based Action Selection Policy for Reinforcement Learning. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89817-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89816-8

  • Online ISBN: 978-3-030-89817-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics