Causal Based Action Selection Policy for Reinforcement Learning

Feliciano-Avelino, Ivan; Méndez-Molina, Arquímides; Morales, Eduardo F.; Sucar, L. Enrique

doi:10.1007/978-3-030-89817-5_16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13067))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1380 Accesses
1 Citations

Abstract

Reinforcement learning (RL) is the de facto learning by interaction paradigm within machine learning. One of the intrinsic challenges of RL is the trade-off between exploration and exploitation. To solve this problem, in this paper, we propose to improve the reinforcement learning exploration process with an agent that can exploit causal relationships of the world. A causal graphical model is used to restrict the search space by reducing the actions that an agent can take through graph queries that check which variables are direct causes of the variables of interest. Our main contributions are a framework to represent causal information and an algorithm to guide the action selection process of a reinforcement learning agent, by querying the causal graph. We test our approach on discrete and continuous domains and show that using the causal structure in the Q-learning action selection step, leads to higher jump-start reward and stability. Furthermore, it is also shown that a better performance is obtained even with partial and spurious relationships in the causal graphical model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this setting, each action is associated with a cost and the agent cannot spend more than a fixed budget allocated for all the task.
2.
A high-level variable or macro variable is a function over a data structure, which in turn is defined from other variables [8]. These variables can be seen as a quantity that summarizes information about some aspect of the data structure.
3.
\(\epsilon \) is a value between 0 and 1 that weights the relationship between exploration and exploitation.
4.
We consider problems where it is not necessary to perform several actions at the same time to affect x.
5.
The software developed is available at https://anonymous.4open.science/r/cbdbb0ba-d371-4e0b-97b6-24613aff69ac/.

References

Abel, D., et al.: Goal-based action priors. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 25 (2015)
Google Scholar
Akkaya, I., et al.: Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)
Bareinboim, E., Forney, A., Pearl, J.: Bandits with unobserved confounders: A causal approach. In: Conference on Neural Information Processing Systems, pp. 1342–1350 (2015)
Google Scholar
Beyret, B., Hernández-Orallo, J., Cheke, L., Halina, M., Shanahan, M., Crosby, M.: The animal-AI environment: training and testing animal-like artificial cognition (2019)
Google Scholar
Brockman, G., et al.: OpenAI gym (2016)
Google Scholar
Campbell, D.T., Cook, T.D.: Quasi-experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Publishing Company, Chicago (1979)
Google Scholar
Chalupka, K., Perona, P., Eberhardt, F.: Visual causal feature learning. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (UAI 2015), pp. 181–190. AUAI Press, Arlington, Virginia (2015)
Google Scholar
Dasgupta, I., et al.: Causal reasoning from meta-reinforcement learning. CoRR abs/1901.08162 (2019). http://arxiv.org/abs/1901.08162
Everitt, T., Hutter, M.: Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective. CoRR abs/1908.04734 (2019). http://arxiv.org/abs/1908.04734
Geibel, P.: Reinforcement learning with bounded risk. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 162–169. Morgan Kaufmann (2001)
Google Scholar
Gershman, S.J.: Reinforcement Learning and Causal Models. The Oxford handbook of causal reasoning, p. 295 (2017)
Google Scholar
Gonzalez-Soto, M., Sucar, L.E., Escalante, H.J.: Playing against nature: causal discovery for decision making under uncertainty. CoRR abs/1807.01268 (2018). http://arxiv.org/abs/1807.01268
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge(2016)
Google Scholar
Gottesman, O., et al.: Evaluating reinforcement learning algorithms in observational health settings (2018)
Google Scholar
Hafner, D., Lillicrap, T.P., Ba, J., Norouzi, M.: Dream to control: learning behaviors by latent imagination. In: 8th International Conference on Learning Representations (ICLR). OpenReview.net (2020). https://openreview.net/forum?id=S1lOTC4tDS
Hitchcock, C.: Causal models. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Stanford University, Metaphysics Research Lab (2019)
Google Scholar
Ho, S.: Causal learning versus reinforcement learning for knowledge learning and problem solving. In: Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence. AAAI Workshops, vol. WS-17. AAAI Press (2017)
Google Scholar
Lattimore, F., Lattimore, T., Reid, M.D.: Causal bandits: Learning good interventions via causal inference. In: Advances in Neural Information Processing Systems, pp. 1181–1189 (2016)
Google Scholar
Lu, C., Schölkopf, B., Hernández-Lobato, J.M.: Deconfounding reinforcement learning in observational settings. CoRR abs/1812.10576 (2018), http://arxiv.org/abs/1812.10576
Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958 (2019)
Mazumder, S., et al.: Guided exploration in deep reinforcement learning (2019). https://openreview.net/forum?id=SJMeTo09YQ
McFarlane, R.: A Survey of Exploration Strategies in Reinforcement Learning. McGill University (2018). http://www.cs.mcgill.ca/cs526/roger.pdf
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations (2017)
Google Scholar
Nair, S., Zhu, Y., Savarese, S., Fei-Fei, L.: Causal induction from visual observations for goal directed tasks. arXiv preprint arXiv:1910.01751 (2019)
Pearl, J.: Causality: Models, Reasoning, and Interference. Cambridge University Press, Cambridge (2009)
Google Scholar
Rezende, D.J., et al.: Causally correct partial models for reinforcement learning. CoRR abs/2002.02836 (2020). https://arxiv.org/abs/2002.02836
Saunders, W., Sastry, G., Stuhlmüller, A., Evans, O.: Trial without error: towards safe reinforcement learning via human intervention. In: André, E., Koenig, S., Dastani, M., Sukthankar, G. (eds.) Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp. 2067–2069. ACM (2018)
Google Scholar
Sen, R., Shanmugam, K., Dimakis, A.G., Shakkottai, S.: Identifying best interventions through online importance sampling. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3057–3066. JMLR. org (2017)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)
Google Scholar
Vinyals, O., et al.: Grandmaster level in Starcraft III using multi-agent reinforcement learning. Nature 575, 350–354 (2019)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Article MATH Google Scholar
Woodward, J.: Making Things Happen: A Theory of Causal Explanation. Oxford University Press, Oxford (2005)
Google Scholar
Woodward, J.: Causation and manipulability. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, winter 2016 edn. (2016)
Google Scholar
Zhang, J., Bareinboim, E.: Transfer learning in multi-armed bandit: a causal approach. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1778–1780 (2017)
Google Scholar

Download references

Acknowledgments

This work was partially supported by CONACYT, Project A1-S-43346 and scholarships 725976 (first author) and 754972 (second author) .

Author information

Authors and Affiliations

Instituto Nacional de Astrofísica, Óptica y Electrónica Luis Enrique Erro # 1, C.P. 72840, Tonantzintla, Puebla, México
Ivan Feliciano-Avelino, Arquímides Méndez-Molina, Eduardo F. Morales & L. Enrique Sucar

Authors

Ivan Feliciano-Avelino
View author publications
You can also search for this author in PubMed Google Scholar
Arquímides Méndez-Molina
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo F. Morales
View author publications
You can also search for this author in PubMed Google Scholar
L. Enrique Sucar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arquímides Méndez-Molina .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Ildar Batyrshin
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Alexander Gelbukh
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feliciano-Avelino, I., Méndez-Molina, A., Morales, E.F., Sucar, L.E. (2021). Causal Based Action Selection Policy for Reinforcement Learning. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-89817-5_16
Published: 21 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics