Abstract
Reinforcement learning (RL) is the de facto learning by interaction paradigm within machine learning. One of the intrinsic challenges of RL is the trade-off between exploration and exploitation. To solve this problem, in this paper, we propose to improve the reinforcement learning exploration process with an agent that can exploit causal relationships of the world. A causal graphical model is used to restrict the search space by reducing the actions that an agent can take through graph queries that check which variables are direct causes of the variables of interest. Our main contributions are a framework to represent causal information and an algorithm to guide the action selection process of a reinforcement learning agent, by querying the causal graph. We test our approach on discrete and continuous domains and show that using the causal structure in the Q-learning action selection step, leads to higher jump-start reward and stability. Furthermore, it is also shown that a better performance is obtained even with partial and spurious relationships in the causal graphical model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this setting, each action is associated with a cost and the agent cannot spend more than a fixed budget allocated for all the task.
- 2.
A high-level variable or macro variable is a function over a data structure, which in turn is defined from other variables [8]. These variables can be seen as a quantity that summarizes information about some aspect of the data structure.
- 3.
\(\epsilon \) is a value between 0 and 1 that weights the relationship between exploration and exploitation.
- 4.
We consider problems where it is not necessary to perform several actions at the same time to affect x.
- 5.
The software developed is available at https://anonymous.4open.science/r/cbdbb0ba-d371-4e0b-97b6-24613aff69ac/.
References
Abel, D., et al.: Goal-based action priors. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 25 (2015)
Akkaya, I., et al.: Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)
Bareinboim, E., Forney, A., Pearl, J.: Bandits with unobserved confounders: A causal approach. In: Conference on Neural Information Processing Systems, pp. 1342–1350 (2015)
Beyret, B., Hernández-Orallo, J., Cheke, L., Halina, M., Shanahan, M., Crosby, M.: The animal-AI environment: training and testing animal-like artificial cognition (2019)
Brockman, G., et al.: OpenAI gym (2016)
Campbell, D.T., Cook, T.D.: Quasi-experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Publishing Company, Chicago (1979)
Chalupka, K., Perona, P., Eberhardt, F.: Visual causal feature learning. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (UAI 2015), pp. 181–190. AUAI Press, Arlington, Virginia (2015)
Dasgupta, I., et al.: Causal reasoning from meta-reinforcement learning. CoRR abs/1901.08162 (2019). http://arxiv.org/abs/1901.08162
Everitt, T., Hutter, M.: Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective. CoRR abs/1908.04734 (2019). http://arxiv.org/abs/1908.04734
Geibel, P.: Reinforcement learning with bounded risk. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 162–169. Morgan Kaufmann (2001)
Gershman, S.J.: Reinforcement Learning and Causal Models. The Oxford handbook of causal reasoning, p. 295 (2017)
Gonzalez-Soto, M., Sucar, L.E., Escalante, H.J.: Playing against nature: causal discovery for decision making under uncertainty. CoRR abs/1807.01268 (2018). http://arxiv.org/abs/1807.01268
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge(2016)
Gottesman, O., et al.: Evaluating reinforcement learning algorithms in observational health settings (2018)
Hafner, D., Lillicrap, T.P., Ba, J., Norouzi, M.: Dream to control: learning behaviors by latent imagination. In: 8th International Conference on Learning Representations (ICLR). OpenReview.net (2020). https://openreview.net/forum?id=S1lOTC4tDS
Hitchcock, C.: Causal models. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Stanford University, Metaphysics Research Lab (2019)
Ho, S.: Causal learning versus reinforcement learning for knowledge learning and problem solving. In: Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence. AAAI Workshops, vol. WS-17. AAAI Press (2017)
Lattimore, F., Lattimore, T., Reid, M.D.: Causal bandits: Learning good interventions via causal inference. In: Advances in Neural Information Processing Systems, pp. 1181–1189 (2016)
Lu, C., Schölkopf, B., Hernández-Lobato, J.M.: Deconfounding reinforcement learning in observational settings. CoRR abs/1812.10576 (2018), http://arxiv.org/abs/1812.10576
Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958 (2019)
Mazumder, S., et al.: Guided exploration in deep reinforcement learning (2019). https://openreview.net/forum?id=SJMeTo09YQ
McFarlane, R.: A Survey of Exploration Strategies in Reinforcement Learning. McGill University (2018). http://www.cs.mcgill.ca/cs526/roger.pdf
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations (2017)
Nair, S., Zhu, Y., Savarese, S., Fei-Fei, L.: Causal induction from visual observations for goal directed tasks. arXiv preprint arXiv:1910.01751 (2019)
Pearl, J.: Causality: Models, Reasoning, and Interference. Cambridge University Press, Cambridge (2009)
Rezende, D.J., et al.: Causally correct partial models for reinforcement learning. CoRR abs/2002.02836 (2020). https://arxiv.org/abs/2002.02836
Saunders, W., Sastry, G., Stuhlmüller, A., Evans, O.: Trial without error: towards safe reinforcement learning via human intervention. In: André, E., Koenig, S., Dastani, M., Sukthankar, G. (eds.) Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp. 2067–2069. ACM (2018)
Sen, R., Shanmugam, K., Dimakis, A.G., Shakkottai, S.: Identifying best interventions through online importance sampling. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3057–3066. JMLR. org (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)
Vinyals, O., et al.: Grandmaster level in Starcraft III using multi-agent reinforcement learning. Nature 575, 350–354 (2019)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Woodward, J.: Making Things Happen: A Theory of Causal Explanation. Oxford University Press, Oxford (2005)
Woodward, J.: Causation and manipulability. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, winter 2016 edn. (2016)
Zhang, J., Bareinboim, E.: Transfer learning in multi-armed bandit: a causal approach. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1778–1780 (2017)
Acknowledgments
This work was partially supported by CONACYT, Project A1-S-43346 and scholarships 725976 (first author) and 754972 (second author) .
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Feliciano-Avelino, I., Méndez-Molina, A., Morales, E.F., Sucar, L.E. (2021). Causal Based Action Selection Policy for Reinforcement Learning. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-89817-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)