Abstract
We propose a hybrid planner-(deep)reinforcement learning (RL) architecture, RePReL, that leverages a relational planner to efficiently provide useful state abstractions. State abstractions have a tremendous advantage for better generalization and transfer in RL. Our framework takes an important step toward constructing these abstractions. Specifically, the framework enables multi-level abstractions by leveraging a high-level planner to communicate with a low-level (deep) reinforcement learner. Our empirical results demonstrate the generalization and transfer capabilities of the framework in both discrete and continuous domains with rich structures (objects and relations between these objects). A key aspect of RePReL is that it can be seen as a plug-and-play framework where different planners can be used in combination with different (deep) RL agents.
Similar content being viewed by others
Code availability
Notes
Variables are uppercase. Constants and predicates are lowercase. X, Y, D, K are variables for location and P a variable for passenger.
With matching or null option.
An HTN planner [37] written in python, https://bitbucket.org/dananau/pyhop.
Boldfont indicates a vector.
References
Andrychowicz M, Wolski F, Ray A, et al (2017) Hindsight experience replay. In: NeurIPS, pp 5048–5058
Ash JT, Adams RP (2020) On warm-starting neural network training. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual
Battaglia PW, Hamrick JB, Bapst V et al (2018) Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261
Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3:213–231
Das S, Natarajan S, Roy K et al (2020) Fitted q-learning for relational domains. CoRR abs/2006.05595
Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: ICML, pp 118–126
Dong H, Mao J, Lin T et al (2019) Neural logic machines. In: ICLR
Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: ECML, pp 97–108
Džeroski S, De Raedt L, Driessens K (2001) Relational reinforcement learning. Mach Learn 43(1/2):7–52
Eppe M, Nguyen PDH, Wermter S (2019) From semantics to execution: integrating action planning with reinforcement learning for robotic causal problem-solving. Front Robot AI 6:123
Evans R, Grefenstette E (2018) Learning explanatory rules from noisy data. JAIR 61:1–64
Fern A, Yoon S, Givan R (2006) Approximate policy iteration with a policy language bias: solving relational Markov decision processes. JAIR 25:75–118
Ghallab M, Nau D, Traverso P (2004) Automated Planning: theory and practice. Elsevier
Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1–2):163–223
Grounds M, Kudenko D (2005) Combining reinforcement learning with symbolic planning. AAMAS III:75–86
Guestrin C, Patrascu R, Schuurmans D (2002) Algorithm-directed exploration for model-based reinforcement learning in factored mdps. In: ICML, pp 235–242
Guestrin C, Koller D et al (2003) Generalizing plans to new environments in relational mdps. In: IJCAI, pp 1003–1010
Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp 1861–1870
van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, pp 2094–2100
Igl M, Farquhar G, Luketina J et al (2021) Transient non-stationarity and generalisation in deep reinforcement learning. In: International conference on learning representations
Illanes L, Yan X, Icarte RT et al (2020) Symbolic plans as high-level instructions for reinforcement learning. ICAPS pp 540–550
Janisch J, Pevný T, Lisý V (2021) Symbolic relational deep reinforcement learning based on graph neural networks. RL4RealLife @ ICML
Jiang Y, Yang F, Zhang S, et al (2019) Task-motion planning with reinforcement learning for adaptable mobile service robots. In: IROS, pp 7529–7534
Jiang Z, Luo S (2019) Neural logic reinforcement learning. In: ICML, vol 97. PMLR, pp 3110–3119
Jiang Z, Minervini P, Jiang M, et al (2021) Grid-to-graph: flexible spatial relational inductive biases for reinforcement learning. In: AAMAS. ACM, pp 674–682
Kimura D, Ono M, Chaudhury S, et al (2021) Neuro-symbolic reinforcement learning with first-order logic. In: EMNLP, pp 3505–3511
Kokel H, Manoharan A, Natarajan S et al (2021) Reprel: integrating relational planning and reinforcement learning for effective abstraction. ICAPS 31(1):533–541
Kokel H, Manoharan A, Natarajan S, et al (2021b) Dynamic probabilistic logic models for effective abstractions in RL. CoRR abs/2110.08318
Konidaris G, Kaelbling LP, Lozano-Perez T (2018) From skills to symbols: Learning symbolic representations for abstract high-level planning. JAIR
Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for mdps. In: ISAIM, p 5
Li R, Jabri A, Darrell T, et al (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: ICRA. IEEE, pp 4051–4058
Lyle C, Rowland M, Dabney W (2022) Understanding and preventing capacity loss in reinforcement learning. In: International Conference on Learning Representations
Lyu D, Yang F, Liu B, et al (2019) SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI, pp 2970–2977
Manfredotti CE (2009) Modeling and inference with relational dynamic bayesian networks. In: CCAI, pp 287–290
Natarajan S, Tadepalli P, et al (2005) Learning first-order probabilistic models with combining rules. In: ICML, pp 609–616
Natarajan S, Tadepalli P et al (2008) Learning first-order probabilistic models with combining rules. Ann Math Artif Intell 54(1–3):223–256
Nau D, Cao Y, Lotem A, et al (1999) Shop: Simple hierarchical ordered planner. In: IJCAI, pp 968–975
Parr R, Russell SJ (1998) Reinforcement learning with hierarchies of machines. In: NeurIPS, pp 1043–1049
Plappert M, Andrychowicz M, Ray A, et al (2018) Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464
Ravindran B, Barto AG (2003) Smdp homomorphisms: an algebraic approach to abstraction in semi markov decision processes. In: IJCAI, pp 1011–1018
Ravindran B, Barto AG (2003) SMDP homomorphisms: an algebraic approach to abstraction in semi-markov decision processes. In: IJCAI. Morgan Kaufmann, pp 1011–1018
Riegel R, Gray AG, Luus FPS, et al (2020) Logical neural networks. CoRR abs/2006.13155
Silver D, Hubert T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Sutton RS, Precup D, Singh SP (1998) Intra-option learning about temporally abstract actions. In: ICML, pp 556–564
Sutton RS, Precup D, Singh SP (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211
Vlasselaer J, Meert W, et al (2014) Efficient probabilistic inference for dynamic relational models. In: StarAI @ AAAI
Yang F, Lyu D, Liu B, et al (2018) Peorl: integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. IJCAI pp 4860–4866
Zambaldi V, Raposo D, et al (2019) Deep reinforcement learning with relational inductive biases. In: ICLR
Zhang L, Li X, Wang M, et al (2021) Off-policy differentiable logic reinforcement learning. In: ECML PKDD, pp 617–632
Acknowledgements
HK and SN gratefully acknowledge the support of ARO award W911NF2010224 and AFOSR award FA9550-18-1-0462. PT acknowledges the support of DARPA contract N66001-17-2-4030 and NSF grant IIS-1619433. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the ARO, AFOSR, NSF, DARPA or the US government. We sincerely thank Illanes et al. (2020) for sharing the Taskable RL code for baselines. We also thank the Starling lab members for feedback on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The research leading to these results received funding from federal grants as mentioned in the acknowledgments. Specifically, HK and SN received the support of ARO (award W911NF2010224) and AFOSR (award FA9550-18-1-0462). PT received the support of DARPA (contract N66001-17-2-4030) and NSF (grant IIS-1619433). No conflict of interest exists with this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kokel, H., Natarajan, S., Ravindran, B. et al. RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains. Neural Comput & Applic 35, 16877–16892 (2023). https://doi.org/10.1007/s00521-022-08119-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08119-y