RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains

Kokel, Harsha; Natarajan, Sriraam; Ravindran, Balaraman; Tadepalli, Prasad

doi:10.1007/s00521-022-08119-y

RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots
Published: 12 December 2022

Volume 35, pages 16877–16892, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Harsha Kokel ORCID: orcid.org/0000-0002-7548-3719¹,
Sriraam Natarajan¹,
Balaraman Ravindran² &
…
Prasad Tadepalli³

427 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

We propose a hybrid planner-(deep)reinforcement learning (RL) architecture, RePReL, that leverages a relational planner to efficiently provide useful state abstractions. State abstractions have a tremendous advantage for better generalization and transfer in RL. Our framework takes an important step toward constructing these abstractions. Specifically, the framework enables multi-level abstractions by leveraging a high-level planner to communicate with a low-level (deep) reinforcement learner. Our empirical results demonstrate the generalization and transfer capabilities of the framework in both discrete and continuous domains with rich structures (objects and relations between these objects). A key aspect of RePReL is that it can be seen as a plug-and-play framework where different planners can be used in combination with different (deep) RL agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

Generative AI Models for Drug Discovery

Code availability

https://github.com/starling-lab/RePReL.

Notes

Variables are uppercase. Constants and predicates are lowercase. X, Y, D, K are variables for location and P a variable for passenger.
With matching or null option.
An HTN planner [37] written in python, https://bitbucket.org/dananau/pyhop.
https://github.com/rail-berkeley/rlkit.
Boldfont indicates a vector.

References

Andrychowicz M, Wolski F, Ray A, et al (2017) Hindsight experience replay. In: NeurIPS, pp 5048–5058
Ash JT, Adams RP (2020) On warm-starting neural network training. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual
Battaglia PW, Hamrick JB, Bapst V et al (2018) Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261
Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3:213–231
MathSciNet MATH Google Scholar
Das S, Natarajan S, Roy K et al (2020) Fitted q-learning for relational domains. CoRR abs/2006.05595
Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: ICML, pp 118–126
Dong H, Mao J, Lin T et al (2019) Neural logic machines. In: ICLR
Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: ECML, pp 97–108
Džeroski S, De Raedt L, Driessens K (2001) Relational reinforcement learning. Mach Learn 43(1/2):7–52
Article MATH Google Scholar
Eppe M, Nguyen PDH, Wermter S (2019) From semantics to execution: integrating action planning with reinforcement learning for robotic causal problem-solving. Front Robot AI 6:123
Article Google Scholar
Evans R, Grefenstette E (2018) Learning explanatory rules from noisy data. JAIR 61:1–64
Article MathSciNet MATH Google Scholar
Fern A, Yoon S, Givan R (2006) Approximate policy iteration with a policy language bias: solving relational Markov decision processes. JAIR 25:75–118
Article MathSciNet MATH Google Scholar
Ghallab M, Nau D, Traverso P (2004) Automated Planning: theory and practice. Elsevier
Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1–2):163–223
Article MathSciNet MATH Google Scholar
Grounds M, Kudenko D (2005) Combining reinforcement learning with symbolic planning. AAMAS III:75–86
Google Scholar
Guestrin C, Patrascu R, Schuurmans D (2002) Algorithm-directed exploration for model-based reinforcement learning in factored mdps. In: ICML, pp 235–242
Guestrin C, Koller D et al (2003) Generalizing plans to new environments in relational mdps. In: IJCAI, pp 1003–1010
Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp 1861–1870
van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, pp 2094–2100
Igl M, Farquhar G, Luketina J et al (2021) Transient non-stationarity and generalisation in deep reinforcement learning. In: International conference on learning representations
Illanes L, Yan X, Icarte RT et al (2020) Symbolic plans as high-level instructions for reinforcement learning. ICAPS pp 540–550
Janisch J, Pevný T, Lisý V (2021) Symbolic relational deep reinforcement learning based on graph neural networks. RL4RealLife @ ICML
Jiang Y, Yang F, Zhang S, et al (2019) Task-motion planning with reinforcement learning for adaptable mobile service robots. In: IROS, pp 7529–7534
Jiang Z, Luo S (2019) Neural logic reinforcement learning. In: ICML, vol 97. PMLR, pp 3110–3119
Jiang Z, Minervini P, Jiang M, et al (2021) Grid-to-graph: flexible spatial relational inductive biases for reinforcement learning. In: AAMAS. ACM, pp 674–682
Kimura D, Ono M, Chaudhury S, et al (2021) Neuro-symbolic reinforcement learning with first-order logic. In: EMNLP, pp 3505–3511
Kokel H, Manoharan A, Natarajan S et al (2021) Reprel: integrating relational planning and reinforcement learning for effective abstraction. ICAPS 31(1):533–541
Article Google Scholar
Kokel H, Manoharan A, Natarajan S, et al (2021b) Dynamic probabilistic logic models for effective abstractions in RL. CoRR abs/2110.08318
Konidaris G, Kaelbling LP, Lozano-Perez T (2018) From skills to symbols: Learning symbolic representations for abstract high-level planning. JAIR
Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for mdps. In: ISAIM, p 5
Li R, Jabri A, Darrell T, et al (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: ICRA. IEEE, pp 4051–4058
Lyle C, Rowland M, Dabney W (2022) Understanding and preventing capacity loss in reinforcement learning. In: International Conference on Learning Representations
Lyu D, Yang F, Liu B, et al (2019) SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI, pp 2970–2977
Manfredotti CE (2009) Modeling and inference with relational dynamic bayesian networks. In: CCAI, pp 287–290
Natarajan S, Tadepalli P, et al (2005) Learning first-order probabilistic models with combining rules. In: ICML, pp 609–616
Natarajan S, Tadepalli P et al (2008) Learning first-order probabilistic models with combining rules. Ann Math Artif Intell 54(1–3):223–256
Article MathSciNet MATH Google Scholar
Nau D, Cao Y, Lotem A, et al (1999) Shop: Simple hierarchical ordered planner. In: IJCAI, pp 968–975
Parr R, Russell SJ (1998) Reinforcement learning with hierarchies of machines. In: NeurIPS, pp 1043–1049
Plappert M, Andrychowicz M, Ray A, et al (2018) Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464
Ravindran B, Barto AG (2003) Smdp homomorphisms: an algebraic approach to abstraction in semi markov decision processes. In: IJCAI, pp 1011–1018
Ravindran B, Barto AG (2003) SMDP homomorphisms: an algebraic approach to abstraction in semi-markov decision processes. In: IJCAI. Morgan Kaufmann, pp 1011–1018
Riegel R, Gray AG, Luus FPS, et al (2020) Logical neural networks. CoRR abs/2006.13155
Silver D, Hubert T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Article MathSciNet MATH Google Scholar
Sutton RS, Precup D, Singh SP (1998) Intra-option learning about temporally abstract actions. In: ICML, pp 556–564
Sutton RS, Precup D, Singh SP (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211
Article MathSciNet MATH Google Scholar
Vlasselaer J, Meert W, et al (2014) Efficient probabilistic inference for dynamic relational models. In: StarAI @ AAAI
Yang F, Lyu D, Liu B, et al (2018) Peorl: integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. IJCAI pp 4860–4866
Zambaldi V, Raposo D, et al (2019) Deep reinforcement learning with relational inductive biases. In: ICLR
Zhang L, Li X, Wang M, et al (2021) Off-policy differentiable logic reinforcement learning. In: ECML PKDD, pp 617–632

Download references

Acknowledgements

HK and SN gratefully acknowledge the support of ARO award W911NF2010224 and AFOSR award FA9550-18-1-0462. PT acknowledges the support of DARPA contract N66001-17-2-4030 and NSF grant IIS-1619433. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the ARO, AFOSR, NSF, DARPA or the US government. We sincerely thank Illanes et al. (2020) for sharing the Taskable RL code for baselines. We also thank the Starling lab members for feedback on the manuscript.

Author information

Authors and Affiliations

The University of Texas at Dallas, Richardson, USA
Harsha Kokel & Sriraam Natarajan
Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, India
Balaraman Ravindran
Oregon State University, Corvallis, USA
Prasad Tadepalli

Authors

Harsha Kokel
View author publications
You can also search for this author in PubMed Google Scholar
Sriraam Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Balaraman Ravindran
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Tadepalli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harsha Kokel.

Ethics declarations

Conflict of interest

The research leading to these results received funding from federal grants as mentioned in the acknowledgments. Specifically, HK and SN received the support of ARO (award W911NF2010224) and AFOSR (award FA9550-18-1-0462). PT received the support of DARPA (contract N66001-17-2-4030) and NSF (grant IIS-1619433). No conflict of interest exists with this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kokel, H., Natarajan, S., Ravindran, B. et al. RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains. Neural Comput & Applic 35, 16877–16892 (2023). https://doi.org/10.1007/s00521-022-08119-y

Download citation

Received: 11 March 2022
Accepted: 24 November 2022
Published: 12 December 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-022-08119-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Deep learning: systematic review, models, challenges, and research directions

Generative AI Models for Drug Discovery

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Deep learning: systematic review, models, challenges, and research directions

Generative AI Models for Drug Discovery

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation