Skip to main content
Log in

RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains

  • S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

We propose a hybrid planner-(deep)reinforcement learning (RL) architecture, RePReL, that leverages a relational planner to efficiently provide useful state abstractions. State abstractions have a tremendous advantage for better generalization and transfer in RL. Our framework takes an important step toward constructing these abstractions. Specifically, the framework enables multi-level abstractions by leveraging a high-level planner to communicate with a low-level (deep) reinforcement learner. Our empirical results demonstrate the generalization and transfer capabilities of the framework in both discrete and continuous domains with rich structures (objects and relations between these objects). A key aspect of RePReL is that it can be seen as a plug-and-play framework where different planners can be used in combination with different (deep) RL agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Code availability

https://github.com/starling-lab/RePReL.

Notes

  1. Variables are uppercase. Constants and predicates are lowercase. XYDK are variables for location and P a variable for passenger.

  2. With matching or null option.

  3. An HTN planner [37] written in python, https://bitbucket.org/dananau/pyhop.

  4. https://github.com/rail-berkeley/rlkit.

  5. Boldfont indicates a vector.

References

  1. Andrychowicz M, Wolski F, Ray A, et al (2017) Hindsight experience replay. In: NeurIPS, pp 5048–5058

  2. Ash JT, Adams RP (2020) On warm-starting neural network training. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual

  3. Battaglia PW, Hamrick JB, Bapst V et al (2018) Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261

  4. Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3:213–231

    MathSciNet  MATH  Google Scholar 

  5. Das S, Natarajan S, Roy K et al (2020) Fitted q-learning for relational domains. CoRR abs/2006.05595

  6. Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: ICML, pp 118–126

  7. Dong H, Mao J, Lin T et al (2019) Neural logic machines. In: ICLR

  8. Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: ECML, pp 97–108

  9. Džeroski S, De Raedt L, Driessens K (2001) Relational reinforcement learning. Mach Learn 43(1/2):7–52

    Article  MATH  Google Scholar 

  10. Eppe M, Nguyen PDH, Wermter S (2019) From semantics to execution: integrating action planning with reinforcement learning for robotic causal problem-solving. Front Robot AI 6:123

    Article  Google Scholar 

  11. Evans R, Grefenstette E (2018) Learning explanatory rules from noisy data. JAIR 61:1–64

    Article  MathSciNet  MATH  Google Scholar 

  12. Fern A, Yoon S, Givan R (2006) Approximate policy iteration with a policy language bias: solving relational Markov decision processes. JAIR 25:75–118

    Article  MathSciNet  MATH  Google Scholar 

  13. Ghallab M, Nau D, Traverso P (2004) Automated Planning: theory and practice. Elsevier

  14. Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1–2):163–223

    Article  MathSciNet  MATH  Google Scholar 

  15. Grounds M, Kudenko D (2005) Combining reinforcement learning with symbolic planning. AAMAS III:75–86

    Google Scholar 

  16. Guestrin C, Patrascu R, Schuurmans D (2002) Algorithm-directed exploration for model-based reinforcement learning in factored mdps. In: ICML, pp 235–242

  17. Guestrin C, Koller D et al (2003) Generalizing plans to new environments in relational mdps. In: IJCAI, pp 1003–1010

  18. Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp 1861–1870

  19. van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, pp 2094–2100

  20. Igl M, Farquhar G, Luketina J et al (2021) Transient non-stationarity and generalisation in deep reinforcement learning. In: International conference on learning representations

  21. Illanes L, Yan X, Icarte RT et al (2020) Symbolic plans as high-level instructions for reinforcement learning. ICAPS pp 540–550

  22. Janisch J, Pevný T, Lisý V (2021) Symbolic relational deep reinforcement learning based on graph neural networks. RL4RealLife @ ICML

  23. Jiang Y, Yang F, Zhang S, et al (2019) Task-motion planning with reinforcement learning for adaptable mobile service robots. In: IROS, pp 7529–7534

  24. Jiang Z, Luo S (2019) Neural logic reinforcement learning. In: ICML, vol 97. PMLR, pp 3110–3119

  25. Jiang Z, Minervini P, Jiang M, et al (2021) Grid-to-graph: flexible spatial relational inductive biases for reinforcement learning. In: AAMAS. ACM, pp 674–682

  26. Kimura D, Ono M, Chaudhury S, et al (2021) Neuro-symbolic reinforcement learning with first-order logic. In: EMNLP, pp 3505–3511

  27. Kokel H, Manoharan A, Natarajan S et al (2021) Reprel: integrating relational planning and reinforcement learning for effective abstraction. ICAPS 31(1):533–541

    Article  Google Scholar 

  28. Kokel H, Manoharan A, Natarajan S, et al (2021b) Dynamic probabilistic logic models for effective abstractions in RL. CoRR abs/2110.08318

  29. Konidaris G, Kaelbling LP, Lozano-Perez T (2018) From skills to symbols: Learning symbolic representations for abstract high-level planning. JAIR

  30. Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for mdps. In: ISAIM, p 5

  31. Li R, Jabri A, Darrell T, et al (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: ICRA. IEEE, pp 4051–4058

  32. Lyle C, Rowland M, Dabney W (2022) Understanding and preventing capacity loss in reinforcement learning. In: International Conference on Learning Representations

  33. Lyu D, Yang F, Liu B, et al (2019) SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI, pp 2970–2977

  34. Manfredotti CE (2009) Modeling and inference with relational dynamic bayesian networks. In: CCAI, pp 287–290

  35. Natarajan S, Tadepalli P, et al (2005) Learning first-order probabilistic models with combining rules. In: ICML, pp 609–616

  36. Natarajan S, Tadepalli P et al (2008) Learning first-order probabilistic models with combining rules. Ann Math Artif Intell 54(1–3):223–256

    Article  MathSciNet  MATH  Google Scholar 

  37. Nau D, Cao Y, Lotem A, et al (1999) Shop: Simple hierarchical ordered planner. In: IJCAI, pp 968–975

  38. Parr R, Russell SJ (1998) Reinforcement learning with hierarchies of machines. In: NeurIPS, pp 1043–1049

  39. Plappert M, Andrychowicz M, Ray A, et al (2018) Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464

  40. Ravindran B, Barto AG (2003) Smdp homomorphisms: an algebraic approach to abstraction in semi markov decision processes. In: IJCAI, pp 1011–1018

  41. Ravindran B, Barto AG (2003) SMDP homomorphisms: an algebraic approach to abstraction in semi-markov decision processes. In: IJCAI. Morgan Kaufmann, pp 1011–1018

  42. Riegel R, Gray AG, Luus FPS, et al (2020) Logical neural networks. CoRR abs/2006.13155

  43. Silver D, Hubert T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

    Article  MathSciNet  MATH  Google Scholar 

  44. Sutton RS, Precup D, Singh SP (1998) Intra-option learning about temporally abstract actions. In: ICML, pp 556–564

  45. Sutton RS, Precup D, Singh SP (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211

    Article  MathSciNet  MATH  Google Scholar 

  46. Vlasselaer J, Meert W, et al (2014) Efficient probabilistic inference for dynamic relational models. In: StarAI @ AAAI

  47. Yang F, Lyu D, Liu B, et al (2018) Peorl: integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. IJCAI pp 4860–4866

  48. Zambaldi V, Raposo D, et al (2019) Deep reinforcement learning with relational inductive biases. In: ICLR

  49. Zhang L, Li X, Wang M, et al (2021) Off-policy differentiable logic reinforcement learning. In: ECML PKDD, pp 617–632

Download references

Acknowledgements

HK and SN gratefully acknowledge the support of ARO award W911NF2010224 and AFOSR award FA9550-18-1-0462. PT acknowledges the support of DARPA contract N66001-17-2-4030 and NSF grant IIS-1619433. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the ARO, AFOSR, NSF, DARPA or the US government. We sincerely thank Illanes et al. (2020) for sharing the Taskable RL code for baselines. We also thank the Starling lab members for feedback on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsha Kokel.

Ethics declarations

Conflict of interest

The research leading to these results received funding from federal grants as mentioned in the acknowledgments. Specifically, HK and SN received the support of ARO (award W911NF2010224) and AFOSR (award FA9550-18-1-0462). PT received the support of DARPA (contract N66001-17-2-4030) and NSF (grant IIS-1619433). No conflict of interest exists with this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kokel, H., Natarajan, S., Ravindran, B. et al. RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains. Neural Comput & Applic 35, 16877–16892 (2023). https://doi.org/10.1007/s00521-022-08119-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08119-y

Keywords

Navigation