Skip to main content

Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines

  • Conference paper
  • First Online:
PRICAI 2023: Trends in Artificial Intelligence (PRICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14325))

Included in the following conference series:

Abstract

Recently, reinforcement learning (RL) has made great progress in theory and application. Whereas, challenges remain in RL, such as low sample utilization and difficulty in designing suitable reward functions. Therefore, this paper focuses on optimizing the structure of the reward function and improving sample utilization. We propose a hierarchical reinforcement learning (HRL) algorithm based on the options framework, which incorporates a segmented reward mechanism and an experience replay mechanism. The reward mechanism can help the agent grasp the reward function’s internal structure. The experience replay mechanism includes a buffer for storing typical experiences and a particular buffer for storing the special state experiences of the agent accessing the subtasks, which are conducive to training. We conducted single-task and multitask tests in multiple environments. Experimental results demonstrate that our algorithm has a better performance than baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andreas, J., Dan, L., et al.: Modular multitask reinforcement learning with policy sketches. In: 9th International Conference on Machine Learning, Proceedings, pp. 166–175 (2017)

    Google Scholar 

  2. Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  3. Karpathy, A.: REINFORCEjs: WaterWorld demo (2015). http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

  4. McIlraith, S., Icarte, R.T., Klassen, R.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: 10th International Conference on Machine Learning, Proceedings, pp. 2107–2116 (2018)

    Google Scholar 

  5. Sutton, R.S., et al.: Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  6. Toro Icarte, R., Waldie, E., Klassen, T., Valenzano, R., Castro, M., McIlraith, S.: Learning reward machines for partially observable reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  7. Wiering, M.A., Van Otterlo, M.: Reinforcement learning. Adapt. Learn. Optim. 12(3), 729 (2012)

    Google Scholar 

  8. Zheng, X., Yu, C., Zhang, M.: Lifelong reinforcement learning with temporal logic formulas and reward machines. Knowl.-Based Syst. 257, 109650 (2022)

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant Nos. 62172072).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuanjuan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cong, J., Liu, Y., Liu, C. (2024). Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7019-3_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7018-6

  • Online ISBN: 978-981-99-7019-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics