Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines

Cong, Jinmiao; Liu, Yang; Liu, Chuanjuan

doi:10.1007/978-981-99-7019-3_17

Jinmiao Cong¹²,
Yang Liu¹² &
Chuanjuan Liu¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14325))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

Abstract

Recently, reinforcement learning (RL) has made great progress in theory and application. Whereas, challenges remain in RL, such as low sample utilization and difficulty in designing suitable reward functions. Therefore, this paper focuses on optimizing the structure of the reward function and improving sample utilization. We propose a hierarchical reinforcement learning (HRL) algorithm based on the options framework, which incorporates a segmented reward mechanism and an experience replay mechanism. The reward mechanism can help the agent grasp the reward function’s internal structure. The experience replay mechanism includes a buffer for storing typical experiences and a particular buffer for storing the special state experiences of the agent accessing the subtasks, which are conducive to training. We conducted single-task and multitask tests in multiple environments. Experimental results demonstrate that our algorithm has a better performance than baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andreas, J., Dan, L., et al.: Modular multitask reinforcement learning with policy sketches. In: 9th International Conference on Machine Learning, Proceedings, pp. 166–175 (2017)
Google Scholar
Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)
Article MathSciNet MATH Google Scholar
Karpathy, A.: REINFORCEjs: WaterWorld demo (2015). http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html
McIlraith, S., Icarte, R.T., Klassen, R.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: 10th International Conference on Machine Learning, Proceedings, pp. 2107–2116 (2018)
Google Scholar
Sutton, R.S., et al.: Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Toro Icarte, R., Waldie, E., Klassen, T., Valenzano, R., Castro, M., McIlraith, S.: Learning reward machines for partially observable reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Wiering, M.A., Van Otterlo, M.: Reinforcement learning. Adapt. Learn. Optim. 12(3), 729 (2012)
Google Scholar
Zheng, X., Yu, C., Zhang, M.: Lifelong reinforcement learning with temporal logic formulas and reward machines. Knowl.-Based Syst. 257, 109650 (2022)
Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant Nos. 62172072).

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
Jinmiao Cong, Yang Liu & Chuanjuan Liu

Authors

Jinmiao Cong
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chuanjuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuanjuan Liu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS (Malaysia), Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cong, J., Liu, Y., Liu, C. (2024). Guiding Task Learning by Hierarchical RL with an Experience Replay Mechanism Through Reward Machines. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-7019-3_17
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7018-6
Online ISBN: 978-981-99-7019-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics