Abstract
Hierarchical reinforcement learning (HRL) is a promising method, which decomposes complex tasks into a series of sub-tasks. However, at present, most HRL methods have slow convergence speed and are difficult to be widely applied to such scenarios in real life. In this paper, we propose an efficient hierarchical reinforcement learning algorithm with fuzzy rules (HFR), a novel framework for integrating human prior knowledge with hierarchical policy network, which can effectively accelerate the optimization of policy. The model presented in this paper uses fuzzy rules to represent the human prior knowledge, making the rules trainable because of the derivability of the fuzzy rules. In addition, a switch module that adaptively adjusts the decision-making frequency of the upper-level policy is proposed to solve the limitation of manual tuning. Experiment results demonstrate that HFR has a faster convergence rate than the current state-of-the-art HRL algorithms, especially in complex scenarios, such as robot control tasks.
Similar content being viewed by others
References
Al-taezi M, Zhu P, Hu Q, Wang Y, Al-badwi A (2021) Self-paced hierarchical metric learning (SPHML). Int J Mach Learn Cybernetics 12(9):2529–2541. https://doi.org/10.1007/s13042-021-01336-2
An S, Hu Q, Wang C, Guo G, Li P (2021) Data reduction based on NN-kNN measure for NN classification and regression. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-021-01327-3
Bakker B, Schmidhuber J (2003) Hierarchical reinforcement learning based on automatic discovery of subgoals and specialization of subpolicies. In: EWRL-6’2003: European workshop on reinforcement learning
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):341–379
Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: Proceedings of the 15th international conference on machine learning
Dietterich TG (2000) An overview of maxq hierarchical reinforcement learning. In: Proceedings of the 4th international symposium on abstraction, reformulation, and approximation
Fan C, Zeng L, Feng Y, Cheng G, Huang J, Liu Z (2020) A novel learning-based approach for efficient dismantling of networks. In J Mach Learn Cybernetics 11(9):2101–2111. https://doi.org/10.1007/s13042-020-01104-8
Feng Y, Dai L, Gao J, Cheng G (2020) Uncertain pursuit-evasion game. Soft Comput 24(4):2425–2429. https://doi.org/10.1007/s00500-018-03689-3
Feng Y, Shi W, Shi W, Cheng G, Huang J, Liu Z (2020) Benchmarking framework for command and control mission planning under uncertain environment. Soft Comput 24(4):2463–2478. https://doi.org/10.1007/s00500-018-03732-3
Feng Y, Yang X, Cheng G (2018) Stability in mean for multi-dimensional uncertain differential equation. Soft Comput 22(17):5783–5789. https://doi.org/10.1007/s00500-017-2659-7
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3389–3396
Johnson F, Dana K (2020) Feudal steering: hierarchical learning for steering angle prediction. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW)
Konidaris G, Barto A (2007) Building portable options: skill transfer in reinforcement learning. In: International journal conference on artificial intelligence
Li S, Wang R, Tang M, Zhang C (2019) Hierarchical reinforcement learning with advantage-based auxiliary rewards. arXiv preprint arXiv: 1910.04450
Lillicrap T, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Computer arXiv: 1509:02971
Mcgovern A (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th international conference on machine learning
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv: 180508296
Parr RE (1999) Hierarchical control and learning for markov decision processes. Thesis, University of California
Perkins TJ, Barto AG, Brodley CE, Danyluk A (2003) Lyapunov design for safe reinforcement learning. J Mach Learn Res 3:803–832
Rafati J, Noelle D (2019) Efficient exploration through intrinsic motivation learning for unsupervised subgoal discovery in model-free hierarchical reinforcement learning. arXiv preprint arXiv: 191110164
Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. International conference on machine learning arXiv: 1502:05477
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv: 170706347
Stolle M, Precup D (2002) Learning options in reinforcement learning. In: Abstraction, reformulation and approximation, 5th international symposium, SARA 2002, Kananaskis, Alberta, Canada, August 2–4, 2002, Proceedings
Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 31–36
Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, PMLR, pp 3540–3549
Wang Y, Liu R, Lin D, Chen D, Li P, Hu Q, Philip CL (2021) Chen coarse-to-fine: progressive knowledge transfer-based multitask convolutional neural network for intelligent large-scale fault diagnosis. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3100928
Wu G, Fan M, Shi J, Feng Y. Reinforcement learning based truck-and-drone Coordinated Delivery. In: IEEE Transactions on Artificial Intelligence, pp 1–1. https://doi.org/10.1109/TAI.2021.3087666
Xu Z, He Y, Wang X (2019) An overview of probabilistic-based expressions for qualitative decision-making: techniques comparisons and developments. Int J Mach Learn Cybernetics 10(6):1513–1528. https://doi.org/10.1007/s13042-018-0830-9
Xu R, Wen Z, Gui L, Lu Q, Li B, Wang X (2020) Ensemble with estimation: seeking for optimization in class noisy data. Int J Mach Learn Cybernetics 11(2):231–248. https://doi.org/10.1007/s13042-019-00969-8
Yu D, Xu Z, Wang X (2020) Bibliometric analysis of support vector machines research trend: a case study in China. Int J Mach Learn Cybernetics 11(3):715–728. https://doi.org/10.1007/s13042-019-01028-y
Zadeh LA (1965) Fuzzy sets. Inf. Control 8(3):338–353
Zadeh LA (1996) Knowledge representation in fuzzy logic. In: Fuzzy sets, fuzzy logic, and fuzzy systems
Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y (2020) Kogun: accelerating deep reinforcement learning via integrating human suboptimal knowledge. arXiv preprint arXiv: 200207418
Zhou WJ, Yu Y (2020) Temporal-adaptive hierarchical reinforcement learning. arXiv preprint arXiv: 200202080
Acknowledgements
Wei Shi, Yanghe Feng and Honglan Huang contributed equally to this work and should be considered as co-first authors. The authors would like to thank Xingxing Liang for his contribution to the model design of this paper.
Funding
The work described in this paper has been funded by National Natural Science Foundation of P.R. China(71701205).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wei Shi, Yanghe Feng and Honglan Huang co-first author.
Rights and permissions
About this article
Cite this article
Shi, W., Feng, Y., Huang, H. et al. Efficient hierarchical policy network with fuzzy rules. Int. J. Mach. Learn. & Cyber. 13, 447–459 (2022). https://doi.org/10.1007/s13042-021-01417-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01417-2