Skip to main content

Advertisement

Log in

Efficient hierarchical policy network with fuzzy rules

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Hierarchical reinforcement learning (HRL) is a promising method, which decomposes complex tasks into a series of sub-tasks. However, at present, most HRL methods have slow convergence speed and are difficult to be widely applied to such scenarios in real life. In this paper, we propose an efficient hierarchical reinforcement learning algorithm with fuzzy rules (HFR), a novel framework for integrating human prior knowledge with hierarchical policy network, which can effectively accelerate the optimization of policy. The model presented in this paper uses fuzzy rules to represent the human prior knowledge, making the rules trainable because of the derivability of the fuzzy rules. In addition, a switch module that adaptively adjusts the decision-making frequency of the upper-level policy is proposed to solve the limitation of manual tuning. Experiment results demonstrate that HFR has a faster convergence rate than the current state-of-the-art HRL algorithms, especially in complex scenarios, such as robot control tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Al-taezi M, Zhu P, Hu Q, Wang Y, Al-badwi A (2021) Self-paced hierarchical metric learning (SPHML). Int J Mach Learn Cybernetics 12(9):2529–2541. https://doi.org/10.1007/s13042-021-01336-2

    Article  Google Scholar 

  2. An S, Hu Q, Wang C, Guo G, Li P (2021) Data reduction based on NN-kNN measure for NN classification and regression. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-021-01327-3

    Article  Google Scholar 

  3. Bakker B, Schmidhuber J (2003) Hierarchical reinforcement learning based on automatic discovery of subgoals and specialization of subpolicies. In: EWRL-6’2003: European workshop on reinforcement learning

  4. Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):341–379

    Article  MathSciNet  Google Scholar 

  5. Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: Proceedings of the 15th international conference on machine learning

  6. Dietterich TG (2000) An overview of maxq hierarchical reinforcement learning. In: Proceedings of the 4th international symposium on abstraction, reformulation, and approximation

  7. Fan C, Zeng L, Feng Y, Cheng G, Huang J, Liu Z (2020) A novel learning-based approach for efficient dismantling of networks. In J Mach Learn Cybernetics 11(9):2101–2111. https://doi.org/10.1007/s13042-020-01104-8

    Article  Google Scholar 

  8. Feng Y, Dai L, Gao J, Cheng G (2020) Uncertain pursuit-evasion game. Soft Comput 24(4):2425–2429. https://doi.org/10.1007/s00500-018-03689-3

    Article  MATH  Google Scholar 

  9. Feng Y, Shi W, Shi W, Cheng G, Huang J, Liu Z (2020) Benchmarking framework for command and control mission planning under uncertain environment. Soft Comput 24(4):2463–2478. https://doi.org/10.1007/s00500-018-03732-3

    Article  Google Scholar 

  10. Feng Y, Yang X, Cheng G (2018) Stability in mean for multi-dimensional uncertain differential equation. Soft Comput 22(17):5783–5789. https://doi.org/10.1007/s00500-017-2659-7

    Article  MATH  Google Scholar 

  11. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3389–3396

  12. Johnson F, Dana K (2020) Feudal steering: hierarchical learning for steering angle prediction. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW)

  13. Konidaris G, Barto A (2007) Building portable options: skill transfer in reinforcement learning. In: International journal conference on artificial intelligence

  14. Li S, Wang R, Tang M, Zhang C (2019) Hierarchical reinforcement learning with advantage-based auxiliary rewards. arXiv preprint arXiv: 1910.04450

  15. Lillicrap T, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Computer arXiv: 1509:02971

  16. Mcgovern A (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th international conference on machine learning

  17. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  18. Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv: 180508296

  19. Parr RE (1999) Hierarchical control and learning for markov decision processes. Thesis, University of California

  20. Perkins TJ, Barto AG, Brodley CE, Danyluk A (2003) Lyapunov design for safe reinforcement learning. J Mach Learn Res 3:803–832

    MathSciNet  MATH  Google Scholar 

  21. Rafati J, Noelle D (2019) Efficient exploration through intrinsic motivation learning for unsupervised subgoal discovery in model-free hierarchical reinforcement learning. arXiv preprint arXiv: 191110164

  22. Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. International conference on machine learning arXiv: 1502:05477

  23. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv: 170706347

  24. Stolle M, Precup D (2002) Learning options in reinforcement learning. In: Abstraction, reformulation and approximation, 5th international symposium, SARA 2002, Kananaskis, Alberta, Canada, August 2–4, 2002, Proceedings

  25. Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 31–36

  26. Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, PMLR, pp 3540–3549

  27. Wang Y, Liu R, Lin D, Chen D, Li P, Hu Q, Philip CL (2021) Chen coarse-to-fine: progressive knowledge transfer-based multitask convolutional neural network for intelligent large-scale fault diagnosis. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3100928

    Article  Google Scholar 

  28. Wu G, Fan M, Shi J, Feng Y. Reinforcement learning based truck-and-drone Coordinated Delivery. In: IEEE Transactions on Artificial Intelligence, pp 1–1. https://doi.org/10.1109/TAI.2021.3087666

  29. Xu Z, He Y, Wang X (2019) An overview of probabilistic-based expressions for qualitative decision-making: techniques comparisons and developments. Int J Mach Learn Cybernetics 10(6):1513–1528. https://doi.org/10.1007/s13042-018-0830-9

    Article  Google Scholar 

  30. Xu R, Wen Z, Gui L, Lu Q, Li B, Wang X (2020) Ensemble with estimation: seeking for optimization in class noisy data. Int J Mach Learn Cybernetics 11(2):231–248. https://doi.org/10.1007/s13042-019-00969-8

    Article  Google Scholar 

  31. Yu D, Xu Z, Wang X (2020) Bibliometric analysis of support vector machines research trend: a case study in China. Int J Mach Learn Cybernetics 11(3):715–728. https://doi.org/10.1007/s13042-019-01028-y

    Article  Google Scholar 

  32. Zadeh LA (1965) Fuzzy sets. Inf. Control 8(3):338–353

    Article  Google Scholar 

  33. Zadeh LA (1996) Knowledge representation in fuzzy logic. In: Fuzzy sets, fuzzy logic, and fuzzy systems

  34. Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y (2020) Kogun: accelerating deep reinforcement learning via integrating human suboptimal knowledge. arXiv preprint arXiv: 200207418

  35. Zhou WJ, Yu Y (2020) Temporal-adaptive hierarchical reinforcement learning. arXiv preprint arXiv: 200202080

Download references

Acknowledgements

Wei Shi, Yanghe Feng and Honglan Huang contributed equally to this work and should be considered as co-first authors. The authors would like to thank Xingxing Liang for his contribution to the model design of this paper.

Funding

The work described in this paper has been funded by National Natural Science Foundation of P.R. China(71701205).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanghe Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Wei Shi, Yanghe Feng and Honglan Huang co-first author.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, W., Feng, Y., Huang, H. et al. Efficient hierarchical policy network with fuzzy rules. Int. J. Mach. Learn. & Cyber. 13, 447–459 (2022). https://doi.org/10.1007/s13042-021-01417-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01417-2

Keywords

Navigation