Hierarchical Policies of Subgoals for Safe Deep Reinforcement Learning

Yu, Fumin; Gao, Feng; Yuan, Yao; Xing, Xiaofei; Dai, Yinglong

doi:10.1007/978-981-99-0272-9_15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1768))

Included in the following conference series:

International Conference on Ubiquitous Security

Abstract

Reinforcement learning is a machine learning method that relies on the agent to learn by trial and error to solve decision optimization problems. It is well known that an agent based on deep reinforcement learning in complex environments is difficult to train. Moreover, the agent will generate unsafe and strange actions due to the lack of sufficient reward feedback from the environment. To make the agent converge to a better policy and make its behavior safer and more controllable under sparse rewards, we propose a subgoal embedding method based on prior knowledge and hierarchical strategy that can make the training process converge faster. The subgoal embedding method can be combined with existing reinforcement learning methods. In this paper, we combine the subgoal embedding method with REINFORCE algorithm and PPO(Proximal Policy Optimization) algorithm to test the method in the MiniGrid-DoorKey game environment of the gym platform. The experiments demonstrate the effectiveness of the subgoal embedding method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Skill Reward for Safe Deep Reinforcement Learning

Safe Offline Reinforcement Learning Through Hierarchical Policies

A survey on model-based reinforcement learning

Article 23 January 2024

References

Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Cheng, J., Yu, F., Zhang, H., Dai, Y.: Skill reward for safe deep reinforcement learning. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol. 1557, pp 203–213. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_15
Ferreira, E., Avignon, F., Lefevre, F.: On the use of social signal for reward shaping in reinforcement learning for dialogue management. In: SEMDIAL 2013 DialDam, p. 44 (2013)
Google Scholar
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Iosif, A.C., Gasiba, T.E., Zhao, T., Lechner, U., Pinto-Albuquerque, M.: A large-scale study on the security vulnerabilities of cloud deployments. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security, UbiSec 2021. CCIS, vol. 1557, pp 171–188. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_13
Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., Kavukcuoglu, K.: Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397 (2016)
Koay, A.M.Y., Xie, M., Ko, R.K.L., Sterner, C., Choi, T., Dong, N.: Sdgen: A scalable, reproducible and flexible approach to generate real world cyber security datasets. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol 1557, pp 102–115. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_8
Lou, P., Xu, K., Jiang, X., Xiao, Z., Yan, J.: Path planning in an unknown environment based on deep reinforcement learning with prior knowledge. J. Intell. Fuzzy Syst. (Preprint), 1–17 (2021)
Google Scholar
Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Hadsell, R.: Learning to navigate in complex environments. arXiv (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Ng, A.Y., Russell, S., et al.: Algorithms for inverse reinforcement learning. In: Icml, vol. 1, p. 2 (2000)
Google Scholar
Plappert, M., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018)
Riedmiller, M., et al.: Learning by playing solving sparse reward tasks from scratch. In: International Conference On Machine Learning, pp. 4344–4353. PMLR (2018)
Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320. PMLR (2015)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Tang, Y., Zhang, D., Liang, W., Li, K.C., Sukhija, N.: Active malicious accounts detection with multimodal fusion machine learning algorithm. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security. UbiSec 2021. CCIS, vol. 1557, pp 38–52. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_4
Tavakoli, A., Pardo, F., Kormushev, P.: Action branching architectures for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Zhao, X., Zhang, L., Xia, L., Ding, Z., Yin, D., Tang, J.: Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2017)

Download references

Acknowledgments

This work is supported in part by China Postdoctoral Science Foundation under Grant Number 2021M693976, Hunan Provincial Natural Science Foundation under Grant Number 2020JJ5367, Key Project of Teaching Reform in Colleges and Universities of Hunan Province under Grant Number HNJG-2021-0251, and Scientific Research Fund of Hunan Provincial Education Department under Grant Number 21A0599.

Author information

Authors and Affiliations

College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
Fumin Yu, Feng Gao & Yao Yuan
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
Xiaofei Xing
College of Liberal Arts and Sciences, National University of Defense Technology, Changsha, 410073, China
Yinglong Dai
Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Changsha, 410081, China
Yinglong Dai

Authors

Fumin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xing
View author publications
You can also search for this author in PubMed Google Scholar
Yinglong Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinglong Dai .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
Temple University, Philadelphia, PA, USA
Jie Wu
Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Ernesto Damiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, F., Gao, F., Yuan, Y., Xing, X., Dai, Y. (2023). Hierarchical Policies of Subgoals for Safe Deep Reinforcement Learning. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_15

Download citation

DOI: https://doi.org/10.1007/978-981-99-0272-9_15
Published: 16 February 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Policies of Subgoals for Safe Deep Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Skill Reward for Safe Deep Reinforcement Learning

Safe Offline Reinforcement Learning Through Hierarchical Policies

A survey on model-based reinforcement learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Hierarchical Policies of Subgoals for Safe Deep Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Skill Reward for Safe Deep Reinforcement Learning

Safe Offline Reinforcement Learning Through Hierarchical Policies

A survey on model-based reinforcement learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation