Abstract
Reinforcement learning is a machine learning method that relies on the agent to learn by trial and error to solve decision optimization problems. It is well known that an agent based on deep reinforcement learning in complex environments is difficult to train. Moreover, the agent will generate unsafe and strange actions due to the lack of sufficient reward feedback from the environment. To make the agent converge to a better policy and make its behavior safer and more controllable under sparse rewards, we propose a subgoal embedding method based on prior knowledge and hierarchical strategy that can make the training process converge faster. The subgoal embedding method can be combined with existing reinforcement learning methods. In this paper, we combine the subgoal embedding method with REINFORCE algorithm and PPO(Proximal Policy Optimization) algorithm to test the method in the MiniGrid-DoorKey game environment of the gym platform. The experiments demonstrate the effectiveness of the subgoal embedding method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems 30 (2017)
Cheng, J., Yu, F., Zhang, H., Dai, Y.: Skill reward for safe deep reinforcement learning. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol. 1557, pp 203–213. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_15
Ferreira, E., Avignon, F., Lefevre, F.: On the use of social signal for reward shaping in reinforcement learning for dialogue management. In: SEMDIAL 2013 DialDam, p. 44 (2013)
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Iosif, A.C., Gasiba, T.E., Zhao, T., Lechner, U., Pinto-Albuquerque, M.: A large-scale study on the security vulnerabilities of cloud deployments. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security, UbiSec 2021. CCIS, vol. 1557, pp 171–188. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_13
Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., Kavukcuoglu, K.: Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397 (2016)
Koay, A.M.Y., Xie, M., Ko, R.K.L., Sterner, C., Choi, T., Dong, N.: Sdgen: A scalable, reproducible and flexible approach to generate real world cyber security datasets. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol 1557, pp 102–115. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_8
Lou, P., Xu, K., Jiang, X., Xiao, Z., Yan, J.: Path planning in an unknown environment based on deep reinforcement learning with prior knowledge. J. Intell. Fuzzy Syst. (Preprint), 1–17 (2021)
Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Hadsell, R.: Learning to navigate in complex environments. arXiv (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Ng, A.Y., Russell, S., et al.: Algorithms for inverse reinforcement learning. In: Icml, vol. 1, p. 2 (2000)
Plappert, M., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018)
Riedmiller, M., et al.: Learning by playing solving sparse reward tasks from scratch. In: International Conference On Machine Learning, pp. 4344–4353. PMLR (2018)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320. PMLR (2015)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Tang, Y., Zhang, D., Liang, W., Li, K.C., Sukhija, N.: Active malicious accounts detection with multimodal fusion machine learning algorithm. In: Wang, G., Choo, K.K.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds.) Ubiquitous Security. UbiSec 2021. CCIS, vol. 1557, pp 38–52. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0468-4_4
Tavakoli, A., Pardo, F., Kormushev, P.: Action branching architectures for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Zhao, X., Zhang, L., Xia, L., Ding, Z., Yin, D., Tang, J.: Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2017)
Acknowledgments
This work is supported in part by China Postdoctoral Science Foundation under Grant Number 2021M693976, Hunan Provincial Natural Science Foundation under Grant Number 2020JJ5367, Key Project of Teaching Reform in Colleges and Universities of Hunan Province under Grant Number HNJG-2021-0251, and Scientific Research Fund of Hunan Provincial Education Department under Grant Number 21A0599.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yu, F., Gao, F., Yuan, Y., Xing, X., Dai, Y. (2023). Hierarchical Policies of Subgoals for Safe Deep Reinforcement Learning. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_15
Download citation
DOI: https://doi.org/10.1007/978-981-99-0272-9_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)