Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach

Yang, Yizhou; Chen, Longde; Liu, Sha; Wang, Lanning; Fu, Haohuan; Liu, Xin; Chen, Zuoning

doi:10.1007/s11704-024-3380-1

Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach

Research Article
Published: 19 November 2024

Volume 19, article number 193309, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yizhou Yang^1,2,
Longde Chen³,
Sha Liu³,
Lanning Wang⁴,
Haohuan Fu⁵,
Xin Liu³ &
…
Zuoning Chen³

61 Accesses
2 Citations
11 Altmetric
2 Mentions
Explore all metrics

Abstract

Reinforcement Learning (RL) is gaining importance in automating penetration testing as it reduces human effort and increases reliability. Nonetheless, given the rapidly expanding scale of modern network infrastructure, the limited testing scale and monotonous strategies of existing RL-based automated penetration testing methods make them less effective in practical application. In this paper, we present CLAP (Coverage-Based Reinforcement Learning to Automate Penetration Testing), an RL penetration testing agent that provides comprehensive network security assessments with diverse adversary testing behaviours on a massive scale. CLAP employs a novel neural network, namely the coverage mechanism, to address the enormous and growing action spaces in large networks. It also utilizes a Chebyshev decomposition critic to identify various adversary strategies and strike a balance between them. Experimental results across various scenarios demonstrate that CLAP outperforms state-of-the-art methods, by further reducing attack operations by nearly 35%. CLAP also provides enhanced training efficiency and stability and can effectively perform pen-testing over large-scale networks with up to 500 hosts. Additionally, the proposed agent is also able to discover pareto-dominant strategies that are both diverse and effective in achieving multiple objectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

INNES: An intelligent network penetration testing model based on deep reinforcement learning

Article 01 September 2023

NIG-AP: a new method for automated penetration testing

Article 01 September 2019

A Comprehensive Literature Review of Artificial Intelligent Practices in the Field of Penetration Testing

References

Applebaum A, Miller D, Strom B, Korban C, Wolf R. Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications. 2016
Google Scholar
Hoffmann J. Simulated penetration testing: from “dijkstra” to “turing test++”. In: Proceedings of the 25th International Conference on Automated Planning and Scheduling. 2015
Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676): 354–359
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, Choi D H, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou J P, Jaderberg M, Vezhnevets A S, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine T L, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, Mckinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350–354
Article Google Scholar
Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A. Reinforcement learning with augmented data. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020
Google Scholar
Hu Z, Beuran R, Tan Y. Automated penetration testing using deep reinforcement learning. In: Proceedings of 2020 IEEE European Symposium on Security and Privacy Workshops. 2020, 2–10
Google Scholar
Zhou S, Liu J, Hou D, Zhong X, Zhang Y. Autonomous penetration testing based on improved deep Q-network. Applied Sciences, 2021, 11(19): 8823
Article Google Scholar
Tran K, Akella A, Standen M, Kim J, Bowman D, Richer T, Lin C T. Deep hierarchical reinforcement agents for automated penetration testing. 2021, arXiv preprint arXiv: 2109.06449
Schwartz J, Kurniawati H. Autonomous penetration testing using reinforcement learning. 2019, arXiv preprint arXiv: 1905.05965
Schwartz J, Kurniawatti H. NASim: network attack simulator. Networkattacksimulator.readthedocs.io/, 2019
Baillie C, Standen M, Schwartz J, Docking M, Bowman D, Kim J. CybORG: an autonomous cyber operations research gym. 2020, arXiv preprint arXiv: 2002.10667
Shmaryahu D, Shani G, Hoffmann J, Steinmetz M. Simulated penetration testing as contingent planning. In: Proceedings of the 28th International Conference on Automated Planning and Scheduling. 2018, 241–249
Google Scholar
Lallie H S, Debattista K, Bal J. A review of attack graph and attack tree visual syntax in cyber security. Computer Science Review, 2020, 35: 100219
Article MathSciNet Google Scholar
Erdődi L, Zennaro F M. The agent web model: modeling web hacking for reinforcement learning. International Journal of Information Security, 2022, 21(2): 293–309
Article Google Scholar
Li Y, Yan J, Naili M. Deep reinforcement learning for penetration testing of cyber-physical attacks in the smart grid. In: Proceedings of 2022 International Joint Conference on Neural Networks. 2022, 1–9
Google Scholar
Gangupantulu R, Cody T, Rahma A, Redino C, Clark R, Park P. Crown jewels analysis using reinforcement learning with attack graphs. In: Proceedings of 2021 IEEE Symposium Series on Computational Intelligence. 2021, 1–6
Google Scholar
Ghanem M C, Chen T M. Reinforcement learning for efficient network penetration testing. Information, 2019, 11(1): 6
Article Google Scholar
Zennaro F M, Erdődi L. Modelling penetration testing with reinforcement learning using capture-the-flag challenges: trade-offs between model-free learning and a priori knowledge. IET Information Security, 2023, 17(3): 441–457
Article Google Scholar
Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017, 2778–2787
Google Scholar
Wang P, Liu J, Zhong X, Yang G, Zhou S, Zhang Y. DUSC-DQN: an improved deep Q-network for intelligent penetration testing path design. In: Proceedings of the 7th International Conference on Computer and Communication Systems. 2022, 476–480
Google Scholar
Vezhnevets A S, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K. Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3540–3549
Google Scholar
Czarnecki W, Jayakumar S, Jaderberg M, Hasenclever L, Teh Y W, Heess N, Osindero S, Pascanu R. Mix & match agent curricula for reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1087–1095
Google Scholar
Farquhar G, Gustafson L, Lin Z, Whiteson S, Usunier N, Synnaeve G. Growing action spaces. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 285
Google Scholar
Murata T, Ishibuchi H, Gen M. Specification of genetic search directions in cellular multi-objective genetic algorithms. In: Proceedings of the 1st International Conference on Evolutionary Multi-Criterion Optimization. 2001, 82–95
Chapter Google Scholar
Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 2014, 18(4): 577–601
Article Google Scholar
Hsu C H, Chang S H, Liang J H, Chou H P, Liu C H, Chang S C, Pan J Y, Chen Y T, Wei W, Juan D C. MONAS: multi-objective neural architecture search using reinforcement learning. 2018, arXiv preprint arXiv: 1806.10332
Mossalam H, Assael Y M, Roijers D M, Whiteson S. Multi-objective deep reinforcement learning. 2016, arXiv preprint arXiv: 1610.02707
Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castaneda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859–865
Article MathSciNet Google Scholar
Shen R, Zheng Y, Hao J, Meng Z, Chen Y, Fan C, Liu Y. Generating behavior-diverse game AIs with evolutionary multi-objective deep reinforcement learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 3371–3377
Google Scholar
Strom B E, Applebaum A, Miller D P, Nickels K C, Pennington A G, Thomas C B. Mitre att&ck: Design and philosophy. Mitre Product MP, 2018
Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347
Schulman J, Moritz P, Levine S, Jordan M I, Abbeel P. High-dimensional continuous control using generalized advantage estimation. In: Proceedings of the 4th International Conference on Learning Representations. 2016
Google Scholar
Burda Y, Edwards H, Storkey A J, Klimov O. Exploration by random network distillation. In: Proceedings of the 7th International Conference on Learning Representations. 2019
Google Scholar
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018
Google Scholar
Roijers D M, Vamplew P, Whiteson S, Dazeley R. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 2013, 48: 67–113
Article MathSciNet Google Scholar
Oh I, Rho S, Moon S, Son S, Lee H, Chung J. Creating pro-level AI for a real-time fighting game using deep reinforcement learning. IEEE Transactions on Games, 2022, 14(2): 212–220
Article Google Scholar
Agarwal R, Schwarzer M, Castro P S, Courville A C, Bellemare M. Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 29304–29320
Google Scholar
Standen M, Bowman D, Hoang S, Richer T, Lucas M, Van Tassel R. Cyber autonomy gym for experimentation challenge 1, 2021
Google Scholar
Katz F H. Breadth vs. depth: best practices teaching cybersecurity in a small public university sharing models. The Cyber Defense Review, 2018, 3(2): 65–72
Google Scholar

Download references

Acknowledgements

This research was partially supported by te Key Research Project of Zhejiang Lab (No. 2021PB0AV02). We would like to thank Wenli Liu, Mengxuan Chen, Yuejin Ye and Shuitao Gan for their useful comments and support.

Author information

Authors and Affiliations

Zhongguancun Laboratory, Beijing, 100081, China
Yizhou Yang
Zhejiang Lab, Hangzhou, 311121, China
Yizhou Yang
National Research Centre of Parallel Computer Engineering and Technology, Wuxi, 214000, China
Longde Chen, Sha Liu, Xin Liu & Zuoning Chen
Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, China
Lanning Wang
Department of Earth System Science, Tsinghua University, Beijing, 100084, China
Haohuan Fu

Authors

Yizhou Yang
View author publications
You can also search for this author inPubMed Google Scholar
Longde Chen
View author publications
You can also search for this author inPubMed Google Scholar
Sha Liu
View author publications
You can also search for this author inPubMed Google Scholar
Lanning Wang
View author publications
You can also search for this author inPubMed Google Scholar
Haohuan Fu
View author publications
You can also search for this author inPubMed Google Scholar
Xin Liu
View author publications
You can also search for this author inPubMed Google Scholar
Zuoning Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xin Liu.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Yizhou Yang received his PhD degree from The Australian National University, Australia. He is currently an assistant researcher at Zhongguancun Laboratory, China. His research interests include deep learning applications, network security, and wireless communication systems.

Longde Chen is currently an researcher at the National Research Centre of Parallel Computer Engineering and Technology, China. His research interests include parallel computing, large model software application and cybersecurity.

Sha Liu is a doctor student at University of Science and Technology of China, China. He is currently an associate researcher at the National Research Centre of Parallel Computer Engineering and Technology, China. His research interests include parallel computing and large model software application.

Lanning Wang is currently a professor at the Faculty of Geographical Science, Beijing Normal University, China. His research focuses on climate model and earth system model development, specifically in the areas of the dynamical core of the atmospheric model, model coupling technology, and high-performance computing technology for earth system models. He led the development of the Beijing Normal University–Earth System Model (BNU-ESM) and actively participated in the Couple Model Inter-comparison Project Phase 5 (CMIP5). He received the ACM Gordon Bell Prize in 2016.

Haohuan Fu (Senior Member, IEEE) received the PhD degree in computing from Imperial College, London. He is currently a professor with the Ministry of Education Key Laboratory for Earth System Modeling, and the Department of Earth System Science, Tsinghua University, China. He is also the deputy director with National Supercomputing Center, China. His research interests include high-performance computing in earth and environmental sciences, computer architectures, performance optimizations, and programming tools in parallel computing. He was the recipient of ACM Gordon Bell Prize in 2016, 2017, and 2021 and Most Significant Paper Award by FPL in 2015.

Xin Liu received the PhD degree from PLA Information Engineering University, China in 2006. She is currently a research fellow at National Research Centre of Parallel Computer Engineering and Technology, China. Her research interests include the areas of parallel algorithms and parallel application software.

Zuoning Chen is currently a research fellow at the National Research Centre of Parallel Computer Engineering and Technology, China. Her research interests include operation system and system security.

Electronic supplementary material