Skip to main content

Advertisement

Log in

Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Reinforcement Learning (RL) is gaining importance in automating penetration testing as it reduces human effort and increases reliability. Nonetheless, given the rapidly expanding scale of modern network infrastructure, the limited testing scale and monotonous strategies of existing RL-based automated penetration testing methods make them less effective in practical application. In this paper, we present CLAP (Coverage-Based Reinforcement Learning to Automate Penetration Testing), an RL penetration testing agent that provides comprehensive network security assessments with diverse adversary testing behaviours on a massive scale. CLAP employs a novel neural network, namely the coverage mechanism, to address the enormous and growing action spaces in large networks. It also utilizes a Chebyshev decomposition critic to identify various adversary strategies and strike a balance between them. Experimental results across various scenarios demonstrate that CLAP outperforms state-of-the-art methods, by further reducing attack operations by nearly 35%. CLAP also provides enhanced training efficiency and stability and can effectively perform pen-testing over large-scale networks with up to 500 hosts. Additionally, the proposed agent is also able to discover pareto-dominant strategies that are both diverse and effective in achieving multiple objectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Applebaum A, Miller D, Strom B, Korban C, Wolf R. Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications. 2016

    Google Scholar 

  2. Hoffmann J. Simulated penetration testing: from “dijkstra” to “turing test++”. In: Proceedings of the 25th International Conference on Automated Planning and Scheduling. 2015

    Google Scholar 

  3. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676): 354–359

    Article  Google Scholar 

  4. Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, Choi D H, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou J P, Jaderberg M, Vezhnevets A S, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine T L, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, Mckinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350–354

    Article  Google Scholar 

  5. Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A. Reinforcement learning with augmented data. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020

    Google Scholar 

  6. Hu Z, Beuran R, Tan Y. Automated penetration testing using deep reinforcement learning. In: Proceedings of 2020 IEEE European Symposium on Security and Privacy Workshops. 2020, 2–10

    Google Scholar 

  7. Zhou S, Liu J, Hou D, Zhong X, Zhang Y. Autonomous penetration testing based on improved deep Q-network. Applied Sciences, 2021, 11(19): 8823

    Article  Google Scholar 

  8. Tran K, Akella A, Standen M, Kim J, Bowman D, Richer T, Lin C T. Deep hierarchical reinforcement agents for automated penetration testing. 2021, arXiv preprint arXiv: 2109.06449

  9. Schwartz J, Kurniawati H. Autonomous penetration testing using reinforcement learning. 2019, arXiv preprint arXiv: 1905.05965

  10. Schwartz J, Kurniawatti H. NASim: network attack simulator. Networkattacksimulator.readthedocs.io/, 2019

  11. Baillie C, Standen M, Schwartz J, Docking M, Bowman D, Kim J. CybORG: an autonomous cyber operations research gym. 2020, arXiv preprint arXiv: 2002.10667

  12. Shmaryahu D, Shani G, Hoffmann J, Steinmetz M. Simulated penetration testing as contingent planning. In: Proceedings of the 28th International Conference on Automated Planning and Scheduling. 2018, 241–249

    Google Scholar 

  13. Lallie H S, Debattista K, Bal J. A review of attack graph and attack tree visual syntax in cyber security. Computer Science Review, 2020, 35: 100219

    Article  MathSciNet  Google Scholar 

  14. Erdődi L, Zennaro F M. The agent web model: modeling web hacking for reinforcement learning. International Journal of Information Security, 2022, 21(2): 293–309

    Article  Google Scholar 

  15. Li Y, Yan J, Naili M. Deep reinforcement learning for penetration testing of cyber-physical attacks in the smart grid. In: Proceedings of 2022 International Joint Conference on Neural Networks. 2022, 1–9

    Google Scholar 

  16. Gangupantulu R, Cody T, Rahma A, Redino C, Clark R, Park P. Crown jewels analysis using reinforcement learning with attack graphs. In: Proceedings of 2021 IEEE Symposium Series on Computational Intelligence. 2021, 1–6

    Google Scholar 

  17. Ghanem M C, Chen T M. Reinforcement learning for efficient network penetration testing. Information, 2019, 11(1): 6

    Article  Google Scholar 

  18. Zennaro F M, Erdődi L. Modelling penetration testing with reinforcement learning using capture-the-flag challenges: trade-offs between model-free learning and a priori knowledge. IET Information Security, 2023, 17(3): 441–457

    Article  Google Scholar 

  19. Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017, 2778–2787

    Google Scholar 

  20. Wang P, Liu J, Zhong X, Yang G, Zhou S, Zhang Y. DUSC-DQN: an improved deep Q-network for intelligent penetration testing path design. In: Proceedings of the 7th International Conference on Computer and Communication Systems. 2022, 476–480

    Google Scholar 

  21. Vezhnevets A S, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K. Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3540–3549

    Google Scholar 

  22. Czarnecki W, Jayakumar S, Jaderberg M, Hasenclever L, Teh Y W, Heess N, Osindero S, Pascanu R. Mix & match agent curricula for reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1087–1095

    Google Scholar 

  23. Farquhar G, Gustafson L, Lin Z, Whiteson S, Usunier N, Synnaeve G. Growing action spaces. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 285

    Google Scholar 

  24. Murata T, Ishibuchi H, Gen M. Specification of genetic search directions in cellular multi-objective genetic algorithms. In: Proceedings of the 1st International Conference on Evolutionary Multi-Criterion Optimization. 2001, 82–95

    Chapter  Google Scholar 

  25. Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 2014, 18(4): 577–601

    Article  Google Scholar 

  26. Hsu C H, Chang S H, Liang J H, Chou H P, Liu C H, Chang S C, Pan J Y, Chen Y T, Wei W, Juan D C. MONAS: multi-objective neural architecture search using reinforcement learning. 2018, arXiv preprint arXiv: 1806.10332

  27. Mossalam H, Assael Y M, Roijers D M, Whiteson S. Multi-objective deep reinforcement learning. 2016, arXiv preprint arXiv: 1610.02707

  28. Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castaneda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859–865

    Article  MathSciNet  Google Scholar 

  29. Shen R, Zheng Y, Hao J, Meng Z, Chen Y, Fan C, Liu Y. Generating behavior-diverse game AIs with evolutionary multi-objective deep reinforcement learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 3371–3377

    Google Scholar 

  30. Strom B E, Applebaum A, Miller D P, Nickels K C, Pennington A G, Thomas C B. Mitre att&ck: Design and philosophy. Mitre Product MP, 2018

    Google Scholar 

  31. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347

  32. Schulman J, Moritz P, Levine S, Jordan M I, Abbeel P. High-dimensional continuous control using generalized advantage estimation. In: Proceedings of the 4th International Conference on Learning Representations. 2016

    Google Scholar 

  33. Burda Y, Edwards H, Storkey A J, Klimov O. Exploration by random network distillation. In: Proceedings of the 7th International Conference on Learning Representations. 2019

    Google Scholar 

  34. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018

    Google Scholar 

  35. Roijers D M, Vamplew P, Whiteson S, Dazeley R. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 2013, 48: 67–113

    Article  MathSciNet  Google Scholar 

  36. Oh I, Rho S, Moon S, Son S, Lee H, Chung J. Creating pro-level AI for a real-time fighting game using deep reinforcement learning. IEEE Transactions on Games, 2022, 14(2): 212–220

    Article  Google Scholar 

  37. Agarwal R, Schwarzer M, Castro P S, Courville A C, Bellemare M. Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 29304–29320

    Google Scholar 

  38. Standen M, Bowman D, Hoang S, Richer T, Lucas M, Van Tassel R. Cyber autonomy gym for experimentation challenge 1, 2021

    Google Scholar 

  39. Katz F H. Breadth vs. depth: best practices teaching cybersecurity in a small public university sharing models. The Cyber Defense Review, 2018, 3(2): 65–72

    Google Scholar 

Download references

Acknowledgements

This research was partially supported by te Key Research Project of Zhejiang Lab (No. 2021PB0AV02). We would like to thank Wenli Liu, Mengxuan Chen, Yuejin Ye and Shuitao Gan for their useful comments and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Liu.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Yizhou Yang received his PhD degree from The Australian National University, Australia. He is currently an assistant researcher at Zhongguancun Laboratory, China. His research interests include deep learning applications, network security, and wireless communication systems.

Longde Chen is currently an researcher at the National Research Centre of Parallel Computer Engineering and Technology, China. His research interests include parallel computing, large model software application and cybersecurity.

Sha Liu is a doctor student at University of Science and Technology of China, China. He is currently an associate researcher at the National Research Centre of Parallel Computer Engineering and Technology, China. His research interests include parallel computing and large model software application.

Lanning Wang is currently a professor at the Faculty of Geographical Science, Beijing Normal University, China. His research focuses on climate model and earth system model development, specifically in the areas of the dynamical core of the atmospheric model, model coupling technology, and high-performance computing technology for earth system models. He led the development of the Beijing Normal University–Earth System Model (BNU-ESM) and actively participated in the Couple Model Inter-comparison Project Phase 5 (CMIP5). He received the ACM Gordon Bell Prize in 2016.

Haohuan Fu (Senior Member, IEEE) received the PhD degree in computing from Imperial College, London. He is currently a professor with the Ministry of Education Key Laboratory for Earth System Modeling, and the Department of Earth System Science, Tsinghua University, China. He is also the deputy director with National Supercomputing Center, China. His research interests include high-performance computing in earth and environmental sciences, computer architectures, performance optimizations, and programming tools in parallel computing. He was the recipient of ACM Gordon Bell Prize in 2016, 2017, and 2021 and Most Significant Paper Award by FPL in 2015.

Xin Liu received the PhD degree from PLA Information Engineering University, China in 2006. She is currently a research fellow at National Research Centre of Parallel Computer Engineering and Technology, China. Her research interests include the areas of parallel algorithms and parallel application software.

Zuoning Chen is currently a research fellow at the National Research Centre of Parallel Computer Engineering and Technology, China. Her research interests include operation system and system security.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Chen, L., Liu, S. et al. Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach. Front. Comput. Sci. 19, 193309 (2025). https://doi.org/10.1007/s11704-024-3380-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-3380-1

Keywords