Skip to main content

Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling

  • Conference paper
  • First Online:
Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2023)

Abstract

In the cyber attack and defense process, the opponent’s strategy is often dynamic, random, and uncertain. Especially in an advanced persistent threat scenario, it is not easy to capture its behavior strategy when confronted with a long-term latent, highly dynamic and unpredictable opponent. FlipIt game can model the stealth interaction of advanced persistent threat. However, it is insufficient for traditional reinforcement learning approach to solve real-time and non-stationary game model. Therefore, how to model a non-stationary opponent implicitly and keep the defense agent’s advantage continuously is essential. In this paper, we propose an extended FlipIt game model incorporating opponent modeling. And then we propose an approach that combines deep reinforcement learning, opponent modeling, and dropout technology to perceive the behavior of a non-stationary opponent and defeat it. Instead of explicitly identifying the opponent’s intention, the defense agent observes the opponent’s last move actions from the game environment, stores the information in its knowledge, then perceives the opponent’s strategy and finally makes a decision to maximize its benefits. We show the excellent performance of our approach whether the opponent adopts traditional, random or composite strategies. The experimental results demonstrated that our approach can perceive the opponent quickly and maintain the superiority of suppressing the opponent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baillie, C., Standen, M., Schwartz, J., Docking, M., Bowman, D., Kim, J.: Cyborg: an autonomous cyber operations research gym. arXiv preprint arXiv:2002.10667 (2020)

  2. Everett, R., Roberts, S.J.: Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In: AAAI Spring Symposia (2018)

    Google Scholar 

  3. Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326 (2017)

  4. Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)

  5. Greige, L., Chin, P.: Deep reinforcement learning for flipit security game. In: Benito, R.M., et al. (eds.) COMPLEX NETWORKS 2021, pp. 831–843. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93409-5_68

    Chapter  Google Scholar 

  6. He, H., Boyd-Graber, J., Kwok, K., Daumé III, H.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)

    Google Scholar 

  7. Hernandez-Leal, P., Zhan, Y., Taylor, M.E., Sucar, L.E., Munoz de Cote, E.: An exploration strategy for non-stationary opponents. Auton. Agent. Multi-Agent Syst. 31, 971–1002 (2017)

    Google Scholar 

  8. Hong, Z.W., Su, S.Y., Shann, T.Y., Chang, Y.H., Lee, C.Y.: A deep policy inference q-network for multi-agent systems. arXiv preprint arXiv:1712.07893 (2017)

  9. Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 1179–1191 (2020)

    Google Scholar 

  10. Laszka, A., Horvath, G., Felegyhazi, M., Buttyán, L.: Flipthem: modeling targeted attacks with flipt, for multiple resources. In: Poovendran, R., Saad, W. (eds.) GameSec 2014. LNCS, vol. 8840, pp. 175–194. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12601-2_10

    Chapter  Google Scholar 

  11. Li, L., Fayad, R., Taylor, A.: Cygil: a cyber gym for training autonomous agents over emulated network systems. arXiv preprint arXiv:2109.03331 (2021)

  12. Liu, Z., Wang, L.: Flipit game model-based defense strategy against cyberattacks on SCADA systems considering insider assistance. IEEE Trans. Inf. Forensics Secur. 16, 2791–2804 (2021)

    Article  Google Scholar 

  13. Ma, Y., et al.: Opponent portrait for multiagent reinforcement learning in competitive environment. Int. J. Intell. Syst. 36(12), 7461–7474 (2021)

    Article  Google Scholar 

  14. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  15. Oakley, L., Oprea, A.: \({\sf QFlip}\): an adaptive reinforcement learning strategy for the \({\sf FlipIt}\) security game. In: Alpcan, T., Vorobeychik, Y., Baras, J.S., Dán, G. (eds.) GameSec 2019. LNCS, vol. 11836, pp. 364–384. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32430-8_22

    Chapter  Google Scholar 

  16. Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4257–4266. PMLR (2018)

    Google Scholar 

  17. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  18. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  19. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  20. Tang, Z., Zhu, Y., Zhao, D., Lucas, S.M.: Enhanced rolling horizon evolution algorithm with opponent model learning. IEEE Transactions on Games (2020)

    Google Scholar 

  21. Tankard, C.: Advanced persistent threats and how to monitor and deter them. Netw. Secur. 2011(8), 16–19 (2011)

    Article  Google Scholar 

  22. Team, M.D.: CyberBattleSim (2021). https://github.com/microsoft/cyberbattlesim

  23. Van Dijk, M., Juels, A., Oprea, A., Rivest, R.L.: Flipit: the game of “stealthy takeover.’’. J. Cryptol. 26, 655–713 (2013)

    Article  MathSciNet  Google Scholar 

  24. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  25. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)

    Google Scholar 

  26. Wu, Z., Li, K., Xu, H., Zang, Y., An, B., Xing, J.: L2e: learning to exploit your opponent. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)

    Google Scholar 

  27. Zhang, R., Zhu, Q.: Flipin: a game-theoretic cyber insurance framework for incentive-compatible cyber risk management of internet of things. IEEE Trans. Inf. Forensics Secur. 15, 2026–2041 (2019)

    Article  Google Scholar 

  28. Zhu, J., Wei, Y., Kang, Y., Jiang, X., Dullerud, G.E.: Adaptive deep reinforcement learning for non-stationary environments. Sci. Chin. Inf. Sci. 65(10), 202204 (2022)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongjie Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, Q., Xiong, X., Wang, P., Wang, Y. (2024). Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 562. Springer, Cham. https://doi.org/10.1007/978-3-031-54528-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54528-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54527-6

  • Online ISBN: 978-3-031-54528-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics