Abstract
In the cyber attack and defense process, the opponent’s strategy is often dynamic, random, and uncertain. Especially in an advanced persistent threat scenario, it is not easy to capture its behavior strategy when confronted with a long-term latent, highly dynamic and unpredictable opponent. FlipIt game can model the stealth interaction of advanced persistent threat. However, it is insufficient for traditional reinforcement learning approach to solve real-time and non-stationary game model. Therefore, how to model a non-stationary opponent implicitly and keep the defense agent’s advantage continuously is essential. In this paper, we propose an extended FlipIt game model incorporating opponent modeling. And then we propose an approach that combines deep reinforcement learning, opponent modeling, and dropout technology to perceive the behavior of a non-stationary opponent and defeat it. Instead of explicitly identifying the opponent’s intention, the defense agent observes the opponent’s last move actions from the game environment, stores the information in its knowledge, then perceives the opponent’s strategy and finally makes a decision to maximize its benefits. We show the excellent performance of our approach whether the opponent adopts traditional, random or composite strategies. The experimental results demonstrated that our approach can perceive the opponent quickly and maintain the superiority of suppressing the opponent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baillie, C., Standen, M., Schwartz, J., Docking, M., Bowman, D., Kim, J.: Cyborg: an autonomous cyber operations research gym. arXiv preprint arXiv:2002.10667 (2020)
Everett, R., Roberts, S.J.: Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In: AAAI Spring Symposia (2018)
Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326 (2017)
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Greige, L., Chin, P.: Deep reinforcement learning for flipit security game. In: Benito, R.M., et al. (eds.) COMPLEX NETWORKS 2021, pp. 831–843. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93409-5_68
He, H., Boyd-Graber, J., Kwok, K., Daumé III, H.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)
Hernandez-Leal, P., Zhan, Y., Taylor, M.E., Sucar, L.E., Munoz de Cote, E.: An exploration strategy for non-stationary opponents. Auton. Agent. Multi-Agent Syst. 31, 971–1002 (2017)
Hong, Z.W., Su, S.Y., Shann, T.Y., Chang, Y.H., Lee, C.Y.: A deep policy inference q-network for multi-agent systems. arXiv preprint arXiv:1712.07893 (2017)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 1179–1191 (2020)
Laszka, A., Horvath, G., Felegyhazi, M., Buttyán, L.: Flipthem: modeling targeted attacks with flipt, for multiple resources. In: Poovendran, R., Saad, W. (eds.) GameSec 2014. LNCS, vol. 8840, pp. 175–194. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12601-2_10
Li, L., Fayad, R., Taylor, A.: Cygil: a cyber gym for training autonomous agents over emulated network systems. arXiv preprint arXiv:2109.03331 (2021)
Liu, Z., Wang, L.: Flipit game model-based defense strategy against cyberattacks on SCADA systems considering insider assistance. IEEE Trans. Inf. Forensics Secur. 16, 2791–2804 (2021)
Ma, Y., et al.: Opponent portrait for multiagent reinforcement learning in competitive environment. Int. J. Intell. Syst. 36(12), 7461–7474 (2021)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Oakley, L., Oprea, A.: \({\sf QFlip}\): an adaptive reinforcement learning strategy for the \({\sf FlipIt}\) security game. In: Alpcan, T., Vorobeychik, Y., Baras, J.S., Dán, G. (eds.) GameSec 2019. LNCS, vol. 11836, pp. 364–384. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32430-8_22
Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4257–4266. PMLR (2018)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Tang, Z., Zhu, Y., Zhao, D., Lucas, S.M.: Enhanced rolling horizon evolution algorithm with opponent model learning. IEEE Transactions on Games (2020)
Tankard, C.: Advanced persistent threats and how to monitor and deter them. Netw. Secur. 2011(8), 16–19 (2011)
Team, M.D.: CyberBattleSim (2021). https://github.com/microsoft/cyberbattlesim
Van Dijk, M., Juels, A., Oprea, A., Rivest, R.L.: Flipit: the game of “stealthy takeover.’’. J. Cryptol. 26, 655–713 (2013)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)
Wu, Z., Li, K., Xu, H., Zang, Y., An, B., Xing, J.: L2e: learning to exploit your opponent. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)
Zhang, R., Zhu, Q.: Flipin: a game-theoretic cyber insurance framework for incentive-compatible cyber risk management of internet of things. IEEE Trans. Inf. Forensics Secur. 15, 2026–2041 (2019)
Zhu, J., Wei, Y., Kang, Y., Jiang, X., Dullerud, G.E.: Adaptive deep reinforcement learning for non-stationary environments. Sci. Chin. Inf. Sci. 65(10), 202204 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Yao, Q., Xiong, X., Wang, P., Wang, Y. (2024). Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 562. Springer, Cham. https://doi.org/10.1007/978-3-031-54528-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-54528-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54527-6
Online ISBN: 978-3-031-54528-3
eBook Packages: Computer ScienceComputer Science (R0)