Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling

Yao, Qian; Xiong, Xinli; Wang, Peng; Wang, Yongjie

doi:10.1007/978-3-031-54528-3_4

Qian Yao^18,19,
Xinli Xiong^18,19,
Peng Wang^18,19 &
…
Yongjie Wang^18,19

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 562))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

139 Accesses

Abstract

In the cyber attack and defense process, the opponent’s strategy is often dynamic, random, and uncertain. Especially in an advanced persistent threat scenario, it is not easy to capture its behavior strategy when confronted with a long-term latent, highly dynamic and unpredictable opponent. FlipIt game can model the stealth interaction of advanced persistent threat. However, it is insufficient for traditional reinforcement learning approach to solve real-time and non-stationary game model. Therefore, how to model a non-stationary opponent implicitly and keep the defense agent’s advantage continuously is essential. In this paper, we propose an extended FlipIt game model incorporating opponent modeling. And then we propose an approach that combines deep reinforcement learning, opponent modeling, and dropout technology to perceive the behavior of a non-stationary opponent and defeat it. Instead of explicitly identifying the opponent’s intention, the defense agent observes the opponent’s last move actions from the game environment, stores the information in its knowledge, then perceives the opponent’s strategy and finally makes a decision to maximize its benefits. We show the excellent performance of our approach whether the opponent adopts traditional, random or composite strategies. The experimental results demonstrated that our approach can perceive the opponent quickly and maintain the superiority of suppressing the opponent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baillie, C., Standen, M., Schwartz, J., Docking, M., Bowman, D., Kim, J.: Cyborg: an autonomous cyber operations research gym. arXiv preprint arXiv:2002.10667 (2020)
Everett, R., Roberts, S.J.: Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In: AAAI Spring Symposia (2018)
Google Scholar
Foerster, J.N., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326 (2017)
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Greige, L., Chin, P.: Deep reinforcement learning for flipit security game. In: Benito, R.M., et al. (eds.) COMPLEX NETWORKS 2021, pp. 831–843. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93409-5_68
Chapter Google Scholar
He, H., Boyd-Graber, J., Kwok, K., Daumé III, H.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)
Google Scholar
Hernandez-Leal, P., Zhan, Y., Taylor, M.E., Sucar, L.E., Munoz de Cote, E.: An exploration strategy for non-stationary opponents. Auton. Agent. Multi-Agent Syst. 31, 971–1002 (2017)
Google Scholar
Hong, Z.W., Su, S.Y., Shann, T.Y., Chang, Y.H., Lee, C.Y.: A deep policy inference q-network for multi-agent systems. arXiv preprint arXiv:1712.07893 (2017)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 1179–1191 (2020)
Google Scholar
Laszka, A., Horvath, G., Felegyhazi, M., Buttyán, L.: Flipthem: modeling targeted attacks with flipt, for multiple resources. In: Poovendran, R., Saad, W. (eds.) GameSec 2014. LNCS, vol. 8840, pp. 175–194. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12601-2_10
Chapter Google Scholar
Li, L., Fayad, R., Taylor, A.: Cygil: a cyber gym for training autonomous agents over emulated network systems. arXiv preprint arXiv:2109.03331 (2021)
Liu, Z., Wang, L.: Flipit game model-based defense strategy against cyberattacks on SCADA systems considering insider assistance. IEEE Trans. Inf. Forensics Secur. 16, 2791–2804 (2021)
Article Google Scholar
Ma, Y., et al.: Opponent portrait for multiagent reinforcement learning in competitive environment. Int. J. Intell. Syst. 36(12), 7461–7474 (2021)
Article Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Oakley, L., Oprea, A.: \({\sf QFlip}\): an adaptive reinforcement learning strategy for the \({\sf FlipIt}\) security game. In: Alpcan, T., Vorobeychik, Y., Baras, J.S., Dán, G. (eds.) GameSec 2019. LNCS, vol. 11836, pp. 364–384. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32430-8_22
Chapter Google Scholar
Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4257–4266. PMLR (2018)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet Google Scholar
Tang, Z., Zhu, Y., Zhao, D., Lucas, S.M.: Enhanced rolling horizon evolution algorithm with opponent model learning. IEEE Transactions on Games (2020)
Google Scholar
Tankard, C.: Advanced persistent threats and how to monitor and deter them. Netw. Secur. 2011(8), 16–19 (2011)
Article Google Scholar
Team, M.D.: CyberBattleSim (2021). https://github.com/microsoft/cyberbattlesim
Van Dijk, M., Juels, A., Oprea, A., Rivest, R.L.: Flipit: the game of “stealthy takeover.’’. J. Cryptol. 26, 655–713 (2013)
Article MathSciNet Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)
Google Scholar
Wu, Z., Li, K., Xu, H., Zang, Y., An, B., Xing, J.: L2e: learning to exploit your opponent. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)
Google Scholar
Zhang, R., Zhu, Q.: Flipin: a game-theoretic cyber insurance framework for incentive-compatible cyber risk management of internet of things. IEEE Trans. Inf. Forensics Secur. 15, 2026–2041 (2019)
Article Google Scholar
Zhu, J., Wei, Y., Kang, Y., Jiang, X., Dullerud, G.E.: Adaptive deep reinforcement learning for non-stationary environments. Sci. Chin. Inf. Sci. 65(10), 202204 (2022)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Electronic Engineering, National University of Defense Technology, Hefei, 230037, China
Qian Yao, Xinli Xiong, Peng Wang & Yongjie Wang
Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei, 230037, China
Qian Yao, Xinli Xiong, Peng Wang & Yongjie Wang

Authors

Qian Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xinli Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongjie Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongjie Wang .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool, Suzhou, China
Xinheng Wang
University of Peloponnese, Patra, Greece
Nikolaos Voros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, Q., Xiong, X., Wang, P., Wang, Y. (2024). Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 562. Springer, Cham. https://doi.org/10.1007/978-3-031-54528-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-54528-3_4
Published: 23 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54527-6
Online ISBN: 978-3-031-54528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling