Abstract
We propose an online learning model that efficiently teaches a defender’s agent to learn the attacker’s behavior while interacting in the cyber-world. This paper models the interaction between these two agents as a stochastic game with limited rationality. Because of this limited rationality, the proposed model helps the defender’s agent learn the unknown communicator’s behavior from the feedback obtained while interacting with it. Many models are built to solve the interaction between them by developing a state-oriented stochastic Markov game. However, such models fail due to the state explosion problem, and therefore, this paper discusses a model to solve this game, restricting it to a stateless stochastic game. The model is then compared to check the performance with different algorithms that solve stochastic games. The comparison between them shows that the proposed algorithm converges to an optimal strategy in a brief simulation time span. Finally, our model checks the performance with an existing technique that shows that the proposed algorithm chooses the correct strategy for around 91% of the simulation time compared to 73% of the simulation time by the existing algorithm.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
\(New\text { }Utility \leftarrow Old \text { } Utility + Learning Rate \times (Current \text {} Reward - Old\text { }Utility)\)
References
Cyber kill chain®. https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-chain.html. Accessed 16 January 2021
Hussain S, Ahmad MB, Ghouri SSU (2021) Advance persistent threat—a systematic review of literature and meta-analysis of threat vectors. In: Bhatia SK, Tiwari S, Ruidan S, Chandra Trivedi M, Mishra KK (eds) Advances in computer, communication and computational sciences. Springer, Singapore, pp 161–178
Khouzani MHR, Liu Z, Malacaria P (2019) Scalable min-max multi-objective cyber-security optimisation over probabilistic attack graphs. Eur J Oper Res 278(3):894–903
Almohri HMJ, Watson LT, Yao D, Ou X (2016) Security optimization of dynamic networks with probabilistic graph modeling and linear programming. IEEE Trans Depend Sec Comput 13(4):474–487
Farhang S, Manshaei MH, Esfahani Milad N, Zhu Q (2014) A dynamic bayesian security game framework for strategic defense mechanism design. In: 5th International conference on decision and game theory for security, GameSec 2014, pp 319–328
Osborne Martin J, Rubinstein A (1994) A course in game theory. MIT Press
Manshaei MH, Zhu Q, Alpcan T, Bacşar T, Hubaux J-P (2013) Game theory meets network security and privacy. ACM Comput Surv 45(3):25
Li Husheng, Lai L, Qiu Robert C (2011) A denial-of-service jamming game for remote state monitoring in smart grid. In: 2011 45th Annual conference on information sciences and systems, pp 1–6
Rasoul Etesami S, Başar T (2019) Dynamic games in cyber-physical security: an overview. Dyn Games Appl 9(4):884–913
Manadhata PK, Wing JM (2011) An attack surface metric. IEEE Trans Softw Eng 37(3):371–386
Zhang Y, Liu J (2019) Optimal decision-making approach for cyber security defense using game theory and intelligent learning. Sec Commun Netw 2019:3038586
Ouyang Y, Tavafoghi H, Teneketzis D (2017) Dynamic games with asymmetric information: Common information based perfect bayesian equilibria and sequential decomposition. IEEE Trans Autom Control 62(1):222–237
Lye K-W, Wing JM (2005) Game strategies in network security. Int J Inform Sec 4(1):71–86
Bossert W, Brams SJ, Kilgour DM (2002) Cooperative vs. non-cooperative truels: little agreement, but does that matter? Games Econ Behav 40(2):185–202
Shandilya V, Shiva S (2015) A network security game model. In: Proceedings of the 5th ACM conference on data and application security and privacy, pp 159–161
Nash JF (1950) Equilibrium points in n-person games. Proc Natl Acad Sci 36(1):48–49
Sakhnini J, Karimipour H, Dehghantanha A, Parizi RM, Srivastava G (2021) Security aspects of internet of things aided smart grids: a bibliometric survey. Intern Things 14:100111
Yazdinejad A, HaddadPajouh H, Dehghantanha A, Parizi RM, Srivastava G, Chen M-Y (2020) Cryptocurrency malware hunting: a deep recurrent neural network approach. Appl Soft Comput 96:106630
Fard SMH, Karimipour H, Dehghantanha A, Jahromi AN, Srivastava G (2020) Ensemble sparse representation-based cyber threat hunting for security of smart cities. Comput Elect Eng 88:106825
Li Y, Quevedo DE, Dey S, Shi L (2017) A game-theoretic approach to fake-acknowledgment attack on cyber-physical systems. IEEE Trans Sig Inform Process Over Netw 3(1):1–11
Lee S, Kim S, Choi K, Shon T (2018) Game theory-based security vulnerability quantification for social internet of things. Future Gen Comput Syst 82:752–760
Agah A, Das SK (2007) Preventing dos attacks in wireless sensor networks: a repeated game theory approach. Int J Netw Sec 5(2):145–153
Bendor J (2001) Bounded rationality. In: Smelser NJ, Baltes PB (eds) International encyclopedia of the social & behavioral sciences. Pergamon, Oxford, pp 1303–1307
Tan J, Zhang H, Zhang H, Lei C, Jin H, Li B, Hao H (2020) Optimal timing selection approach to moving target defense: a flipit attack-defense game model. Sec Commun Netw 2020:3151495
Sedaghati S, Abdollahi F, Khorasani K (2019) Model predictive and non-cooperative dynamic game fault recovery control strategies for a network of unmanned underwater vehicles. Int J Control 92(3):489–517
Nguyen Kien C, Alpcan T, Tamer BM (2009) Stochastic games for security in networks with interdependent nodes. In: Proceedings of the 2009 international conference on game theory for networks, GameNets ’09, Proceedings of the 2009 international conference on game theory for networks, GameNets ’09, vol 10, pp 697–703
Zhu Q, Tembine H, Başar T (2010) Network security configurations: a nonzero-sum stochastic game approach. In: Proceedings of the 2010 American control conference, pp 1059–1064
Liu X, Zhang H, Zhang Y, Shao L, Han J (2019) Active defense strategy selection method based on two-way signaling game. Sec Commun Netw 2019:1362964
Casey W, Kellner A, Memarmoshrefi P, Morales JA, Mishra B (2018) Deception, identity, and security: the game theory of sybil attacks. Commun ACM 62(1):85–93
McKenzie AJ (2019) Evolutionary game theory. In: Zalta EN (ed) The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University (summer 2019 edition)
Axelrod R, Hamilton WD (1981) The evolution of cooperation. Science 211(4489):1390–1396
Hu H, Liu Y, Zhang H, Pan R (2018) Optimal network defense strategy selection based on incomplete information evolutionary game. IEEE Access 6:29806–29821
Zhang H, Jiang L, Huang S, Wang J, Zhang Y (2019) Attack-defense differential game model for network defense strategy selection. IEEE Access 7:50618–50629
Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artificial Intelligence 136(2):215–250
Engel Y, Mannor S, Meir R (2003) Bayes meets bellman: the gaussian process approach to temporal difference learning. In: Proceedings of the twentieth international conference on international conference on machine learning, ICML’03. AAAI Press, pp 154–161
Strobl MAR, Barker D (2016) On simulated annealing phase transitions in phylogeny reconstruction. Mol Phylogen Evol 101:46–55
Alós-Ferrer C, Netzer N (2010) The logit-response dynamics. Games Econ Behav 68(2):413–427
Monderer D, Shapley LS (1996) Potential games. Games Econ Behav 14(1):124–143
Fudenberg D, Levine David K (1998) The theory of learning in games, vol 1 of MIT Press Books. The MIT Press, September
Shi P, Lian Y (2008) Game-theoretical effectiveness evaluation of ddos defense. In: Seventh international conference on networking (icn 2008), pp 427–433
Hannah Lauren A (2015) Stochastic optimization. In: Wright JD (ed) International encyclopedia of the social & behavioral sciences, 2nd edn. Elsevier, Oxford, pp 473–481
Abdallah S, Lesser V (2014) A multiagent reinforcement learning algorithm with non-linear dynamics. arXiv e-prints, arXiv:1401.3454, January
Watkins Christopher JCH, Dayan P (1992) Q-learning. In: Machine learning, pp 279–292
Simões D, Lau N, Reis Luís P (2018) Mixed-policy asynchronous deep q-learning. In: Anibal O, Alberto S, Luis M, Nuno L, Carlos C (eds) ROBOT 2017: third Iberian robotics conference. Springer International Publishing, Cham, pp 129–140
Bowling M (2005) Convergence and no-regret in multiagent learning. In: In advances in neural information processing systems, vol 17. MIT Press, pp 209–216
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, M.T., Borkotokey, S., Lahcen, R.A.M. et al. A generic scheme for cyber security in resource constraint network using incomplete information game. Evol. Intel. 16, 819–832 (2023). https://doi.org/10.1007/s12065-021-00684-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-021-00684-w