Abstract
In this chapter, we leverage reinforcement learning as a unified framework to design effective adaptive cyber defenses against zero-day attacks. Reinforcement learning is an integration of control theory and machine learning. A salient feature of reinforcement learning is that it does not require the defender to know critical information of zero-day attacks (e.g., their attack targets, and the locations of the vulnerabilities). This information is difficult, if not impossible, for the defender to gather in advance. The reinforcement learning based schemes are applied to defeat three classes of attacks: strategic attacks where the interactions between an attacker and a defender are modeled as a non-cooperative game; non-strategic random attacks where the attacker chooses its actions by following a predetermined probability distribution; and attacks depicted by Bayesian attack graphs where the attacker exploits combinations of multiple known or zero-day vulnerabilities to compromise machines in a network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shirey, R.: Internet Security Glossary, RFC 2828, RFC Editor, May 2000
Johansen, H., Johansen, D., van Renesse, R.: FirePatch: secure and time-critical dissemination of software patches. In: Venter, H., Eloff, M., Labuschagne, L., Eloff, J., von Solms, R. (eds.) SEC 2007. IIFIP, vol. 232, pp. 373–384. Springer, Boston, MA (2007). https://doi.org/10.1007/978-0-387-72367-9_32
Durumeric, Z., et al.: The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference (IMC 2014), Vancouver, BC, Canada, pp. 475–488 (2014)
CVE-2014-0160, Heartbleed bug (2014). https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0160
Symantec, Internet security threat report (2016). https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf
Symantec, Internet security threat report (2015). https://know.elq.symantec.com/LP=1542
Okhravi, H., et al.: Survey of cyber moving targets. Technical report, Massachusetts Institute of Technology Lexington Lincoln Lab (2013)
O’Donnell, A.J., Sethu, H.: On achieving software diversity for improved network security using distributed coloring algorithms. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS 2004, Washington DC, USA, pp. 121–131 (2004)
Fraser, T., Petkac, M., Badger, L.: Security agility for dynamic execution environments. Technical report, DTIC Document (2002)
Roeder, T., Schneider, F.B.: Proactive obfuscation. ACM Trans. Comput. Syst. 28(2) (2010)
Larsen, P., Homescu, A., Brunthaler, S., Franz, M.: SoK: automated software diversity. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP 2014), San Jose, CA, USA (2014)
Okhravi, H., Riordan, J., Carter, K.: Quantitative evaluation of dynamic platform techniques as a defensive mechanism. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 405–425. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_20
Bigelow, D., Hobson, T., Rudd, R., Streilein, W., Okhravi, H.: Timely rerandomization for mitigating memory disclosures. In: ACM SIGSAC Conference on Computer and Communications Security (CCS 205), Denver, Colorado, USA, pp. 268–279 (2015)
Sidiroglou, S., Giovanidis, G., Keromytis, A.D.: A dynamic mechanism for recovering from buffer overflow attacks. In: Zhou, J., Lopez, J., Deng, R.H., Bao, F. (eds.) ISC 2005. LNCS, vol. 3650, pp. 1–15. Springer, Heidelberg (2005). https://doi.org/10.1007/11556992_1
Giuffrida, C., Kuijsten, A., Tanenbaum, A.S.: Enhanced operating system security through efficient and fine-grained address space randomization. In: USENIX Conference on Security Symposium (Security 2012), Bellevue, WA, pp. 475–490, August 2012
Xin, Z., Chen, H., Han, H., Mao, B., Xie, L.: Misleading malware similarities analysis by automatic data structure obfuscation. In: Burmester, M., Tsudik, G., Magliveras, S., Ilić, I. (eds.) ISC 2010. LNCS, vol. 6531, pp. 181–195. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18178-8_16
Chen, P., Xu, J., Lin, Z., Xu, D., Mao, B., Liu, P.: A practical approach for adaptive data structure layout randomization. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9326, pp. 69–89. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24174-6_4
Cybenko, G., Jajodia, S., Wellman, M.P., Liu, P.: Adversarial and uncertain reasoning for adaptive cyber defense: building the scientific foundation. In: Prakash, A., Shyamasundar, R. (eds.) ICISS 2014. LNCS, vol. 8880, pp. 1–8. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13841-1_1
Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
mprotect(2) - Linux man page. http://linux.die.net/man/2/mprotect
Silberman, P., Johnson, R.: A comparison of buffer overflow prevention implementations and weaknesses. IDEFENSE, August 2004
Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)
Alpcan, T., BaÅŸar, T.: Network Security: A Decision and Game-Theoretic Approach. Cambridge University Press, Cambridge (2010)
Manshaei, M., Zhu, Q., Alpcan, T., Basar, T., Hubaux, J.: Game theory meets network security and privacy. ACM Comput. Surv. 45(3), 25–39 (2013)
Roy, S., Ellis, C., Shiva, S., Dasgupta, D., Shandilya, V., Wu, Q.: A survey of game theory as applied to network security, Hawaii, USA, pp. 1–10 (2010)
Bohacek, S., Hespanha, J., Lee, J., Lim, C., Obraczka, K.: Game theoretic stochastic routing for fault tolerance and security in computer networks. IEEE Trans. Parallel Distrib. Syst. 18(9), 1227–1240 (2007)
Zhu, Q., Clark, A., Poovendran, R., Başar, T.: Deceptive routing games. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, Hawaii, USA, pp. 2704–2711 (2012)
Clark, A., Sun, K., Bushnell, L., Poovendran, R.: A game-theoretic approach to IP address randomization in decoy-based cyber defense. In: Khouzani, M.H.R., Panaousis, E., Theodorakopoulos, G. (eds.) GameSec 2015. LNCS, vol. 9406, pp. 3–21. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25594-1_1
Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. SIAM Classics in Applied Mathematics, Philadelphia (1999)
Fudenberg, D., Levine, D.K.: The Theory of Learning in Games, vol. 2. MIT Press, Cambridge (1998)
Sandholm, W.H.: Population Games and Evolutionary Dynamics. MIT Press, Cambridge (2010)
Young, H.P.: Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Princeton University Press, Princeton (2001)
Arrow, K., Debreu, G.: Existence of an equilibrium for a competitive economy. Econometrica 22, 265–290 (1954)
Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5(3), 173–210 (2007)
Rosen, J.: Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3), 520–534 (1965)
Pang, J.-S., Scutari, G., Facchinei, F., Wang, C.: Distributed power allocation with rate constraints in Gaussian parallel interference channels. IEEE Trans. Inf. Theory 54(8), 3471–3489 (2008)
Yin, H., Shanbhag, U., Mehta, P.: Nash equilibrium problems with scaled congestion costs and shared constraints. IEEE Trans. Autom. Control 56(7), 1702–1708 (2011)
Palomar, D., Eldar, Y.: Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)
Zhu, M., MartÃnez, S.: Distributed coverage games for energy-aware mobile sensor networks. SIAM J. Control Optim. 51(1), 1–27 (2013)
Koshal, J., Nedić, A., Shanbhag, U.V.: Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 58, 594–609 (2013)
Yousefian, F., Nedić, A., Shanbhag, U.V.: A distributed adaptive steplength stochastic approximation method for monotone stochastic Nash games. In: 2013 American Control Conference, pp. 4765–4770, June 2013
Young, H.P.: The evolution of conventions. Econ.: J. Econom. Soc. 61, 57–84 (1993)
Hu, Z., Zhu, M., Chen, P., Liu, P.: On convergence rates of game theoretic reinforcement learning algorithms. Automatica 104(6), 90–101 (2019)
Okhravi, H., Comella, A., Robinson, E., Haines, J.: Creating a cyber moving target for critical infrastructure applications using platform diversity. Int. J. Crit. Infrastruct. Prot. 5(1), 30–39 (2012)
Takahashi, S., Yamamori, T.: The pure Nash equilibrium property and the quasi-acyclic condition. Econ. Bull. 3(22), 1–6 (2002)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Zurschmeide, J.B.: IRIX Advanced Site and Server Administration Guide. Silicon Graphics (1994)
Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems. CoRR, abs/1402.6028 (2014)
SPEC CPU benchmark suite (2000). http://www.spec.org/cpu2000/
R. LLC, World’s most used penetration testing software (2016). http://www.metasploit.com/
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
CVE-2002-0656, Apache openSSL heap overflow exploit (2002). http://www.phreedom.org/research/exploits/apache-openssl/
Lin, Z., Riley, R.D., Xu, D.: Polymorphing software by randomizing data structure layout. In: Flegel, U., Bruschi, D. (eds.) DIMVA 2009. LNCS, vol. 5587, pp. 107–126. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02918-9_7
Crispin, C., et al.: Stackguard: automatic adaptive detection and prevention of buffer-overflow attacks. In: Proceedings of the 7th Conference on USENIX Security Symposium (SSYM 1998), San Antonio, Texas, pp. 63–78, January 1998
CVE-2001-0144, SSH CRC-32 compensation attack detector (2001). http://www.securityfocus.com/bid/2347/discuss
CVE-2015-0235, Ghost: glibc gethostbyname buffer overflow (2015). https://www.qualys.com/2015/01/27/cve-2015-0235/GHOST-CVE-2015-0235.txt
CVE-1999-0071, Apache-cookie bug (1999). http://seclab.cs.ucdavis.edu/projects/testing/vulner/39.html
Chen, P., Hu, Z., Xu, J., Zhu, M., Liu, P.: Feedback control can make data structure layout randomization more cost-effective under zero-day attacks. Cybersecurity 1, 3 (2018)
Âström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)
Miehling, E., Rasouli, M., Teneketzis, D.: Optimal defense policies for partially observable spreading processes on Bayesian attack graphs. In: Proceedings of the Second ACM Workshop on Moving Target Defense, MTD 2015, Denver, Colorado, USA, pp. 67–76. ACM (2015)
Liu, Y., Man, H.: Network vulnerability assessment using Bayesian networks. In: Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2005, pp. 61–71 (2005)
Poolsappasit, N., Dewri, R., Ray, I.: Dynamic security risk management using Bayesian attack graphs. IEEE Trans. Dependable Secur. Comput. 9, 61–74 (2012)
Nguyen, T.H., Wright, M., Wellman, M.P., Baveja, S.: Multi-stage attack graph security games: heuristic strategies, with empirical game-theoretic analysis. In: Proceedings of the 2017 Workshop on Moving Target Defense, MTD 2017, Dallas, Texas, USA, pp. 87–97. ACM (2017)
Schiffman, M.: Common vulnerability scoring system (CVSS) (2017). http://www.first.org/cvss
Hu, Z., Zhu, M., Liu, P.: Online algorithms for adaptive cyber defense on Bayesian attack graphs. In: Fourth ACM Workshop on Moving Target Defense in Association with 2017 ACM Conference on Computer and Communications Security, Dallas, pp. 99–109, October 2017
Zambon, E., Bolzoni, D.: Network intrusion detection systems: false positive reduction through anomaly detection (2006). http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Zambon.pdf
Sondik, E.J.: The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper. Res. 26(2), 282–304 (1978)
Bellman, R.E.: Dynamic Programming. Dover Publications, Mineola (2003)
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. J. Mach. Learn. Res. 6, 639–660 (2005)
Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71, 291–307 (2005)
Weisberg, H.: Central tendency and variability. No. 83, Sage University Paper Series on Quantitative Applications in the Social Sciences (1992)
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27, 1–51 (2013)
Schiffman, M.: Common vulnerability scoring system v3.0: specification document (2017). https://www.first.org/cvss/cvss-v30-specification-v1.7.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hu, Z., Chen, P., Zhu, M., Liu, P. (2019). Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks. In: Jajodia, S., Cybenko, G., Liu, P., Wang, C., Wellman, M. (eds) Adversarial and Uncertain Reasoning for Adaptive Cyber Defense. Lecture Notes in Computer Science(), vol 11830. Springer, Cham. https://doi.org/10.1007/978-3-030-30719-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-30719-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30718-9
Online ISBN: 978-3-030-30719-6
eBook Packages: Computer ScienceComputer Science (R0)