Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks

Hu, Zhisheng; Chen, Ping; Zhu, Minghui; Liu, Peng

doi:10.1007/978-3-030-30719-6_4

Zhisheng Hu¹³,
Ping Chen¹⁴,
Minghui Zhu¹³ &
…
Peng Liu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11830))

1267 Accesses

Abstract

In this chapter, we leverage reinforcement learning as a unified framework to design effective adaptive cyber defenses against zero-day attacks. Reinforcement learning is an integration of control theory and machine learning. A salient feature of reinforcement learning is that it does not require the defender to know critical information of zero-day attacks (e.g., their attack targets, and the locations of the vulnerabilities). This information is difficult, if not impossible, for the defender to gather in advance. The reinforcement learning based schemes are applied to defeat three classes of attacks: strategic attacks where the interactions between an attacker and a defender are modeled as a non-cooperative game; non-strategic random attacks where the attacker chooses its actions by following a predetermined probability distribution; and attacks depicted by Bayesian attack graphs where the attacker exploits combinations of multiple known or zero-day vulnerabilities to compromise machines in a network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection

Article 30 August 2022

Reinforcement Learning Model for Detecting Phishing Websites

Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense

References

Shirey, R.: Internet Security Glossary, RFC 2828, RFC Editor, May 2000
Google Scholar
Johansen, H., Johansen, D., van Renesse, R.: FirePatch: secure and time-critical dissemination of software patches. In: Venter, H., Eloff, M., Labuschagne, L., Eloff, J., von Solms, R. (eds.) SEC 2007. IIFIP, vol. 232, pp. 373–384. Springer, Boston, MA (2007). https://doi.org/10.1007/978-0-387-72367-9_32
Chapter Google Scholar
Durumeric, Z., et al.: The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference (IMC 2014), Vancouver, BC, Canada, pp. 475–488 (2014)
Google Scholar
CVE-2014-0160, Heartbleed bug (2014). https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0160
Symantec, Internet security threat report (2016). https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf
Symantec, Internet security threat report (2015). https://know.elq.symantec.com/LP=1542
Okhravi, H., et al.: Survey of cyber moving targets. Technical report, Massachusetts Institute of Technology Lexington Lincoln Lab (2013)
Google Scholar
O’Donnell, A.J., Sethu, H.: On achieving software diversity for improved network security using distributed coloring algorithms. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS 2004, Washington DC, USA, pp. 121–131 (2004)
Google Scholar
Fraser, T., Petkac, M., Badger, L.: Security agility for dynamic execution environments. Technical report, DTIC Document (2002)
Google Scholar
Roeder, T., Schneider, F.B.: Proactive obfuscation. ACM Trans. Comput. Syst. 28(2) (2010)
Article Google Scholar
Larsen, P., Homescu, A., Brunthaler, S., Franz, M.: SoK: automated software diversity. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP 2014), San Jose, CA, USA (2014)
Google Scholar
Okhravi, H., Riordan, J., Carter, K.: Quantitative evaluation of dynamic platform techniques as a defensive mechanism. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 405–425. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_20
Chapter Google Scholar
Bigelow, D., Hobson, T., Rudd, R., Streilein, W., Okhravi, H.: Timely rerandomization for mitigating memory disclosures. In: ACM SIGSAC Conference on Computer and Communications Security (CCS 205), Denver, Colorado, USA, pp. 268–279 (2015)
Google Scholar
Sidiroglou, S., Giovanidis, G., Keromytis, A.D.: A dynamic mechanism for recovering from buffer overflow attacks. In: Zhou, J., Lopez, J., Deng, R.H., Bao, F. (eds.) ISC 2005. LNCS, vol. 3650, pp. 1–15. Springer, Heidelberg (2005). https://doi.org/10.1007/11556992_1
Chapter Google Scholar
Giuffrida, C., Kuijsten, A., Tanenbaum, A.S.: Enhanced operating system security through efficient and fine-grained address space randomization. In: USENIX Conference on Security Symposium (Security 2012), Bellevue, WA, pp. 475–490, August 2012
Google Scholar
Xin, Z., Chen, H., Han, H., Mao, B., Xie, L.: Misleading malware similarities analysis by automatic data structure obfuscation. In: Burmester, M., Tsudik, G., Magliveras, S., Ilić, I. (eds.) ISC 2010. LNCS, vol. 6531, pp. 181–195. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18178-8_16
Chapter Google Scholar
Chen, P., Xu, J., Lin, Z., Xu, D., Mao, B., Liu, P.: A practical approach for adaptive data structure layout randomization. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9326, pp. 69–89. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24174-6_4
Chapter Google Scholar
Cybenko, G., Jajodia, S., Wellman, M.P., Liu, P.: Adversarial and uncertain reasoning for adaptive cyber defense: building the scientific foundation. In: Prakash, A., Shyamasundar, R. (eds.) ICISS 2014. LNCS, vol. 8880, pp. 1–8. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13841-1_1
Chapter Google Scholar
Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
mprotect(2) - Linux man page. http://linux.die.net/man/2/mprotect
Silberman, P., Johnson, R.: A comparison of buffer overflow prevention implementations and weaknesses. IDEFENSE, August 2004
Google Scholar
Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)
Article MathSciNet MATH Google Scholar
Alpcan, T., Başar, T.: Network Security: A Decision and Game-Theoretic Approach. Cambridge University Press, Cambridge (2010)
Book MATH Google Scholar
Manshaei, M., Zhu, Q., Alpcan, T., Basar, T., Hubaux, J.: Game theory meets network security and privacy. ACM Comput. Surv. 45(3), 25–39 (2013)
Article MATH Google Scholar
Roy, S., Ellis, C., Shiva, S., Dasgupta, D., Shandilya, V., Wu, Q.: A survey of game theory as applied to network security, Hawaii, USA, pp. 1–10 (2010)
Google Scholar
Bohacek, S., Hespanha, J., Lee, J., Lim, C., Obraczka, K.: Game theoretic stochastic routing for fault tolerance and security in computer networks. IEEE Trans. Parallel Distrib. Syst. 18(9), 1227–1240 (2007)
Article Google Scholar
Zhu, Q., Clark, A., Poovendran, R., Başar, T.: Deceptive routing games. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, Hawaii, USA, pp. 2704–2711 (2012)
Google Scholar
Clark, A., Sun, K., Bushnell, L., Poovendran, R.: A game-theoretic approach to IP address randomization in decoy-based cyber defense. In: Khouzani, M.H.R., Panaousis, E., Theodorakopoulos, G. (eds.) GameSec 2015. LNCS, vol. 9406, pp. 3–21. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25594-1_1
Chapter MATH Google Scholar
Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. SIAM Classics in Applied Mathematics, Philadelphia (1999)
MATH Google Scholar
Fudenberg, D., Levine, D.K.: The Theory of Learning in Games, vol. 2. MIT Press, Cambridge (1998)
MATH Google Scholar
Sandholm, W.H.: Population Games and Evolutionary Dynamics. MIT Press, Cambridge (2010)
MATH Google Scholar
Young, H.P.: Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Princeton University Press, Princeton (2001)
Google Scholar
Arrow, K., Debreu, G.: Existence of an equilibrium for a competitive economy. Econometrica 22, 265–290 (1954)
Article MathSciNet MATH Google Scholar
Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5(3), 173–210 (2007)
Article MathSciNet MATH Google Scholar
Rosen, J.: Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3), 520–534 (1965)
Article MathSciNet MATH Google Scholar
Pang, J.-S., Scutari, G., Facchinei, F., Wang, C.: Distributed power allocation with rate constraints in Gaussian parallel interference channels. IEEE Trans. Inf. Theory 54(8), 3471–3489 (2008)
Article MathSciNet MATH Google Scholar
Yin, H., Shanbhag, U., Mehta, P.: Nash equilibrium problems with scaled congestion costs and shared constraints. IEEE Trans. Autom. Control 56(7), 1702–1708 (2011)
Article MathSciNet MATH Google Scholar
Palomar, D., Eldar, Y.: Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)
MATH Google Scholar
Zhu, M., Martínez, S.: Distributed coverage games for energy-aware mobile sensor networks. SIAM J. Control Optim. 51(1), 1–27 (2013)
Article MathSciNet MATH Google Scholar
Koshal, J., Nedić, A., Shanbhag, U.V.: Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 58, 594–609 (2013)
Article MathSciNet MATH Google Scholar
Yousefian, F., Nedić, A., Shanbhag, U.V.: A distributed adaptive steplength stochastic approximation method for monotone stochastic Nash games. In: 2013 American Control Conference, pp. 4765–4770, June 2013
Google Scholar
Young, H.P.: The evolution of conventions. Econ.: J. Econom. Soc. 61, 57–84 (1993)
MathSciNet MATH Google Scholar
Hu, Z., Zhu, M., Chen, P., Liu, P.: On convergence rates of game theoretic reinforcement learning algorithms. Automatica 104(6), 90–101 (2019)
Article MathSciNet MATH Google Scholar
Okhravi, H., Comella, A., Robinson, E., Haines, J.: Creating a cyber moving target for critical infrastructure applications using platform diversity. Int. J. Crit. Infrastruct. Prot. 5(1), 30–39 (2012)
Article Google Scholar
Takahashi, S., Yamamori, T.: The pure Nash equilibrium property and the quasi-acyclic condition. Econ. Bull. 3(22), 1–6 (2002)
Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Article MathSciNet MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article MATH Google Scholar
Zurschmeide, J.B.: IRIX Advanced Site and Server Administration Guide. Silicon Graphics (1994)
Google Scholar
Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems. CoRR, abs/1402.6028 (2014)
Google Scholar
SPEC CPU benchmark suite (2000). http://www.spec.org/cpu2000/
R. LLC, World’s most used penetration testing software (2016). http://www.metasploit.com/
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Article MathSciNet MATH Google Scholar
CVE-2002-0656, Apache openSSL heap overflow exploit (2002). http://www.phreedom.org/research/exploits/apache-openssl/
Lin, Z., Riley, R.D., Xu, D.: Polymorphing software by randomizing data structure layout. In: Flegel, U., Bruschi, D. (eds.) DIMVA 2009. LNCS, vol. 5587, pp. 107–126. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02918-9_7
Chapter Google Scholar
Crispin, C., et al.: Stackguard: automatic adaptive detection and prevention of buffer-overflow attacks. In: Proceedings of the 7th Conference on USENIX Security Symposium (SSYM 1998), San Antonio, Texas, pp. 63–78, January 1998
Google Scholar
CVE-2001-0144, SSH CRC-32 compensation attack detector (2001). http://www.securityfocus.com/bid/2347/discuss
CVE-2015-0235, Ghost: glibc gethostbyname buffer overflow (2015). https://www.qualys.com/2015/01/27/cve-2015-0235/GHOST-CVE-2015-0235.txt
CVE-1999-0071, Apache-cookie bug (1999). http://seclab.cs.ucdavis.edu/projects/testing/vulner/39.html
Chen, P., Hu, Z., Xu, J., Zhu, M., Liu, P.: Feedback control can make data structure layout randomization more cost-effective under zero-day attacks. Cybersecurity 1, 3 (2018)
Article Google Scholar
Âström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)
Article MathSciNet MATH Google Scholar
Miehling, E., Rasouli, M., Teneketzis, D.: Optimal defense policies for partially observable spreading processes on Bayesian attack graphs. In: Proceedings of the Second ACM Workshop on Moving Target Defense, MTD 2015, Denver, Colorado, USA, pp. 67–76. ACM (2015)
Google Scholar
Liu, Y., Man, H.: Network vulnerability assessment using Bayesian networks. In: Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2005, pp. 61–71 (2005)
Google Scholar
Poolsappasit, N., Dewri, R., Ray, I.: Dynamic security risk management using Bayesian attack graphs. IEEE Trans. Dependable Secur. Comput. 9, 61–74 (2012)
Article Google Scholar
Nguyen, T.H., Wright, M., Wellman, M.P., Baveja, S.: Multi-stage attack graph security games: heuristic strategies, with empirical game-theoretic analysis. In: Proceedings of the 2017 Workshop on Moving Target Defense, MTD 2017, Dallas, Texas, USA, pp. 87–97. ACM (2017)
Google Scholar
Schiffman, M.: Common vulnerability scoring system (CVSS) (2017). http://www.first.org/cvss
Hu, Z., Zhu, M., Liu, P.: Online algorithms for adaptive cyber defense on Bayesian attack graphs. In: Fourth ACM Workshop on Moving Target Defense in Association with 2017 ACM Conference on Computer and Communications Security, Dallas, pp. 99–109, October 2017
Google Scholar
Zambon, E., Bolzoni, D.: Network intrusion detection systems: false positive reduction through anomaly detection (2006). http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Zambon.pdf
Sondik, E.J.: The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper. Res. 26(2), 282–304 (1978)
Article MathSciNet MATH Google Scholar
Bellman, R.E.: Dynamic Programming. Dover Publications, Mineola (2003)
MATH Google Scholar
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)
Book MATH Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
MATH Google Scholar
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. J. Mach. Learn. Res. 6, 639–660 (2005)
MathSciNet MATH Google Scholar
Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71, 291–307 (2005)
Article MathSciNet MATH Google Scholar
Weisberg, H.: Central tendency and variability. No. 83, Sage University Paper Series on Quantitative Applications in the Social Sciences (1992)
Book Google Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27, 1–51 (2013)
Article Google Scholar
Schiffman, M.: Common vulnerability scoring system v3.0: specification document (2017). https://www.first.org/cvss/cvss-v30-specification-v1.7.pdf

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA
Zhisheng Hu & Minghui Zhu
JD.com, Beijing, 10111, China
Ping Chen
College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
Peng Liu

Authors

Zhisheng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minghui Zhu .

Editor information

Editors and Affiliations

George Mason University, Fairfax, VA, USA
Sushil Jajodia
Dartmouth College, Hanover, NH, USA
George Cybenko
Pennsylvania State University, University Park, PA, USA
Peng Liu
Army Research Laboratory, Triangle Park, NC, USA
Cliff Wang
University of Michigan, Ann Arbor, MI, USA
Michael Wellman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hu, Z., Chen, P., Zhu, M., Liu, P. (2019). Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks. In: Jajodia, S., Cybenko, G., Liu, P., Wang, C., Wellman, M. (eds) Adversarial and Uncertain Reasoning for Adaptive Cyber Defense. Lecture Notes in Computer Science(), vol 11830. Springer, Cham. https://doi.org/10.1007/978-3-030-30719-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-30719-6_4
Published: 31 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30718-9
Online ISBN: 978-3-030-30719-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection

Reinforcement Learning Model for Detecting Phishing Websites

Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection

Reinforcement Learning Model for Detecting Phishing Websites

Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation