Skip to main content

Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks

  • Chapter
  • First Online:
Adversarial and Uncertain Reasoning for Adaptive Cyber Defense

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11830))

  • 1267 Accesses

Abstract

In this chapter, we leverage reinforcement learning as a unified framework to design effective adaptive cyber defenses against zero-day attacks. Reinforcement learning is an integration of control theory and machine learning. A salient feature of reinforcement learning is that it does not require the defender to know critical information of zero-day attacks (e.g., their attack targets, and the locations of the vulnerabilities). This information is difficult, if not impossible, for the defender to gather in advance. The reinforcement learning based schemes are applied to defeat three classes of attacks: strategic attacks where the interactions between an attacker and a defender are modeled as a non-cooperative game; non-strategic random attacks where the attacker chooses its actions by following a predetermined probability distribution; and attacks depicted by Bayesian attack graphs where the attacker exploits combinations of multiple known or zero-day vulnerabilities to compromise machines in a network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Shirey, R.: Internet Security Glossary, RFC 2828, RFC Editor, May 2000

    Google Scholar 

  2. Johansen, H., Johansen, D., van Renesse, R.: FirePatch: secure and time-critical dissemination of software patches. In: Venter, H., Eloff, M., Labuschagne, L., Eloff, J., von Solms, R. (eds.) SEC 2007. IIFIP, vol. 232, pp. 373–384. Springer, Boston, MA (2007). https://doi.org/10.1007/978-0-387-72367-9_32

    Chapter  Google Scholar 

  3. Durumeric, Z., et al.: The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference (IMC 2014), Vancouver, BC, Canada, pp. 475–488 (2014)

    Google Scholar 

  4. CVE-2014-0160, Heartbleed bug (2014). https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0160

  5. Symantec, Internet security threat report (2016). https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf

  6. Symantec, Internet security threat report (2015). https://know.elq.symantec.com/LP=1542

  7. Okhravi, H., et al.: Survey of cyber moving targets. Technical report, Massachusetts Institute of Technology Lexington Lincoln Lab (2013)

    Google Scholar 

  8. O’Donnell, A.J., Sethu, H.: On achieving software diversity for improved network security using distributed coloring algorithms. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS 2004, Washington DC, USA, pp. 121–131 (2004)

    Google Scholar 

  9. Fraser, T., Petkac, M., Badger, L.: Security agility for dynamic execution environments. Technical report, DTIC Document (2002)

    Google Scholar 

  10. Roeder, T., Schneider, F.B.: Proactive obfuscation. ACM Trans. Comput. Syst. 28(2) (2010)

    Article  Google Scholar 

  11. Larsen, P., Homescu, A., Brunthaler, S., Franz, M.: SoK: automated software diversity. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP 2014), San Jose, CA, USA (2014)

    Google Scholar 

  12. Okhravi, H., Riordan, J., Carter, K.: Quantitative evaluation of dynamic platform techniques as a defensive mechanism. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 405–425. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_20

    Chapter  Google Scholar 

  13. Bigelow, D., Hobson, T., Rudd, R., Streilein, W., Okhravi, H.: Timely rerandomization for mitigating memory disclosures. In: ACM SIGSAC Conference on Computer and Communications Security (CCS 205), Denver, Colorado, USA, pp. 268–279 (2015)

    Google Scholar 

  14. Sidiroglou, S., Giovanidis, G., Keromytis, A.D.: A dynamic mechanism for recovering from buffer overflow attacks. In: Zhou, J., Lopez, J., Deng, R.H., Bao, F. (eds.) ISC 2005. LNCS, vol. 3650, pp. 1–15. Springer, Heidelberg (2005). https://doi.org/10.1007/11556992_1

    Chapter  Google Scholar 

  15. Giuffrida, C., Kuijsten, A., Tanenbaum, A.S.: Enhanced operating system security through efficient and fine-grained address space randomization. In: USENIX Conference on Security Symposium (Security 2012), Bellevue, WA, pp. 475–490, August 2012

    Google Scholar 

  16. Xin, Z., Chen, H., Han, H., Mao, B., Xie, L.: Misleading malware similarities analysis by automatic data structure obfuscation. In: Burmester, M., Tsudik, G., Magliveras, S., Ilić, I. (eds.) ISC 2010. LNCS, vol. 6531, pp. 181–195. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18178-8_16

    Chapter  Google Scholar 

  17. Chen, P., Xu, J., Lin, Z., Xu, D., Mao, B., Liu, P.: A practical approach for adaptive data structure layout randomization. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9326, pp. 69–89. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24174-6_4

    Chapter  Google Scholar 

  18. Cybenko, G., Jajodia, S., Wellman, M.P., Liu, P.: Adversarial and uncertain reasoning for adaptive cyber defense: building the scientific foundation. In: Prakash, A., Shyamasundar, R. (eds.) ICISS 2014. LNCS, vol. 8880, pp. 1–8. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13841-1_1

    Chapter  Google Scholar 

  19. Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  20. Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  21. mprotect(2) - Linux man page. http://linux.die.net/man/2/mprotect

  22. Silberman, P., Johnson, R.: A comparison of buffer overflow prevention implementations and weaknesses. IDEFENSE, August 2004

    Google Scholar 

  23. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  24. Alpcan, T., BaÅŸar, T.: Network Security: A Decision and Game-Theoretic Approach. Cambridge University Press, Cambridge (2010)

    Book  MATH  Google Scholar 

  25. Manshaei, M., Zhu, Q., Alpcan, T., Basar, T., Hubaux, J.: Game theory meets network security and privacy. ACM Comput. Surv. 45(3), 25–39 (2013)

    Article  MATH  Google Scholar 

  26. Roy, S., Ellis, C., Shiva, S., Dasgupta, D., Shandilya, V., Wu, Q.: A survey of game theory as applied to network security, Hawaii, USA, pp. 1–10 (2010)

    Google Scholar 

  27. Bohacek, S., Hespanha, J., Lee, J., Lim, C., Obraczka, K.: Game theoretic stochastic routing for fault tolerance and security in computer networks. IEEE Trans. Parallel Distrib. Syst. 18(9), 1227–1240 (2007)

    Article  Google Scholar 

  28. Zhu, Q., Clark, A., Poovendran, R., Başar, T.: Deceptive routing games. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, Hawaii, USA, pp. 2704–2711 (2012)

    Google Scholar 

  29. Clark, A., Sun, K., Bushnell, L., Poovendran, R.: A game-theoretic approach to IP address randomization in decoy-based cyber defense. In: Khouzani, M.H.R., Panaousis, E., Theodorakopoulos, G. (eds.) GameSec 2015. LNCS, vol. 9406, pp. 3–21. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25594-1_1

    Chapter  MATH  Google Scholar 

  30. Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. SIAM Classics in Applied Mathematics, Philadelphia (1999)

    MATH  Google Scholar 

  31. Fudenberg, D., Levine, D.K.: The Theory of Learning in Games, vol. 2. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  32. Sandholm, W.H.: Population Games and Evolutionary Dynamics. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

  33. Young, H.P.: Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Princeton University Press, Princeton (2001)

    Google Scholar 

  34. Arrow, K., Debreu, G.: Existence of an equilibrium for a competitive economy. Econometrica 22, 265–290 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  35. Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5(3), 173–210 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  36. Rosen, J.: Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3), 520–534 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  37. Pang, J.-S., Scutari, G., Facchinei, F., Wang, C.: Distributed power allocation with rate constraints in Gaussian parallel interference channels. IEEE Trans. Inf. Theory 54(8), 3471–3489 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  38. Yin, H., Shanbhag, U., Mehta, P.: Nash equilibrium problems with scaled congestion costs and shared constraints. IEEE Trans. Autom. Control 56(7), 1702–1708 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  39. Palomar, D., Eldar, Y.: Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)

    MATH  Google Scholar 

  40. Zhu, M., Martínez, S.: Distributed coverage games for energy-aware mobile sensor networks. SIAM J. Control Optim. 51(1), 1–27 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  41. Koshal, J., Nedić, A., Shanbhag, U.V.: Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 58, 594–609 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  42. Yousefian, F., Nedić, A., Shanbhag, U.V.: A distributed adaptive steplength stochastic approximation method for monotone stochastic Nash games. In: 2013 American Control Conference, pp. 4765–4770, June 2013

    Google Scholar 

  43. Young, H.P.: The evolution of conventions. Econ.: J. Econom. Soc. 61, 57–84 (1993)

    MathSciNet  MATH  Google Scholar 

  44. Hu, Z., Zhu, M., Chen, P., Liu, P.: On convergence rates of game theoretic reinforcement learning algorithms. Automatica 104(6), 90–101 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  45. Okhravi, H., Comella, A., Robinson, E., Haines, J.: Creating a cyber moving target for critical infrastructure applications using platform diversity. Int. J. Crit. Infrastruct. Prot. 5(1), 30–39 (2012)

    Article  Google Scholar 

  46. Takahashi, S., Yamamori, T.: The pure Nash equilibrium property and the quasi-acyclic condition. Econ. Bull. 3(22), 1–6 (2002)

    Google Scholar 

  47. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  48. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  MATH  Google Scholar 

  49. Zurschmeide, J.B.: IRIX Advanced Site and Server Administration Guide. Silicon Graphics (1994)

    Google Scholar 

  50. Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems. CoRR, abs/1402.6028 (2014)

    Google Scholar 

  51. SPEC CPU benchmark suite (2000). http://www.spec.org/cpu2000/

  52. R. LLC, World’s most used penetration testing software (2016). http://www.metasploit.com/

  53. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  54. CVE-2002-0656, Apache openSSL heap overflow exploit (2002). http://www.phreedom.org/research/exploits/apache-openssl/

  55. Lin, Z., Riley, R.D., Xu, D.: Polymorphing software by randomizing data structure layout. In: Flegel, U., Bruschi, D. (eds.) DIMVA 2009. LNCS, vol. 5587, pp. 107–126. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02918-9_7

    Chapter  Google Scholar 

  56. Crispin, C., et al.: Stackguard: automatic adaptive detection and prevention of buffer-overflow attacks. In: Proceedings of the 7th Conference on USENIX Security Symposium (SSYM 1998), San Antonio, Texas, pp. 63–78, January 1998

    Google Scholar 

  57. CVE-2001-0144, SSH CRC-32 compensation attack detector (2001). http://www.securityfocus.com/bid/2347/discuss

  58. CVE-2015-0235, Ghost: glibc gethostbyname buffer overflow (2015). https://www.qualys.com/2015/01/27/cve-2015-0235/GHOST-CVE-2015-0235.txt

  59. CVE-1999-0071, Apache-cookie bug (1999). http://seclab.cs.ucdavis.edu/projects/testing/vulner/39.html

  60. Chen, P., Hu, Z., Xu, J., Zhu, M., Liu, P.: Feedback control can make data structure layout randomization more cost-effective under zero-day attacks. Cybersecurity 1, 3 (2018)

    Article  Google Scholar 

  61. Âström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  62. Miehling, E., Rasouli, M., Teneketzis, D.: Optimal defense policies for partially observable spreading processes on Bayesian attack graphs. In: Proceedings of the Second ACM Workshop on Moving Target Defense, MTD 2015, Denver, Colorado, USA, pp. 67–76. ACM (2015)

    Google Scholar 

  63. Liu, Y., Man, H.: Network vulnerability assessment using Bayesian networks. In: Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2005, pp. 61–71 (2005)

    Google Scholar 

  64. Poolsappasit, N., Dewri, R., Ray, I.: Dynamic security risk management using Bayesian attack graphs. IEEE Trans. Dependable Secur. Comput. 9, 61–74 (2012)

    Article  Google Scholar 

  65. Nguyen, T.H., Wright, M., Wellman, M.P., Baveja, S.: Multi-stage attack graph security games: heuristic strategies, with empirical game-theoretic analysis. In: Proceedings of the 2017 Workshop on Moving Target Defense, MTD 2017, Dallas, Texas, USA, pp. 87–97. ACM (2017)

    Google Scholar 

  66. Schiffman, M.: Common vulnerability scoring system (CVSS) (2017). http://www.first.org/cvss

  67. Hu, Z., Zhu, M., Liu, P.: Online algorithms for adaptive cyber defense on Bayesian attack graphs. In: Fourth ACM Workshop on Moving Target Defense in Association with 2017 ACM Conference on Computer and Communications Security, Dallas, pp. 99–109, October 2017

    Google Scholar 

  68. Zambon, E., Bolzoni, D.: Network intrusion detection systems: false positive reduction through anomaly detection (2006). http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Zambon.pdf

  69. Sondik, E.J.: The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper. Res. 26(2), 282–304 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  70. Bellman, R.E.: Dynamic Programming. Dover Publications, Mineola (2003)

    MATH  Google Scholar 

  71. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)

    Book  MATH  Google Scholar 

  72. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    MATH  Google Scholar 

  73. Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. J. Mach. Learn. Res. 6, 639–660 (2005)

    MathSciNet  MATH  Google Scholar 

  74. Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71, 291–307 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  75. Weisberg, H.: Central tendency and variability. No. 83, Sage University Paper Series on Quantitative Applications in the Social Sciences (1992)

    Book  Google Scholar 

  76. Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27, 1–51 (2013)

    Article  Google Scholar 

  77. Schiffman, M.: Common vulnerability scoring system v3.0: specification document (2017). https://www.first.org/cvss/cvss-v30-specification-v1.7.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minghui Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hu, Z., Chen, P., Zhu, M., Liu, P. (2019). Reinforcement Learning for Adaptive Cyber Defense Against Zero-Day Attacks. In: Jajodia, S., Cybenko, G., Liu, P., Wang, C., Wellman, M. (eds) Adversarial and Uncertain Reasoning for Adaptive Cyber Defense. Lecture Notes in Computer Science(), vol 11830. Springer, Cham. https://doi.org/10.1007/978-3-030-30719-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30719-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30718-9

  • Online ISBN: 978-3-030-30719-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics