Skip to main content

A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

  • Conference paper
  • First Online:

Abstract

Malware or threat actors use a Command and Control (C2) environment to proliferate and manage an attack. In a sophisticated attack, a threat actor often employs a Domain Generation Algorithm (DGA) to cycle the network location in which malware communicates with C2. Network security controls such as blacklisting, implementing a DNS sinkhole, or inserting a firewall rule is a vital asset to an organization’s security posture. However, all of them are typically ineffective against a DGA. In this paper, we propose a machine learning framework for identifying and clustering domain names to circumvent threats from a DGA. We collect a real-time threat intelligent feed over a six month period where all domains have threats on the public Internet at the time of collection. We then apply the proposed machine learning framework to study DGA-based malware. The proposed framework contains a two-level model, which consists of classification and clustering is used to first detect DGA domains and then identify the DGA of those domains. Our extensive experimental results demonstrate the accuracy of the proposed framework. To be precise, we achieve accuracies of 95.14% for the first-level classification and 92.45% for the second-level clustering, respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70542-0_6

    Chapter  Google Scholar 

  2. Chin, T., Xiong, K., Rahouti, M.: SDN-based kernel modular countermeasure for intrusion detection. In: Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A. (eds.) SecureComm 2017. LNICST, vol. 238, pp. 270–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78813-5_14

    Chapter  Google Scholar 

  3. Ghosh, U., et al.: An SDN based framework for guaranteeing security and performance in information-centric cloud networks. In: Proceedings of the 11th IEEE International Conference on Cloud Computing (IEEE Cloud) (2017)

    Google Scholar 

  4. Khancome, C., Boonjing, V., Chanvarasuth, P.: A two-hashing table multiple string pattern matching algorithm. In: Tenth International Conference on Information Technology: New Generations (ITNG), pp. 696–701. IEEE (2013)

    Google Scholar 

  5. Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_11

    Chapter  Google Scholar 

  6. Sood, A.K., Zeadally, S.: A taxonomy of domain-generation algorithms. IEEE Secur. Priv. 14(4), 46–53 (2016)

    Article  Google Scholar 

  7. Xiong, K.: Multiple priority customer service guarantees in cluster computing. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2009)

    Google Scholar 

  8. Xiong, K.: Resource optimization and security for cloud services. Wiley, Hoboken (2014)

    Book  Google Scholar 

  9. Xiong, K.: Resource optimization and security for distributed computing (2008). https://repository.lib.ncsu.edu/handle/1840.16/3581

  10. Mark, B., et al.: GENI: a federated testbed for innovative network experiments. Comput. Netw. 61, 5–23 (2014)

    Article  Google Scholar 

  11. Xiong, K., Chen, X.: Ensuring cloud service guarantees via service level agreement (SLA)-based resource allocation. In: Proceedings of the IEEE 35th International Conference on Distributed Computing Systems Workshops, ICDCS Workshops, pp. 35–41. IEEE (2015)

    Google Scholar 

  12. Chin, T., Xiong, K.: Dynamic generation containment systems (DGCS): A moving target defense approach. In: Proceedings of the 3rd International Workshop on Emerging Ideas and Trends in Engineering of Cyber-Physical Systems (EITEC), vol. 00, pp. 11–16, April 2016

    Google Scholar 

  13. Sornalakshmi, K.: Detection of DoS attack and zero day threat with SIEM. In: International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1–7. IEEE (2017)

    Google Scholar 

  14. Yadav, S., Reddy, A.L.N.: Winning with DNS failures: strategies for faster botnet detection. In: Rajarajan, M., Piper, F., Wang, H., Kesidis, G. (eds.) SecureComm 2011. LNICST, vol. 96, pp. 446–459. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31909-9_26

    Chapter  Google Scholar 

  15. Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Trans. Netw. 20(5), 1663–1677 (2012)

    Article  Google Scholar 

  16. Guo, F., Ferrie, P., Chiueh, T.: A study of the packer problem and its solutions. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 98–115. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87403-4_6

    Chapter  Google Scholar 

  17. Holz, T., Steiner, M., Dahl, F., Biersack, E., Freiling, F.C., et al.: Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm. LEET 8(1), 1–9 (2008)

    Google Scholar 

  18. Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 53–60. IEEE (2011)

    Google Scholar 

  19. Barabosch, T., Wichmann, A., Leder, F., Gerhards-Padilla, E.: Automatic extraction of domain name generation algorithms from current malware. In: Proceedings of NATO Symposium IST-111 on Information Assurance and Cyber Defense, Koblenz, Germany (2012)

    Google Scholar 

  20. Gardiner, J., Nagaraja, S.: On the security of machine learning in malware c&c detection: a survey. ACM Comput. Surv. (CSUR) 49(3), 59 (2016)

    Article  Google Scholar 

  21. Ahluwalia, A., Traore, I., Ganame, K., Agarwal, N.: Detecting broad length algorithmically generated domains. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 19–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_2

    Chapter  Google Scholar 

  22. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)

    Google Scholar 

  23. Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX security symposium, vol. 12 (2012)

    Google Scholar 

  24. Wang, W., Shirley, K.: Breaking bad: detecting malicious domains using word segmentation. arXiv preprint arXiv:1506.04111 (2015)

  25. McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. LEET 8, 4 (2008)

    Google Scholar 

  26. Mowbray, M., Hagen, J.: Finding domain-generation algorithms by looking at length distribution. In: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 395–400. IEEE (2014)

    Google Scholar 

  27. Shabtai, A., et al.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Information Security Technical Report (2009)

    Google Scholar 

  28. Sharifnya, R., Abadi, M.: A novel reputation system to detect DGA-based botnets. In: 3th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 417–423. IEEE (2013)

    Google Scholar 

  29. Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)

  30. Xu, W., Sanders, K., Zhang, Y.: We know it before you do: predicting malicious domains. In: Virus Bulletin Conference (2014)

    Google Scholar 

  31. Yu, B., Gray, D.L., Pan, J., De Cock, M., Nascimento, A.C.: Inline DGA detection with deep networks. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 683–692. IEEE (2017)

    Google Scholar 

  32. Saxe, J., Berlin, K.: eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint arXiv:1702.08568 (2017)

  33. Bambenek: OSINT feeds from bambenek consulting. Bambenek Consulting

    Google Scholar 

  34. Yang, L., Karim, R., Ganapathy, V., Smith, R.: Fast, memory-efficient regular expression matching with NFA-OBDDs. Comput. Netw. 55(15), 3376–3393 (2011)

    Article  Google Scholar 

  35. Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_1

    Chapter  Google Scholar 

  36. JBT Organization: Domain feed of known DGA domains (2017)

    Google Scholar 

  37. Jarvis, K.: Cryptolocker ransomware. Viitattu 20, 2014 (2013)

    Google Scholar 

  38. Chaignon, P.: A collection of known domain generation algorithms (2014)

    Google Scholar 

  39. Technologies: Top million websites & TLDs (2016)

    Google Scholar 

  40. Chin, T., Mountrouidou, X., Li, X., Xiong, K.: An SDN-supported collaborative approach for DDoS flooding detection and containment. In: 2015 IEEE Military Communications Conference, MILCOM 2015, pp. 659–664. IEEE (2015)

    Google Scholar 

  41. Lenkala, S.R., Shetty, S., Xiong, K.: Security risk assessment of cloud carrier. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 442–449. IEEE (2013)

    Google Scholar 

  42. Xiong, K., Perros, H.: SLA-based service composition in enterprise computing. In: 16th International Workshop on Quality of Service, IWQoS 2008, pp. 30–39. IEEE (2008)

    Google Scholar 

Download references

Acknowledgments

We acknowledge National Science Foundation (NSF) to partially sponsor the research work under grants #1633978, #1620871, #1636622, #1651280, and #1620862, and BBN/GPO project #1936 through an NSF/CNS grant. We also thank the Florida Center for Cybersecurity (FC2) located at the University of South Florida (USF) to support the research through its funding that is open to all institutions in the State University System of Florida.

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of NSF, FC2, and USF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tommy Chin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chin, T., Xiong, K., Hu, C., Li, Y. (2018). A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds) Security and Privacy in Communication Networks. SecureComm 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 254. Springer, Cham. https://doi.org/10.1007/978-3-030-01701-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01701-9_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01700-2

  • Online ISBN: 978-3-030-01701-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics