Abstract
Malware or threat actors use a Command and Control (C2) environment to proliferate and manage an attack. In a sophisticated attack, a threat actor often employs a Domain Generation Algorithm (DGA) to cycle the network location in which malware communicates with C2. Network security controls such as blacklisting, implementing a DNS sinkhole, or inserting a firewall rule is a vital asset to an organization’s security posture. However, all of them are typically ineffective against a DGA. In this paper, we propose a machine learning framework for identifying and clustering domain names to circumvent threats from a DGA. We collect a real-time threat intelligent feed over a six month period where all domains have threats on the public Internet at the time of collection. We then apply the proposed machine learning framework to study DGA-based malware. The proposed framework contains a two-level model, which consists of classification and clustering is used to first detect DGA domains and then identify the DGA of those domains. Our extensive experimental results demonstrate the accuracy of the proposed framework. To be precise, we achieve accuracies of 95.14% for the first-level classification and 92.45% for the second-level clustering, respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70542-0_6
Chin, T., Xiong, K., Rahouti, M.: SDN-based kernel modular countermeasure for intrusion detection. In: Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A. (eds.) SecureComm 2017. LNICST, vol. 238, pp. 270–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78813-5_14
Ghosh, U., et al.: An SDN based framework for guaranteeing security and performance in information-centric cloud networks. In: Proceedings of the 11th IEEE International Conference on Cloud Computing (IEEE Cloud) (2017)
Khancome, C., Boonjing, V., Chanvarasuth, P.: A two-hashing table multiple string pattern matching algorithm. In: Tenth International Conference on Information Technology: New Generations (ITNG), pp. 696–701. IEEE (2013)
Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_11
Sood, A.K., Zeadally, S.: A taxonomy of domain-generation algorithms. IEEE Secur. Priv. 14(4), 46–53 (2016)
Xiong, K.: Multiple priority customer service guarantees in cluster computing. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2009)
Xiong, K.: Resource optimization and security for cloud services. Wiley, Hoboken (2014)
Xiong, K.: Resource optimization and security for distributed computing (2008). https://repository.lib.ncsu.edu/handle/1840.16/3581
Mark, B., et al.: GENI: a federated testbed for innovative network experiments. Comput. Netw. 61, 5–23 (2014)
Xiong, K., Chen, X.: Ensuring cloud service guarantees via service level agreement (SLA)-based resource allocation. In: Proceedings of the IEEE 35th International Conference on Distributed Computing Systems Workshops, ICDCS Workshops, pp. 35–41. IEEE (2015)
Chin, T., Xiong, K.: Dynamic generation containment systems (DGCS): A moving target defense approach. In: Proceedings of the 3rd International Workshop on Emerging Ideas and Trends in Engineering of Cyber-Physical Systems (EITEC), vol. 00, pp. 11–16, April 2016
Sornalakshmi, K.: Detection of DoS attack and zero day threat with SIEM. In: International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1–7. IEEE (2017)
Yadav, S., Reddy, A.L.N.: Winning with DNS failures: strategies for faster botnet detection. In: Rajarajan, M., Piper, F., Wang, H., Kesidis, G. (eds.) SecureComm 2011. LNICST, vol. 96, pp. 446–459. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31909-9_26
Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Trans. Netw. 20(5), 1663–1677 (2012)
Guo, F., Ferrie, P., Chiueh, T.: A study of the packer problem and its solutions. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 98–115. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87403-4_6
Holz, T., Steiner, M., Dahl, F., Biersack, E., Freiling, F.C., et al.: Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm. LEET 8(1), 1–9 (2008)
Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 53–60. IEEE (2011)
Barabosch, T., Wichmann, A., Leder, F., Gerhards-Padilla, E.: Automatic extraction of domain name generation algorithms from current malware. In: Proceedings of NATO Symposium IST-111 on Information Assurance and Cyber Defense, Koblenz, Germany (2012)
Gardiner, J., Nagaraja, S.: On the security of machine learning in malware c&c detection: a survey. ACM Comput. Surv. (CSUR) 49(3), 59 (2016)
Ahluwalia, A., Traore, I., Ganame, K., Agarwal, N.: Detecting broad length algorithmically generated domains. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 19–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_2
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)
Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX security symposium, vol. 12 (2012)
Wang, W., Shirley, K.: Breaking bad: detecting malicious domains using word segmentation. arXiv preprint arXiv:1506.04111 (2015)
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. LEET 8, 4 (2008)
Mowbray, M., Hagen, J.: Finding domain-generation algorithms by looking at length distribution. In: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 395–400. IEEE (2014)
Shabtai, A., et al.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Information Security Technical Report (2009)
Sharifnya, R., Abadi, M.: A novel reputation system to detect DGA-based botnets. In: 3th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 417–423. IEEE (2013)
Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)
Xu, W., Sanders, K., Zhang, Y.: We know it before you do: predicting malicious domains. In: Virus Bulletin Conference (2014)
Yu, B., Gray, D.L., Pan, J., De Cock, M., Nascimento, A.C.: Inline DGA detection with deep networks. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 683–692. IEEE (2017)
Saxe, J., Berlin, K.: eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint arXiv:1702.08568 (2017)
Bambenek: OSINT feeds from bambenek consulting. Bambenek Consulting
Yang, L., Karim, R., Ganapathy, V., Smith, R.: Fast, memory-efficient regular expression matching with NFA-OBDDs. Comput. Netw. 55(15), 3376–3393 (2011)
Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_1
JBT Organization: Domain feed of known DGA domains (2017)
Jarvis, K.: Cryptolocker ransomware. Viitattu 20, 2014 (2013)
Chaignon, P.: A collection of known domain generation algorithms (2014)
Technologies: Top million websites & TLDs (2016)
Chin, T., Mountrouidou, X., Li, X., Xiong, K.: An SDN-supported collaborative approach for DDoS flooding detection and containment. In: 2015 IEEE Military Communications Conference, MILCOM 2015, pp. 659–664. IEEE (2015)
Lenkala, S.R., Shetty, S., Xiong, K.: Security risk assessment of cloud carrier. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 442–449. IEEE (2013)
Xiong, K., Perros, H.: SLA-based service composition in enterprise computing. In: 16th International Workshop on Quality of Service, IWQoS 2008, pp. 30–39. IEEE (2008)
Acknowledgments
We acknowledge National Science Foundation (NSF) to partially sponsor the research work under grants #1633978, #1620871, #1636622, #1651280, and #1620862, and BBN/GPO project #1936 through an NSF/CNS grant. We also thank the Florida Center for Cybersecurity (FC2) located at the University of South Florida (USF) to support the research through its funding that is open to all institutions in the State University System of Florida.
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of NSF, FC2, and USF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Chin, T., Xiong, K., Hu, C., Li, Y. (2018). A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds) Security and Privacy in Communication Networks. SecureComm 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 254. Springer, Cham. https://doi.org/10.1007/978-3-030-01701-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-01701-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01700-2
Online ISBN: 978-3-030-01701-9
eBook Packages: Computer ScienceComputer Science (R0)