A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

Chin, Tommy; Xiong, Kaiqi; Hu, Chengbin; Li, Yi

doi:10.1007/978-3-030-01701-9_24

A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

Tommy Chin¹⁹,
Kaiqi Xiong²⁰,
Chengbin Hu²⁰ &
…
Yi Li²⁰

Conference paper
First Online: 29 December 2018

1554 Accesses
13 Citations

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 254))

Abstract

Malware or threat actors use a Command and Control (C2) environment to proliferate and manage an attack. In a sophisticated attack, a threat actor often employs a Domain Generation Algorithm (DGA) to cycle the network location in which malware communicates with C2. Network security controls such as blacklisting, implementing a DNS sinkhole, or inserting a firewall rule is a vital asset to an organization’s security posture. However, all of them are typically ineffective against a DGA. In this paper, we propose a machine learning framework for identifying and clustering domain names to circumvent threats from a DGA. We collect a real-time threat intelligent feed over a six month period where all domains have threats on the public Internet at the time of collection. We then apply the proposed machine learning framework to study DGA-based malware. The proposed framework contains a two-level model, which consists of classification and clustering is used to first detect DGA domains and then identify the DGA of those domains. Our extensive experimental results demonstrate the accuracy of the proposed framework. To be precise, we achieve accuracies of 95.14% for the first-level classification and 92.45% for the second-level clustering, respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70542-0_6
Chapter Google Scholar
Chin, T., Xiong, K., Rahouti, M.: SDN-based kernel modular countermeasure for intrusion detection. In: Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A. (eds.) SecureComm 2017. LNICST, vol. 238, pp. 270–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78813-5_14
Chapter Google Scholar
Ghosh, U., et al.: An SDN based framework for guaranteeing security and performance in information-centric cloud networks. In: Proceedings of the 11th IEEE International Conference on Cloud Computing (IEEE Cloud) (2017)
Google Scholar
Khancome, C., Boonjing, V., Chanvarasuth, P.: A two-hashing table multiple string pattern matching algorithm. In: Tenth International Conference on Information Technology: New Generations (ITNG), pp. 696–701. IEEE (2013)
Google Scholar
Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_11
Chapter Google Scholar
Sood, A.K., Zeadally, S.: A taxonomy of domain-generation algorithms. IEEE Secur. Priv. 14(4), 46–53 (2016)
Article Google Scholar
Xiong, K.: Multiple priority customer service guarantees in cluster computing. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2009)
Google Scholar
Xiong, K.: Resource optimization and security for cloud services. Wiley, Hoboken (2014)
Book Google Scholar
Xiong, K.: Resource optimization and security for distributed computing (2008). https://repository.lib.ncsu.edu/handle/1840.16/3581
Mark, B., et al.: GENI: a federated testbed for innovative network experiments. Comput. Netw. 61, 5–23 (2014)
Article Google Scholar
Xiong, K., Chen, X.: Ensuring cloud service guarantees via service level agreement (SLA)-based resource allocation. In: Proceedings of the IEEE 35th International Conference on Distributed Computing Systems Workshops, ICDCS Workshops, pp. 35–41. IEEE (2015)
Google Scholar
Chin, T., Xiong, K.: Dynamic generation containment systems (DGCS): A moving target defense approach. In: Proceedings of the 3rd International Workshop on Emerging Ideas and Trends in Engineering of Cyber-Physical Systems (EITEC), vol. 00, pp. 11–16, April 2016
Google Scholar
Sornalakshmi, K.: Detection of DoS attack and zero day threat with SIEM. In: International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1–7. IEEE (2017)
Google Scholar
Yadav, S., Reddy, A.L.N.: Winning with DNS failures: strategies for faster botnet detection. In: Rajarajan, M., Piper, F., Wang, H., Kesidis, G. (eds.) SecureComm 2011. LNICST, vol. 96, pp. 446–459. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31909-9_26
Chapter Google Scholar
Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Trans. Netw. 20(5), 1663–1677 (2012)
Article Google Scholar
Guo, F., Ferrie, P., Chiueh, T.: A study of the packer problem and its solutions. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 98–115. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87403-4_6
Chapter Google Scholar
Holz, T., Steiner, M., Dahl, F., Biersack, E., Freiling, F.C., et al.: Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm. LEET 8(1), 1–9 (2008)
Google Scholar
Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 53–60. IEEE (2011)
Google Scholar
Barabosch, T., Wichmann, A., Leder, F., Gerhards-Padilla, E.: Automatic extraction of domain name generation algorithms from current malware. In: Proceedings of NATO Symposium IST-111 on Information Assurance and Cyber Defense, Koblenz, Germany (2012)
Google Scholar
Gardiner, J., Nagaraja, S.: On the security of machine learning in malware c&c detection: a survey. ACM Comput. Surv. (CSUR) 49(3), 59 (2016)
Article Google Scholar
Ahluwalia, A., Traore, I., Ganame, K., Agarwal, N.: Detecting broad length algorithmically generated domains. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 19–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_2
Chapter Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)
Google Scholar
Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX security symposium, vol. 12 (2012)
Google Scholar
Wang, W., Shirley, K.: Breaking bad: detecting malicious domains using word segmentation. arXiv preprint arXiv:1506.04111 (2015)
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. LEET 8, 4 (2008)
Google Scholar
Mowbray, M., Hagen, J.: Finding domain-generation algorithms by looking at length distribution. In: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 395–400. IEEE (2014)
Google Scholar
Shabtai, A., et al.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Information Security Technical Report (2009)
Google Scholar
Sharifnya, R., Abadi, M.: A novel reputation system to detect DGA-based botnets. In: 3th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 417–423. IEEE (2013)
Google Scholar
Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)
Xu, W., Sanders, K., Zhang, Y.: We know it before you do: predicting malicious domains. In: Virus Bulletin Conference (2014)
Google Scholar
Yu, B., Gray, D.L., Pan, J., De Cock, M., Nascimento, A.C.: Inline DGA detection with deep networks. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 683–692. IEEE (2017)
Google Scholar
Saxe, J., Berlin, K.: eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint arXiv:1702.08568 (2017)
Bambenek: OSINT feeds from bambenek consulting. Bambenek Consulting
Google Scholar
Yang, L., Karim, R., Ganapathy, V., Smith, R.: Fast, memory-efficient regular expression matching with NFA-OBDDs. Comput. Netw. 55(15), 3376–3393 (2011)
Article Google Scholar
Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_1
Chapter Google Scholar
JBT Organization: Domain feed of known DGA domains (2017)
Google Scholar
Jarvis, K.: Cryptolocker ransomware. Viitattu 20, 2014 (2013)
Google Scholar
Chaignon, P.: A collection of known domain generation algorithms (2014)
Google Scholar
Technologies: Top million websites & TLDs (2016)
Google Scholar
Chin, T., Mountrouidou, X., Li, X., Xiong, K.: An SDN-supported collaborative approach for DDoS flooding detection and containment. In: 2015 IEEE Military Communications Conference, MILCOM 2015, pp. 659–664. IEEE (2015)
Google Scholar
Lenkala, S.R., Shetty, S., Xiong, K.: Security risk assessment of cloud carrier. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 442–449. IEEE (2013)
Google Scholar
Xiong, K., Perros, H.: SLA-based service composition in enterprise computing. In: 16th International Workshop on Quality of Service, IWQoS 2008, pp. 30–39. IEEE (2008)
Google Scholar

Download references

Acknowledgments

We acknowledge National Science Foundation (NSF) to partially sponsor the research work under grants #1633978, #1620871, #1636622, #1651280, and #1620862, and BBN/GPO project #1936 through an NSF/CNS grant. We also thank the Florida Center for Cybersecurity (FC2) located at the University of South Florida (USF) to support the research through its funding that is open to all institutions in the State University System of Florida.

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of NSF, FC2, and USF.

Author information

Authors and Affiliations

Department of Computing Security, Rochester Institute of Technology, Rochester, USA
Tommy Chin
Florida Center for Cybersecurity, University of South Florida, Tampa, USA
Kaiqi Xiong, Chengbin Hu & Yi Li

Authors

Tommy Chin
View author publications
You can also search for this author in PubMed Google Scholar
Kaiqi Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Chengbin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tommy Chin .

Editor information

Editors and Affiliations

Klaus Advanced Computing Building, Georgia Institute of Technology, Atlanta, GA, USA
Raheem Beyah
Singapore Management University, Singapore, Singapore
Bing Chang
School of Information Systems, Singapore Management University, Singapore, Singapore
Yingjiu Li
Pennsylvania State University, University Park, PA, USA
Sencun Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chin, T., Xiong, K., Hu, C., Li, Y. (2018). A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds) Security and Privacy in Communication Networks. SecureComm 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 254. Springer, Cham. https://doi.org/10.1007/978-3-030-01701-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-01701-9_24
Published: 29 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01700-2
Online ISBN: 978-3-030-01701-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics