skip to main content
10.1145/3155133.3155166acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

DGA Botnet Detection Using Supervised Learning Methods

Published: 07 December 2017 Publication History

Abstract

Modern botnets are based on Domain Generation Algorithms (DGAs) to build a resilient communication between bots and Command and Control (C&C) server. The basic aim is to avoid blacklisting and evade the Intrusion Protection Systems (IPS). Given the prevalence of this mechanism, numerous solutions have been developed in the literature. In particular, supervised learning has received an increased interest as it is able to operate on the raw domains and is amenable to real-time applications. Hidden Markov Model, C4.5 decision tree, Extreme Learning Machine, Long Short-Term Memory networks have become the state of the art in DGA botnet detection. There also exist several advanced supervised learning methods, namely Support Vector Machine (SVM), Recurrent SVM, CNN+LSTM and Bidirectional LSTM, which have not been suitably appropriated in such domain. This paper presents a first attempt to thoroughly investigate all the above methods, evaluate them on the real-world collected DGA dataset involving 38 classes with 168,900 samples, and should provide a valuable reference point for future research in this field.

References

[1]
S. Yadav, A.K.K Reddy, A.L.N. Reddy, S. Ranjan, Detecting algorithmically generated domain-flux attacks with DNS traffic analysis, IEEE/ACM Transactions on Networking 20.5 (2012): 1663--1677.
[2]
M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon, From Throw-Away Traffic to Bots: Detecting the Rise of DGA- Based Malware. In: the 21st USENIX Security Symposium (USENIX Security 12) (2012).
[3]
Y. Zhou, Q.S. Li, Q. Miao, K. Yin, DGA-Based Botnet Detection Using DNS Traffic, Journal of Internet Services and Information Security, 3.3/4 (2013): 116--123.
[4]
S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix: DGA-based botnet tracking and intelligence, International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) (2014). LNCS 8550, 192--211
[5]
H. Zhang, M. Gharaibeh, S. Thanasoulas, and C. Papadopoulos, Botdigger: Detecting dga bots in a single network, Proceedings of the IEEE International Workshop on Traffic Monitoring and Analaysis. 2016.
[6]
L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis, Ndss. 2011.
[7]
Y. Shi, C. Gong and L. Juntao, Malicious Domain Name Detection Based on Extreme Machine Learning, Neural Processing Letters (2017): 1--11.
[8]
J. Woodbridge, H.S. Anderson, A. Ahuja, and D. Grant, Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. arXiv preprint arXiv:1611.00791 (2016).
[9]
Y. Tang, Deep learning using linear support vector machines, arXiv preprint arXiv:1306.0239 (2013).
[10]
S.X. Zhang, R. Zhao, C. Liu, J. Li, and Y. Gong Recurrent support vector machines for speech recognition, IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), 2016.
[11]
Kim, Yoon, et al. Character-Aware Neural Language Models. AAAI. 2016.
[12]
A. Graves, and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks 18.5 (2005): 602--610.
[13]
S. Hochreiter, and J. Schmidhuber, Long short-term memory, Neural computation 9(8) (1997): 1735--1780.
[14]
FA. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, Neural computation 12(10) (2000): 2451--2471.
[15]
Jay Jacobs, Building a DGA Classifier: Feature Engineering. Available online at: http://datadrivensecurity.info/blog/posts/2014/Oct/dga-part2/. October 2014
[16]
S. Krishnan, T. Taylor, F. Monrose, and J. McHugh, Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing, 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (2013) 1--12
[17]
N. Cristianini, and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods, Cambridge university press, 2000.
[18]
J. Milgram, M. Cheriet, and R. Sabourin, "One against one" or "one against all": Which one is better for handwriting recognition with SVMs?, Tenth international workshop on frontiers in handwriting recognition. La Baule. 2006.
[19]
J.R. Quinlan, C4. 5: programs for machine learning, Elsevier, 2014
[20]
G.B Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70.1 (2006): 489--501.
[21]
W. Yin, K. Kann, M. Yu, and H. Schütze, Comparative Study of CNN and RNN for Natural Language Processing, arXiv preprint arXiv:1702. 01923 (2017).
[22]
V. Tong, and G. Nguyen, A method for detecting DGA botnet based on semantic and cluster analysis, Proceedings of the Seventh Symposium on Information and Communication Technology. ACM, 2016.
[23]
P. Su, X. Ding, Y. Zhang, Y. Li, and N. Zhao, Predicting Blood Pressure with Deep Bidirectional LSTM Network, arXiv preprint arXiv:1705.04524 (2017).
[24]
J. Demšar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine learning research 7 (2006): 1--30.
[25]
J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. Garcia, L. Sanchez, F. Herrera, Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic Soft Computing 17 (2-3) (2011) 255--287.
[26]
Chollet, François. Keras (2015). URL http://keras.io (2017).
[27]
F, Pedregosa, et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011): 2825--2830.
[28]
Does Alexa have a list of its top-ranked websites? Available online at: https://support.alexa.com/hc/en-us/articles/200449834-Does-Alexa-have-a-list-of-its-topranked-websites-. (2017).
[29]
Bambenek Consulting - Master feeds. Available online at: http://osint.bambenekconsulting.com/feeds/ (2016).
[30]
M. Masud, T. Al-khateeb, L. Khan, B. Thuraisingham, and K. Hamlen, Flow-based identification of botnet traffic by mining multiple log files, in Distributed Framework and Applications, 2008. DFmA 2008. First International Conference on, oct. 2008, pp. 200--206.
[31]
M. Antonakakis, et al., Building a Dynamic Reputation System for DNS, USENIX security symposium. 2010.
[32]
Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. Supervised machine learning: A review of classification techniques. (2007): 3--24.

Cited By

View all
  • (2024)KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA DetectionMathematics10.3390/math1205062612:5(626)Online publication date: 20-Feb-2024
  • (2024)Modulating LSTMs of Data-Driven Domain Features for DGA Detection: A Semantic Context-Dependent Method2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT)10.1109/ICCECT60629.2024.10545717(1508-1514)Online publication date: 26-Apr-2024
  • (2024)BERT-Enhanced DGA Botnet Detection: A Comparative Analysis of Machine Learning and Deep Learning Models2024 13th International Conference on Control, Automation and Information Sciences (ICCAIS)10.1109/ICCAIS63750.2024.10814364(1-6)Online publication date: 26-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology
December 2017
486 pages
ISBN:9781450353281
DOI:10.1145/3155133
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • SOICT: School of Information and Communication Technology - HUST
  • NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bidirectional LSTM
  2. DGA Botnet
  3. Long Short-Term Memory networks
  4. Recurrent SVM
  5. Supervised Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoICT 2017

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA DetectionMathematics10.3390/math1205062612:5(626)Online publication date: 20-Feb-2024
  • (2024)Modulating LSTMs of Data-Driven Domain Features for DGA Detection: A Semantic Context-Dependent Method2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT)10.1109/ICCECT60629.2024.10545717(1508-1514)Online publication date: 26-Apr-2024
  • (2024)BERT-Enhanced DGA Botnet Detection: A Comparative Analysis of Machine Learning and Deep Learning Models2024 13th International Conference on Control, Automation and Information Sciences (ICCAIS)10.1109/ICCAIS63750.2024.10814364(1-6)Online publication date: 26-Nov-2024
  • (2024)A Threat Modeling Framework for IoT-Based Botnet AttacksHeliyon10.1016/j.heliyon.2024.e39192(e39192)Online publication date: Oct-2024
  • (2024)On DGA Detection and Classification Using P4 Programmable SwitchesComputers & Security10.1016/j.cose.2024.104007145(104007)Online publication date: Oct-2024
  • (2023)A Novel Phishing Website Detection Model Based on LightGBM and Domain Name FeaturesSymmetry10.3390/sym1501018015:1(180)Online publication date: 7-Jan-2023
  • (2023)Detection of Algorithmically Generated Malicious Domain Names with Feature Fusion of Meaningful Word Segmentation and N-Gram SequencesApplied Sciences10.3390/app1307440613:7(4406)Online publication date: 30-Mar-2023
  • (2023)Use of subword tokenization for domain generation algorithm classificationCybersecurity10.1186/s42400-023-00183-86:1Online publication date: 7-Sep-2023
  • (2023)Adversarial Defense: DGA-Based Botnets and DNS Homographs Detection Through Integrated Deep LearningIEEE Transactions on Engineering Management10.1109/TEM.2021.305966470:1(249-266)Online publication date: Jan-2023
  • (2023)Towards DGA Domain Name Detection via Multi-feature Coordinated Representation and Random Forest2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00097(510-518)Online publication date: 30-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media