research-article

DGA Botnet Detection Using Supervised Learning Methods

Authors:

Linh Giang Nguyen,

Hai Anh TranAuthors Info & Claims

SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

Pages 211 - 218

https://doi.org/10.1145/3155133.3155166

Published: 07 December 2017 Publication History

Abstract

Modern botnets are based on Domain Generation Algorithms (DGAs) to build a resilient communication between bots and Command and Control (C&C) server. The basic aim is to avoid blacklisting and evade the Intrusion Protection Systems (IPS). Given the prevalence of this mechanism, numerous solutions have been developed in the literature. In particular, supervised learning has received an increased interest as it is able to operate on the raw domains and is amenable to real-time applications. Hidden Markov Model, C4.5 decision tree, Extreme Learning Machine, Long Short-Term Memory networks have become the state of the art in DGA botnet detection. There also exist several advanced supervised learning methods, namely Support Vector Machine (SVM), Recurrent SVM, CNN+LSTM and Bidirectional LSTM, which have not been suitably appropriated in such domain. This paper presents a first attempt to thoroughly investigate all the above methods, evaluate them on the real-world collected DGA dataset involving 38 classes with 168,900 samples, and should provide a valuable reference point for future research in this field.

References

[1]

S. Yadav, A.K.K Reddy, A.L.N. Reddy, S. Ranjan, Detecting algorithmically generated domain-flux attacks with DNS traffic analysis, IEEE/ACM Transactions on Networking 20.5 (2012): 1663--1677.

Digital Library

[2]

M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon, From Throw-Away Traffic to Bots: Detecting the Rise of DGA- Based Malware. In: the 21st USENIX Security Symposium (USENIX Security 12) (2012).

Digital Library

[3]

Y. Zhou, Q.S. Li, Q. Miao, K. Yin, DGA-Based Botnet Detection Using DNS Traffic, Journal of Internet Services and Information Security, 3.3/4 (2013): 116--123.

[4]

S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix: DGA-based botnet tracking and intelligence, International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) (2014). LNCS 8550, 192--211

[5]

H. Zhang, M. Gharaibeh, S. Thanasoulas, and C. Papadopoulos, Botdigger: Detecting dga bots in a single network, Proceedings of the IEEE International Workshop on Traffic Monitoring and Analaysis. 2016.

[6]

L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis, Ndss. 2011.

[7]

Y. Shi, C. Gong and L. Juntao, Malicious Domain Name Detection Based on Extreme Machine Learning, Neural Processing Letters (2017): 1--11.

[8]

J. Woodbridge, H.S. Anderson, A. Ahuja, and D. Grant, Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. arXiv preprint arXiv:1611.00791 (2016).

[9]

Y. Tang, Deep learning using linear support vector machines, arXiv preprint arXiv:1306.0239 (2013).

[10]

S.X. Zhang, R. Zhao, C. Liu, J. Li, and Y. Gong Recurrent support vector machines for speech recognition, IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), 2016.

[11]

Kim, Yoon, et al. Character-Aware Neural Language Models. AAAI. 2016.

Digital Library

[12]

A. Graves, and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks 18.5 (2005): 602--610.

Digital Library

[13]

S. Hochreiter, and J. Schmidhuber, Long short-term memory, Neural computation 9(8) (1997): 1735--1780.

Digital Library

[14]

FA. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, Neural computation 12(10) (2000): 2451--2471.

Digital Library

[15]

Jay Jacobs, Building a DGA Classifier: Feature Engineering. Available online at: http://datadrivensecurity.info/blog/posts/2014/Oct/dga-part2/. October 2014

[16]

S. Krishnan, T. Taylor, F. Monrose, and J. McHugh, Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing, 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (2013) 1--12

Digital Library

[17]

N. Cristianini, and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods, Cambridge university press, 2000.

[18]

J. Milgram, M. Cheriet, and R. Sabourin, "One against one" or "one against all": Which one is better for handwriting recognition with SVMs?, Tenth international workshop on frontiers in handwriting recognition. La Baule. 2006.

[19]

J.R. Quinlan, C4. 5: programs for machine learning, Elsevier, 2014

Digital Library

[20]

G.B Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70.1 (2006): 489--501.

[21]

W. Yin, K. Kann, M. Yu, and H. Schütze, Comparative Study of CNN and RNN for Natural Language Processing, arXiv preprint arXiv:1702. 01923 (2017).

[22]

V. Tong, and G. Nguyen, A method for detecting DGA botnet based on semantic and cluster analysis, Proceedings of the Seventh Symposium on Information and Communication Technology. ACM, 2016.

Digital Library

[23]

P. Su, X. Ding, Y. Zhang, Y. Li, and N. Zhao, Predicting Blood Pressure with Deep Bidirectional LSTM Network, arXiv preprint arXiv:1705.04524 (2017).

[24]

J. Demšar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine learning research 7 (2006): 1--30.

Digital Library

[25]

J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. Garcia, L. Sanchez, F. Herrera, Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic Soft Computing 17 (2-3) (2011) 255--287.

[26]

Chollet, François. Keras (2015). URL http://keras.io (2017).

[27]

F, Pedregosa, et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011): 2825--2830.

Digital Library

[28]

Does Alexa have a list of its top-ranked websites? Available online at: https://support.alexa.com/hc/en-us/articles/200449834-Does-Alexa-have-a-list-of-its-topranked-websites-. (2017).

[29]

Bambenek Consulting - Master feeds. Available online at: http://osint.bambenekconsulting.com/feeds/ (2016).

[30]

M. Masud, T. Al-khateeb, L. Khan, B. Thuraisingham, and K. Hamlen, Flow-based identification of botnet traffic by mining multiple log files, in Distributed Framework and Applications, 2008. DFmA 2008. First International Conference on, oct. 2008, pp. 200--206.

[31]

M. Antonakakis, et al., Building a Dynamic Reputation System for DNS, USENIX security symposium. 2010.

Digital Library

[32]

Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. Supervised machine learning: A review of classification techniques. (2007): 3--24.

Digital Library

Cited By

Fan BMa HLiu YYuan XKe W(2024)KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA DetectionMathematics10.3390/math1205062612:5(626)Online publication date: 20-Feb-2024
https://doi.org/10.3390/math12050626
Zhao RChen CLi RYan BLiu SWang H(2024)Modulating LSTMs of Data-Driven Domain Features for DGA Detection: A Semantic Context-Dependent Method2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT)10.1109/ICCECT60629.2024.10545717(1508-1514)Online publication date: 26-Apr-2024
https://doi.org/10.1109/ICCECT60629.2024.10545717
Cao QDao-Hoang PNguyen DNguyen XLe K(2024)BERT-Enhanced DGA Botnet Detection: A Comparative Analysis of Machine Learning and Deep Learning Models2024 13th International Conference on Control, Automation and Information Sciences (ICCAIS)10.1109/ICCAIS63750.2024.10814364(1-6)Online publication date: 26-Nov-2024
https://doi.org/10.1109/ICCAIS63750.2024.10814364
Show More Cited By

Recommendations

Detecting DGA domains with recurrent neural networks and side information
ARES '19: Proceedings of the 14th International Conference on Availability, Reliability and Security

Modern malware typically makes use of a domain generation algorithm (DGA) to avoid command and control domains or IPs being seized or sinkholed. This means that an infected system may attempt to access many domains in an attempt to contact the command ...
DeepDGA: Adversarially-Tuned Domain Generation and Detection
AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-...
Detecting DGA Botnet based on Malware Behavior Analysis
SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

DGA botnet uses the Domain Generation Algorithm to generate domains that are used to establish the connection between malware bots and malicious actors. It has become a serious threat to internet-connected systems. Detection of DGA botnets is a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

December 2017

486 pages

ISBN:9781450353281

DOI:10.1145/3155133

General Chairs:
Huynh Quyet Thang
HUST, Vietnam
,
Zhenjiang Hu
NII, Japan
,
Program Chairs:
Marc Bui
EPHE, France
,
Biplab Sikdar
NUS, Singapore
,
Ichiro IDE
Nagoya, Japan
,
Huynh Thi Thanh Binh
HUST, Vietnam
,
Publications Chairs:
Worrawat Engchuan
Canada
,
Dinh Viet Sang
HUST, Vietnam
,
Nguyen Thi Oanh
HUST, Vietnam

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SOICT: School of Information and Communication Technology - HUST
NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SoICT 2017

SoICT 2017: The Eighth International Symposium on Information and Communication Technology

December 7 - 8, 2017

Nha Trang City, Viet Nam

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
458
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)4

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan BMa HLiu YYuan XKe W(2024)KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA DetectionMathematics10.3390/math1205062612:5(626)Online publication date: 20-Feb-2024
https://doi.org/10.3390/math12050626
Zhao RChen CLi RYan BLiu SWang H(2024)Modulating LSTMs of Data-Driven Domain Features for DGA Detection: A Semantic Context-Dependent Method2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT)10.1109/ICCECT60629.2024.10545717(1508-1514)Online publication date: 26-Apr-2024
https://doi.org/10.1109/ICCECT60629.2024.10545717
Cao QDao-Hoang PNguyen DNguyen XLe K(2024)BERT-Enhanced DGA Botnet Detection: A Comparative Analysis of Machine Learning and Deep Learning Models2024 13th International Conference on Control, Automation and Information Sciences (ICCAIS)10.1109/ICCAIS63750.2024.10814364(1-6)Online publication date: 26-Nov-2024
https://doi.org/10.1109/ICCAIS63750.2024.10814364
Jin HJeon GAneka Choi HJeon SSeo J(2024)A Threat Modeling Framework for IoT-Based Botnet AttacksHeliyon10.1016/j.heliyon.2024.e39192(e39192)Online publication date: Oct-2024
https://doi.org/10.1016/j.heliyon.2024.e39192
AlSabeh AFriday KKfoury ECrichigno JBou-Harb E(2024)On DGA Detection and Classification Using P4 Programmable SwitchesComputers & Security10.1016/j.cose.2024.104007145(104007)Online publication date: Oct-2024
https://doi.org/10.1016/j.cose.2024.104007
Zhou JCui HLi XYang WWu X(2023)A Novel Phishing Website Detection Model Based on LightGBM and Domain Name FeaturesSymmetry10.3390/sym1501018015:1(180)Online publication date: 7-Jan-2023
https://doi.org/10.3390/sym15010180
Chen SLang BChen YXie C(2023)Detection of Algorithmically Generated Malicious Domain Names with Feature Fusion of Meaningful Word Segmentation and N-Gram SequencesApplied Sciences10.3390/app1307440613:7(4406)Online publication date: 30-Mar-2023
https://doi.org/10.3390/app13074406
Liew SLaw N(2023)Use of subword tokenization for domain generation algorithm classificationCybersecurity10.1186/s42400-023-00183-86:1Online publication date: 7-Sep-2023
https://doi.org/10.1186/s42400-023-00183-8
Ravi VAlazab MSrinivasan SArunachalam ASoman K(2023)Adversarial Defense: DGA-Based Botnets and DNS Homographs Detection Through Integrated Deep LearningIEEE Transactions on Engineering Management10.1109/TEM.2021.305966470:1(249-266)Online publication date: Jan-2023
https://doi.org/10.1109/TEM.2021.3059664
Xu HWang XQiu YXu Y(2023)Towards DGA Domain Name Detection via Multi-feature Coordinated Representation and Random Forest2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00097(510-518)Online publication date: 30-Jul-2023
https://doi.org/10.1109/ISCTech60480.2023.00097
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten