skip to main content
10.1145/3340531.3416022acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Helix: DGA Domain Embeddings for Tracking and Exploring Botnets

Published: 19 October 2020 Publication History

Abstract

Botnets have been using domain generation algorithms (DGA) for over a decade to covertly and robustly identify the domain name of their command and control servers (C&C). Recent advancements in DGA detection has motivated botnet owners to rapidly alter the C&C domain and use adversarial techniques to evade detection. As a result, it has become increasingly difficult to track botnets in DNS traffic. In this paper, we present Helix, a method for tracking and exploring botnets. Helix uses a spatio-temporal deep neural network autoencoder to convert domains into numerical vectors (embeddings) which capture the DGA and seed used to create the domain. This is made possible by leveraging both convolutional (spatial) and recurrent (temporal) layers, and by using techniques such as attention mechanisms and highways. Furthermore, by using an autoencoder architecture, the network can be trained in an unsupervised manner (no labeling of data) which makes the system practical for real world deployments.
In our evaluation, we found that Helix can track botnet campaigns, distinguish between DGA families and seeds, and can identify domains generated using the latest adversarial machine learning techniques. Helix is currently being used to track botnets in one of the world's largest Internet Service Providers (ISP), and we include some of the ISP's analysis work using our method.

Supplementary Material

MP4 File (3340531.3416022.mp4)
Botnets use domain generation algorithms (DGA) to establish a robust connection, Recent advancements in DGA detection has motivated botnet to use adversarial techniques to evade detection. As a result, it has become increasingly challenging to track botnets in DNS traffic. \r\nWe present Helix, a method for tracking and exploring botnets. Helix uses a deep neural network autoencoder to convert domains into numerical vectors which capture the DGA and seed used to create the domain. Furthermore, by using an autoencoder architecture, the network can be trained in an unsupervised manner which makes the system practical for real-world deployments. In our evaluation, we found that Helix can track botnet campaigns, distinguish between DGA families and seeds, and can identify domains generated using the latest adversarial machine learning techniques. \r\n

References

[1]
Aashna Ahluwalia, Issa Traore, Karim Ganame, and Nainesh Agarwal. 2017. Detecting Broad Length Algorithmically Generated Domains. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10618 LNCS (2017), 19--34.
[2]
Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-Tuned Domain Generation and Detection. (2016), 13--21. https://doi.org/10.1145/2996758.2996767
[3]
Manos Antonakakis and Roberto Perdisci. 2012. From throw-away traffic to bots: detecting the rise of DGA-based malware. Proceedings of the 21st USENIX Security Symposium (2012), 16. https://www.usenix.org/system/files/conference/usenixsecurity12/sec12-final127.pdf
[4]
Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. 2010. Building a Dynamic Reputation System for DNS. USENIX Security'10: Proceedings of the 19th USENIX conference on Security (2010), 1--17. http://www.usenix.org/events/sec10/tech/full_papers/Antonakakis.pdf
[5]
Alejandro Correa Bahnsen, Eduardo Contreras Bohorquez, Sergio Villegas, Javier Vargas, and Fabio A. Gonzalez. 2017. Classifying phishing URLs using recurrent neural networks. eCrime Researchers Summit, eCrime (2017), 1--8.
[6]
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2017. Character Level Based Detection of Dga Domain Names. (2017), 1--17.
[7]
Leyla Bilge. 2011. E XPOSURE: a Passive DNS Analysis Service to Detect and Report Malicious Domains., Vol. V, 4 (2011).
[8]
Jega Anish Dev. 2014. Bitcoin mining acceleration and performance quantification. Canadian Conference on Electrical and Computer Engineering (2014), 1--6.
[9]
Nazrul Hoque, Dhruba K. Bhattacharyya, and Jugal K. Kalita. 2015. Botnet in DDoS Attacks: Trends and Challenges. IEEE Communications Surveys and Tutorials, Vol. 17, 4 (2015), 2242--2270.
[10]
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2015. Character-Aware Neural Language Models. (2015). https://doi.org/2
[11]
Hung Le, Quang Pham, Doyen Sahoo, and Steven C. H. Hoi. 2018. URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection. i (2018). http://arxiv.org/abs/1802.03162
[12]
Asaf Shabtai Lior Sidi, Asaf Nadler. 2019. MaskDGA: A Black-box Evasion Technique Against DGA Classifiers and Adversarial Defenses. arXiv preprint arXiv:1902.08909 (2019).
[13]
X Luo, L Wang, Z Xu, J Yang, M Sun, and J Wang. 2017. DGASensor: Fast detection for DGA-based malwares. ACM International Conference Proceeding Series, Vol. Part F1280 (2017), 47--53. https://doi.org/10.1145/3057109.3057112
[14]
Hieu Mac, Duc Tran, Van Tong, Linh Giang Nguyen, and Hai Anh Tran. 2017. DGA Botnet Detection Using Supervised Learning Methods. Proceedings of the Eighth International Symposium on Information and Communication Technology (2017).
[15]
P. Mockapetris. 1987. RFC1035 DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION. https://www.ietf.org/rfc/rfc1035.txt. (Accessed on 05/26/2020).
[16]
Miranda Mowbray and Josiah Hagen. 2014. Finding domain-generation algorithms by looking at length distribution. Proceedings - IEEE 25th International Symposium on Software Reliability Engineering Workshops, ISSREW 2014 (2014), 395--400.
[17]
Asaf Nadler, Avi Aminov, and Asaf Shabtai. 2019. Detection of malicious and low throughput data exfiltration over the DNS protocol. Computers & Security, Vol. 80 (2019), 36--53.
[18]
Jonathan Peck, Claire Nie, Raaghavi Sivaguru, Charles Grumer, Femi Olumofin, Bin Yu, Anderson Nascimento, and Martine De Cock. 2019. CharBot: A Simple and Effective Method for Evading DGA Classifiers. arXiv:1905.01078 (2019).
[19]
Daniel Plohmann, Khaled Yakdan, Michael Klatt, and Elmar Gerhards-Padilla. 2016. A Comprehensive Measurement Study of Domain Generating Malware. Proceedings of the 25th USENIX Security Symposium (2016). https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/plohmann
[20]
Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, and Shahrokh Valaee. 2017. Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078 (2017).
[21]
Joshua Saxe and Konstantin Berlin. 2017. eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint arXiv:1702.08568 (2017).
[22]
Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, and Stefano Zanero. 2014. Phoenix: DGA-based botnet tracking and intelligence. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8550 LNCS (2014), 192--211.
[23]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway networks. arXiv preprint arXiv:1505.00387 (2015).
[24]
Van Tong and Giang Nguyen. 2016. A method for detecting DGA botnet based on semantic and cluster analysis. Proceedings of the Seventh Symposium on Information and Communication Technology - SoICT '16 (2016), 272--277. http://dl.acm.org/citation.cfm?doid=3011077.3011112
[25]
Duc Tran, Hieu Mac, Van Tong, Hai Anh Tran, and Linh Giang Nguyen. 2018. A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, Vol. 275 (2018), 2401--2413.
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[27]
R Vinayakumar, Prabaharan Poornachandran, and KP Soman. 2018. Scalable Framework for Cyber Threat Situational Awareness Based on Domain Name Systems Data Analysis. In Big Data in Engineering Applications. Springer.
[28]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2009. Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In Proceedings of the 26th annual international conference on machine learning.
[29]
Tzy Shiah Wang, Hui Tang Lin, Wei Tsung Cheng, and Chang Yu Chen. 2017. DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis. Computers and Security, Vol. 64 (2017), 1--15. https://doi.org/10.1016/j.cose.2016.10.001
[30]
Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, and Daniel Grant. 2016. Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. (2016). http://arxiv.org/abs/1611.00791
[31]
Bin Yu, Daniel L. Gray, Jie Pan, Martine De Cock, and Anderson C.A. Nascimento. 2017. Inline DGA detection with deep networks. IEEE International Conference on Data Mining Workshops, ICDMW, Vol. 2017-Novem (2017), 683--692.
[32]
Han Zhang. 2016. BotDigger: Detecting DGA Bots in a Single Network. Tma (2016).

Cited By

View all

Index Terms

  1. Helix: DGA Domain Embeddings for Tracking and Exploring Botnets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
    October 2020
    3619 pages
    ISBN:9781450368599
    DOI:10.1145/3340531
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. autoencoder
    2. botnet
    3. cnn
    4. dga
    5. dns
    6. embedding
    7. lstm

    Qualifiers

    • Research-article

    Conference

    CIKM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Down to earth! Guidelines for DGA-based Malware DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678913(147-165)Online publication date: 30-Sep-2024
    • (2024)Detecting Domain Names Generated by DGAs With Low False Positives in Chinese Domain NamesIEEE Access10.1109/ACCESS.2024.345424212(123716-123730)Online publication date: 2024
    • (2023) WaterPurifierComputer Communications10.1016/j.comcom.2022.12.019199:C(186-195)Online publication date: 1-Feb-2023
    • (2022)A semantic element representation model for malicious domain name detectionJournal of Information Security and Applications10.1016/j.jisa.2022.10314866:COnline publication date: 1-May-2022
    • (2021)DeepAID: Interpreting and Improving Deep Learning-based Anomaly Detection in Security ApplicationsProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security10.1145/3460120.3484589(3197-3217)Online publication date: 12-Nov-2021
    • (2021)MORTON: Detection of Malicious Routines in Large-Scale DNS TrafficComputer Security – ESORICS 202110.1007/978-3-030-88418-5_35(736-756)Online publication date: 4-Oct-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media