Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic

Pereira, Mayana; Coleman, Shaun; Yu, Bin; DeCock, Martine; Nascimento, Anderson

doi:10.1007/978-3-030-00470-5_14

Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic

Mayana Pereira¹⁷,
Shaun Coleman¹⁸,
Bin Yu¹⁷,
Martine DeCock¹⁸ &
…
Anderson Nascimento¹⁸

Conference paper
First Online: 07 September 2018

5140 Accesses
36 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11050))

Abstract

Automatic detection of algorithmically generated domains (AGDs) is a crucial element for fighting Botnets. Modern AGD detection systems have benefited from the combination of powerful advanced machine learning algorithms and linguistic distinctions between legitimate domains and malicious AGDs. However, a more evolved class of AGDs misleads the aforementioned detection systems by generating domains based on wordlists (also called dictionaries). The resulting domains, Dictionary-AGDs, are seemingly benign to both human analysis and most of AGD detection methods that receive as input solely the domain itself. In this paper, we design and implement method called WordGraph for extracting dictionaries used by the Domain Generation Algorithms (DGAs) solely DNS traffic. Our result immediately gives us an efficient mechanism for detecting this elusive, new type of DGA, without any need for reverse engineering to extract dictionaries. Our experimental results on data from known Dictionary-AGDs show that our method can extract dictionary information that is embedded in the malware code even when the number of DGA domains is much smaller than that of legitimate domains, or when multiple dictionaries are present in the data. This allows our approach to detect Dictionary-AGDs in real traffic more accurately than state-of-the-art methods based on human defined features or featureless deep learning approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abbink, J., Doerr, C.: Popularity-based detection of domain generation algorithms. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, p. 79. ACM (2017)
Google Scholar
ALEXA: Top sites on the web (2017). http://alexa.com/topsites
Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: 21st USENIX Security Symposium, pp. 24–24 (2012). http://dl.acm.org/citation.cfm?id=2362793.2362817
Barabosch, T., Wichmann, A., Leder, F., Gerhards-Padilla, E.: Automatic extraction of domain name generation algorithms from current malware. In: Proceedings of NATO Symposium IST-111 on Information Assurance and Cyber Defense (2012)
Google Scholar
Bilge, L., Kirda, E., Kruegel, C., Balduzzi, M.: Exposure: finding malicious domains using passive DNS analysis. In: NDSS (2011)
Google Scholar
Diestel, R.: Graph Theory. Graduate Texts in Mathematics, vol. 137. Springer, Heidelberg (2005)
Google Scholar
Geffner, J.: End-to-end analysis of a domain generating algorithm malware family. Black Hat USA 2013 (2013)
Google Scholar
Krishnan, S., Taylor, T., Monrose, F., McHugh, J.: Crossing the threshold: detecting network malfeasance via sequential hypothesis testing. In: 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 1–12 (2013)
Google Scholar
Lind, P.G., Gonzalez, M.C., Herrmann, H.J.: Cycles and clustering in bipartite networks. Phys. Rev. E 72(5), 056127 (2005)
Article Google Scholar
Lison, P., Mavroeidis, V.: Automatic detection of malware-generated domains with recurrent neural models. arXiv:1709.07102 (2017)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 1245–1254 (2009). https://doi.org/10.1145/1557019.1557153
Mao, G., Zhang, N.: Analysis of average shortest-path length of scale-free network. J. Appl. Math. (2013). http://dx.doi.org/10.1155/2013/865643
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. LEET 8, 4 (2008)
Google Scholar
Mowbray, M., Hagen, J.: Finding domain-generation algorithms by looking at length distribution. In: 25th IEEE International Symposium on Software Reliability Engineering Workshops, ISSRE Workshops, pp. 395–400 (2014). https://doi.org/10.1109/ISSREW.2014.20
Plohmann, D., Yakdan, K., Klatt, M., Bader, J., Gerhards-Padilla, E.: A comprehensive measurement study of domain generating malware. In: 25th USENIX Security Symposium, pp. 263–278 (2016)
Google Scholar
Saxe, J., Berlin, K.: eXpose: a character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv:1702.08568 (2017)
Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_11
Chapter Google Scholar
Skuratovich, S.: Matsnu technical report. Check Point Software Technologies Ltd. (2015). https://blog.checkpoint.com/wp-content/uploads/2015/07/matsnu-malwareid-technical-brief.pdf
Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing 275, 2401–2413 (2018)
Article Google Scholar
Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv:1611.00791 (2016)
Yadav, S., Reddy, A.K.K., Reddy, A.L.N., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 48–61 (2010). https://doi.org/10.1145/1879141.1879148
Yu, B., Gray, D., Pan, J., De Cock, M., Nascimento, A.: Inline DGA detection with deep networks. In: Data Mining for Cyber Security, Proceedings of International Conference on Data Mining (ICDM 2017) Workshops, pp. 683–692 (2017)
Google Scholar
Yu, B., Pan, J., Hu, J., Nascimento, A., De Cock, M.: Character level based detection of DGA domain names. In: Proceedings of IJCNN at WCCI2018 (2018 IEEE World Congress on Computational Intelligence) (2018)
Google Scholar
Yu, B., Smith, L., Threefoot, M.: Semi-supervised time series modeling for real-time flux domain detection on passive DNS traffic. In: Perner, P. (ed.) MLDM 2014. LNCS (LNAI), vol. 8556, pp. 258–271. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08979-9_20
Chapter Google Scholar
Yu, B., Smith, L., Threefoot, M., Olumofin, F.: Behavior analysis based DNS tunneling detection with big data technologies. In: Proceedings of the International Conference on Internet of Things and Big Data, pp. 284–290 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Infoblox Inc., Santa Clara, CA, USA
Mayana Pereira & Bin Yu
Institute of Technology, University of Washington Tacoma, Tacoma, WA, USA
Shaun Coleman, Martine DeCock & Anderson Nascimento

Authors

Mayana Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Shaun Coleman
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Martine DeCock
View author publications
You can also search for this author in PubMed Google Scholar
Anderson Nascimento
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mayana Pereira .

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Michael Bailey
Ruhr-Universität Bochum, Bochum, Germany
Thorsten Holz
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Manolis Stamatogiannakis
Foundation for Research & Technology – Hellas, Heraklion, Crete, Greece
Sotiris Ioannidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pereira, M., Coleman, S., Yu, B., DeCock, M., Nascimento, A. (2018). Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-00470-5_14
Published: 07 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00469-9
Online ISBN: 978-3-030-00470-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics