Skip to main content
Log in

BotDetector: a system for identifying DGA-based botnet with CNN-LSTM

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

Botnets are one of the major threats to network security nowadays. To carry out malicious actions remotely, they heavily rely on Command and Control channels. DGA-based botnets use a domain generation algorithm to generate a significant number of domain names. By analyzing the linguistic distinctions between legitimate and DGA-based domain names, traditional machine learning schemes obtain great benefits. However, it is difficult to identify the ones based on wordlists or pseudo-random generated. Accordingly, this paper proposes an efficient CNN-LSTM-based detection model (BotDetector) that uses only a set of simple-to-compute, easy-to-compute character features. We evaluate our model with two open-source benchmark datasets (360 netlab, Bambenek) and real DNS traffic from the China Education and Research Network. Experimental results demonstrate that our algorithm improves by 1.6\(\%\) in terms of accuracy and F1-score and reduces the computation time by 9.4\(\%\) compared to other state-of-the-art alternatives. Remarkably, our work can identify botnet’s covert communication channels that use domain names based on word lists or pseudo-random generation without any help of reverse engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Singh, M., Singh, M., & Kaur, S. (2019). Issues and challenges in DNS based botnet detection: A survey. Computers & Security, 86, 28–52.

    Article  Google Scholar 

  2. Patsakis, C., Casino, F., & Katos, V. (2020). Encrypted and covert DNS queries for botnets: Challenges and countermeasures. Computers & Security, 88, 101614.

    Article  Google Scholar 

  3. Patsakis, C., & Casino, F. (2021). Exploiting statistical and structural features for the detection of domain generation algorithms. Journal of Information Security and Applications, 58, 102725.

    Article  Google Scholar 

  4. Namgung, J., Son, S., & Moon, Y.-S. (2021). Efficient deep learning models for DGA domain detection. Security and Communication Networks, 2021, 1–15.

    Article  Google Scholar 

  5. Al-Duwairi, B., Jarrah, M., & Shatnawi, A. S. (2021). PASSVM: A highly accurate fast flux detection system. Computers & Security, 110, 102431.

    Article  Google Scholar 

  6. Xu, C., Shen, J., & Du, X. (2019). Detection method of domain names generated by DGAs based on semantic representation and deep neural network. Computers & Security, 85, 77–88.

    Article  Google Scholar 

  7. Shin, S., Gu, G., Reddy, N., & Lee, C. P. (2011). A large-scale empirical study of Conficker. IEEE Transactions on Information Forensics and Security, 7(2), 676–690.

    Article  Google Scholar 

  8. Zago, M., Gil Pérez, M., & Martínez Pérez, G. (2019). Scalable detection of botnets based on DGA. Soft Computing, 24(8), 5517–5537.

    Article  Google Scholar 

  9. Akhila, G. P., Gayathri, R., Keerthana, S., & Gladston, A. (2020). A machine learning framework for domain generating algorithm based malware detection. Security and Privacy, 3(6), e127.

    Article  Google Scholar 

  10. Tong, A. T., Long, H. V., & Taniar, D. (2021). On detecting and classifying DGA botnets and their families. Computers & Security, 113, 102549.

    Google Scholar 

  11. Anderson, H. S., Woodbridge, J., & Filar, B. (2016). DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM workshop on artificial intelligence and security (pp. 13–21). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/2996758.2996767.

  12. Manasrah, A. M., Khdour, T., & Freehat, R. (2022). DGA-based botnets detection using DNS traffic mining. Journal of King Saud University—Computer and Information Sciences, 34(5), 2045–2061.

    Article  Google Scholar 

  13. Wang, W., Shang, Y., He, Y., Li, Y., & Liu, J. (2020). BotMark: Automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors. Information Sciences, 511, 284–296.

    Article  Google Scholar 

  14. Ysab, C., Kj, A., Lc, A., Gj, A., Szab, C., Yzab, C., & Dan, P. D. (2022). Online malicious domain name detection with partial labels for large-scale dependable systems. Journal of Systems and Software, 190, 111322.

    Article  Google Scholar 

  15. Patsakis, C., & Casino, F. (2021). Exploiting statistical and structural features for the detection of domain generation algorithms. Journal of Information Security and Applications, 58, 102725.

    Article  Google Scholar 

  16. Namgung, J., Son, S., & Moon, Y. S. (2021). Efficient deep learning models for DGA domain detection. Security and Communication Networks, 2021(2), 1–15.

    Article  Google Scholar 

  17. Tran, D., Mac, H., Tong, V., Tran, H. A., & Nguyen, L. G. (2017). A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 275, 2401–2413.

    Article  Google Scholar 

  18. Yun, X., Huang, J., Wang, Y., Zang, T., & Zhang, Y. (2019). Khaos: An adversarial neural network DGA with high anti-detection ability. IEEE Transactions on Information Forensics and Security, 15, 2225–2240.

    Article  Google Scholar 

  19. Liang, J., Chen, S., Wei, Z., Zhao, S., & Zhao, W. (2022). HAGDetector: Heterogeneous DGA domain name detection model. Computers & Security, 120, 102803.

    Article  Google Scholar 

  20. Alaeiyan, M., Parsa, S., Vinod, P., & Conti, M. (2020). Detection of algorithmically-generated domains: An adversarial machine learning approach. Computer Communications, 160, 661–673.

    Article  Google Scholar 

  21. Yang, L., Liu, G., Wang, J., Bai, H., & Dai, Y. (2021). Fast3DS: A real-time full-convolutional malicious domain name detection system. Journal of Information Security and Applications, 61(1), 102933.

    Article  Google Scholar 

  22. Wang, Z., Guo, Y., & Montgomery, D. (2022). Machine learning-based algorithmically generated domain detection. Computers & Electrical Engineering, 100, 107841.

    Article  Google Scholar 

  23. Park, K. H., Song, H. M., Yoo, J. D., Hong, S.-Y., Cho, B., Kim, K., & Kim, H. K. (2022). Unsupervised malicious domain detection with less labeling effort. Computers & Security, 116, 102662.

    Article  Google Scholar 

  24. Intercepting Hail Hydra. (2021). Real-time detection of algorithmically generated domains. Journal of Network and Computer Applications, 190, 103135.

    Article  Google Scholar 

  25. Wang, T. S., Lin, H. T., Cheng, W. T., & Chen, C. Y. (2017). DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis. Computers & Security, 64, 1–15.

    Article  Google Scholar 

  26. Tong, M., Sun, X., Yang, J., Zhang, H., & Liu, H. (2019). D3N: DGA detection with deep-learning through NXDomain. Cham: Springer.

    Google Scholar 

  27. Schüppen, S., Teubert, D., Herrmann, P., & Meyer, U. (2018). FANCI: Feature-based automated NXDomain classification and intelligence. In 27th USENIX security symposium (USENIX security 18) (pp. 1165–1181).

  28. Yadav, S., Reddy, A. K., Reddy, A. L., & Ranjan, S. (2012). Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Transactions on Networking, 20(5), 1663–1677.

    Article  Google Scholar 

  29. Yan, D., Zhang, H., Wang, Y., Zang, T., Xu, X., & Zeng, Y. (2019). Pontus: A linguistics-based DGA detection system. In 2019 IEEE global communications conference (GLOBECOM) (pp. 1–6). https://doi.org/10.1109/GLOBECOM38437.2019.9014040.

  30. Cucchiarelli, A., Morbidoni, C., Spalazzi, L., & Baldi, M. (2020). Algorithmically generated malicious domain names detection based on n-grams features. Expert Systems with Applications, 170, 114551.

    Article  Google Scholar 

  31. Almashhadani, A., Kaiiali, M., Carlin, D., & Sezer, S. (2020). MaldomDetector: A system for detecting algorithmically generated domain names with machine learning. Computers & Security, 93, 101787.

    Article  Google Scholar 

  32. Beiranvand, F., Mehrdad, V., & Dowlatshahi, M. B. (2022). Unsupervised feature selection for image classification: A bipartite matching-based principal component analysis approach. Knowledge-Based Systems, 250, 109085.

    Article  Google Scholar 

  33. Khehra, G., & Sofat, S. (2018). BotScoop: Scalable detection of DGA based botnets using DNS traffic. In 2018 9th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–6).

  34. Schiavoni, S., Maggi, F., Cavallaro, L., & Zanero, S. (2014). Phoenix: DGA-based botnet tracking and intelligence. In Detection of intrusions and malware, and vulnerability assessment (pp. 192–211).

  35. Curtin, R. R., Gardner, A. B., Grzonkowski, S., Kleymenov, A., & Mosquera, A. (2018). Detecting DGA domains with recurrent neural networks and side information. In Proceedings of the 14th international conference on availability, reliability and security (pp. 1–10).

  36. Zhou, S., Lin, L., Yuan, J., Wang, F., Ling, Z., & Cui, J. (2019). CNN-based DGA detection with high coverage. In 2019 IEEE international conference on intelligence and security informatics (ISI) (pp. 62–67). https://doi.org/10.1109/ISI.2019.8823200.

  37. Woodbridge, J., Anderson, H. S., Ahuja, A., & Grant, D. (2016). Predicting domain generation algorithms with long short-term memory networks. arXiv:1611.00791

  38. Jiao, H., Wang, Q., Fan, Z., Liu, J., Du, D., Li, N., & Liu, Y. (2022). DGGCN: Dictionary based DGA detection method based on DomainGraph and GCN. In 2022 international conference on computer communications and networks (ICCCN) (pp. 1–10). https://doi.org/10.1109/ICCCN54977.2022.9868932

  39. Ahluwalia, A., Traore, I., Ganame, K., & Agarwal, N. (2017). Detecting broad length algorithmically generated domains. In Intelligent, secure, and dependable systems in distributed and cloud environments (pp. 19–34). Cham: Springer International Publishing.

  40. Patsakis, C., & Casino, F. (2021). Exploiting statistical and structural features for the detection of domain generation algorithms. Journal of Information Security and Applications, 58(2), 102725.

    Article  Google Scholar 

  41. Li, X., Zhang, H., Zhang, R., Liu, Y., & Nie, F. (2019). Generalized uncorrelated regression with adaptive graph for unsupervised feature selection. IEEE Transactions on Neural Networks and Learning Systems, 30(5), 1587–1595.

    Article  MathSciNet  PubMed  Google Scholar 

  42. Huang, D., Cai, X., & Wang, C. D. (2019). Unsupervised feature selection with multi-subspace randomization and collaboration. Knowledge-Based Systems, 182, 104856.

    Article  Google Scholar 

  43. Xie, J., Wang, M., Xu, S., Huang, Z., & Grant, P. W. (2021). The unsupervised feature selection algorithms based on standard deviation and cosine similarity for genomic data analysis. Frontiers in Genetics, 12, 684100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Yu, B., Gray, D. L., Pan, J., Cock, M., & Nascimento, A. C. A. (2017). Inline DGA detection with deep networks. In 2017 IEEE international conference on data mining workshops (ICDMW) (pp. 683–692). https://doi.org/10.1109/ICDMW.2017.96.

  45. Zhang, X., & Wang, T. (2022). Elastic and reliable bandwidth reservation based on distributed traffic monitoring and control. IEEE Transactions on Parallel and Distributed Systems, 33(12), 4563–4580.

    Article  Google Scholar 

  46. Zhang, X., Wang, Y., Geng, G., & Yu, J. (2021). Delay-optimized multicast tree packing in software-defined networks. IEEE Transactions on Services Computing. https://doi.org/10.1109/TSC.2021.3106264

    Article  Google Scholar 

  47. Tuan, T. A., Long, H. V., & Taniar, D. (2022). On detecting and classifying DGA botnets and their families. Computers & Security, 113, 102549.

Download references

Funding

This work has been supported by the support of Key Laboratory of Computer Network and Information Integration (Ministry of Education) (No. K9392022), and Shandong Computer Society provincial key laboratory joint open fund (No.SDKLCN202203), and Natural Science Foundation of Shandong Province, China under grant (No. ZR2021QF090), and Yangzhou Science and Technology Plan Project (YZ2023200), and Self-Developing Experimental Instrument and Equipment Project of Yangzhou University (zzyq2023zy06).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by XZ, JC, XZ, JG and GL. The first draft of the manuscript was written by JC and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaodong Zang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zang, X., Cao, J., Zhang, X. et al. BotDetector: a system for identifying DGA-based botnet with CNN-LSTM. Telecommun Syst 85, 207–223 (2024). https://doi.org/10.1007/s11235-023-01073-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-023-01073-7

Keywords

Navigation