Skip to main content

Botnet Detection on TCP Traffic Using Supervised Machine Learning

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2019)

Abstract

The increase of botnet presence on the Internet has made it necessary to detect their activity in order to prevent them to attack and spread over the Internet. The main methods to detect botnets are traffic classifiers and sinkhole servers, which are special servers designed as a trap for botnets. However, sinkholes also receive non-malicious automatic online traffic and therefore they also need to use traffic classifiers. For these reasons, we have created two new datasets to evaluate classifiers: the TCP-Int dataset, built from publicly available TCP Internet traces of normal traffic and of three botnets, Kelihos, Miuref and Sality; and the TCP-Sink dataset based on traffic from a private sinkhole server with traces of the Conficker botnet and of automatic normal traffic. We used the two datasets to test four well-known Machine Learning classifiers: Decision Tree, k-Nearest Neighbours, Support Vector Machine and Naïve Bayes. On the TCP-Int dataset, we used the F1 score to measure the capability to identify the type of traffic, i.e., if the trace is normal or from one of the three considered botnets, while on the TCP-Sink we used ROC curves and the corresponding AUC score since it only presents two classes: non-malicious or botnet traffic. The best performance was achieved by Decision Tree, with a 0.99 F1 score and a 0.99 AUC score on the TCP-Int and the TCP-Sink datasets respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.uvic.ca/engineering/ece/isot/datasets/.

  2. 2.

    https://incibe.es/.

  3. 3.

    https://pypi.org/project/pyshark/.

  4. 4.

    https://www.wireshark.org/.

  5. 5.

    https://www.pytables.org/.

  6. 6.

    http://scikit-learn.org/stable/index.html.

References

  1. Martínez, J., Iglesias, C., García-Nieto, P.: Machine learning techniques applied to cybersecurity. Int. J. Mach. Learn. Cybern. 1–14 (2019)

    Google Scholar 

  2. Silva, S.S., Silva, R.M., Pinto, R.C., Salles, R.M.: Botnets: a survey. Comput. Netw. 57(2), 378–403 (2013)

    Article  Google Scholar 

  3. Boshmaf, Y., Muslukhov, I., Beznosov, K., Ripeanu, M.: Design and analysis of a social botnet. Comput. Netw. 57(2), 556–578 (2013)

    Article  Google Scholar 

  4. Bujlow, T., Carela-Español, V., Barlet-Ros, P.: Independent comparison of popular DPI tools for traffic classification. Comput. Netw. 76, 75–89 (2015)

    Article  Google Scholar 

  5. Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput. Electr. Eng. 50, 91–101 (2016)

    Article  Google Scholar 

  6. Kim, H., Choi, S.S., Song, J.: A methodology for multipurpose DNS Sinkhole analyzing double bounce emails. In: International Conference on Neural Information Processing, pp. 609–616 (2013)

    Chapter  Google Scholar 

  7. Fetzer, C., Felber, P., Rivière, É., Schiavoni, V., Sutra, P.: UniCrawl: a practical geographically distributed web crawler. In: International Conference on Cloud Computing, pp. 389–396 (2015)

    Google Scholar 

  8. Sangkatsanee, P., Wattanapongsakorn, N., Charnsripinyo, C.: Practical real-time intrusion detection using machine learning approaches. Comput. Commun. 34(18), 2227–2235 (2011)

    Article  Google Scholar 

  9. Kim, H., Claffy, K.C., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT Conference, pp. 11:1–11:12 (2008)

    Google Scholar 

  10. Doshi, R., Apthorpe, N., Feamster, N.: Machine learning DDoS detection for consumer internet of things devices. In: IEEE Security and Privacy Workshops, pp. 29–35 (2018)

    Google Scholar 

  11. García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)

    Article  Google Scholar 

  12. Saad, S., et al.: Detecting P2P botnets through network behavior analysis and machine learning. In: 2011 Ninth Annual International Conference on Privacy, Security and Trust, pp. 174–180 (2011)

    Google Scholar 

  13. Zhao, D., et al.: Botnet detection based on traffic behavior analysis and flow intervals. Comput. Secur. 39, 2–16 (2013)

    Article  Google Scholar 

  14. Buntine, W., Niblett, T.: A further comparison of splitting rules for decision-tree induction. Mach. Learn. 8, 75–85 (1992)

    Google Scholar 

  15. Friedman, J.H.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, vol. 1, pp. 717–724 (1996)

    Google Scholar 

  16. Dong, W., Moses, C., Li, K.: Efficient K-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pp. 577–586 (2011)

    Google Scholar 

  17. Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17(1), 113–126 (2004)

    Article  Google Scholar 

  18. Al Nabki, M.W., Fidalgo, E., Alegre, E., de Paz, I.: Classifying illegal activities on TOR network based on web textual contents. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, pp. 35–43 (2017)

    Google Scholar 

  19. Fidalgo, E., Alegre, E., González-Castro, V., Fernández-Robles, L.: Compass radius estimation for improved image classification using Edge-SIFT. Neurocomputing 197, 119–135 (2016)

    Article  Google Scholar 

  20. Fidalgo, E., Alegre, E., González-Castro, V., Fernández-Robles, L.: Illegal activity categorisation in darknet based on image classification using CREIC method. In: Pérez García, H., Alfonso-Cendón, J., Sánchez González, L., Quintián, H., Corchado, E. (eds.) SOCO/CISIS/ICEUTE -2017. AISC, vol. 649, pp. 600–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67180-2_58

    Chapter  Google Scholar 

  21. Fidalgo, E., Alegre, E., González-Castro, V., Fernández-Robles, L.: Boosting image classification through semantic attention filtering strategies. Pattern Recogn. Lett. 112, 176–183 (2018)

    Article  Google Scholar 

  22. Schneider, K.: A comparison of event models for Naive Bayes Anti-spam e-Mail Filtering. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 307–314 (2003)

    Google Scholar 

  23. Xu, S.: Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 44(1), 48–59 (2018)

    Article  Google Scholar 

  24. Ren, J., Lee, S.D., Chen, X., Kao B., Cheng, R., Cheung, D.: Naive Bayes classification of uncertain data. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 944–949 (2009)

    Google Scholar 

  25. Sasaki, Y.: The truth of the F-measure. Teach Tutor mater 1(5), 1–5 (2007)

    Google Scholar 

  26. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  27. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  28. van Roosmalen, J., Vranken, H., van Eekelen, M.: Applying deep learning on packet flows for botnet detection. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 1629–1636 (2018)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the framework agreement between the University of León and INCIBE (Spanish National Cybersecurity Institute) under Addendum 01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Velasco-Mata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Velasco-Mata, J., Fidalgo, E., González-Castro, V., Alegre, E., Blanco-Medina, P. (2019). Botnet Detection on TCP Traffic Using Supervised Machine Learning. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29859-3_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29858-6

  • Online ISBN: 978-3-030-29859-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics