Botnet Detection on TCP Traffic Using Supervised Machine Learning

Velasco-Mata, Javier; Fidalgo, Eduardo; González-Castro, Víctor; Alegre, Enrique; Blanco-Medina, Pablo

doi:10.1007/978-3-030-29859-3_38

Javier Velasco-Mata^13,14,
Eduardo Fidalgo^13,14,
Víctor González-Castro^13,14,
Enrique Alegre^13,14 &
…
Pablo Blanco-Medina^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1422 Accesses
5 Citations

Abstract

The increase of botnet presence on the Internet has made it necessary to detect their activity in order to prevent them to attack and spread over the Internet. The main methods to detect botnets are traffic classifiers and sinkhole servers, which are special servers designed as a trap for botnets. However, sinkholes also receive non-malicious automatic online traffic and therefore they also need to use traffic classifiers. For these reasons, we have created two new datasets to evaluate classifiers: the TCP-Int dataset, built from publicly available TCP Internet traces of normal traffic and of three botnets, Kelihos, Miuref and Sality; and the TCP-Sink dataset based on traffic from a private sinkhole server with traces of the Conficker botnet and of automatic normal traffic. We used the two datasets to test four well-known Machine Learning classifiers: Decision Tree, k-Nearest Neighbours, Support Vector Machine and Naïve Bayes. On the TCP-Int dataset, we used the F1 score to measure the capability to identify the type of traffic, i.e., if the trace is normal or from one of the three considered botnets, while on the TCP-Sink we used ROC curves and the corresponding AUC score since it only presents two classes: non-malicious or botnet traffic. The best performance was achieved by Decision Tree, with a 0.99 F1 score and a 0.99 AUC score on the TCP-Int and the TCP-Sink datasets respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Martínez, J., Iglesias, C., García-Nieto, P.: Machine learning techniques applied to cybersecurity. Int. J. Mach. Learn. Cybern. 1–14 (2019)
Google Scholar
Silva, S.S., Silva, R.M., Pinto, R.C., Salles, R.M.: Botnets: a survey. Comput. Netw. 57(2), 378–403 (2013)
Article Google Scholar
Boshmaf, Y., Muslukhov, I., Beznosov, K., Ripeanu, M.: Design and analysis of a social botnet. Comput. Netw. 57(2), 556–578 (2013)
Article Google Scholar
Bujlow, T., Carela-Español, V., Barlet-Ros, P.: Independent comparison of popular DPI tools for traffic classification. Comput. Netw. 76, 75–89 (2015)
Article Google Scholar
Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput. Electr. Eng. 50, 91–101 (2016)
Article Google Scholar
Kim, H., Choi, S.S., Song, J.: A methodology for multipurpose DNS Sinkhole analyzing double bounce emails. In: International Conference on Neural Information Processing, pp. 609–616 (2013)
Chapter Google Scholar
Fetzer, C., Felber, P., Rivière, É., Schiavoni, V., Sutra, P.: UniCrawl: a practical geographically distributed web crawler. In: International Conference on Cloud Computing, pp. 389–396 (2015)
Google Scholar
Sangkatsanee, P., Wattanapongsakorn, N., Charnsripinyo, C.: Practical real-time intrusion detection using machine learning approaches. Comput. Commun. 34(18), 2227–2235 (2011)
Article Google Scholar
Kim, H., Claffy, K.C., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT Conference, pp. 11:1–11:12 (2008)
Google Scholar
Doshi, R., Apthorpe, N., Feamster, N.: Machine learning DDoS detection for consumer internet of things devices. In: IEEE Security and Privacy Workshops, pp. 29–35 (2018)
Google Scholar
García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)
Article Google Scholar
Saad, S., et al.: Detecting P2P botnets through network behavior analysis and machine learning. In: 2011 Ninth Annual International Conference on Privacy, Security and Trust, pp. 174–180 (2011)
Google Scholar
Zhao, D., et al.: Botnet detection based on traffic behavior analysis and flow intervals. Comput. Secur. 39, 2–16 (2013)
Article Google Scholar
Buntine, W., Niblett, T.: A further comparison of splitting rules for decision-tree induction. Mach. Learn. 8, 75–85 (1992)
Google Scholar
Friedman, J.H.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, vol. 1, pp. 717–724 (1996)
Google Scholar
Dong, W., Moses, C., Li, K.: Efficient K-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pp. 577–586 (2011)
Google Scholar
Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17(1), 113–126 (2004)
Article Google Scholar
Al Nabki, M.W., Fidalgo, E., Alegre, E., de Paz, I.: Classifying illegal activities on TOR network based on web textual contents. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, pp. 35–43 (2017)
Google Scholar
Fidalgo, E., Alegre, E., González-Castro, V., Fernández-Robles, L.: Compass radius estimation for improved image classification using Edge-SIFT. Neurocomputing 197, 119–135 (2016)
Article Google Scholar
Fidalgo, E., Alegre, E., González-Castro, V., Fernández-Robles, L.: Illegal activity categorisation in darknet based on image classification using CREIC method. In: Pérez García, H., Alfonso-Cendón, J., Sánchez González, L., Quintián, H., Corchado, E. (eds.) SOCO/CISIS/ICEUTE -2017. AISC, vol. 649, pp. 600–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67180-2_58
Chapter Google Scholar
Fidalgo, E., Alegre, E., González-Castro, V., Fernández-Robles, L.: Boosting image classification through semantic attention filtering strategies. Pattern Recogn. Lett. 112, 176–183 (2018)
Article Google Scholar
Schneider, K.: A comparison of event models for Naive Bayes Anti-spam e-Mail Filtering. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 307–314 (2003)
Google Scholar
Xu, S.: Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 44(1), 48–59 (2018)
Article Google Scholar
Ren, J., Lee, S.D., Chen, X., Kao B., Cheng, R., Cheung, D.: Naive Bayes classification of uncertain data. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 944–949 (2009)
Google Scholar
Sasaki, Y.: The truth of the F-measure. Teach Tutor mater 1(5), 1–5 (2007)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
van Roosmalen, J., Vranken, H., van Eekelen, M.: Applying deep learning on packet flows for botnet detection. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 1629–1636 (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by the framework agreement between the University of León and INCIBE (Spanish National Cybersecurity Institute) under Addendum 01.

Author information

Authors and Affiliations

Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain
Javier Velasco-Mata, Eduardo Fidalgo, Víctor González-Castro, Enrique Alegre & Pablo Blanco-Medina
Researcher at INCIBE (Spanish National Cybersecurity Institute), León, Spain
Javier Velasco-Mata, Eduardo Fidalgo, Víctor González-Castro, Enrique Alegre & Pablo Blanco-Medina

Authors

Javier Velasco-Mata
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Fidalgo
View author publications
You can also search for this author in PubMed Google Scholar
Víctor González-Castro
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Alegre
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Blanco-Medina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Velasco-Mata .

Editor information

Editors and Affiliations

University of León, León, Spain
Hilde Pérez García
University of León, León, Spain
Lidia Sánchez González
University of León, León, Spain
Manuel Castejón Limas
University of A Coruña, Ferrol, Spain
Héctor Quintián Pardo
University of Salamanca, Salamanca, Spain
Emilio Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Velasco-Mata, J., Fidalgo, E., González-Castro, V., Alegre, E., Blanco-Medina, P. (2019). Botnet Detection on TCP Traffic Using Supervised Machine Learning. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-29859-3_38
Published: 26 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics