Abstract
Even though advanced Machine Learning (ML) techniques have been adopted for DDoS detection, the attack remains a major threat of the Internet. Most of the existing ML-based DDoS detection approaches are under two categories: supervised and unsupervised. Supervised ML approaches for DDoS detection rely on availability of labeled network traffic datasets. Whereas, unsupervised ML approaches detect attacks by analyzing the incoming network traffic. Both approaches are challenged by large amount of network traffic data, low detection accuracy and high false positive rates. In this paper we present an online sequential semi-supervised ML approach for DDoS detection based on network Entropy estimation, Co-clustering, Information Gain Ratio and Exra-Trees algorithm. The unsupervised part of the approach allows to reduce the irrelevant normal traffic data for DDoS detection which allows to reduce false positive rates and increase accuracy. Whereas, the supervised part allows to reduce the false positive rates of the unsupervised part and to accurately classify the DDoS traffic. Various experiments were performed to evaluate the proposed approach using three public datasets namely NSL-KDD, UNB ISCX 12 and UNSW-NB15. An accuracy of 98.23%, 99.88% and 93.71% is achieved for respectively NSL-KDD, UNB ISCX 12 and UNSW-NB15 datasets, with respectively the false positive rates 0.33%, 0.35% and 0.46%.
Similar content being viewed by others
References
Bhuyan MH, Bhattacharyya DK, Kalita JK (2015) An empirical evaluation of information metrics for low-rate and high-rate ddos attack detection. Pattern Recogn Lett 51:1–7
Lin S-C, Tseng S-S (2004) Constructing detection knowledge for ddos intrusion tolerance. Exp Syst Appl 27(3):379–390
Chang RKC (2002) Defending against flooding-based distributed denial-of-service attacks: a tutorial. IEEE Commun Mag 40(10):42–51
Yu S (2014) Distributed denial of service attack and defense. Springer, Berlin
Wikipedia (2016) 2016 dyn cyberattack. https://en.wikipedia.org/wiki/2016_Dyn_cyberattack. (Online; accessed 10 Apr 2017)
theguardian (2016) Ddos attack that disrupted internet was largest of its kind in history, experts say. https://www.theguardian.com/technology/2016/oct/26/ddos-attack-dyn-mirai-botnet. (Online; accessed 10 Apr 2017)
Kalegele K, Sasai K, Takahashi H, Kitagata G, Kinoshita T (2015) Four decades of data mining in network and systems management. IEEE Trans Knowl Data Eng 27(10):2700–2716
Han J, Pei J, Kamber M (2006) What is data mining. Data mining: concepts and techniques. Morgan Kaufinann
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, pp 25–71
Mori T (2002) Information gain ratio as term weight: the case of summarization of ir results. In: Proceedings of the 19th international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 1–7
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Tavallaee M, Bagheri E, Lu W, Ghorbani A-A (2009) A detailed analysis of the kdd cup 99 data set. In: Proceedings of the second IEEE symposium on computational intelligence for security and defence applications 2009
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31:357–374
Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: Military communications and information systems conference (MilCIS), 2015. IEEE, pp 1–6
Moustafa N, Slay J (2016) The evaluation of network anomaly detection systems: statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data set. Inf Secur J: Glob Perspect 25:18–31
Akilandeswari V, Shalinie SM (2012) Probabilistic neural network based attack traffic classification. In: 2012 fourth international conference on advanced computing (ICoAC). IEEE, pp 1–8
Boroujerdi AS, Ayat S (2013) A robust ensemble of neuro-fuzzy classifiers for ddos attack detection. In: 2013 3rd international conference on computer science and network technology (ICCSNT). IEEE, pp 484–487
Ahmed M, Mahmood AN (2015) Novel approach for network traffic pattern analysis using clustering-based collective anomaly detection. Ann Data Sci 2(1):111–130
Saied A, Overill RE, Radzik T (2016) Detection of known and unknown ddos attacks using artificial neural networks. Neurocomputing 172:385–393
Boro D, Bhattacharyya DK (2016) Dyprosd: a dynamic protocol specific defense for high-rate ddos flooding attacks. Microsyst Technol 23:1–19
Nicolau M, McDermott J et al (2016) A hybrid autoencoder and density estimation model for anomaly detection. In: International conference on parallel problem solving from nature. Springer, pp 717–726
Idhammad M, Afdel K, Belouch M (2017) Dos detection method based on artificial neural networks. Int J Adv Comput Sci Appl (ijacsa) 8(4):465–471
Mustapha B, Salah EH, Mohamed I (2017) A two-stage classifier approach using reptree algorithm for network intrusion detection. Int J Adv Comput Sci Appl (ijacsa) 8(6):389–394
Lakhina A, Crovella M, Diot C (2005) Mining anomalies using traffic feature distributions. In: ACM SIGCOMM computer communication review, vol 35. ACM, pp 217–228
Wagner A, Plattner B (2005) Entropy based worm and anomaly detection in fast ip networks. In: 14th IEEE international workshops on enabling technologies: infrastructure for collaborative enterprise (WETICE’05). IEEE, pp 172–177
Liu T, Wang Z, Wang H, Lu K (2014) An entropy-based method for attack detection in large scale network. Int J Comput Commun Control 7(3):509–517
Papalexakis EE, Beutel A, Steenkiste P (2014) Network anomaly detection using co-clustering. In: Encyclopedia of social network analysis and mining. Springer, Berlin, pp 1054–1068
Ahmed M, Mahmood AN (2014) Network traffic pattern analysis using improved information theoretic co-clustering based collective anomaly detection. In: International conference on security and privacy in communication systems. Springer, Berlin, pp 204–219
Ahmad A (2014) Decision tree ensembles based on kernel features. Appl Intell 41(3):855–869
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
van der Walt S, Colbert CS, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22–30
McKinney W (2014) Pandas, python data analysis library. 2015. Reference Source
Hunter JD (2007) Matplotlib: a 2d graphics environment. Comput Sci Eng 9(3):90–95
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Idhammad, M., Afdel, K. & Belouch, M. Semi-supervised machine learning approach for DDoS detection. Appl Intell 48, 3193–3208 (2018). https://doi.org/10.1007/s10489-018-1141-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1141-2