Abstract
This paper proposes an improved detection performance of anomaly-based intrusion detection system (IDS) using gradient boosted machine (GBM). The best parameters of GBM are obtained by performing grid search. The performance of GBM is then compared with the four renowned classifiers, i.e. random forest, deep neural network, support vector machine, and classification and regression tree in terms of four performance measures, i.e. accuracy, specificity, sensitivity, false positive rate and area under receiver operating characteristic curve (AUC). From the experimental result, it can be revealed that GBM significantly outperforms the most recent IDS techniques, i.e. fuzzy classifier, two-tier classifier, GAR-forest, and tree-based classifier ensemble. These results are the highest so far applied on the complete features of three different datasets, i.e. NSL-KDD, UNSW-NB15, and GPRS dataset using either tenfold cross-validation or hold-out method. Moreover, we prove our results by conducting two statistical significant tests which are yet to discover in the existing IDS researches.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aiello S, Eckstrand E, Fu A, Landry M, Aboyoun P (2016) Machine learning with R and H2O. https://h2o-release.s3.amazonaws.com/h2o/rel-turan/4/docs-website/h2o-docs/booklets/R_Vignette.pdf. Accessed July 2017
Arora A, Candel A, Lanford J, LeDell E, Parmar V (2016) Deep learning with H2O. http://h2o.ai/resources
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307
Conover WJ (1999) Practical nonparametric statistics 3rd edition, John Wiley and Sons, Michigan
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82
Govindarajan M, Chandrasekaran R (2011) Intrusion detection using neural based hybrid classification methods. Comput Netw 55(8):1662–1671
Harb HM, Desuky AS (2011) Adaboost ensemble with genetic algorithm post optimization for intrusion detection. Int J Comput Sci Issues 8:5
Hsu CW, Chang CC, Lin CJ et al (2010) A practical guide to support vector classification. http://www.datascienceassn.org/sites/default/files/Practical Guide to Support Vector Classification.pdf. Accessed July 2017
Kanakarajan NK, Muniasamy K (2016) Improving the accuracy of intrusion detection using GAR-Forest with feature selection. In: Proceedings of the 4th international conference on frontiers in intelligent computing: theory and applications (FICTA) 2015, Springer, New York, pp 539–547
Kevric J, Jukic S, Subasi A (2016) An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Comput Appl 1–8
Krömer P, Platoš J, Snášel V, Abraham A (2011) Fuzzy classification by evolutionary algorithms. In: 2011 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 313–318
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lewis RJ (2000) An introduction to classification and regression tree (CART) analysis. In: Annual meeting of the society for academic emergency medicine in San Francisco, California, pp 1–14
Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
Mohammadi M, Raahemi B, Akbari A, Nassersharif B (2012) New class-dependent feature transformation for intrusion detection systems. Secur Commun Netw 5(12):1296–1311
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military communications and information systems conference (MilCIS), 2015, IEEE, pp 1–6
Moustafa N, Slay J (2016) The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf Secur J Glob Perspect 25(1–3):18–31
Mukkamala S, Sung AH, Abraham A (2005) Intrusion detection using an ensemble of intelligent paradigms. J Netw Comput Appl 28(2):167–182
Oza NC, Tumer K (2008) Classifier ensembles: select real-world applications. Inf Fusion 9(1):4–20
Pajouh HH, Dastghaibyfard G, Hashemi S (2017) Two-tier network anomaly detection model: a machine learning approach. J Intell Inf Syst 48(1):61–74
Panda M, Abraham A, Patra MR (2010) Discriminative multinomial naive bayes for network intrusion detection. In: Information assurance and security (IAS), 2010 sixth international conference on IEEE, pp 5–10
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39
Sindhu SSS, Geetha S, Kannan A (2012) Decision tree based light weight intrusion detection using a wrapper approach. Expert Syst Appl 39(1):129–141
Tama BA, Rhee KH (2015a) A combination of PSO-based feature selection and tree-based classifiers ensemble for intrusion detection systems. In: Advances in computer science and ubiquitous computing, Springer, New York, pp 489–495
Tama BA, Rhee KH (2015b) Performance analysis of multiple classifier system in DoS attack detection. In: International workshop on information security applications, Springer, New York, pp 339–347
Tama BA, Rhee KH (2016) Classifier ensemble design with rotation forest to enhance attack detection of IDS in wireless network. In: 2016 11th Asia joint conference on information security (AsiaJCIS), IEEE, pp 87–91
Tama BA, Rhee KH (2017) Performance evaluation of intrusion detection system using classifier ensembles. Int J Internet Protoc Technol 10(1):22–29
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD Cup 99 data set. In: Proceedings of the second IEEE symposium on computational intelligence for security and Defence applications 2009
Therneau TM, Atkinson B, Ripley B et al (2010) rpart: Recursive partitioning. R Package Version 3:1–46
Vilela DW, Ferreira E, Shinoda AA, de Souza Araujo NV, de Oliveira R, Nascimento VE (2014) A dataset for evaluating intrusion detection systems in IEEE 802.11 wireless networks. In: IEEE Colombian conference on communications and computing (COLCOM), IEEE, pp 1–5
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion). First author acknowledges Korean Government for providing scholarship through KGSP for Graduate 2013–2018.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Tama, B.A., Rhee, KH. An in-depth experimental study of anomaly detection using gradient boosted machine. Neural Comput & Applic 31, 955–965 (2019). https://doi.org/10.1007/s00521-017-3128-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3128-z