Abstract
Autoencoder and conventional machine learning classifiers are widely used to design an intrusion detection system (IDS). However, noise and corruption in the high-dimensional network traffic samples will still affect the stability and performance of an autoencoder and other conventional machine learning based IDS models. The distortions in the input datasets cause deviations in the learnt patterns and always resulted in a low detection rate. Besides, the IDS classifiers use every single feature to train the samples, which makes the model consumes longer training time, computational resources and memory usage. The main aim of this proposal is to remove the distortions from the network traffic and train the IDS model in a faster manner to detect any category of intruders in the network traffic by achieving a higher detection rate in a short training time. To achieve this, we propose an intrusion detection system that combines a denoising autoencoder and LightGBM classifier. The denoising autoencoder removes the noise and corruptions in the network traffic, thereby possibly avoiding the deviations which can enhance the features learning capacity required for classification. Subsequently, to classify the samples, the LightGBM classifier is used. The classifier uses the feature histogram bins with larger gradients, thus avoiding using each feature at every iteration to accelerate the training speed and boost the predictive capacity of the model. The proposed model shows better detection performance improvement over nine benchmark datasets including CIDDS-001, CIDDS-002, ISCX-URL2016, UNSW-NB15, CIC-IDS-2017, ISCX-Tor2016, BoT-IoT, IoTID20 and Kyoto 2006+ for both binary classification and multi-classification tasks as compared to other existing IDS. The model achieves the maximum detection rate of over 99.60% for CIDDS-001, 99.90% for CIDDS-002, 97.00% for ISCX-Tor2016, 96.11% for UNSW-NB15, 99.86% for CIC-IDS17, 97.76% for ISCX-URL16, 99.91% for BoT-IoT, 97.43% for both IoTID2020 and Kyoto 2006+ datasets respectively, while the training time ranges from 1.10 to 21.78 s only. More importantly, the proposed model has higher learning and predictivity capacity which boosts the generalization capacity. The model also shows good performance in detecting all classes including the minority classes for all aforementioned datasets without any oversampling techniques. The efficiency of the model emphasizes that it can be deployed as a real-time model in any industrial network traffic that includes IoT based smart environment and fog-cloud computing network.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abualigah L, Diabat A, Mirjalili S, Elaziz MA, Gandomi AH (2021a) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376(2):113609
Abualigah L, Yousri D, Elaziz MA, Ewees AA, Al-qaness MAA, Gandomi AH (2021b) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng 57(11):107250
Abualigah L, Diabat A, Sumari P, Gandomi AH (2021c) Applications, deployments, and integration of internet of drones (IoD): a review. IEEE Sens J 21(22):25532–25546
Ahmed AA, Jabbar WA, Sadiq AS, Patel H (2022) Deep learning based classification model for botnet attack detection. J Ambient Intell Human Comput 13:3457–3466
Alsamiri J, Alsubhi K (2019) Internet of things cyber attacks detection using machine learning. Int J Adv Comput Sci Appl 10(12):627–634
Anitha P, Kaarthick B (2019) Oppositional based Laplacian grey wolf optimization algorithm with SVM for data mining in intrusion detection system. J Ambient Intell Human Comput 12:3589–3600
Attak H, Combalia M, Gardikis G, Gaston B et al (2018) Application of distributed computing and machine learning technologies to cybersecurity. In: The conference on artificial intelligence and cybersecurity, p 1–13
Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoencoder based models. In: 2017 IEEE 4th international conference on cyber security and cloud computing, IEEE, p 193–198
Baig MM, Awaisa MM, El-Alfy ESM (2017) A multi-class cascade of artificial neural network for network intrusion detection. J Intell Fuzzy Syst 32(4):2875–2883
Bansal A, Kaur S (2018) Extreme gradient boosting based tuning for classification in intrusion detection systems. In: Singh M, Gupta P, Tyagi V, Flusser J, Oren T (eds) Advances in computing and data sciences. Springer, Singapore, pp 372–380
Besharati E, Naderan M, Namjoo E (2018) LR-HIDS: logistic regression host-based intrusion detection system for cloud environments. J Ambient Intell Human Comput 10:3669–3692
Catak FO, Mustacoglu AF (2019) Distributed denial of service attack detection using autoencoder and deep neural networks. J Intell Fuzzy Syst 37:3969–3979
Chowdhury S, Liang B, Tizghadam A (2019) Explaining class-of-service oriented network traffic classification with super features. In:Proceedings of the 3rd ACM CoNEXT workshop on big data, machine learning and artificial intelligence for data communication networks. Association for computing machinery
Cuautla DG, Suarez AH, Perez GS (2020) Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl Sci 10(3):794
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1-4):131–156
Dwibedi S, Pujari M, Sun W (2020) A comparative study on contemporary intrusion detection datasets for machine learning research. In: 2020 IEEE international conference on intelligence and security informatics (ISI), IEEE, 2020
Ferrag MA, Maglaras L, Ahmim A, Derdour M, Janicke H (2020a) RDTIDS: Rules and decision tree-based intrusion detection system for internet-of-things networks. Future Internet 12(3):44
Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020b) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inf Secur Appl 50:102419
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Gu J, Lu S (2021) An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Comput Secur 103:102158
Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised K-means DDoS detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365
He W, Li H, Li J (2019) Ensemble features selection for improving intrusion detection classification accuracy. In: Proceedings of the 2019 international conference on artificial intelligence and computer science, p 28–33
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366
Hsu YF, He ZY, Tarutani Y, Matsuoka M (2019) Toward an online network intrusion detection system based on ensemble learning. In: 12th international conference on cloud computing, IEEE, p 174–178
Idhammad M, Afde K, Belouch M (2018a) Distributed intrusion detection system for cloud environments based on data mining techniques. Procedia Comput Sci 127:35–41
Idhammad M, Afde K, Belouch M (2018b) Detection system of HTTP DDoS attacks in a cloud environment based on information theoretic entropy and random forest. Secur Commun Netw 2018:1263123
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st conference on advances in neural information processing systems (NIPS 2017), p 3149–3157
Khan FA, Gumaei A, Derhab A, Hussain A (2019) A novel two stage deep learning model for efficient network intrusion detection. IEEE Access 7:30373–30385
Khraisat A, Gondal I, Vamplew P, Kamruzzaman J, Alazab A (2019) A novel ensemble of hybrid intrusion detection system for detecting internet of things attacks. Electronics 8(11):1210
Koroniotis N, Moustafa N, Sitnikova E, Turnbull BP (2019) Towards the development of realistic botnet dataset in the internet of things for network forensic systems. Future Gener Comput Syst 100:779–796
Kumar P, Gupta GP, Tripathi R (2021a) A distributed ensemble design based intrusion detection system using fog computing to protect the internet of things networks. J Ambient Intell Human Comput 12:9555–9572
Kumar P, Gupta GP, Tripathi R (2021b) Toward design of an intelligent cyber attack detection system using hybrid feature reduced approach for IoT networks. Arab J Sci Eng 46:3749–3778
Kumar P, Gupta GP, Tripathi R (2021c) An ensemble learning and fog-cloud architecture-driven cyber-attack detection framework for IoMT networks. Comput Commun 166:110–124
Kumar P, Gupta GP, Tripathi R (2021d) Design of anomaly-based intrusion detection system using fog computing for IoT network. Autom Control Comput Sci 55:137–147
Kunang YN, Nurmaini S, Stiawan D, Zarkasi A, Firdaus, Jasmir (2018) Automatic features extraction using autoencoder in intrusion detection system. In: 2018 International conference on electrical engineering and computer science, IEEE
Lashkari AH, Gil GD, Mamun MSI, Ghorbani AA (2017) Characterization of tor traffic using time based features. In: Proceedings of the 3rd international conference on information systems security and privacy (ICISSP 2017), SciTePress, p 253–262
Lee SC, Heinbuch DV (2001) Training a neural-network based intrusion detector to recognize novel attacks. IEEE Trans Syst Man Cybern Syst Hum 31(4):294–299
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li Q, Meng L, Zhang Y, Yan J (2019) DDoS attacks detection using machine learning algorithms. In: International forum on digital TV and wireless multimedia communications (IFTC 2018). Digital TV and multimedia communication 1009, p 205–216
Liao Y, Vemuri VR (2002) Use of K-nearest neighbour classifier for intrusion detection. Comput Secur 21:439–448
Lopez AD, Mohan AP, Nair S (2019) Network traffic behavioural analytics for detection of DDoS attacks. SMU Data Sci Rev 2(1):14
Mahajan HB, Badarla A (2020) Detecting HTTP vulnerabilities in IoT-based precision farming connected with cloud environment using artificial intelligence. Int J Adv Sci Technol 29(3):214–226
Mamun MSI, Rathore MA, Lashkari AH, Stakhanova N, Ghorbani AA (2016) Detecting malicious URLs using lexical analysis. In: Chen J, Piuri V, Su C, Yung M (eds) Network and system security. Springer, Berlin, pp 467–482
Meira J, Andrade R, Praça I, Carneiro J (2020) Performance evaluation of unsupervised techniques in cyber attack anomaly detection. J Ambient Intell Human Comput 11:4477–4489
Meng L, Ding S, Xue Y (2016) Research on denoising autoencoder. Int J Mach Learn Cybern 8(5):1719–1729
Mousavi SM, Majidnezhad V, Naghipour A (2022) A new intelligent intrusion detector based on ensemble of decision trees. J Ambient Intell Human Comput 13:3347–3359. https://link.springer.com/article/10.1007/s12652-019-01596-5
Moustafa N, Slay J (2015). UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), IEEE, pp 1–6.
Nazari Z, Noferesti M, Jalili R (2019) DSCA: an inline and adaptive application identification approach in encrypted network traffic. In: Proceedings of the 3rd international conference on cryptography, security and privacy, p 39–43.
Negandhi P, Trivedi Y, Mangrulkar R (2009) Intrusion detection system using random forest on the NSL-KDD dataset. In: Shetty N, Patnaik L, Nagaraj H, Hamsavath P, Nalini (eds) Emerging research in computing, information, communication and applications. Advances in intelligent systems and computing, vol 906. Springer, Berlin, pp 519–531
Ossowicka AD, Pietrołaj M, Rumiński J (2021) A survey of neural networks usage for intrusion detection systems. J Ambient Intell Human Comput 12:497–514
Panda M, Patra MR (2007) Network intrusion detection using naive bayes. Int J Comput Sci Netw Secur 7:258–263
Park K, Song Y, Cheong YG (2018) Classification of attack types for intrusion detection systems using a machine learning algorithm. In: 2018 IEEE fourth international conference on big data computing service and applications (BigDataService), IEEE, p 282–286
Pattawaro A, Polprasert C (2018) Anomaly-based net work intrusion detection system through feature selection and hybrid machine learning technique. In: 2018 16th international conference on ICT and knowledge, IEEE, 2018.
Peng K, Leung VCM, Zheng L, Wang S, Huang C, Lin T (2018) Intrusion detection system based on decision tree over big data in fog environment. Wirel Commun Mob Comput 2018:4680867
Qureshi AS, Khan A, Shamim N, Durad MH (2019) Intrusion detection using deep sparse auto-encoder and self-taught learning. Neural Comput Appl 32:3135–3147
Razdan S, Gupta H, Seth A (2021) Performance analysis of network intrusion detection systems using J48 and naive Bayes algorithms. In: 2021 6th International conference for convergence in technology (I2CT), IEEE, p 1–7
Ring M, Wunderlich S, Gruedl D, Landes D, Hotho A (2017a) Creation of flow-based data sets for intrusion detection. J Inf Warf 16(4):41–54
Ring M, Wunderlich S, Gruedl D, Landes D, Hotho A (2017b) Flow-based benchmark data sets for intrusion detection. In: Proceedings of the 16th European conference on cyber warfare and security, p 361–369
Ring M, Wunderlich S, Scheuring D, Landes D (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167
Safaldin M, Otair M, Abualigah L (2021) Improved binary gray wolf optimizer and SVM for intrusion detection system in wireless sensor networks. J Ambient Intell Human Comput 12(11):1559–1576
Sahu S, Mehtre BM (2015) Network intrusion detection system using J48 decision tree. In: International conference on advances in computing, communications and informatics, IEEE, p 2023–2026
Santikellur P, Haque T, Zewairi MA, Chakraborty R (2019) Optimized multi-layer hierarchical network intrusion detection system with genetic algorithms. In: 2019 2nd International conference on new trends in computing sciences, IEEE, p 1–7
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th international conference on informa tion systems security and privacy (ICISSP 2018), SciTePress, p 108–116.
Shukla P, Rai R (2017) Ara-mac: attacker identification using logistic regression. In: 2017 International conference on recent innovations in signal processing and embedded systems, IEEE, p 124–128
Song J, Takakura H, Okabe Y, Eto M et al (2011) Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of the first workshop on building analysis datasets and gathering experience returns for security, p 29–36
Sumathi S, Karthikeyan N (2020) Detection of distributed denial of service using deep learning neural network. J Ambient Intell Human Comput 12:5943–5953. https://link.springer.com/article/10.1007/s12652-020-02144-2
Thakkar A, Lohiya R (2021) Attack classification using feature selection techniques: a comparative study. J Ambient Intell Human Comput 12:1249–1266
Ucar E, Ucar M, Incetas MO (2019) A deep learning approach for detection of malicious URLs. In: Proceedings of the international management informa tion systems conference, IEEE, p 10–16
Ullah I, Mahmoud Q H (2020b) A scheme for generating a dataset for anomalous activity detection in IoT networks. In: Proceedings of the 33rd Canadian conference on artificial intelligence (Canadian AI 2020b), Spring, pp.508–520.
Ullah I, Mahmoud QH (2020a) A two-level flow-based anomalous activity system for IoT networks. Electronics 9(3):530
Verma A, Ranga V (2020) Machine learning based intrusion detection systems for IoT applications. Wirel Pers Commun 111:2287–2310
Vijayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatrama S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Yan J, Jin D, Lee CW, Liu PA (2018) Comparative study of off-line deep learning-based network intrusion detection. In: Tenth international conference on ubiquitous and future networks, IEEE, p 299–304
Zaman M, Lung CH (2018) Evaluation of machine learning techniques for network intrusion detection. In: 2018 IEEE/IFIP conference on network operations and management symposium, IEEE, p 1–5
Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity—application to the Tox21 and Mutagenicity data sets. J Chem Inf Model 59(10):4150–4158
Funding
Funding was provided by Ministry of Higher Education, Malaysia (grant no. TRGS/1/2016/UTAR/01/2/2).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ayubkhan, S.A.H., Yap, WS., Morris, E. et al. A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance. J Ambient Intell Human Comput 14, 7427–7452 (2023). https://doi.org/10.1007/s12652-022-04449-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-04449-w