Skip to main content

Advertisement

A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Autoencoder and conventional machine learning classifiers are widely used to design an intrusion detection system (IDS). However, noise and corruption in the high-dimensional network traffic samples will still affect the stability and performance of an autoencoder and other conventional machine learning based IDS models. The distortions in the input datasets cause deviations in the learnt patterns and always resulted in a low detection rate. Besides, the IDS classifiers use every single feature to train the samples, which makes the model consumes longer training time, computational resources and memory usage. The main aim of this proposal is to remove the distortions from the network traffic and train the IDS model in a faster manner to detect any category of intruders in the network traffic by achieving a higher detection rate in a short training time. To achieve this, we propose an intrusion detection system that combines a denoising autoencoder and LightGBM classifier. The denoising autoencoder removes the noise and corruptions in the network traffic, thereby possibly avoiding the deviations which can enhance the features learning capacity required for classification. Subsequently, to classify the samples, the LightGBM classifier is used. The classifier uses the feature histogram bins with larger gradients, thus avoiding using each feature at every iteration to accelerate the training speed and boost the predictive capacity of the model. The proposed model shows better detection performance improvement over nine benchmark datasets including CIDDS-001, CIDDS-002, ISCX-URL2016, UNSW-NB15, CIC-IDS-2017, ISCX-Tor2016, BoT-IoT, IoTID20 and Kyoto 2006+ for both binary classification and multi-classification tasks as compared to other existing IDS. The model achieves the maximum detection rate of over 99.60% for CIDDS-001, 99.90% for CIDDS-002, 97.00% for ISCX-Tor2016, 96.11% for UNSW-NB15, 99.86% for CIC-IDS17, 97.76% for ISCX-URL16, 99.91% for BoT-IoT, 97.43% for both IoTID2020 and Kyoto 2006+ datasets respectively, while the training time ranges from 1.10 to 21.78 s only. More importantly, the proposed model has higher learning and predictivity capacity which boosts the generalization capacity. The model also shows good performance in detecting all classes including the minority classes for all aforementioned datasets without any oversampling techniques. The efficiency of the model emphasizes that it can be deployed as a real-time model in any industrial network traffic that includes IoT based smart environment and fog-cloud computing network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abualigah L, Diabat A, Mirjalili S, Elaziz MA, Gandomi AH (2021a) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376(2):113609

    MathSciNet  MATH  Google Scholar 

  • Abualigah L, Yousri D, Elaziz MA, Ewees AA, Al-qaness MAA, Gandomi AH (2021b) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng 57(11):107250

    Google Scholar 

  • Abualigah L, Diabat A, Sumari P, Gandomi AH (2021c) Applications, deployments, and integration of internet of drones (IoD): a review. IEEE Sens J 21(22):25532–25546

    Google Scholar 

  • Ahmed AA, Jabbar WA, Sadiq AS, Patel H (2022) Deep learning based classification model for botnet attack detection. J Ambient Intell Human Comput 13:3457–3466

    Google Scholar 

  • Alsamiri J, Alsubhi K (2019) Internet of things cyber attacks detection using machine learning. Int J Adv Comput Sci Appl 10(12):627–634

    Google Scholar 

  • Anitha P, Kaarthick B (2019) Oppositional based Laplacian grey wolf optimization algorithm with SVM for data mining in intrusion detection system. J Ambient Intell Human Comput 12:3589–3600

    Google Scholar 

  • Attak H, Combalia M, Gardikis G, Gaston B et al (2018) Application of distributed computing and machine learning technologies to cybersecurity. In: The conference on artificial intelligence and cybersecurity, p 1–13

  • Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoencoder based models. In: 2017 IEEE 4th international conference on cyber security and cloud computing, IEEE, p 193–198

  • Baig MM, Awaisa MM, El-Alfy ESM (2017) A multi-class cascade of artificial neural network for network intrusion detection. J Intell Fuzzy Syst 32(4):2875–2883

    Google Scholar 

  • Bansal A, Kaur S (2018) Extreme gradient boosting based tuning for classification in intrusion detection systems. In: Singh M, Gupta P, Tyagi V, Flusser J, Oren T (eds) Advances in computing and data sciences. Springer, Singapore, pp 372–380

    Google Scholar 

  • Besharati E, Naderan M, Namjoo E (2018) LR-HIDS: logistic regression host-based intrusion detection system for cloud environments. J Ambient Intell Human Comput 10:3669–3692

    Google Scholar 

  • Catak FO, Mustacoglu AF (2019) Distributed denial of service attack detection using autoencoder and deep neural networks. J Intell Fuzzy Syst 37:3969–3979

    Google Scholar 

  • Chowdhury S, Liang B, Tizghadam A (2019) Explaining class-of-service oriented network traffic classification with super features. In:Proceedings of the 3rd ACM CoNEXT workshop on big data, machine learning and artificial intelligence for data communication networks. Association for computing machinery

  • Cuautla DG, Suarez AH, Perez GS (2020) Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl Sci 10(3):794

    Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1-4):131–156

    Google Scholar 

  • Dwibedi S, Pujari M, Sun W (2020) A comparative study on contemporary intrusion detection datasets for machine learning research. In: 2020 IEEE international conference on intelligence and security informatics (ISI), IEEE, 2020

  • Ferrag MA, Maglaras L, Ahmim A, Derdour M, Janicke H (2020a) RDTIDS: Rules and decision tree-based intrusion detection system for internet-of-things networks. Future Internet 12(3):44

    Google Scholar 

  • Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020b) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inf Secur Appl 50:102419

    Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    MathSciNet  MATH  Google Scholar 

  • Gu J, Lu S (2021) An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Comput Secur 103:102158

    Google Scholar 

  • Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised K-means DDoS detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365

    Google Scholar 

  • He W, Li H, Li J (2019) Ensemble features selection for improving intrusion detection classification accuracy. In: Proceedings of the 2019 international conference on artificial intelligence and computer science, p 28–33

  • Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366

    MATH  Google Scholar 

  • Hsu YF, He ZY, Tarutani Y, Matsuoka M (2019) Toward an online network intrusion detection system based on ensemble learning. In: 12th international conference on cloud computing, IEEE, p 174–178

  • Idhammad M, Afde K, Belouch M (2018a) Distributed intrusion detection system for cloud environments based on data mining techniques. Procedia Comput Sci 127:35–41

    Google Scholar 

  • Idhammad M, Afde K, Belouch M (2018b) Detection system of HTTP DDoS attacks in a cloud environment based on information theoretic entropy and random forest. Secur Commun Netw 2018:1263123

    Google Scholar 

  • Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st conference on advances in neural information processing systems (NIPS 2017), p 3149–3157

  • Khan FA, Gumaei A, Derhab A, Hussain A (2019) A novel two stage deep learning model for efficient network intrusion detection. IEEE Access 7:30373–30385

    Google Scholar 

  • Khraisat A, Gondal I, Vamplew P, Kamruzzaman J, Alazab A (2019) A novel ensemble of hybrid intrusion detection system for detecting internet of things attacks. Electronics 8(11):1210

    Google Scholar 

  • Koroniotis N, Moustafa N, Sitnikova E, Turnbull BP (2019) Towards the development of realistic botnet dataset in the internet of things for network forensic systems. Future Gener Comput Syst 100:779–796

    Google Scholar 

  • Kumar P, Gupta GP, Tripathi R (2021a) A distributed ensemble design based intrusion detection system using fog computing to protect the internet of things networks. J Ambient Intell Human Comput 12:9555–9572

    Google Scholar 

  • Kumar P, Gupta GP, Tripathi R (2021b) Toward design of an intelligent cyber attack detection system using hybrid feature reduced approach for IoT networks. Arab J Sci Eng 46:3749–3778

    Google Scholar 

  • Kumar P, Gupta GP, Tripathi R (2021c) An ensemble learning and fog-cloud architecture-driven cyber-attack detection framework for IoMT networks. Comput Commun 166:110–124

    Google Scholar 

  • Kumar P, Gupta GP, Tripathi R (2021d) Design of anomaly-based intrusion detection system using fog computing for IoT network. Autom Control Comput Sci 55:137–147

    Google Scholar 

  • Kunang YN, Nurmaini S, Stiawan D, Zarkasi A, Firdaus, Jasmir (2018) Automatic features extraction using autoencoder in intrusion detection system. In: 2018 International conference on electrical engineering and computer science, IEEE

  • Lashkari AH, Gil GD, Mamun MSI, Ghorbani AA (2017) Characterization of tor traffic using time based features. In: Proceedings of the 3rd international conference on information systems security and privacy (ICISSP 2017), SciTePress, p 253–262

  • Lee SC, Heinbuch DV (2001) Training a neural-network based intrusion detector to recognize novel attacks. IEEE Trans Syst Man Cybern Syst Hum 31(4):294–299

    Google Scholar 

  • Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  • Li Q, Meng L, Zhang Y, Yan J (2019) DDoS attacks detection using machine learning algorithms. In: International forum on digital TV and wireless multimedia communications (IFTC 2018). Digital TV and multimedia communication 1009, p 205–216

  • Liao Y, Vemuri VR (2002) Use of K-nearest neighbour classifier for intrusion detection. Comput Secur 21:439–448

    Google Scholar 

  • Lopez AD, Mohan AP, Nair S (2019) Network traffic behavioural analytics for detection of DDoS attacks. SMU Data Sci Rev 2(1):14

    Google Scholar 

  • Mahajan HB, Badarla A (2020) Detecting HTTP vulnerabilities in IoT-based precision farming connected with cloud environment using artificial intelligence. Int J Adv Sci Technol 29(3):214–226

    Google Scholar 

  • Mamun MSI, Rathore MA, Lashkari AH, Stakhanova N, Ghorbani AA (2016) Detecting malicious URLs using lexical analysis. In: Chen J, Piuri V, Su C, Yung M (eds) Network and system security. Springer, Berlin, pp 467–482

    Google Scholar 

  • Meira J, Andrade R, Praça I, Carneiro J (2020) Performance evaluation of unsupervised techniques in cyber attack anomaly detection. J Ambient Intell Human Comput 11:4477–4489

    Google Scholar 

  • Meng L, Ding S, Xue Y (2016) Research on denoising autoencoder. Int J Mach Learn Cybern 8(5):1719–1729

    Google Scholar 

  • Mousavi SM, Majidnezhad V, Naghipour A (2022) A new intelligent intrusion detector based on ensemble of decision trees. J Ambient Intell Human Comput 13:3347–3359. https://link.springer.com/article/10.1007/s12652-019-01596-5

    Article  Google Scholar 

  • Moustafa N, Slay J (2015). UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), IEEE, pp 1–6.

  • Nazari Z, Noferesti M, Jalili R (2019) DSCA: an inline and adaptive application identification approach in encrypted network traffic. In: Proceedings of the 3rd international conference on cryptography, security and privacy, p 39–43.

  • Negandhi P, Trivedi Y, Mangrulkar R (2009) Intrusion detection system using random forest on the NSL-KDD dataset. In: Shetty N, Patnaik L, Nagaraj H, Hamsavath P, Nalini (eds) Emerging research in computing, information, communication and applications. Advances in intelligent systems and computing, vol 906. Springer, Berlin, pp 519–531

    Google Scholar 

  • Ossowicka AD, Pietrołaj M, Rumiński J (2021) A survey of neural networks usage for intrusion detection systems. J Ambient Intell Human Comput 12:497–514

    Google Scholar 

  • Panda M, Patra MR (2007) Network intrusion detection using naive bayes. Int J Comput Sci Netw Secur 7:258–263

    Google Scholar 

  • Park K, Song Y, Cheong YG (2018) Classification of attack types for intrusion detection systems using a machine learning algorithm. In: 2018 IEEE fourth international conference on big data computing service and applications (BigDataService), IEEE, p 282–286

  • Pattawaro A, Polprasert C (2018) Anomaly-based net work intrusion detection system through feature selection and hybrid machine learning technique. In: 2018 16th international conference on ICT and knowledge, IEEE, 2018.

  • Peng K, Leung VCM, Zheng L, Wang S, Huang C, Lin T (2018) Intrusion detection system based on decision tree over big data in fog environment. Wirel Commun Mob Comput 2018:4680867

    Google Scholar 

  • Qureshi AS, Khan A, Shamim N, Durad MH (2019) Intrusion detection using deep sparse auto-encoder and self-taught learning. Neural Comput Appl 32:3135–3147

    Google Scholar 

  • Razdan S, Gupta H, Seth A (2021) Performance analysis of network intrusion detection systems using J48 and naive Bayes algorithms. In: 2021 6th International conference for convergence in technology (I2CT), IEEE, p 1–7

  • Ring M, Wunderlich S, Gruedl D, Landes D, Hotho A (2017a) Creation of flow-based data sets for intrusion detection. J Inf Warf 16(4):41–54

    Google Scholar 

  • Ring M, Wunderlich S, Gruedl D, Landes D, Hotho A (2017b) Flow-based benchmark data sets for intrusion detection. In: Proceedings of the 16th European conference on cyber warfare and security, p 361–369

  • Ring M, Wunderlich S, Scheuring D, Landes D (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167

    Google Scholar 

  • Safaldin M, Otair M, Abualigah L (2021) Improved binary gray wolf optimizer and SVM for intrusion detection system in wireless sensor networks. J Ambient Intell Human Comput 12(11):1559–1576

    Google Scholar 

  • Sahu S, Mehtre BM (2015) Network intrusion detection system using J48 decision tree. In: International conference on advances in computing, communications and informatics, IEEE, p 2023–2026

  • Santikellur P, Haque T, Zewairi MA, Chakraborty R (2019) Optimized multi-layer hierarchical network intrusion detection system with genetic algorithms. In: 2019 2nd International conference on new trends in computing sciences, IEEE, p 1–7

  • Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th international conference on informa tion systems security and privacy (ICISSP 2018), SciTePress, p 108–116.

  • Shukla P, Rai R (2017) Ara-mac: attacker identification using logistic regression. In: 2017 International conference on recent innovations in signal processing and embedded systems, IEEE, p 124–128

  • Song J, Takakura H, Okabe Y, Eto M et al (2011) Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of the first workshop on building analysis datasets and gathering experience returns for security, p 29–36

  • Sumathi S, Karthikeyan N (2020) Detection of distributed denial of service using deep learning neural network. J Ambient Intell Human Comput 12:5943–5953. https://link.springer.com/article/10.1007/s12652-020-02144-2

    Article  Google Scholar 

  • Thakkar A, Lohiya R (2021) Attack classification using feature selection techniques: a comparative study. J Ambient Intell Human Comput 12:1249–1266

    Google Scholar 

  • Ucar E, Ucar M, Incetas MO (2019) A deep learning approach for detection of malicious URLs. In: Proceedings of the international management informa tion systems conference, IEEE, p 10–16

  • Ullah I, Mahmoud Q H (2020b) A scheme for generating a dataset for anomalous activity detection in IoT networks. In: Proceedings of the 33rd Canadian conference on artificial intelligence (Canadian AI 2020b), Spring, pp.508–520.

  • Ullah I, Mahmoud QH (2020a) A two-level flow-based anomalous activity system for IoT networks. Electronics 9(3):530

    Google Scholar 

  • Verma A, Ranga V (2020) Machine learning based intrusion detection systems for IoT applications. Wirel Pers Commun 111:2287–2310

    Google Scholar 

  • Vijayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatrama S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550

    Google Scholar 

  • Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  • Yan J, Jin D, Lee CW, Liu PA (2018) Comparative study of off-line deep learning-based network intrusion detection. In: Tenth international conference on ubiquitous and future networks, IEEE, p 299–304

  • Zaman M, Lung CH (2018) Evaluation of machine learning techniques for network intrusion detection. In: 2018 IEEE/IFIP conference on network operations and management symposium, IEEE, p 1–5

  • Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity—application to the Tox21 and Mutagenicity data sets. J Chem Inf Model 59(10):4150–4158

    Google Scholar 

Download references

Funding

Funding was provided by Ministry of Higher Education, Malaysia (grant no. TRGS/1/2016/UTAR/01/2/2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wun-She Yap.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ayubkhan, S.A.H., Yap, WS., Morris, E. et al. A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance. J Ambient Intell Human Comput 14, 7427–7452 (2023). https://doi.org/10.1007/s12652-022-04449-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-04449-w

Keywords