Skip to main content
Log in

Attack classification using feature selection techniques: a comparative study

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The goal of securing a network is to protect the information flowing through the network and to ensure the security of intellectual as well as sensitive data for the underlying application. To accomplish this goal, security mechanism such as Intrusion Detection System (IDS) is used, that analyzes the network traffic and extract useful information for inspection. It identifies various patterns and signatures from the data and use them as features for attack detection and classification. Various Machine Learning (ML) techniques are used to design IDS for attack detection and classification. All the features captured from the network packets do not contribute in detecting or classifying attack. Therefore, the objective of our research work is to study the effect of various feature selection techniques on the performance of IDS. Feature selection techniques select relevant features and group them into subsets. This paper implements Chi-Square, Information Gain (IG), and Recursive Feature Elimination (RFE) feature selection techniques with ML classifiers namely Support Vector Machine, Naïve Bayes, Decision Tree Classifier, Random Forest Classifier, k-nearest neighbours, Logistic Regression, and Artificial Neural Networks. The methods are experimented on NSL-KDD dataset and comparative analysis of results is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agarwal N, Hussain SZ (2018) A closer look at intrusion detection system for web applications. Secur Commun Netw 2018:1–27. https://doi.org/10.1155/2018/9601357

    Article  Google Scholar 

  • Aljawarneh S, Aldwairi M, Yassein MB (2018) Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci 25:152–160

    Article  Google Scholar 

  • Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:170702919

  • Almseidin M, Alzubi M, Kovacs S, Alkasassbeh M (2017) Evaluation of machine learning algorithms for intrusion detection system. In: 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), IEEE, pp 000277–000282

  • Amiri F, Yousefi MR, Lucas C, Shakery A, Yazdani N (2011) Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 34(4):1184–1199

    Article  Google Scholar 

  • Balasaraswathi VR, Sugumaran M, Hamid Y (2017) Feature selection techniques for intrusion detection using non-bio-inspired and bio-inspired optimization algorithms. J Commun Inform Netw 2(4):107–119

    Article  Google Scholar 

  • Benaddi H, Ibrahimi K, Benslimane A (2018) Improving the intrusion detection system for nsl-kdd dataset based on pca-fuzzy clustering-knn. In: 2018 6th International Conference on Wireless Networks and Mobile Communications (WINCOM), IEEE, pp 1–6

  • Besharati E, Naderan M, Namjoo E (2019) Lr-hids: logistic regression host-based intrusion detection system for cloud environments. J Ambient Intell Human Comput 10(9):3669–3692

    Article  Google Scholar 

  • Biswas SK (2018) Intrusion detection using machine learning: a comparison study. Int J Pure Appl Math 118(19):101–114

    Google Scholar 

  • Bitaab M, Hashemi S (2017) Hybrid intrusion detection: Combining decision tree and gaussian mixture model. In: 2017 14th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC), IEEE, pp 8–12

  • Breiman L (2017) Classification and regression trees. Routledge, Abingdon

    Book  Google Scholar 

  • Chomboon K, Chujai P, Teerarassamee P, Kerdprasop K, Kerdprasop N (2015) An empirical study of distance metrics for k-nearest neighbor algorithm. In: Proceedings of the 3rd international conference on industrial application engineering, pp 1–6

  • Chou TS, Yen KK, Luo J (2008) Network intrusion detection design using feature selection of soft computing paradigms. Int J Computat Intell 4(3):196–208

    Google Scholar 

  • Da Silva IN, Spatti DH, Flauzino RA, Liboni LHB, dos Reis Alves SF (2017) Artificial neural networks. Springer International Publishing, Cham

    Book  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intelligent data analysis 1(1–4):131–156

    Article  Google Scholar 

  • Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng 2:222–232

    Article  Google Scholar 

  • Deshmukh DH, Ghorpade T, Padiya P (2015) Improving classification using preprocessing and machine learning algorithms on nsl-kdd dataset. In: 2015 International Conference on Communication, Information & Computing Technology (ICCICT), IEEE, pp 1–6

  • Dogan Ü, Glasmachers T, Igel C (2016) A unified view on multi-class support vector classification. J Mach Learn Res 17(45):1–32

    MathSciNet  MATH  Google Scholar 

  • Ektefa M, Memar S, Sidi F, Affendey LS (2010) Intrusion detection using data mining techniques. In: 2010 International Conference on Information Retrieval & Knowledge Management (CAMP), IEEE, pp 200–203

  • Fadlil A, Riadi I, Aji S (2017) Ddos attacks classification using numeric attributebased gaussian naive bayes. Int J Adv Comput Sci Appl (IJACSA) 8(8):42–50

    Google Scholar 

  • Hackeling G (2017) Mastering Machine Learning with scikit-learn. Packt Publishing Ltd, pp 1–254. https://www.packtpub.com/in/big-data-and-business-intelligence/mastering-machine-learning-scikit-learn-second-edition

  • Harrell FE Jr (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, Berlin

    Book  Google Scholar 

  • Heba FE, Darwish A, Hassanien AE, Abraham A (2010) Principle components analysis and support vector machine based intrusion detection system. In: Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on, IEEE, pp 363–367

  • Ingre B, Yadav A (2015) Performance analysis of nsl-kdd dataset using ann. In: 2015 International Conference on Signal Processing and Communication Engineering Systems, IEEE, pp 92–96

  • Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology. Electronics and Microelectronics (MIPRO), IEEE, pp 1200–1205

  • Kloft M, Brefeld U, Düessel P, Gehl C, Laskov P (2008) Automatic feature selection for anomaly detection. In: Proceedings of the 1st ACM workshop on Workshop on AISec, ACM, pp 71–76

  • Kumar K, Batth JS (2016) Network intrusion detection with feature selection techniques using machine-learning algorithms. Int J Comput Appl 150(12):1–13. https://doi.org/10.5120/ijca2016910764

    Article  Google Scholar 

  • Kumari B, Swarnkar T (2011) Filter versus wrapper feature subset selection in large dimensionality micro array: a review. Int J Comput Sci Inf Technol 2(3):1048–1053

    Google Scholar 

  • Larson D (2016) Distributed denial of service attacks-holding back the flood. Netw Secur 2016(3):5–7

    Article  Google Scholar 

  • Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput Surv 50:94:1–94:45

    Google Scholar 

  • Maillo J, Ramírez S, Triguero I, Herrera F (2017) knn-is: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl Based Syst 117:3–15

    Article  Google Scholar 

  • Mandal N, Jadhav S (2016) A survey on network security tools for open source. In: 2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), IEEE, pp 1–6

  • Mansournia MA, Geroldinger A, Greenland S, Heinze G (2017) Separation in logistic regression: causes, consequences, and control. Am J Epidemiol 187(4):864–870

    Article  Google Scholar 

  • Mayuranathan M, Murugan M, Dhanakoti V (2019) Best features based intrusion detection system by rbm model for detecting ddos in cloud environment. J Ambient Intell Human Comput: 1–11

  • McHugh J (2000) Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory. ACM Trans Inform Syst Secu (TISSEC) 3(4):262–294

    Article  Google Scholar 

  • Meira J, Andrade R, Praça I, Carneiro J, Bolón-Canedo V, Alonso-Betanzos A, Marreiros G (2019) Performance evaluation of unsupervised techniques in cyber-attack anomaly detection. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-019-01417-9

    Article  Google Scholar 

  • Meyer D, Wien FT (2015) Support vector machines. Interf Libsvm Pack e1071:28

  • Mkuzangwe NN, Nelwamondo F (2017) Ensemble of classifiers based network intrusion detection system performance bound. In: 2017 4th International Conference on Systems and Informatics (ICSAI), IEEE, pp 970–974

  • Mousavi SM, Majidnezhad V, Naghipour A (2019) A new intelligent intrusion detector based on ensemble of decision trees. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-019-01596-5

    Article  Google Scholar 

  • Mukherjee S, Sharma N (2012) Intrusion detection using naive bayes classifier with feature reduction. Proc Technol 4:119–128

    Article  Google Scholar 

  • Nehinbe JO (2011) A critical evaluation of datasets for investigating idss and ipss researches. In: 2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS), IEEE, pp 92–97

  • Nguyen H, Franke K, Petrovic S (2010) Improving effectiveness of intrusion detection by correlation feature selection. In: Availability, Reliability, and Security, 2010. ARES’10 International Conference on, IEEE, pp 17–24

  • Olusola AA, Oladele AS, Abosede DO (2010) Analysis of kdd’99 intrusion detection dataset for selection of relevance features. Proc World Cong Eng Comput Sci Citeseer 1:20–22

    Google Scholar 

  • Phutane MT, Pathan A (2015) Intrusion detection system using decision tree and apriori algorithm. J Comput Eng Technol 6(7):09–18

    Google Scholar 

  • Puga JL, Krzywinski M, Altman N (2015) Points of significance: Bayes’ theorem. Nat Methods 12:277–278. https://doi.org/10.1038/nmeth.3335

    Article  Google Scholar 

  • Rajput D, Thakkar A (2019) A survey on different network intrusion detection systems and countermeasure. Emerging Research in Computing. Information, Communication and Applications, Springer, pp 497–506

  • Richhariya R, Manjhwar AK, Makwana RRS (2017) A hybrid approach for user to root and remote to local attack. Int J Comput Sci Eng 5(6):73–79

    Google Scholar 

  • Russell SJ, Norvig P (2016) Artificial intelligence: a modern approach. Pearson Education Limited, Malaysia

    MATH  Google Scholar 

  • Sahani R, Rout C, Badajena JC, Jena AK, Das H, et al. (2018) Classification of intrusion detection using data mining techniques. In: Progress in computing, analytics and networking, Springer, pp 753–764

  • Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: ICISSP, pp 108–116

  • Smaha SE (1988) Haystack: An intrusion detection system. In: [Proceedings 1988] Fourth Aerospace Computer Security Applications, IEEE, pp 37–44

  • Song YY, Ying L (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130

    Google Scholar 

  • Subba B, Biswas S, Karmakar S (2016) Enhancing performance of anomaly based intrusion detection systems through dimensionality reduction using principal component analysis. In: 2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), IEEE, pp 1–6

  • Suthaharan S (2016) Support vector machine. In: Machine learning models and algorithms for big data classification, vol 36. Springer, pp 207–235

  • Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications, IEEE, pp 1–6

  • Thakkar A, Lohiya R (2020a) A review of the advancement in intrusion detection datasets. Procedia Comput Sci 167:636–645. https://doi.org/10.1016/j.procs.2020.03.330

  • Thakkar A, Lohiya R (2020b) Role of swarm and evolutionary algorithms for intrusion detection system: a survey. Swarm Evolut Comput 53:100631

    Article  Google Scholar 

  • Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class svm. J King Saud Univ Comput Inform Sci 29(4):462–472

    Article  Google Scholar 

  • van Gerven M, Bohte S (2018) Artificial neural networks as models of neural information processing. Frontiers Media SA. https://www.frontiersin.org/research-topics/4817/artificial-neural-networks-as-models-of-neural-information-processing

  • Wahba Y, ElSalamouny E, ElTaweel G (2015) Improving the performance of multi-class intrusion detection systems using feature reduction. arXiv preprint arXiv:150706692

  • Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, pp 1–629. ISBN 978-0-12-374856-0. https://doi.org/10.1016/B978-0-12-374856-0.00002-X

  • Zainal A, Maarof MA, Shamsuddin SM et al (2009) Ensemble classifiers for network intrusion detection system. J Inform Assur Secur 4(3):217–225

    Google Scholar 

  • Zaman S, Karray F (2009) Features selection for intrusion detection systems based on support vector machines. In: Consumer Communications and Networking Conference, 2009. CCNC 2009. 6th IEEE, IEEE, pp 1–8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ritika Lohiya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thakkar, A., Lohiya, R. Attack classification using feature selection techniques: a comparative study. J Ambient Intell Human Comput 12, 1249–1266 (2021). https://doi.org/10.1007/s12652-020-02167-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02167-9

Keywords

Navigation