Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model

https://doi.org/10.1016/j.jocs.2017.03.006Get rights and content

Highlights

  • Utilise the NSL-KDD data set and the binary and multiclass problem with a 20% training dataset.

  • This paper studied a new model that can be used to estimate the intrusion scope threshold degree.

  • The experimental result revealed that the hybrid approach had a significant effect on the minimisation of the computational and time complexity.

  • The accuracy of the proposed model was satisfactory at 99.77% and 99.63% for the binary class and multiclass NSL-KDD data sets, respectively.

Abstract

Efficiently detecting network intrusions requires the gathering of sensitive information. This means that one has to collect large amounts of network transactions including high details of recent network transactions. Assessments based on meta-heuristic anomaly are important in the intrusion related network transaction data’s exploratory analysis. These assessments are needed to make and deliver predictions related to the intrusion possibility based on the available attribute details that are involved in the network transaction. We were able to utilize the NSL-KDD data set, the binary and multiclass problem with a 20% testing dataset. This paper develops a new hybrid model that can be used to estimate the intrusion scope threshold degree based on the network transaction data’s optimal features that were made available for training. The experimental results revealed that the hybrid approach had a significant effect on the minimisation of the computational and time complexity involved when determining the feature association impact scale. The accuracy of the proposed model was measured as 99.81% and 98.56% for the binary class and multiclass NSL-KDD data sets, respectively.

However, there are issues with obtaining high false and low false negative rates. A hybrid approach with two main parts is proposed to address these issues. First, data needs to be filtered using the Vote algorithm with Information Gain that combines the probability distributions of these base learners in order to select the important features that positively affect the accuracy of the proposed model. Next, the hybrid algorithm consists of following classifiers: J48, Meta Pagging, RandomTree, REPTree, AdaBoostM1, DecisionStump and NaiveBayes. Based on the results obtained using the proposed model, we observe improved accuracy, high false negative rate, and low false positive rule.

Introduction

Intrusion detection systems (IDS) are generally divided into two types (see Fig. 1): misuse and anomaly intrusion detection systems. For a misuse IDS, instructions are identified based on parameters of system weaknesses and known attack signatures. However, it does not recognise attacks that are new or unfamiliar. On the other hand, anomaly IDS is based on normal behaviour parameters and utilizes them to pinpoint any action that deviates significantly from normal behaviour. The misuse intrusion detection mechanism identifies intrusions by matching existing intrusion patterns in consideration for examination with previously identified patterns. On the other hand, anomaly intrusion detection identifies patterns based on the examination of data taken from normal usage [1].

Valuable information is always attractive to attackers and therefore vulnerable to concentrated network attacks. Intrusion refers to the process when an attacker enters the system or system server forwarding malicious packets to the user system so that it can steal, modify, or corrupt any confidential or important information. An attack refers to the illegal sending of network packets through the network. The intrusion can take place over the server or system as a result of existing system vulnerabilities, such as user misuse, system misconfiguration, or program defects. One can also make an intelligent intrusion by putting together multiple vulnerabilities. In a global network, large number of online services and millions of big servers are running in the system. At the same time, such networks become more attractive to more attackers and thus require intelligent intrusion detection models to defend their network system [3], [4], [42].

The following steps are part of an intelligent intrusion or system attack [3]:

  • Collecting information: Gathering information about the target involves obtaining all the details and knowledge about the user who will be under attack. This is made possible by executing queries through the use of network commands such as “nslookup”, “whois” to obtain domain name, IP addresses, and server name, etc.

  • Probing and scanning: Involves scanning of the target host and checking the system’s unguarded or unprotected areas as it searches for the sensitive information.

  • Remote to local access: Refers to the process of gaining user system access by R2L (remote to local) attack types, such as password guessing, buffer overflow attack, and network sniffing. In other words, in an R2L attack, an unknown person sends the network packet in order to gain local access to the user machine and be able to execute commands on the target. This type of attack can be performed by using open ports found on the target machine, utilizing the system vulnerabilities, password guessing etc.

  • User to root access: For this type of attack, system vulnerabilities are used by a normal system user to gain root access to the system. They are quite similar to R2L attacks. However, the attacker here is already a normal machine user and he/she will just try to gain root access to the machine.

  • Launch attacks: Finally, actual attacks are launched. Example of these attacks are modifying web pages, stealing confidential information, creating a backdoors for future attacks, or accessing another person’s accounts.

Efficient IDS are normally developed through the utilization of data mining techniques due to the fact that they can excellently detect intrusions and adeptly perform generalisations. However, the implementation and installation of such systems can be naturally complex. The systems’ inherent complications can be categorised into distinct problem sets based on competence, accuracy, and usability parameters [1], [2], [42]. However, IDS designed using data mining techniques and mainly those techniques that have their basis on anomaly detection exhibit a higher percentage of false positive incidents in comparison to previous detection techniques that have their basis on handcrafted signature. Hence, it is difficult for these techniques to process data audit and detect online intrusions. Furthermore, the system’s learning process requires large amounts of training data and great complexity compared to current available methodologies.

Therefore, building efficient intrusion detection is vital in the network system’s defense and helps in sensing attacks over the network. Therefore, a hybrid classification-based intrusion detection model and a feature selection are proposed. Then, the NSL-KDD data set’s dimensions are reduced through the implementation of feature selection. Afterwards, with the application of machine learning approach, an intrusion detection model can be built and used to find system attacks and use the captured data to improve intrusion detection. The proposed model needs feature extraction, dimensionality reduction that can reduce the extracted features, and feature selection. The process of feature extraction involves the utilization of all transformation features, which in turn are made up of a mixture of all the initial features. During the process of feature selection, the classification criteria serve as the basis for the selection of features.

Our work has been organized as follows. The related works are discussed in Section 2. In Section 3, overview of the confusion matrix is drawn to indicate the main elements that should be considered to assess the proposed model usability and accuracy. In Section 4, the important classification techniques are described. Section 5 presents the proposed model and its prototype with details of its phases such as pre-processing, normalization, classifier selections, features selection, and post-processing. Section 6 discusses the results, and finally, Section 7 concludes the paper indicating possible future work.

Section snippets

Related work

The first IDS ever recorded was based on research conducted by Dorothy E. Denning under the SRI International [5]. It gave way to the solution known as the intrusion detection expert system. To detect known intrusion types, it implements a dual approach that uses a rule-based expert system. Additionally, it utilizes a statistical anomaly detection component that has its basis on host systems, user profiles, and target systems. Later on, a new version known as the next-generation intrusion

Confusion matrix

As seen in Table 1, a confusion matrix is used to represent the information related to the actual and predicted classifications performed by the classification system.

The accuracy (AC) = total number of correct predictions.AC=a+da+b+c+d

The true positive rate (TP) = correctly identified positive casesTP=dc+d

The false positive rate (FP) = negative cases that have been incorrectly classified as positiveFP=ba+b

The true negative rate (TN) = negative cases that were correctly classifiedTN=aa+b

The false

Classification techniques

Classification is a type of data mining method and is just one of the many classification algorithms currently in use. It works in a manner that may be similar to other techniques, such as decision trees and neural networks. To make its prediction, these techniques use several ways to analyse the available data [33].

  • Decision tree: This technique involves the division of the classification problem into several sub-problems. It involves the creation of a decision tree, which can then be utilized

Functionality overview of proposed model

The following steps are involved in developing an effective intrusion detection hybrid model that has higher accuracy and performance:

1. Choosing a proper dataset that has quality data such as NSL KDD. Further details about NSL KDD dataset is found in Section 5.1.

2. Apportioning the dataset into 20% test and 80% train for the purpose of the experiment. Further detail is found in Section 5.2.

3. The pre-processing phase. This phase allows the reduction or elimination of the noise forced on the

Experimental results and analysis

This section will present the experiment setup and the analysis of results. Subsection 6.1 explains the experimental setup and vote model while Subsection 6.2 presents the results and analysis.

Conclusion and future work

Results from the analysis of the NSL-KDD dataset revealed that it is the top candidate data set that can be used to test and simulate IDS performance. The proposed hybrid model for dimensionality reduction improves the accuracy rate and reduces the detection time. The analysis performed on the NSL-KDD dataset through the help of tables and figures has allowed the researcher to gain a clearer dataset understanding. It also shows that majority of attacks are done using the TCP protocol’s inherent

Acknowledgement

This research was supported, in part, by Zayed University Research Office, Research Incentive Grant, R15121.

Shadi Aljawarneh is an associate professor, Software Engineering, at the Jordan University of Science and Technology, Jordan. He holds a BSc degree in Computer Science from Jordan Yarmouk University, a MSc degree in Information Technology from Western Sydney University and a PhD in Software Engineering from Northumbria University-England. He worked as an associate professor in faculty of IT in Isra University, Jordan since 2008. His research is centered in software engineering, web and network

References (42)

  • J. McHugh

    Testing Intrusion detection systems: a [33] critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory

    ACM Trans. Inf. Syst. Secur.

    (2000)
  • Fangjun Kuang et al.

    A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection

    Soft Comput.

    (2015)
  • M. Aldwairi et al.

    Application of artificial bee colony for intrusion detection systems

    Sec. Commun. Netw.

    (2015)
    I. Ahmad et al.

    Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components

    Neural Comput.

    (2014)
  • Chun Guo et al.

    A distance sum-based hybrid method for intrusion detection

    Appl. Intell.

    (2014)
  • Saurabh Mukherjee et al.

    Intrusion detection using naive bayes classifier with feature reduction, in: proceedings in 2nd International Conference on Computer, Communication, Control and Information Technology, C3IT-2012

    Procedia Technol.

    (2012)
  • M. Crosbie et al.

    Defending a Computer System Using Autonomous Agents, Technical Report 95-022

    (1994)
  • F. Hosseinpour et al.

    Distributed agent based model for intrusion detection system based on artificial immune system

    Int. J. Digit. Content Technol. Appl.

    (2013)
  • N. Afzali et al.

    MAIS-IDS: a distributed intrusion detection system using multi-agent AIS approach

    Eng. Appl. Artif. Intell.

    (2014)
  • J. Singh et al.

    A survey on machine learning techniques for intrusion detection systems

    Int. J. Adv. Res. Comput. Commun. Eng.

    (2013)
  • S.K. Wagh

    Survey on intrusion detection system using machine learning techniques

    Int. J. Comput. Appl.

    (2013)
  • C. Qiu, J. Shan, B. Polytechnic, B. Shandong, Research on Intrusion Detection Algorithm Based on BP Neural Network,...
  • Cited by (469)

    View all citing articles on Scopus

    Shadi Aljawarneh is an associate professor, Software Engineering, at the Jordan University of Science and Technology, Jordan. He holds a BSc degree in Computer Science from Jordan Yarmouk University, a MSc degree in Information Technology from Western Sydney University and a PhD in Software Engineering from Northumbria University-England. He worked as an associate professor in faculty of IT in Isra University, Jordan since 2008. His research is centered in software engineering, web and network security, e-learning, bioinformatics, Cloud Computing and ICT fields. Aljawarneh has presented at and been on the organizing committees for a number of international conferences and is a board member of the International Community for ACM, Jordan ACM Chapter, ACS, and IEEE. A number of his papers have been selected as “Best Papers” in conferences and journals.

    Monther Aldwairi is an associate professor at the College of Technological Innovation at Zayed University since the fall of 2014. He received his B.S. in electrical engineering from Jordan University of Science and University (JUST) in 1998, and his M.S. and PhD in computer engineering from North Carolina State University (NCSU), Raleigh, NC, in 2001 and 2006, respectively. Prior to joining ZU, he was an Assistant and then Associate Professor of Computer Engineering at Jordan University of Science and Technology. He served as the Vice Dean of the Faculty of Computer and Information Technology from 2010 to 2012 and was the Assistant Dean for Student Affairs in 2009. In addition, he was an Adjunct Professor at New York Institute of Technology (NYiT) from 2009 to 2012. He worked at NCSU as Post-Doctoral Research Associate in 2007 and as a research assistant from 2001 to 2006. He worked as a system integration engineer for ARAMEX from 1998 to 2000. Dr. Aldwairi’s research interests are in information, network and web security, intrusion detection, digital forensics, cloud computing, reconfigurable architectures, artificial intelligence and pattern matching.

    Dr. Muneer Masadeh Bani Yassein received his B.Sc. degree in Computing Science and Mathematics from Yarmouk University, Jordan in 1985 and M. Sc. in Computer Science, from Al Al-bayt, University, Jordan in 2001. And PhD degrees in Computer Science from the University of Glasgow, U.K., in 2007, He is currently an associate professor in the Department of Computer science at Jordan University of Science and Technology (JUST), Muneer served as Chairman of the department of Computer science from 2008 to 2010, as Vice Dean of the Faculty of Computer and Information Technology from 2010 to 2012, and from2013-2014. Muneer is currently conducting research in Mobile Ad hoc Networks, Wireless sensors Networks, Cloud Computing, simulation and modelling, development/analysis of the performance probabilistic flooding behaviours in MANET, optimizations and the refinement of service discovery and routing algorithms for mobile device communications in heterogeneous network environments. Bani Yassein has published over 90 technical papers in well reputed international journals and conferences. During his career, he has supervised more than 50 graduate and undergraduate students. Dr. Bani Yassein is member of IEEE and he is a member of the technical programs of several journals and conferences.

    View full text