A novel architecture combined with optimal parameters for back propagation neural networks applied to anomaly network intrusion detection
Introduction
Nowadays, network security is a world hot topic in computer security and defense. Intrusions, attacks or anomalies in network infrastructures lead mostly in great financial losses, massive sensitive data leaks, thereby decreasing efficiency and the quality of productivity of an organization. The well-known internet security corporations, Symantec and McAfee in the annual Internet Security Threat Report (ISTR) (Symantec Enterprise, 2016) and McAfee Labs 2017 Threat Predictions Report (MacAfee Enterprise, 2017) respectively, state that cybercrime remains widespread and damaging threats from cybercriminals continue to loom over consumers and businesses. SonicWall, originally a private company headquartered in San Jose, California, and Dell subsidiary from 2012 to 2016, highlights in its 2017 SonicWall Annual Threat Report (SonicWall Entreprise, 2017), that 2016 was a highly active attack year. Ransomware alone increased by well over 100×. From the Internet of Things to mobile devices and even virtual worlds, cybercriminals are increasingly aggressive in their stealth strategies. These reports indicate that network security should not be ignored and effective security measures are much needed, in order to prevent unauthorized access, destruction, theft or damage to information system in an organization.
Among the important ways to solve security problems, network intrusion detection system (NIDS) is an effective countermeasure and high-profile method, which is widely deployed in network architectures in order to watch for abnormalities in traffic, to raise alarm and eventually respond to any malicious or suspicious activity in the network.
Network-based intrusion detection is generally implemented using two techniques; misuse-based (also called rule-based) and anomaly-based. Misuse-based detection looks for specific patterns (or intrusion signatures provided in terms of rules) in the data to effectively detect previously known intrusions. Snort is a broadly used rule-based NIDS that can detect intrusions based on previously known intrusion signature patterns. The misuse-based approach usually does not generate a high number of false alarms since it is based on rules that identify recognized attacks (Gogoi et al., 2014), but fail to detect new attacks (zero-day attacks) whose signatures have not been saved previously in the database. Further, this approach is burdensome when signatures of intrusions need to be updated frequently to maintain the performance of misuse detection. On the other hand, the anomaly detection builds a model from normal behaviors and any deviation from the normal model is deemed to be an outlier/attack. This technique usually deals with statistical analysis and data mining problems, which are able to detect novel attacks without prior knowledge since the classification model has the generalization ability to extract intrusion pattern and knowledge during the training phase (Aminanto et al., 2017).
In this paper, we focus on the anomaly detection, because theoretically, it is capable of detecting both known and new unseen attacks, and under the current complicated network environment, the anomaly detection is much more required and has a better application foreground. On the negative side, anomaly detection can detect normal new packets as attack and vice versa, thus raising false alarm rate.
Hence our goal is to build an effective anomaly network intrusion detection system using Back Propagation Neural Network (BPNN) classifier, based on a novel architecture and an optimal set of parameter values included in construction of that classifier or impacting its performance, in order to minimize false positive rate and obtain higher accuracy and detection rate for anomaly-based detection.
This paper is organized as follows: section 2 introduces previous research works. Section 3 explains the background of this paper such as IDS, BPNN, activation functions, KDD CUP ‘99 Dataset, Feature Selection and Normalization. Section 4 describes our proposed approach and an overview of our implementation, experimental results and analysis are given in section 5. Finally we conclude in section 6.
To achieve our goal of building an optimal Anomaly Network Intrusion Detection System (ANIDS) based on Back Propagation Neural Network (BPNN), we have adopted an approach consisting of mainly five stages. In the first two stages, we have studied several deeply related works (Chandrashekhar, Raghuveer, 2014, Gaidhane et al, 2014, Ganeshkumar, Pandeeswari, 2016, Ghosh et al, 2015, Kumar, Yadav, 2014, Lokeswari, Rao, 2016, Modi, Patel, 2013, Mukhopadhyay et al, 2011, Sen et al, 2014, Sen et al, 2015, Shah, Trivedi, 2012).
The first stage was focused on the determination of the most relevant parameters employed to construct that type of classifier or that impact its performance. At the end of our study, we have concluded that the most important parameters are:
- •
The number of selected features/attributes;
- •
Normalization of data;
- •
Architecture of NN, specifically the number of nodes in the hidden layer;
- •
Activation function;
- •
Learning rate;
- •
Momentum term.
The second stage consists of a comparison of those works in order to select for each parameter cited above between two and four relevant values which have given the best results in terms of intrusion detection.
For the number of features or attributes of KDD dataset, the work of Sathya et al. (2011) demonstrates that the classification on 41 features of KDD dataset decreases detection accuracy and speed. Also, in Sathya et al. (2011) it is proved that only a subset of the features of KDD dataset is relevant to each type of attack. If we use non-relevant features in classification, they affect the overall detection accuracy (Modi and Patel, 2013).Therefore, we have selected only three sets of features already used and provided good results. The first set is 12 features selected based on a modified Kolmogorov–Smirnov Correlation Based Filter Algorithm (Lokeswari and Rao, 2016). Whereas, the set of 17 features is chosen based on Information Gain Feature Selection Algorithm (Modi and Patel, 2013), and the set of 34 features is the numeric attributes among 41 features of KDD dataset (Ganeshkumar and Pandeeswari, 2016). In our approach we have not chosen to explore the combinatorial mix of attributes as potential attributes, it is another approach.
For the architecture of the NN of an IDS, especially the rule used to calculate the number of nodes in the hidden layer of NN of the IDS; as discussed in section 1.2, if we have two IDSs having the same number of nodes in both the input layer and the output layer, the number of nodes in the hidden layer is what distinguishes their architectures designed according to Feed-Forward way. Likewise, we have applied the same principle adopted in our approach which consists of a selection of two or three best values of each parameter. So, for calculating the number of nodes in NN of our IDS, we have chosen three popular and widely methods used by researchers, that is to say, Arithmetic Mean method (Gaidhane et al., 2014) (H = (Input + Output)/2), Rules of Thumb (Karsoliya, 2012) (H = Input*70% or H = Input*90%). In addition, we have applied a novel rule to calculate the number of nodes of the hidden layer of an ANN defined as H = 075*Input + Output (H <2*Input). The origin of this rule is detailed in the section 1.2. This rule was used in Shahamiri and Salim (2014) to build a neural network for an Automatic Speech Recognition (ASR) system, but as we know, it is not exploited previously in an anomaly-based intrusion detection system. So, we thought is it interesting to apply this novel rule in the building of novel neural network architecture of our IDSs and compare its performance with the most relevant rules for determination of the number of nodes for the hidden layer of NN.
For the methods of normalization of data, in Wang et al. (2009), it is demonstrated that the mean range [0,1] (Min max normalization) and statistical normalization (Z-Score normalization) are the best schemes of attribute normalization to preprocess the data for anomaly intrusion detection. Thus, we have selected those two techniques.
Concerning the activation function parameter, as it is mentioned in section 3.3, there are several activation functions, but in our survey of ANIDS, we noticed that the widely employed activation functions by researchers to build the IDSs are Sigmoid and Hyperbolic tangent functions which give better performance than the other functions in terms of intrusion detection. This is the reason for our choice of these two functions.
Finally, for learning rate and momentum term, we choose to set their values at 0.01. Our future work will focus on the search for the best values of those parameters.
The third stage of our approach is to generate all possible combinations of the different values of those parameters. Table 1 shows those parameters and their different values. The total number of those combinations is computed as follows:
The number of combinations is equal to the number of different values of sets of attributes (=3) multiplied by the number of different methods for normalization of attributes (=2) multiplied by the number of various activation functions selected (=2) and multiplied by the number of different rules for calculating the number of nodes in the hidden layer (=4).
The number of combinations = 3*2*2*4 = 48 combinations
The fourth stage consists of construction of 48 IDSs corresponding to 48 combinations generated previously. Each combination will serve as configuration of an instance of an IDS.
Finally, in the fifth stage, we have compared the performance of the IDSs built and selected the two best IDSs. To do that, we have used the performance measurements which are described in the section 5.1, namely True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), False Negative Rate (FNR), Accuracy (ACC), Precision, F-score, AUC (ability to avoid false classification) and the average time to classify a connection instance.
Automatic Speech Recognition (ASR) is a technology for identifying uttered word(s) represented as an acoustic signal. An ASR system relies on a given lexicon and prior knowledge of a problem domain to recognize spoken word(s). This system usually consists of two major components: a feature extractor and a classifier. Feature extractor is used to convert raw input into a form that is easily classifiable; this is a common place to incorporate classifiers such as Artificial Neural Networks (ANNs). ASR has several applications in voice-enabled control systems such as those implemented in health care, military, telephony and other domains. Nonetheless, speech recognizers are generally unable to show performance equivalent to that of human level under realistic conditions (i.e. noisy conditions). Although most of the recent speech recognizers possess high recognition rates in the lab, their performance in real-life applications under noisy environments remain unsatisfactory. In 2014, S.R. Shahamiri and S.S.B. Binti Salim (2014) have studied the application of Multi-Nets Artificial Neural Networks (M-NANNs), a realization of multiple-views multiple-learners (MVML) approach, as Multi-Networks Speech Recognizers (M-NSRs) in providing a real-time, frequency-based noise-robust ASR model without requiring noise preprocessing or post-processing. M-NSRs define speech features associated with each word as a different view and apply a standalone ANN as one of the learners to approximate that view; meanwhile, multiple-views single-learner (MVSL) ANN-based speech recognizers employ only one ANN to memorize the features of the entire vocabulary. In their research, the size of the input and output layers of all the M-NSR ANNs is fixed at 390 input neurons and only one output neuron. The output layer of these ANNs has only one neuron because each ANN should learn only one of the views. In their experiments, all of the networks were MLPs with one hidden layer. The following rule was considered to select the number of hidden neurons:where H is the number of hidden neurons, I is the number of input neurons and O is the number of output neurons.
The results indicate that the M-NSR delivered better performance than the MVSL ASR system. In particular, the M-NSR recorded improved average recognition rate by 13.43% and reduced average value of NRMSE by 2.65% in the experiments conducted by authors with noisy data. (Fig. 1)
As we know, the rule in Equation (1) was never used in the field of Intrusion Detection Systems to build a neural network for an IDS. In our approach, we have adopted this rule to construct new architectures of neural networks for our IDSs and compared their performance with other IDSs based on other architectures established based on popular rules such as Arithmetic Mean approach and Rules of Thumb. Since, in a neural network (NN), the number of nodes in the input layer depends on the number of selected features of an instance of network packet, and the number of nodes of the output layer is one (the value of 0 corresponds to an attack while the value of 1 corresponds to a legitimate traffic), so, the rule used to calculate the number of nodes of hidden layer determines the architecture of the neural network. In conclusion, according to the rule employed for calculating the number of nodes of the hidden layer we will have different architectures of neural networks.
Section snippets
Related works
The research on intrusion detection began from Anderson's literature (Anderson, 1980). In Anderson (1980), the author developed a model established from statistics of users' normal behaviors, so as to find the “masquerader” that deviates from the generated normal model, which laid the foundation of intrusion detection system based on anomaly approach. Later, many research efforts on anomaly detection have been carried out using various techniques.
Rinku Sen et al. (2015) have developed a
Preliminaries
In this section we describe general overview of related terms such as Intrusion Detection System, Back Propagation Neural Network, Activation functions, KDD CUP ‘99 Dataset, Feature Selection, Data Preprocessing and Normalization.
Proposed work
This section describes the details of our approach and gives the model of our proposed IDS.
Experiments and results
This section is divided into three subsections, first one describes the performance measurements used for evaluating and comparison of our IDSs generated, the second gives an overview of our implementation and the third shows experimental results and analyzes them to select the two best IDSs. At end of last subsection, we have compared the performance of our two best IDSs to other related works.
Conclusion and future work
Since the aim of this research was to build an effective Anomaly Network Intrusion Detection System (ANIDS) based on Back Propagation Neural Network (BPNN) using Back Propagation Learning Algorithm, which yields higher accuracy, higher detection rate and low false positive rate, the comparative results with other works show that we have succeeded in achieving our objective. The two keys of our success are; first, we have adopted a novel architecture of neural network, which is given by a new
Acknowledgement
We would like to thank all members of LIMSAD Labs for their help and support.
Zouhair Chiba is a Ph.D. Student at LIMSAD Labs within Faculty of Sciences, Hassan II University of Casablanca (Morocco). He had a Master in Computer and Internet Engineering in 2013, and a Bachelor of Mathematical Sciences. His research interests are in the area of Security, Big Data on Cloud Infrastructures, Computer Networks, Mobile Computing and Distributed Systems. E-mail: [email protected].
References (58)
- et al.
An activation function adapting training algorithm for sigmoidal feedforward networks
Neurocomputing
(2004) - et al.
Swarm intelligence in intrusion detection: a survey
Comput Secur
(2011) - et al.
Real-time frequency-based noise-robust automatic speech recognition using multi-nets artificial neural networks: a multi-views multi-learners approach
Neurocomputing
(2014) - et al.
A class+1 sigmoidal activation functions for FFANNs
J Econ Dyn Control
(2003) - et al.
A systematic analysis of performance measures for classification tasks
Inform Process Manage
(2009) - et al.
The use of computational intelligence in intrusion detection systems: a review
Appl Soft Comput
(2010) - et al.
False positive reduction in intrusion detection system: a survey
- et al.
Feature selection for intrusion detection system using ant colony optimization
IJ Netw Secur
(2016) - et al.
Applying Hopfield artificial network and simulating annealing for cloud intrusion detection
J Inform Secur Res
(2015) - et al.
Intrusion detection system based on modified k-means and multi-level support vector machines
Another fuzzy anomaly detection system based on ant clustering algorithm
Computer security threat monitoring and surveillance (Vol. 17)
A hybrid method consisting of GA and SVM for intrusion detection system
Neural Comput Appl
A new classification process for network anomaly detection based on negative selection mechanism
False positives reduction in intrusion detection systems using alert correlation and data mining techniques
Int J Adv Res Comput Sci Softw Eng
Improvising an intrusion detection precision of ANN based hybrid NIDS by incorporating various data normalization techniques–a performance appraisal
IJREAT Int J Res Eng Adv Technol
A survey of intrusion detection systems for cloud computing environment
Performance analysis of intrusion detection system using various neural network classifiers
Artificial intelligence (ai): intrusion analysis. Encyclopedia of information assurance
Intrusion detection and attack classification using back-propagation neural network
Int J Eng Res Technol
Adaptive neuro-fuzzy-based anomaly detection system in cloud
Int J Fuzzy Syst
An efficient cloud network intrusion detection system
MLH-IDS: a multi-level hybrid intrusion detection method
Comput J
Network intrusion detection using genetic algorithm and neural network
Design of a snort-based hybrid intrusion detection system
Intrusion detection model based on the improved neural network and expert system
An immediate system call sequence based approach for detecting malicious program executions in cloud environment
Wireless Person Commun
Neural networks: a comprehensive foundation
The application of genetic neural network in network intrusion detection
JCP
Cited by (96)
A framework for detection of cyber attacks by the classification of intrusion detection datasets
2024, Microprocessors and MicrosystemsInternet of things intrusion detection model and algorithm based on cloud computing and multi-feature extraction extreme learning machine
2023, Digital Communications and NetworksExtreme minority class detection in imbalanced data for network intrusion
2022, Computers and SecuritySecure Deep Learning in Defense in Deep-Learning-as-a-Service Computing Systems in Digital Twins
2024, IEEE Transactions on ComputersEfficient anomaly detection using deer hunting optimization algorithm via adaptive deep belief neural network in mobile network
2023, Journal of Ambient Intelligence and Humanized ComputingBehavior Oriented Design of Evaluation Function for Topic Crawler Search Algorithm
2023, ACM International Conference Proceeding Series
Zouhair Chiba is a Ph.D. Student at LIMSAD Labs within Faculty of Sciences, Hassan II University of Casablanca (Morocco). He had a Master in Computer and Internet Engineering in 2013, and a Bachelor of Mathematical Sciences. His research interests are in the area of Security, Big Data on Cloud Infrastructures, Computer Networks, Mobile Computing and Distributed Systems. E-mail: [email protected].
Noureddine Abghour is currently associate professor in the Faculty of Science of Hassan II University, Morocco. He received his Ph.D. degree from National Polytechnic Institute of Toulouse (France) in 2004. His research mainly deals with Security in Distributed Computing Systems. E-mail: [email protected].
Khalid Moussaid is recently appointed director of Computer Science, Modeling Systems and Decision Support Laboratory of the Hassan II University of Casablanca. He has a Ph.D. in Oriented Object Database; a Master in Computer Science and a Bachelor of Science in Applied Mathematics. He is interested in Optimization, Algorithmic and especially in the field of Big Data and Cloud Computing. E-mail: [email protected].
Amina El Omri is a professor of Higher Education in Computer Science at the Faculty of Sciences, University Hassan II Casablanca, Morocco. Her main scientific interests concern Algorithms, Optimization, Transport and the Logistic problems. She has participated with a lot of research papers in workshops and conferences, and published several journal articles. E-mail: [email protected].
Mohamed Rida is a professor in Computer Science at the Faculty of Sciences, University Hassan II Casablanca (Morocco) and a member of LIMSAD Labs within the same Faculty. He received his Ph.D. degree from University Hassan II Mohammadia in 2005, and his thesis subject was “Virtual Container Terminal: Design and Development of an object platform for the simulation of the operations of a container terminal”. His research area includes Transport, Geographic Information System and Big Data. E-mail: [email protected].