Automatic network intrusion detection: Current techniques and open issues

doi:10.1016/j.compeleceng.2012.05.013

Computers & Electrical Engineering

Volume 38, Issue 5, September 2012, Pages 1062-1072

https://doi.org/10.1016/j.compeleceng.2012.05.013 Get rights and content

Abstract

Automatic network intrusion detection has been an important research topic for the last 20 years. In that time, approaches based on signatures describing intrusive behavior have become the de-facto industry standard. Alternatively, other novel techniques have been used for improving automation of the intrusion detection process. In this regard, statistical methods, machine learning and data mining techniques have been proposed arguing higher automation capabilities than signature-based approaches. However, the majority of these novel techniques have never been deployed on real-life scenarios. The fact is that signature-based still is the most widely used strategy for automatic intrusion detection. In the present article we survey the most relevant works in the field of automatic network intrusion detection. In contrast to previous surveys, our analysis considers several features required for truly deploying each one of the reviewed approaches. This wider perspective can help us to identify the possible causes behind the lack of acceptance of novel techniques by network security experts.

Graphical abstract

Highlights

► This document reviews the most relevant techniques applied to intrusion detection. ► Techniques aim at providing better detection capabilities in a more automatic way. ► Those techniques claiming high accuracy are not easily deployable in real life. ► The assumptions in which these techniques rely on still need a lot of expert work. ► Efforts should be directed to reduce the need of human interaction in the process.

Introduction

A network intrusion detection system (NIDS) is the software tool that automates the network intrusion detection process. From an architectural point of view a NIDS can be analyzed from several angles (i.e. traffic capture process, system location, appropriate measures selection, among others). However, from a more simplified point of view, intrusion detection can be seen just as a classification problem in which a given network traffic event is assigned as normal or intrusive.

In the past 20 years, several techniques have been proposed to address the embedded classification problem inside NIDS. Perhaps the most successful approach has been the one based on pattern signatures describing known attacks behavior [1]. Under this approach, a malicious event is detected when some monitored event matches against a signature pattern. Despite signature-based NIDS are considered the de facto standard, they face the problem of needing a new set of signature patterns each time a new attack emerges. In addition, signatures describing such attacks have to be written by experts, which are not always available. In other words, the signature-based approach has failed in providing the level of automation required by security staff members.

Alternatively, techniques including statistical methods, machine learning and data mining methods have been proposed as a way of dealing with some of the issues regarding signature based-approaches. Such techniques aim at facilitating the work of the network security staff, providing a higher automation in the intrusion detection process along with good detection capabilities. Despite the success in obtaining high accuracy levels, most of these techniques have actually not been deployed in real-life scenarios. This situation suggests that accuracy is not the only goal in the pursuit of automatic intrusion detection.

The present work reviews the most relevant network intrusion detection techniques for wired networks, putting special emphasis on the embedded classification problem. However, in opposition to previous surveys on this field, analysis is performed considering not only accuracy results but also other features required for implementing the discussed techniques in real-life scenarios.

The rest of this work is organized as follows: Section 2 provides background information about the intrusion detection problem, including attack definitions, a taxonomy and a simplified NIDS architecture. Then, in Section 3, the most relevant approaches applied to intrusion detection are reviewed and compared based on the taxonomy along with common measures related to NIDS. Section 4 remarks the remaining open issues, which aim to explain why all except the signature-based approach are not being deployed on current networks. Finally, concluding remarks are provided in Section 5.

Section snippets

Background

Before discussing the most relevant approaches to NIDS, we proceed to describe the fundamental elements inside the intrusion detection problem.

Intrusion detection approaches

Because of the large number of works presented during the past years for both misuse and anomaly detection, it is convenient to group them according to the techniques used by each one of them. In this sense, we rely on the categorization proposed by Patcha and Park [5] and Lazarevic et al. [3].

Remaining open issues

The majority of the previously discussed works focus on the classification problem behind intrusion detection. If we considered the extremely precise results obtained by some approaches, we would say that the detection problem is near to be solved. Then, we should ask why none beyond pattern signature-based approach it is currently being used by network administrators. The fact is that previously analyzed works only cope with a subset of the problems that are essential to truly achieving

Conclusions

Several approaches have been proposed during the last 20 years of research in the intrusion detection field. All of these approaches aimed to facilitate the work of the network security staff providing some level of automation in the intrusion detection process. Certainly, such task cannot be considered easy since the non-stationary behavior of network traffic along with the permanent growth of the network throughput.

Nowadays, NIDS most successful approaches are those based on pattern signatures

Carlos Catania received the BS degree in information systems from Universidad Champagnat, Argentina, in 2004 and the M.Sc. degree in networking from Universidad de Mendoza, Argentina in 2007. He is presently pursuing the PhD degree in computer sciences at UNICEN, Tandil, Argentina. His research interests include Internet security and distributed computing systems.

References (72)

A. Patcha et al.
Network anomaly detection with incomplete audit data
Comput Netw
(2007)
G. Liu et al.
A hierarchical intrusion detection model based on the PCA neural networks
Neurocomputing
(2007)
Z. Bankovic et al.
Improving network security using genetic algorithm approach
Comput Electr Eng
(2007)
M.S. Abadeh et al.
Design and analysis of genetic fuzzy systems for intrusion detection in computer networks
Expert Syst Appl
(2011)
G. Giacinto et al.
Fusion of multiple classifiers for intrusion detection in computer networks
Pattern Recognit Lett
(2003)
W. Cohen
Fast effective rule induction
M.-Y. Su
Using clustering to improve the knn-based classifiers for online anomaly network traffic identification
J Netw Comput Appl
(2011)
G. Giacinto et al.
Intrusion detection in computer networks by a modular ensemble of one-class classifiers
Inform Fusion
(2008)
V. Paxson
Bro: a system for detecting network intruders in real-time
Comput Netw
(1999)
M.A. AydIn et al.
A hybrid intrusion detection system design for computer network security
Comput Electr Eng
(2009)

P. Garcia-Teodoro et al.

Anomaly-based network intrusion detection: techniques, systems and challenges

Comput Secur

(2009)

R.P. Lippmann et al.

Improving intrusion detection performance using keyword selection and neural networks

Comput Netw

(2000)

R. Lippmann et al.

The 1999 darpa off-line intrusion detection evaluation

Comput Netw

(2000)

D. Zagar et al.

Security aspects in ipv6 networks – implementation and testing

Comput Electr Eng

(2007)

A. Shiravi et al.

Toward developing a systematic approach to generate benchmark datasets for intrusion detection

Comput Secur

(2012)

M. Roesch

SNORT – lightweight intrusion detection for networks

Kendall K. A database of computer attacks for the evaluation of intrusion detection systems. Master’s thesis,...

A. Lazarevic et al.

Intrusion detection: a survey

B. Mukherjee et al.

Network intrusion detection

Netw IEEE

(1994)

Lindqvist U, Porras P. Detecting computer and network misuse through the production-based expert system toolset...

Porras PA, Neumann PG. EMERALD: event monitoring enabling responses to anomalous live disturbances. In: Proceedings of...

W. Lee et al.

Data mining approaches for intrusion detection

Cannady J. Artificial neural networks for misuse detection. In: National information systems security conference,...

P.A.R. Kumar et al.

Distributed denial of service attack detection using an ensemble of neural classifier

Comput Commun

(2011)

I. Ahmad et al.

Artificial neural network approaches to intrusion detection: a review

Q. Xu et al.

An intrusion detection approach based on understandable neural network trees

Int J Comput Sci Netw Secur

(2006)

A. Abraham et al.

Evolving intrusion detection systems

Li W. Using genetic algorithm for network intrusion detection. In: Proceedings of the United States department of...

Gong RH, Zulkernine M, Abolmaesumi P. A software implementation of a genetic algorithm based approach to network...

Vollmer T, Alves-Foss J, Manic M. Autonomous rule creation for intrusion detection. In: IEEE Symposium on computational...

J. Gomez et al.

Evolving fuzzy classifiers for intrusion detection

Bridges S, Vaughn R. Fuzzy data mining and genetic algorithms applied to intrusion detection. In: Proceedings of the...

Chen C, Mabu S, Yue C, Shimada K, Hirasawa K. Analysis of fuzzy class association rule mining based on genetic network...

Luo J. Integrating fuzzy logic with data mining methods for intrusion detection. Master’s thesis, Department of...

Florez G, Bridges S, Vaughn R. An improved algorithm for fuzzy data mining for intrusion detection. In: Fuzzy...

Ye N, Li X, Emran S. Decision tree for signature recognition and state classification. In: Proceedings of IEEE systems,...

Cited by (103)

Datasets are not enough: Challenges in labeling network traffic
2022, Computers and Security
Citation Excerpt :
The fact is that much of the analysis and labeling of network traffic is still performed manually: with an expert user observing the network traces (Díaz-Verdejo et al., 2020; Huang et al., 2020). As mentioned by Catania and Garino (2012), Sommer and Paxson (2010), such a situation could be a definite obstacle for the massive adoption of SNIDS in the network security field. The present document provides an extensive review of the works presenting methodological strategies for generating accurate and representative labels for network security datasets.
In contrast to previous surveys, the present work is not focused on reviewing the datasets used in the network security field. The fact is that many of the available public labeled datasets represent the network behavior just for a particular time period. Given the rate of change in malicious behavior and the serious challenge to label, and maintain these datasets, they become quickly obsolete. Therefore, this work is focused on the analysis of current labeling methodologies applied to network-based data. In the field of network security, the process of labeling a representative network traffic dataset is particularly challenging and costly since very specialized knowledge is required to classify network traces. Consequently, most of the current traffic labeling methods are based on the automatic generation of synthetic network traces, which hides many of the essential aspects necessary for a correct differentiation between normal and malicious behavior. Alternatively, a few other methods incorporate non-experts users in the labeling process of real traffic with the help of visual and statistical tools. However, after conducting an in-depth analysis, it seems that all current methods for labeling suffer from fundamental drawbacks regarding the quality, volume, and speed of the resulting dataset. This lack of consistent methods for continuously generating a representative dataset with an accurate and validated methodology must be addressed by the network security research community. Moreover, a consistent label methodology is a fundamental condition for helping in the acceptance of novel detection approaches based on statistical and machine learning techniques.
Human-guided auto-labeling for network traffic data: The GELM approach
2022, Neural Networks
Citation Excerpt :
More recent works combine a visualization component with the AL labeling strategy (Beaugnon et al., 2017; Fan, Li, Yuan, Dong, & Liang, 2019; Yang, Ma, Nie, Chang, & Hauptmann, 2015). Other techniques considering handling and labeling real network traffic are visualization (Guerra, Veas, & Catania, 2019; Torres, Catania, & Veas, 2019), statistical learning (Sharafaldin et al., 2018), and machine learning techniques (Banerjee et al., 2020; Buchanan et al., 2021; Buczak & Guven, 2016; Catania & Garino, 2012; Zhang et al., 2020). However, the effect of such methods on the labeling process is still unclear.
Data labeling is crucial in various areas, including network security, and a prerequisite for applying statistical-based classification and supervised learning techniques. Therefore, developing labeling methods that ensure good performance is important. We propose a human-guided auto-labeling algorithm involving the self-supervised learning concept, with the purpose of labeling data quickly, accurately, and consistently. It consists of three processes: auto-labeling, validation, and update. A labeling scheme is proposed by considering weighted features in the auto-labeling, while the generalized extreme learning machine (GELM) enabling fast training is applied to validate assigned labels. Two different approaches are considered in the update to label new data to investigate labeling speed and accuracy. We experiment to verify the suitability and accuracy of the algorithm for network traffic, applying the algorithm to five traffic datasets, some including distributed denial of service (DDoS), DoS, BruteForce, and PortScan attacks. Numerical results show the algorithm labels unlabeled datasets quickly, accurately, and consistently and the GELM’s learning speed enables labeling data in real-time. It also shows that the performances between auto- and conventional labels are nearly identical on datasets containing only DDoS attacks, which implies the algorithm is quite suitable for such datasets. However, the performance differences between the two labels are not negligible on datasets, including various attacks. Several reasons that require further investigation can be considered, including the selected features and the reliability of conventional labels. Even with this limitation of the current study, the algorithm will provide a criterion for labeling data in real-time occurring in many areas.
Robust adaptive multivariate Hotelling's T<sup>2</sup> control chart based on kernel density estimation for intrusion detection system
2020, Expert Systems with Applications
Citation Excerpt :
In contrast to other approaches, the SPC has the advantage which does not require knowledge of the attack that never happened before. Also, SDI-based SPC can guarantee the attack detection process in real time (Catania & Garino, 2012). A multivariate Control chart is one of the SPC methods that has been widely utilized in network intrusion detection.
The utilization of conventional multivariate control chart in network intrusion detection will deal with two main problems. First, the high false alarm occurs due to the distribution of network traffic data that is not following the theory. Second, the inability of the control chart to detect outliers caused by the masking effect. To overcome these problems, the multivariate control chart based on the fast minimum covariance determinant (MCD) algorithm and kernel density estimation (KDE) is proposed in this paper. The employment of KDE technique is expected to adaptively follow the network traffic data pattern, thereby reducing the occurrence of false alarms. Meanwhile, the usage of Fast-MCD will improve the capabilities of the proposed control chart to quickly and accurately detect the outliers. For the simulated data, the proposed chart shows a better level of accuracy when it is compared to conventional T² and other robust T² based on successive difference covariate matrix (SDSM) charts. For the data generated from some distributions, the proposed chart shows its adaptability by producing low false alarm with high detection rate. The proposed chart shows excellent performance to monitor the KDD99 dataset with 98.61% accuracy, NSL-KDD dataset with 91.71% accuracy, and UNSW-NB 15 dataset with 91.02% accuracy. The proposed method has consistent performance when monitoring the small subset of the datasets, which can minimize the computational time by more than 90% without decreasing its level of accuracy and precision. Also, the performance from the proposed chart surpasses the other benchmarks.
IoT botnet detection via power consumption modeling
2020, Smart Health
Many IoT botnets that exploit vulnerabilities of IoT devices have emerged recently. After taking over control of IoT devices, the botnets generate tremendous traffic to attack target nodes. It is also a threat to the smart health area since they have used IoT devices more and more. To detect the malicious IoT botnets, many researchers have proposed botnet detection systems; however, these are not easily applicable to resource-constrained IoT devices. Moreover, since the botnet's early stage makes marginal differences in terms of traffic, it is hard to detect when they first attack the victim nodes. However, we observe that the IoT botnets generate distinguishable power consumption patterns. Thus, we aim to classify whether the IoT device is affected by malign behaviors through power consumption patterns so that we can protect the healthcare ecosystem from the malicious IoT botnets.
We propose a CNN-based deep learning model that consists of a data processing module as well as an 8-layer CNN. Prior to applying the CNN model, we segment and normalize the collected power consumption data to help our CNN model to achieve higher accuracy. The 8-layer CNN classifies the processed data into four classes including a botnet class, which is our primary target. To demonstrate the performance, we run self-evaluation, cross-device-evaluation, leave-one-device-out, and leave-one-botnet-out tests on three common types of IoT devices, which are Security Camera, Router, and Voice Assistant devices. The self-tests achieve up to 96.5% classification accuracy whereas the cross-evaluation tests perform about 90% accuracy. Leave-one-out tests also introduce higher than 90% accuracy for botnet detection.
Flow-based network traffic generation using Generative Adversarial Networks
2019, Computers and Security
Citation Excerpt :
Recently, Buczak and Guven (2016) presented an overview of the community effort with regard to this issue. However, there are still open challenges (e.g., the high cost of false-positives or the lack of labeled data sets which are publicly available) for the successful use of data mining algorithms for anomaly-based intrusion detection (Catania and Garino, 2012; Sommer and Paxson, 2010). In this work, we focus on a specific challenge within that setting.
Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.
Intrusion Detection System Using Machine Learning
2024, Lecture Notes in Networks and Systems

View all citing articles on Scopus

Carlos Garcia Garino graduated in engineering at University of Buenos Aires in 1978 and received a Ph.D. degree from UPC, Barcelona, Spain in 1993. Currently he is Full Professor at the School of Engineering and Head of the ITIC Research Institute of UNCuyo, Argentina. His research interests include Computational Mechanics, Computer Networks, and Distributed Computing. He has more than 50 papers published in scientific journals and proceedings of international conferences.

^☆: Reviews processed and proposed for publication to Editor-in-Chief by Guest Editor Dr. Gregorio Martinez.

View full text

Automatic network intrusion detection: Current techniques and open issues☆

Abstract

Graphical abstract

Highlights

Introduction

Section snippets

Background

Intrusion detection approaches

Remaining open issues

Conclusions

Comput Netw

Neurocomputing

Comput Electr Eng

Expert Syst Appl

Pattern Recognit Lett

J Netw Comput Appl

Inform Fusion

Comput Netw

Comput Electr Eng

Comput Secur

Comput Netw

Comput Netw

Comput Electr Eng

Comput Secur

SNORT – lightweight intrusion detection for networks

Intrusion detection: a survey

Network intrusion detection

Netw IEEE

Data mining approaches for intrusion detection

Distributed denial of service attack detection using an ensemble of neural classifier

Comput Commun

Artificial neural network approaches to intrusion detection: a review

An intrusion detection approach based on understandable neural network trees

Int J Comput Sci Netw Secur

Evolving intrusion detection systems

Evolving fuzzy classifiers for intrusion detection