Abstract
Machine learning techniques are frequently applied to intrusion detection problems in various ways such as to classify normal and intrusive activities or to mine interesting intrusion patterns. Self-learning rule-based systems can relieve domain experts from the difficult task of hand crafting signatures, in addition to providing intrusion classification capabilities. To this end, a genetic-based signature learning system has been developed that can adaptively and dynamically learn signatures of both normal and intrusive activities from the network traffic. In this paper, we extend the evaluation of our systems to real time network traffic which is captured from a university departmental server. A methodology is developed to build fully labelled intrusion detection data set by mixing real background traffic with attacks simulated in a controlled environment. Tools are developed to pre-process the raw network data into feature vector format suitable for a supervised learning classifier system and other related machine learning systems. The signature extraction system is then applied to this data set and the results are discussed. We show that even simple feature sets can help detecting payload-based attacks.
Similar content being viewed by others
Notes
The original Mucus code was written in 2004 and did not support most new Snort keywords. We used an updated version hosted under Bleeding Threat project [9]
The data set will be made available online later.
Note that UCSSE was run on a much faster machine in comparison to the preprocessing tool. The preprocessing time would be reduced further on a faster machine.
References
Almgren M, Jonsson E (2004) Using active learning in intrusion detection. In: Proceedings of the 17th IEEE computer security foundations workshop (CSFW’04). IEEE Computer Society, New Jersey, pp 88–98
Antonatos S, Anagnostakis KG, Markatos EP (2004) Generating realistic workloads for network intrusion detection systems. ACM SIGSOFT Softw Eng Notes 29(1):207–215
Barisani A (2003) Testing firewalls and IDS with FTester. TISC Insight Newslett 5(6):2–4
Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238
Dixon PW, Corne DW, Oates MJ (2003) A ruleset reduction algorithm for the XCS learning classifier system. In: Proceedings of the 5th international workshop on learning classifier systems, Revised Papers. Springer, Berlin, pp 20–29
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recogn 41(1):176–190
Geschke D (2004) FLoP—Fast logging project for Snort. http://www.geschke-online.de/FLoP/
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine Learning. Addision-Wesley Publishing Company, Inc., Boston
Gregory J (2005) Mucus—traffic generator for IDS simulation. http://www.bleedingthreats.net/.
Hettich S, Bay SD (1999) The UCI KDD archive. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
Holland JH, Booker LB, Colombetti M, Dorigo M, Goldberg DE, Forrest S, Riolo RL, Smith RE, Lanzi PL, Stolzmann W et al (2000) What is a learning classifier system. Learn Classif Syst Found Appl 1813:3–32
Hwang K, Cai M, Chen Y, Qin M (2007) Hybrid intrusion detection with weighted signature generation over anomalous internet episodes. IEEE Trans Dependable Secure Comput 4(1):41–55
Jin S, Yeung DS, Wang X (2007) Network intrusion detection in covariance feature space. Pattern Recogn 40(8):2185–2197
Jung J, Paxson V, Berger AW, Balakrishnan H (2004) Fast portscan detection using sequential hypothesis testing. In: Proceedings of the 2004 IEEE symposium on security and privacy, pp 211–225
Lee W, Stolfo SJ, Mok KW (1999) A data mining framework for building intrusion detection models. IEEE Symp Secur Priv 7:120–132
Lippmann RP, Zissman MA (1998) 1998 DARPA/AFRL off-line intrusion detection evaluation. http://www.ll.mit.edu/IST/ideval/data/data_index.html
Liu Y, Chen K, Liao X, Zhang W (2004) A genetic clustering method for intrusion detection. Pattern Recogn 37(5):927–942
Luo S, Marin GA (2004) Generating realistic network traffic for security experiments. In: Proceedings of the IEEE SoutheastCon, pp 200–207
Mahoney MV, Chan PK (2003) Learning rules for anomaly detection of hostile network traffic. In: Proceedings of the third IEEE international conference on data mining (ICDM 2003), pp 601–604
Mahoney MV (2003) A machine learning approach to detecting attacks by identifying anomalies in network traffic. PhD thesis, Florida Institute of Technology
Mahoney MV, Chan PK (2003) An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In: Proceedings of recent advances in intrusion detection (RAID) 2003. Springer, Berlin, pp 220–237
Massicotte F, Gagnon F, Labiche Y, Briand L, Couture M (2006) Automatic evaluation of intrusion detection systems. In: 22nd annual computer security applications conference, 2006, pp 361–370
McHugh J (2000) Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans Inf Syst Secur 3(4):262–294
Mutz D, Vigna G, Kemmerer R (2003) An experience developing an IDS stimulator for the black-box testing of network intrusion detection systems. In: Proceedings of the 19th annual computer security applications conference, pp 374–383
Ramesh A, Mahesh JV (2001) PNrule: a new framework for learning classifier models in data mining (a case-study in network intrusion detection). In: Proceedings of the first SIAM international conference on data mining, Chicago, IL, USA, 5–7 April, 2001
Roesch M (1999) Snort-lightweight intrusion detection for networks. In: Proceedings of USENIX LISA, pp 229–238. http://www.snort.org/
Sabhnani M, Serpen G (2003) Application of machine learning algorithms to KDD intrusion detection dataset within misuse detection context. In: Proceedings of international conference on machine learning: models, technologies, and applications, pp 23–26
Sabhnani M, Serpen G (2004) Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set. Intell Data Anal 8(4):403–415
Shafi K (2008) An online and adaptive signature-based approach for intrusion detection using learning classifier systems. PhD thesis, University of New South Wales, Australian Defence Force Academy, School of Information Technology and Electrical Engineering
Shafi K, Abbass HA (2009) An adaptive genetic-based signature learning system for intrusion detection. Expert Syst Appl 36(10):12036–12043
Shafi K, Abbass HA, Zhu W (2007) Real time signature extraction from a supervised classifier system. In: Proceeding of the IEEE congress on evolutionary computation, CEC 2007, 25–28 September, 2007, pp 2509–2516
Snort. The open source network intrusion detection system. http://www.snort.org/
Sommers J, Yegneswaran V, Barford P (2005) Toward comprehensive traffic generation for online IDS evaluation. Technical report, Department of Computer Science, University of Wisconsin
Stolfo SJ, Fan W, Lee W, Prodromidis A, Chan PK (2000) Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection: results from the JAM Project. In: Proceedings of DARPA information survivability conference, pp 130–144
Team MD (2006) The Metasploit Project. http://www.metasploit.com/
TeleGeography (2008) TeleGeography’s global internet geography. http://www.telegeography.com/products/gig/index.php
Turner A, Bing M (2005) TCPReplay: PCAP editing and replay tools for *nix. http://tcpreplay.sourceforge.net
Wang K, Stolfo SJ (2004) Anomalous payload-based network intrusion detection. Proc Recent Adv Intrusion Detect 7:201–222
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Wilson SW (2001) Compact rulesets from XCSI. In: Proceedings of the 4th international workshop on advances in learning classifier systems: Revised Papers. Springer, Berlin, pp 197–210
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Fransisco
Acknowledgments
This work is funded by University College Postgraduate Research Scholarship (UCPRS). Most of these experiments were run on the Australian Center for Advanced Computing (AC3) super computing facilities.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shafi, K., Abbass, H.A. Evaluation of an adaptive genetic-based signature extraction system for network intrusion detection. Pattern Anal Applic 16, 549–566 (2013). https://doi.org/10.1007/s10044-011-0255-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0255-5