A Survey and Taxonomy on Data and Pre-processing Techniques of Intrusion Detection Systems

Hamed, Tarfa; Ernst, Jason B.; Kremer, Stefan C.

doi:10.1007/978-3-319-58424-9_7

Tarfa Hamed⁶,
Jason B. Ernst⁷ &
Stefan C. Kremer⁶

3653 Accesses
10 Citations

Abstract

In this chapter, a new review and taxonomy of the input data and pre-processing techniques of intrusion detection systems are presented. This chapter surveys the literature over the last two decades on the data of intrusion detection systems. We present also in this chapter a framework for understanding the different components described in the literature that allows readers to systematically understand the works and envision future hybrid approaches. The chapter describes how to collect the data, and how to prepare this data for different types of processing. We opted to organize the chapter along a component-by-component structure, rather than a paper-by-paper organization, since we believe this will give the reader a wider perspective about the process of constructing an intrusion detection system and its evaluation mechanisms. The organization of this chapter represents an ideal intrusion detection system since it contains most of the components of IDS, so existing approaches can be neatly accommodated within this framework. This will allow the reader to construct and explore new systems by assembling the described components in novel arrangements. We have also conducted important comparisons after each component of IDS supported by some tables to give the reader a better perspective about that particular component. In this sense, it provides insights that a reader would not gain by simply reading the original source papers. The classifiers used with IDS are beyond the scope of this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Taxonomy of Supervised Machine Learning for Intrusion Detection Systems

Survey of intrusion detection systems: techniques, datasets and challenges

Article Open access 17 July 2019

Data Mining Approach for Intrusion Detection

References

Aghaei-Foroushani, V., & Zincir-Heywood, A. N. (2013). On evaluating ip traceback schemes: a practical perspective. In 2013 IEEE Security and privacy workshops (SPW) (pp. 127–134). Piscataway, NJ: IEEE.
Chapter Google Scholar
Al-Jarrah, O., & Arafat, A. (2015). Network intrusion detection system using neural network classification of attack behavior. Journal of Advances in Information Technology, 6(1), 291–295.
Google Scholar
Alata, E., Nicomette, V., Kaaâniche, M., Dacier, M., & Herrb, M. (2006). Lessons learned from the deployment of a high-interaction honeypot. In Sixth European Dependable Computing Conference, 2006. EDCC ’06 (pp. 39–46). doi:10.1109/EDCC.2006.17.
Google Scholar
Baecher, P., Koetter, M., Dornseif, M., & Freiling, F. (2006). The nepenthes platform: An efficient approach to collect malware. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID) (pp. 165–184). Berlin: Springer.
Chapter Google Scholar
Balkanli, E., & Zincir-Heywood, A. (2014). On the analysis of backscatter traffic. In 2014 IEEE 39th Conference on Local Computer Networks Workshops (LCN Workshops) (pp. 671–678). doi:10.1109/LCNW.2014.6927719.
Google Scholar
Baumann, R. (2005). Honeyd–a low involvement honeypot in action. Originally published as part of the GCIA (GIAC Certified Intrusion Analyst) practical (2003)
Google Scholar
Bergadano, F., Gunetti, D., & Picardi, C. (2003). Identity verification through dynamic keystroke analysis. Intelligent Data Analysis, 7(5), 469–496. http://dl.acm.org/citation.cfm?id=1293861.1293866.
Google Scholar
Bhuse, V., & Gupta, A. (2006). Anomaly intrusion detection in wireless sensor networks. Journal of High Speed Networks, 15(1), 33–51.
Google Scholar
Casas, P., Mazel, J., & Owezarski, P. (2012). Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Computer Communications, 35(7), 772–783. http://dx.doi.org/10.1016/j.comcom.2012.01.016, http://www.sciencedirect.com/science/article/pii/S0140366412000266.
Chimedtseren, E., Iwai, K., Tanaka, H., & Kurokawa, T. (2014). Intrusion detection system using discrete Fourier transform. In 2014 Seventh IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA) (pp. 1–5). doi:10.1109/CISDA.2014.7035624.
Google Scholar
Gaikwad, D., & Thool, R. C. (2015). Intrusion detection system using bagging ensemble method of machine learning. In 2015 International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 291–295). Piscataway, NJ: IEEE.
Chapter Google Scholar
Gong, Y., Mabu, S., Chen, C., Wang, Y., & Hirasawa, K. (2009). Intrusion detection system combining misuse detection and anomaly detection using genetic network programming. In ICCAS-SICE, 2009 (pp. 3463–3467).
Google Scholar
Ingre, B., & Yadav, A. (2015). Performance analysis of NSL-KDD dataset using ANN. In 2015 International Conference on Signal Processing and Communication Engineering Systems (SPACES) (pp. 92–96). doi:10.1109/SPACES.2015.7058223.
Google Scholar
Jadhav, A., Jadhav, A., Jadhav, P., & Kulkarni, P. (2013). A novel approach for the design of network intrusion detection system(NIDS). In 2013 International Conference on Sensor Network Security Technology and Privacy Communication System (SNS PCS) (pp. 22–27). doi:10.1109/SNS-PCS.2013.6553828.
Google Scholar
Jamali, S., & Shaker, V. (2014). Defense against {SYN} flooding attacks: A particle swarm optimization approach. Computers and Electrical Engineering, 40(6), 2013–2025. http://dx.doi.org/10.1016/j.compeleceng.2014.05.012, http://www.sciencedirect.com/science/article/pii/S0045790614001591.
Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69–75.
Article Google Scholar
Kayacik, H., & Zincir-Heywood, N. (2005). Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In P. Kantor, G. Muresan, F. Roberts, D. Zeng, F. Y. Wang, H. Chen, & R. Merkle (Eds.), Intelligence and security informatics. Lecture notes in computer science (Vol. 3495, pp. 362–367). Berlin/Heidelberg: Springer. doi:10.1007/11427995_29, http://dx.doi.org/10.1007/11427995_29.
Kim, H. G., Kim, D. J., Cho, S. J., Park, M., & Park, M. (2011). An efficient visitation algorithm to improve the detection speed of high-interaction client honeypots. In Proceedings of the 2011 ACM Symposium on Research in Applied Computation (pp. 266–271). New York: ACM. doi:10.1145/2103380.2103435, http://doi.acm.org/10.1145/2103380.2103435.
Chapter Google Scholar
Kim, J., Bentley, P. J., Aickelin, U., Greensmith, J., Tedesco, G., & Twycross, J. (2007). Immune system approaches to intrusion detection–a review. Natural Computing, 6(4), 413–466.
Article MathSciNet MATH Google Scholar
Lan, F., Chunlei, W., & Guoqing, M. (2010). A framework for network security situation awareness based on knowledge discovery. In 2010 2nd International Conference on Computer Engineering and Technology (ICCET) (Vol. 1, pp. 226–231). Piscataway, NJ: IEEE.
Google Scholar
Lane, T. (2006). A decision-theoretic, semi-supervised model for intrusion detection. In Machine learning and data mining for computer security (pp. 157–177). London: Springer.
Chapter Google Scholar
Lane, T., & Brodley, C. E. (1997). An application of machine learning to anomaly detection. In Proceedings of the 20th National Information Systems Security Conference (pp. 366–377).
Google Scholar
Li, Y., Fang, B. X., Chen, Y., & Guo, L. (2006). A lightweight intrusion detection model based on feature selection and maximum entropy model. In 2006 International Conference on Communication Technology (pp. 1–4). doi:10.1109/ICCT.2006.341771.
Google Scholar
Ligh, M., Adair, S., Hartstein, B., & Richard, M. (2011). Malware analyst’s cookbook and DVD: Tools and techniques for fighting malicious code. Hoboken: Wiley Publishing.
Google Scholar
Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-Based Systems, 78(0), 13–21. http://dx.doi.org/10.1016/j.knosys.2015.01.009, http://www.sciencedirect.com/science/article/pii/S0950705115000167.
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502. doi:10.1109/TKDE.2005.66.
Article MathSciNet Google Scholar
Mahoney, M. V., & Chan, P. K. (2001). Phad: Packet header anomaly detection for identifying hostile network traffic (Tech. Rep. CS-2001-4), Florida Institute of Technology, Melbourne, FL, USA.
Google Scholar
McGraw, G., & Morrisett, G. (2000). Attacking malicious code: A report to the infosec research council. IEEE Software, 17(5), 33–41.
Article Google Scholar
MeeraGandhi, G., & Appavoo, K. (2010). Effective network intrusion detection using classifiers decision trees and decision rules. International Journal of Advanced Networking and Applications, 2(3), 686–692.
Google Scholar
Mehta, V., Bahadur, P., Kapoor, M., Singh, P., & Rajpoot, S. (2015). Threat prediction using honeypot and machine learning. In 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (pp. 278–282). doi:10.1109/ABLAZE.2015.7155011.
Google Scholar
Mo, Y., Ma, Y., & Xu, L. (2008). Design and implementation of intrusion detection based on mobile agents. In: IEEE International Symposium on IT in Medicine and Education, 2008 (pp. 278–281). doi:10.1109/ITME.2008.4743870.
Google Scholar
Mohanapriya, M., & Krishnamurthi, I. (2014). Modified DSR protocol for detection and removal of selective black hole attack in MANET. Computers and Electrical Engineering, 40(2), 530–538. http://dx.doi.org/10.1016/j.compeleceng.2013.06.001, http://www.sciencedirect.com/science/article/pii/S0045790613001596.
Muehlbach, S., & Koch, A. (2012). Malacoda: Towards high-level compilation of network security applications on reconfigurable hardware. In Proceedings of the Eighth ACM/IEEE Symposium on Architectures for Networking and Communications Systems (pp. 247–258). New York: ACM.
Chapter Google Scholar
Muzammil, M., Qazi, S., & Ali, T. (2013). Comparative analysis of classification algorithms performance for statistical based intrusion detection system. In 2013 3rd International Conference on Computer, Control Communication (IC4) (pp. 1–6). doi:10.1109/IC4.2013.6653738.
Google Scholar
Nechaev, B., Allman, M., Paxson, V., & Gurtov, A. (2010). A preliminary analysis of TCP performance in an enterprise network. In Proceedings of the 2010 Internet Network Management Conference on Research on Enterprise Networking, USENIX Association (pp. 1–6).
Google Scholar
Ng, J., Joshi, D., & Banik, S. (2015). Applying data mining techniques to intrusion detection. In 2015 12th International Conference on Information Technology – New Generations (ITNG) (pp. 800–801). doi:10.1109/ITNG.2015.146.
Google Scholar
Northcutt, S., & Novak, J. (2003). Network intrusion detection. Indianapolis: Sams Publishing.
Google Scholar
Pannell, G., & Ashman, H. (2010). Anomaly detection over user profiles for intrusion detection. In Proceedings of the 8th Australian Information Security Management Conference, School of Computer and Information Science, Edith Cowan University, Perth, Western Australia (pp. 81–94)
Google Scholar
Portokalidis, G., & Bos, H. (2007). Sweetbait: Zero-hour worm detection and containment using low-and high-interaction honeypots. Computer Networks, 51(5), 1256–1274.
Article MATH Google Scholar
Project, T. H. (2009). Dionaea. http://dionaea.carnivore.it. Accessed February 2013.
Google Scholar
Provos N (2004) A virtual honeypot framework. In: Proceedings of the 13th Conference on USENIX Security Symposium - Volume 13, USENIX Association, Berkeley, CA, USA, SSYM’04, pp 1-14, http://dl.acm.org/citation.cfm?id=1251375.1251376.
Richharya, V., Rana, D. J., Jain, D. R., & Pandey, D. K. (2013). Design of trust model for efficient cyber attack detection on fuzzified large data using data mining techniques. International Journal of Research in Computer and Communication Technology, 2(3), 126–130.
Google Scholar
Rieck, K., Schwenk, G., Limmer, T., Holz, T., & Laskov, P. (2010). Botzilla: Detecting the phoning home of malicious software. In proceedings of the 2010 ACM Symposium on Applied Computing (pp. 1978–1984). New York: ACM.
Google Scholar
Schonlau, M., DuMouchel, W., Ju, W. H., Karr, A. F., Theus, M., & Vardi, Y. (2001). Computer intrusion: Detecting masquerades. Statistical Science, 16(1), 58–74.
Article MathSciNet MATH Google Scholar
Seifert, C., Welch, I., & Komisarczuk, P. (2008). Application of divide-and-conquer algorithm paradigm to improve the detection speed of high interaction client honeypots. In Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 1426–1432. New York: ACM.
Chapter Google Scholar
Sekar, R., Gupta, A., Frullo, J., Shanbhag, T., Tiwari, A., Yang, H., et al. (2002). Specification-based anomaly detection: A new approach for detecting network intrusions. In Proceedings of the 9th ACM Conference on Computer and Communications Security (pp. 265–274). New York: ACM.
Google Scholar
Sen, J. (2010). Efficient routing anomaly detection in wireless mesh networks. In 2010 First International Conference on Integrated Intelligent Computing (ICIIC) (pp. 302–307). doi:10.1109/ICIIC.2010.22.
Google Scholar
Shanmugavadivu, R., & Nagarajan, N. (2011). Network intrusion detection system using fuzzy logic. Indian Journal of Computer Science and Engineering (IJCSE), 2(1), 101–111.
Google Scholar
Sharma, V., & Nema, A. (2013). Innovative genetic approach for intrusion detection by using decision tree. In 2013 International Conference on Communication Systems and Network Technologies (CSNT) (pp. 418–422). doi:10.1109/CSNT.2013.93.
Google Scholar
Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers and Security, 31(3), 357–374. http://dx.doi.org/10.1016/j.cose.2011.12.012, http://www.sciencedirect.com/science/article/pii/S0167404811001672.
Shon, T., & Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177(18), 3799–3821.
Article Google Scholar
Singh, S., & Silakari, S. (2009). A survey of cyber attack detection systems. International Journal of Computer Science and Network Security (IJCSNS), 9(5), 1–10.
Google Scholar
Subramanian, U., & Ong, H. S. (2014). Analysis of the effect of clustering the training data in naive bayes classifier for anomaly network intrusion detection. Journal of Advances in Computer Networks, 2(1), 85–88.
Article Google Scholar
Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R. P., & Hu, J. (2015). Detection of denial-of-service attacks based on computer vision techniques. IEEE Transactions on Computers, 64(9), 2519–2533. doi:10.1109/TC.2014.2375218.
Article MathSciNet MATH Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009 (pp. 53–58).
Google Scholar
Teng, L., Teng, S., Tang, F., Zhu, H., Zhang, W., Liu, D., et al. (2014). A collaborative and adaptive intrusion detection based on SVMs and decision trees. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 898–905). doi:10.1109/ICDMW.2014.147.
Google Scholar
Terry, S., & Chow, B. J. (2005). An assessment of the DARPA IDS evaluation dataset using snort (Tech. rep.), UC Davis Technical Report.
Google Scholar
Thaseen, S., & Kumar, C. A. (2013). An analysis of supervised tree based classifiers for intrusion detection system. In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering (pp. 294–299). doi:10.1109/ICPRIME.2013.6496489.
Google Scholar
Thomas, C., Sharma, V., & Balakrishnan, N. (2008). Usefulness of darpa dataset for intrusion detection system evaluation. In SPIE Defense and Security Symposium, International Society for Optics and Photonics (pp. 1–8)
Google Scholar
Trinius, P., Holz, T., Willems, C., & Rieck, K. (2009). A malware instruction set for behavior-based analysis (Tech. Rep. TR-2009-07), University of Mannheim.
Google Scholar
Van Jacobson, C. L., & McCanne, S. (1987). Tcpdump. http://www.tcpdump.org/tcpdump_man.html#index. Accessed January 7, 2014.
Wang, W., Guyet, T., Quiniou, R., Cordier, M. O., Masseglia, F., & Zhang, X. (2014). Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowledge-Based Systems, 70(0), 103–117. http://dx.doi.org/10.1016/j.knosys.2014.06.018, http://www.sciencedirect.com/science/article/pii/S0950705114002391.
Warrender, C., Forrest, S., & Pearlmutter, B. (1999). Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, 1999 (pp. 133–145). doi:10.1109/SECPRI.1999.766910.
Google Scholar
Xiaoqing, G., Hebin, G., & Luyi, C. (2010). Network intrusion detection method based on agent and SVM. In 2010 The 2nd IEEE International Conference on Information Management and Engineering (ICIME) (pp. 399–402). Piscataway, NJ: IEEE.
Chapter Google Scholar
Yanjun, Z., Jun, W. M., & Jing, W. (2013). Realization of intrusion detection system based on the improved data mining technology. In 2013 8th International Conference on Computer Science Education (ICCSE) (pp. 982–987). doi:10.1109/ICCSE.2013.6554056.
Google Scholar
Yassin, W., Udzir, N. I., Abdullah, A., Abdullah, M. T., Zulzalil, H., & Muda, Z. (2014). Signature-based anomaly intrusion detection using integrated data mining classifiers. In 2014 International Symposium on Biometrics and Security Technologies (ISBAST) (pp. 232–237). doi:10.1109/ISBAST.2014.7013127.
Google Scholar
Ying, L., Yan, Z., & Yang-Jia, O. (2010). The design and implementation of host-based intrusion detection system. In 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI) (pp. 595–598). doi:10.1109/IITSI.2010.127.
Google Scholar
Zou, X., Pan, Y., & Dai, Y.-S. (2008). Trust and security in collaborative computing. Singapore: World Scientific.
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Guelph, Guelph, ON, Canada
Tarfa Hamed & Stefan C. Kremer
Left Inc., Vancouver, BC, Canada
Jason B. Ernst

Authors

Tarfa Hamed
View author publications
You can also search for this author in PubMed Google Scholar
Jason B. Ernst
View author publications
You can also search for this author in PubMed Google Scholar
Stefan C. Kremer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tarfa Hamed .

Editor information

Editors and Affiliations

University of Detroit Mercy, Detroit, Michigan, USA
Kevin Daimi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hamed, T., Ernst, J.B., Kremer, S.C. (2018). A Survey and Taxonomy on Data and Pre-processing Techniques of Intrusion Detection Systems. In: Daimi, K. (eds) Computer and Network Security Essentials. Springer, Cham. https://doi.org/10.1007/978-3-319-58424-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-58424-9_7
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58423-2
Online ISBN: 978-3-319-58424-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics