Skip to main content

A Survey and Taxonomy on Data and Pre-processing Techniques of Intrusion Detection Systems

  • Chapter
  • First Online:
Computer and Network Security Essentials

Abstract

In this chapter, a new review and taxonomy of the input data and pre-processing techniques of intrusion detection systems are presented. This chapter surveys the literature over the last two decades on the data of intrusion detection systems. We present also in this chapter a framework for understanding the different components described in the literature that allows readers to systematically understand the works and envision future hybrid approaches. The chapter describes how to collect the data, and how to prepare this data for different types of processing. We opted to organize the chapter along a component-by-component structure, rather than a paper-by-paper organization, since we believe this will give the reader a wider perspective about the process of constructing an intrusion detection system and its evaluation mechanisms. The organization of this chapter represents an ideal intrusion detection system since it contains most of the components of IDS, so existing approaches can be neatly accommodated within this framework. This will allow the reader to construct and explore new systems by assembling the described components in novel arrangements. We have also conducted important comparisons after each component of IDS supported by some tables to give the reader a better perspective about that particular component. In this sense, it provides insights that a reader would not gain by simply reading the original source papers. The classifiers used with IDS are beyond the scope of this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aghaei-Foroushani, V., & Zincir-Heywood, A. N. (2013). On evaluating ip traceback schemes: a practical perspective. In 2013 IEEE Security and privacy workshops (SPW) (pp. 127–134). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  2. Al-Jarrah, O., & Arafat, A. (2015). Network intrusion detection system using neural network classification of attack behavior. Journal of Advances in Information Technology, 6(1), 291–295.

    Google Scholar 

  3. Alata, E., Nicomette, V., Kaaâniche, M., Dacier, M., & Herrb, M. (2006). Lessons learned from the deployment of a high-interaction honeypot. In Sixth European Dependable Computing Conference, 2006. EDCC ’06 (pp. 39–46). doi:10.1109/EDCC.2006.17.

    Google Scholar 

  4. Baecher, P., Koetter, M., Dornseif, M., & Freiling, F. (2006). The nepenthes platform: An efficient approach to collect malware. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID) (pp. 165–184). Berlin: Springer.

    Chapter  Google Scholar 

  5. Balkanli, E., & Zincir-Heywood, A. (2014). On the analysis of backscatter traffic. In 2014 IEEE 39th Conference on Local Computer Networks Workshops (LCN Workshops) (pp. 671–678). doi:10.1109/LCNW.2014.6927719.

    Google Scholar 

  6. Baumann, R. (2005). Honeyd–a low involvement honeypot in action. Originally published as part of the GCIA (GIAC Certified Intrusion Analyst) practical (2003)

    Google Scholar 

  7. Bergadano, F., Gunetti, D., & Picardi, C. (2003). Identity verification through dynamic keystroke analysis. Intelligent Data Analysis, 7(5), 469–496. http://dl.acm.org/citation.cfm?id=1293861.1293866.

    Google Scholar 

  8. Bhuse, V., & Gupta, A. (2006). Anomaly intrusion detection in wireless sensor networks. Journal of High Speed Networks, 15(1), 33–51.

    Google Scholar 

  9. Casas, P., Mazel, J., & Owezarski, P. (2012). Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Computer Communications, 35(7), 772–783. http://dx.doi.org/10.1016/j.comcom.2012.01.016, http://www.sciencedirect.com/science/article/pii/S0140366412000266.

  10. Chimedtseren, E., Iwai, K., Tanaka, H., & Kurokawa, T. (2014). Intrusion detection system using discrete Fourier transform. In 2014 Seventh IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA) (pp. 1–5). doi:10.1109/CISDA.2014.7035624.

    Google Scholar 

  11. Gaikwad, D., & Thool, R. C. (2015). Intrusion detection system using bagging ensemble method of machine learning. In 2015 International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 291–295). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  12. Gong, Y., Mabu, S., Chen, C., Wang, Y., & Hirasawa, K. (2009). Intrusion detection system combining misuse detection and anomaly detection using genetic network programming. In ICCAS-SICE, 2009 (pp. 3463–3467).

    Google Scholar 

  13. Ingre, B., & Yadav, A. (2015). Performance analysis of NSL-KDD dataset using ANN. In 2015 International Conference on Signal Processing and Communication Engineering Systems (SPACES) (pp. 92–96). doi:10.1109/SPACES.2015.7058223.

    Google Scholar 

  14. Jadhav, A., Jadhav, A., Jadhav, P., & Kulkarni, P. (2013). A novel approach for the design of network intrusion detection system(NIDS). In 2013 International Conference on Sensor Network Security Technology and Privacy Communication System (SNS PCS) (pp. 22–27). doi:10.1109/SNS-PCS.2013.6553828.

    Google Scholar 

  15. Jamali, S., & Shaker, V. (2014). Defense against {SYN} flooding attacks: A particle swarm optimization approach. Computers and Electrical Engineering, 40(6), 2013–2025. http://dx.doi.org/10.1016/j.compeleceng.2014.05.012, http://www.sciencedirect.com/science/article/pii/S0045790614001591.

  16. Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69–75.

    Article  Google Scholar 

  17. Kayacik, H., & Zincir-Heywood, N. (2005). Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In P. Kantor, G. Muresan, F. Roberts, D. Zeng, F. Y. Wang, H. Chen, & R. Merkle (Eds.), Intelligence and security informatics. Lecture notes in computer science (Vol. 3495, pp. 362–367). Berlin/Heidelberg: Springer. doi:10.1007/11427995_29, http://dx.doi.org/10.1007/11427995_29.

  18. Kim, H. G., Kim, D. J., Cho, S. J., Park, M., & Park, M. (2011). An efficient visitation algorithm to improve the detection speed of high-interaction client honeypots. In Proceedings of the 2011 ACM Symposium on Research in Applied Computation (pp. 266–271). New York: ACM. doi:10.1145/2103380.2103435, http://doi.acm.org/10.1145/2103380.2103435.

    Chapter  Google Scholar 

  19. Kim, J., Bentley, P. J., Aickelin, U., Greensmith, J., Tedesco, G., & Twycross, J. (2007). Immune system approaches to intrusion detection–a review. Natural Computing, 6(4), 413–466.

    Article  MathSciNet  MATH  Google Scholar 

  20. Lan, F., Chunlei, W., & Guoqing, M. (2010). A framework for network security situation awareness based on knowledge discovery. In 2010 2nd International Conference on Computer Engineering and Technology (ICCET) (Vol. 1, pp. 226–231). Piscataway, NJ: IEEE.

    Google Scholar 

  21. Lane, T. (2006). A decision-theoretic, semi-supervised model for intrusion detection. In Machine learning and data mining for computer security (pp. 157–177). London: Springer.

    Chapter  Google Scholar 

  22. Lane, T., & Brodley, C. E. (1997). An application of machine learning to anomaly detection. In Proceedings of the 20th National Information Systems Security Conference (pp. 366–377).

    Google Scholar 

  23. Li, Y., Fang, B. X., Chen, Y., & Guo, L. (2006). A lightweight intrusion detection model based on feature selection and maximum entropy model. In 2006 International Conference on Communication Technology (pp. 1–4). doi:10.1109/ICCT.2006.341771.

    Google Scholar 

  24. Ligh, M., Adair, S., Hartstein, B., & Richard, M. (2011). Malware analyst’s cookbook and DVD: Tools and techniques for fighting malicious code. Hoboken: Wiley Publishing.

    Google Scholar 

  25. Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-Based Systems, 78(0), 13–21. http://dx.doi.org/10.1016/j.knosys.2015.01.009, http://www.sciencedirect.com/science/article/pii/S0950705115000167.

  26. Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502. doi:10.1109/TKDE.2005.66.

    Article  MathSciNet  Google Scholar 

  27. Mahoney, M. V., & Chan, P. K. (2001). Phad: Packet header anomaly detection for identifying hostile network traffic (Tech. Rep. CS-2001-4), Florida Institute of Technology, Melbourne, FL, USA.

    Google Scholar 

  28. McGraw, G., & Morrisett, G. (2000). Attacking malicious code: A report to the infosec research council. IEEE Software, 17(5), 33–41.

    Article  Google Scholar 

  29. MeeraGandhi, G., & Appavoo, K. (2010). Effective network intrusion detection using classifiers decision trees and decision rules. International Journal of Advanced Networking and Applications, 2(3), 686–692.

    Google Scholar 

  30. Mehta, V., Bahadur, P., Kapoor, M., Singh, P., & Rajpoot, S. (2015). Threat prediction using honeypot and machine learning. In 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (pp. 278–282). doi:10.1109/ABLAZE.2015.7155011.

    Google Scholar 

  31. Mo, Y., Ma, Y., & Xu, L. (2008). Design and implementation of intrusion detection based on mobile agents. In: IEEE International Symposium on IT in Medicine and Education, 2008 (pp. 278–281). doi:10.1109/ITME.2008.4743870.

    Google Scholar 

  32. Mohanapriya, M., & Krishnamurthi, I. (2014). Modified DSR protocol for detection and removal of selective black hole attack in MANET. Computers and Electrical Engineering, 40(2), 530–538. http://dx.doi.org/10.1016/j.compeleceng.2013.06.001, http://www.sciencedirect.com/science/article/pii/S0045790613001596.

  33. Muehlbach, S., & Koch, A. (2012). Malacoda: Towards high-level compilation of network security applications on reconfigurable hardware. In Proceedings of the Eighth ACM/IEEE Symposium on Architectures for Networking and Communications Systems (pp. 247–258). New York: ACM.

    Chapter  Google Scholar 

  34. Muzammil, M., Qazi, S., & Ali, T. (2013). Comparative analysis of classification algorithms performance for statistical based intrusion detection system. In 2013 3rd International Conference on Computer, Control Communication (IC4) (pp. 1–6). doi:10.1109/IC4.2013.6653738.

    Google Scholar 

  35. Nechaev, B., Allman, M., Paxson, V., & Gurtov, A. (2010). A preliminary analysis of TCP performance in an enterprise network. In Proceedings of the 2010 Internet Network Management Conference on Research on Enterprise Networking, USENIX Association (pp. 1–6).

    Google Scholar 

  36. Ng, J., Joshi, D., & Banik, S. (2015). Applying data mining techniques to intrusion detection. In 2015 12th International Conference on Information Technology – New Generations (ITNG) (pp. 800–801). doi:10.1109/ITNG.2015.146.

    Google Scholar 

  37. Northcutt, S., & Novak, J. (2003). Network intrusion detection. Indianapolis: Sams Publishing.

    Google Scholar 

  38. Pannell, G., & Ashman, H. (2010). Anomaly detection over user profiles for intrusion detection. In Proceedings of the 8th Australian Information Security Management Conference, School of Computer and Information Science, Edith Cowan University, Perth, Western Australia (pp. 81–94)

    Google Scholar 

  39. Portokalidis, G., & Bos, H. (2007). Sweetbait: Zero-hour worm detection and containment using low-and high-interaction honeypots. Computer Networks, 51(5), 1256–1274.

    Article  MATH  Google Scholar 

  40. Project, T. H. (2009). Dionaea. http://dionaea.carnivore.it. Accessed February 2013.

    Google Scholar 

  41. Provos N (2004) A virtual honeypot framework. In: Proceedings of the 13th Conference on USENIX Security Symposium - Volume 13, USENIX Association, Berkeley, CA, USA, SSYM’04, pp 1-14, http://dl.acm.org/citation.cfm?id=1251375.1251376.

  42. Richharya, V., Rana, D. J., Jain, D. R., & Pandey, D. K. (2013). Design of trust model for efficient cyber attack detection on fuzzified large data using data mining techniques. International Journal of Research in Computer and Communication Technology, 2(3), 126–130.

    Google Scholar 

  43. Rieck, K., Schwenk, G., Limmer, T., Holz, T., & Laskov, P. (2010). Botzilla: Detecting the phoning home of malicious software. In proceedings of the 2010 ACM Symposium on Applied Computing (pp. 1978–1984). New York: ACM.

    Google Scholar 

  44. Schonlau, M., DuMouchel, W., Ju, W. H., Karr, A. F., Theus, M., & Vardi, Y. (2001). Computer intrusion: Detecting masquerades. Statistical Science, 16(1), 58–74.

    Article  MathSciNet  MATH  Google Scholar 

  45. Seifert, C., Welch, I., & Komisarczuk, P. (2008). Application of divide-and-conquer algorithm paradigm to improve the detection speed of high interaction client honeypots. In Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 1426–1432. New York: ACM.

    Chapter  Google Scholar 

  46. Sekar, R., Gupta, A., Frullo, J., Shanbhag, T., Tiwari, A., Yang, H., et al. (2002). Specification-based anomaly detection: A new approach for detecting network intrusions. In Proceedings of the 9th ACM Conference on Computer and Communications Security (pp. 265–274). New York: ACM.

    Google Scholar 

  47. Sen, J. (2010). Efficient routing anomaly detection in wireless mesh networks. In 2010 First International Conference on Integrated Intelligent Computing (ICIIC) (pp. 302–307). doi:10.1109/ICIIC.2010.22.

    Google Scholar 

  48. Shanmugavadivu, R., & Nagarajan, N. (2011). Network intrusion detection system using fuzzy logic. Indian Journal of Computer Science and Engineering (IJCSE), 2(1), 101–111.

    Google Scholar 

  49. Sharma, V., & Nema, A. (2013). Innovative genetic approach for intrusion detection by using decision tree. In 2013 International Conference on Communication Systems and Network Technologies (CSNT) (pp. 418–422). doi:10.1109/CSNT.2013.93.

    Google Scholar 

  50. Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers and Security, 31(3), 357–374. http://dx.doi.org/10.1016/j.cose.2011.12.012, http://www.sciencedirect.com/science/article/pii/S0167404811001672.

  51. Shon, T., & Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177(18), 3799–3821.

    Article  Google Scholar 

  52. Singh, S., & Silakari, S. (2009). A survey of cyber attack detection systems. International Journal of Computer Science and Network Security (IJCSNS), 9(5), 1–10.

    Google Scholar 

  53. Subramanian, U., & Ong, H. S. (2014). Analysis of the effect of clustering the training data in naive bayes classifier for anomaly network intrusion detection. Journal of Advances in Computer Networks, 2(1), 85–88.

    Article  Google Scholar 

  54. Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R. P., & Hu, J. (2015). Detection of denial-of-service attacks based on computer vision techniques. IEEE Transactions on Computers, 64(9), 2519–2533. doi:10.1109/TC.2014.2375218.

    Article  MathSciNet  MATH  Google Scholar 

  55. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009 (pp. 53–58).

    Google Scholar 

  56. Teng, L., Teng, S., Tang, F., Zhu, H., Zhang, W., Liu, D., et al. (2014). A collaborative and adaptive intrusion detection based on SVMs and decision trees. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 898–905). doi:10.1109/ICDMW.2014.147.

    Google Scholar 

  57. Terry, S., & Chow, B. J. (2005). An assessment of the DARPA IDS evaluation dataset using snort (Tech. rep.), UC Davis Technical Report.

    Google Scholar 

  58. Thaseen, S., & Kumar, C. A. (2013). An analysis of supervised tree based classifiers for intrusion detection system. In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering (pp. 294–299). doi:10.1109/ICPRIME.2013.6496489.

    Google Scholar 

  59. Thomas, C., Sharma, V., & Balakrishnan, N. (2008). Usefulness of darpa dataset for intrusion detection system evaluation. In SPIE Defense and Security Symposium, International Society for Optics and Photonics (pp. 1–8)

    Google Scholar 

  60. Trinius, P., Holz, T., Willems, C., & Rieck, K. (2009). A malware instruction set for behavior-based analysis (Tech. Rep. TR-2009-07), University of Mannheim.

    Google Scholar 

  61. Van Jacobson, C. L., & McCanne, S. (1987). Tcpdump. http://www.tcpdump.org/tcpdump_man.html#index. Accessed January 7, 2014.

  62. Wang, W., Guyet, T., Quiniou, R., Cordier, M. O., Masseglia, F., & Zhang, X. (2014). Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowledge-Based Systems, 70(0), 103–117. http://dx.doi.org/10.1016/j.knosys.2014.06.018, http://www.sciencedirect.com/science/article/pii/S0950705114002391.

  63. Warrender, C., Forrest, S., & Pearlmutter, B. (1999). Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, 1999 (pp. 133–145). doi:10.1109/SECPRI.1999.766910.

    Google Scholar 

  64. Xiaoqing, G., Hebin, G., & Luyi, C. (2010). Network intrusion detection method based on agent and SVM. In 2010 The 2nd IEEE International Conference on Information Management and Engineering (ICIME) (pp. 399–402). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  65. Yanjun, Z., Jun, W. M., & Jing, W. (2013). Realization of intrusion detection system based on the improved data mining technology. In 2013 8th International Conference on Computer Science Education (ICCSE) (pp. 982–987). doi:10.1109/ICCSE.2013.6554056.

    Google Scholar 

  66. Yassin, W., Udzir, N. I., Abdullah, A., Abdullah, M. T., Zulzalil, H., & Muda, Z. (2014). Signature-based anomaly intrusion detection using integrated data mining classifiers. In 2014 International Symposium on Biometrics and Security Technologies (ISBAST) (pp. 232–237). doi:10.1109/ISBAST.2014.7013127.

    Google Scholar 

  67. Ying, L., Yan, Z., & Yang-Jia, O. (2010). The design and implementation of host-based intrusion detection system. In 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI) (pp. 595–598). doi:10.1109/IITSI.2010.127.

    Google Scholar 

  68. Zou, X., Pan, Y., & Dai, Y.-S. (2008). Trust and security in collaborative computing. Singapore: World Scientific.

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tarfa Hamed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Hamed, T., Ernst, J.B., Kremer, S.C. (2018). A Survey and Taxonomy on Data and Pre-processing Techniques of Intrusion Detection Systems. In: Daimi, K. (eds) Computer and Network Security Essentials. Springer, Cham. https://doi.org/10.1007/978-3-319-58424-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58424-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58423-2

  • Online ISBN: 978-3-319-58424-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics