Skip to main content

How to Boost Machine Learning Network Intrusion Detection Performance with Encoding Schemes

  • Conference paper
  • First Online:
Computer Information Systems and Industrial Management (CISIM 2023)

Abstract

Network Intrusion Detection is one of the major components of maintaining cybersecurity. This is especially crucial in Soft Targets, important places which are easily accessible, and thus more vulnerable. Real-time machine-learning-based network intrusion detection is an increasingly more relevant field of study offering important benefits to the practice of securing against cyberthreats. This paper contributes to this growing body of research by evaluating one of the problems prevailing in all machine-learning-based detectors - the notion of encoding categorical values. The use of different encoding schemes is thoroughly evaluated with the use of three different classifier types, and statistical analysis of the results is performed. The best-performing solution is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. About 450m cyberattacks prevented during Tokyo olympics. https://www.aa.com.tr/en/asia-pacific/about-450m-cyberattacks-prevented-during-tokyo-olympics/2383969. Accessed 20 Feb 2023

  2. Criminals hacked a fish tank to steal data from a Casino. https://www.forbes.com/sites/leemathews/2017/07/27/criminals-hacked-a-fish-tank-to-steal-data-from-a-casino/#3bc82bd032b9. Accessed 20 Feb 2023

  3. Cyberattackers make waves in hotel swimming pool controls. https://www.darkreading.com/attacks-breaches/breached-controllers-let-attackers-breach-hotel-pools-in-israel. Accessed 20 Feb 2023

  4. Data breach affects 63 Landry’s restaurants — threatpost. https://threatpost.com/data-breach-affects-63-landrys-restaurants/151503/. Accessed 20 Feb 2023

  5. Lessons learned from Oldsmar water plant hack – security today. https://securitytoday.com/articles/2021/04/05/lessons-learned-from-oldsmar-water-plant-hack.aspx. Accessed 20 Feb 2023

  6. Ransomware attack on smarterasp.net impact 440,000 customers - cybersecurity insiders. https://www.cybersecurity-insiders.com/ransomware-attack-on-smarterasp-net-impact-440000-customers/. Accessed 20 Feb 2023

  7. Update: UHS health system confirms all us sites affected by ransomware attack. https://healthitsecurity.com/news/uhs-health-system-confirms-all-us-sites-affected-by-ransomware-attack. Accessed 20 Feb 2023

  8. Ahmad, T., Aziz, M.N.: Data preprocessing and feature selection for machine learning intrusion detection systems. ICIC Express Lett. 13(2), 93–101 (2019)

    Google Scholar 

  9. Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., Ahmad, F.: Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 32(1), e4150 (2021)

    Google Scholar 

  10. Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection (2010)

    Google Scholar 

  11. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  12. Chowdhury, M.N., Ferens, K., Ferens, M.: Network intrusion detection using machine learning. In: Proceedings of the International Conference on Security and Management (SAM), p. 30. The Steering Committee of The World Congress in Computer Science, Computer ... (2016)

    Google Scholar 

  13. Davis, J.J., Clark, A.J.: Data preprocessing for anomaly based network intrusion detection: a review. Comput. Secur. 30(6–7), 353–375 (2011)

    Google Scholar 

  14. Dias, L., Valente, S., Correia, M.: Go with the flow: clustering dynamically-defined NetFlow features for network intrusion detection with DynIDS. In: 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA), pp. 1–10 (2020). https://doi.org/10.1109/NCA51143.2020.9306732

  15. Dutta, V., Choras, M., Pawlicki, M., Kozik, R.: Detection of cyberattacks traces in IoT data. J. Univers. Comput. Sci. 26(11), 1422–1434 (2020)

    Article  Google Scholar 

  16. Elmasry, W., Akbulut, A., Zaim, A.H.: Evolving deep learning architectures for network intrusion detection using a double PSO metaheuristic. Comput. Netw. 168, 107042 (2020). https://doi.org/10.1016/j.comnet.2019.107042. https://www.sciencedirect.com/science/article/pii/S138912861930800X

  17. Fix, E.: Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, vol. 1. USAF School of Aviation Medicine (1985)

    Google Scholar 

  18. Gamage, S., Samarabandu, J.: Deep learning methods in network intrusion detection: a survey and an objective comparison. J. Netw. Comput. Appl. 169, 102767 (2020). https://doi.org/10.1016/j.jnca.2020.102767. https://www.sciencedirect.com/science/article/pii/S1084804520302411

  19. Hancock, J.T., Khoshgoftaar, T.M.: Survey on categorical data for neural networks. J. Big Data 7(1), 1–41 (2020). https://doi.org/10.1186/s40537-020-00305-w

    Article  Google Scholar 

  20. Hassanzadeh, A., et al.: A review of cybersecurity incidents in the water sector. J. Environ. Eng. 146(5), 03120003 (2020)

    Article  Google Scholar 

  21. Hofstede, R., Bartoš, V., Sperotto, A., Pras, A.: Towards real-time intrusion detection for NetFlow and IPFIX. In: Proceedings of the 9th International Conference on Network and Service Management (CNSM 2013), pp. 227–234 (2013). https://doi.org/10.1109/CNSM.2013.6727841

  22. Jackson, E., Agrawal, R.: Performance evaluation of different feature encoding schemes on cybersecurity logs. In: 2019 SoutheastCon, pp. 1–9 (2019). https://doi.org/10.1109/SoutheastCon42311.2019.9020560

  23. Jo, W., Kim, S., Lee, C., Shon, T.: Packet preprocessing in CNN-based network intrusion detection system. Electronics 9(7), 1151 (2020)

    Article  Google Scholar 

  24. Komisarek, M., Pawlicki, M., Kozik, R., Hołubowicz, W., Choraś, M.: How to effectively collect and process network data for intrusion detection? Entropy 23(11), 1532 (2021)

    Article  Google Scholar 

  25. Kosaraju, N., Sankepally, S.R., Mallikharjuna Rao, K.: Categorical data: need, encoding, selection of encoding method and its emergence in machine learning models–a practical review study on heart disease prediction dataset using Pearson correlation. In: Saraswat, M., Chowdhury, C., Kumar Mandal, C., Gandomi, A.H. (eds.) ICDSA 2022, vol. 1, pp. 369–382. Springer, Singapore (2023). https://doi.org/10.1007/978-981-19-6631-6_26

    Chapter  Google Scholar 

  26. Leung, H., Haykin, S.: The complex backpropagation algorithm. IEEE Trans. Signal Process. 39(9), 2101–2104 (1991)

    Article  Google Scholar 

  27. Li, J., Qu, Y., Chao, F., Shum, H.P.H., Ho, E.S.L., Yang, L.: Machine Learning Algorithms for Network Intrusion Detection, pp. 151–179. Springer, Cham (2019)

    Google Scholar 

  28. Mihailescu, M.E., et al.: The proposition and evaluation of the RoeduNet-SIMARGL2021 network intrusion detection dataset. Sensors 21(13), 4319 (2021)

    Article  Google Scholar 

  29. Pawlicka, A., Choraś, M., Pawlicki, M., Kozik, R.: A \$10 million question and other cybersecurity-related ethical dilemmas amid the COVID-19 pandemic. Bus. Horiz. 64(6), 729–734 (2021)

    Article  Google Scholar 

  30. Pawlicki, M., Choraś, M., Kozik, R., Hołubowicz, W.: On the impact of network data balancing in cybersecurity applications. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 196–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_15

    Chapter  Google Scholar 

  31. Pawlicki, M., Kozik, R., Choraś, M.: A survey on neural networks for (cyber-) security and (cyber-) security of neural networks. Neurocomputing 500, 1075–1087 (2022)

    Article  Google Scholar 

  32. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  33. Potdar, K., Pardawala, T.S., Pai, C.D.: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 175(4), 7–9 (2017)

    Google Scholar 

  34. Sarhan, M., Layeghy, S., Moustafa, N., Portmann, M.: NetFlow datasets for machine learning-based network intrusion detection systems. In: Deze, Z., Huang, H., Hou, R., Rho, S., Chilamkurti, N. (eds.) BDTA/WiCON 2020. LNICST, vol. 371, pp. 117–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72802-1_9

    Chapter  Google Scholar 

  35. Sarhan, M., Layeghy, S., Portmann, M.: Towards a standard feature set for network intrusion detection system datasets. Mobile Netw. Appl. 27, 357–370 (2022)

    Article  Google Scholar 

  36. Sharafaldin, I., Gharib, A., Lashkari, A.H., Ghorbani, A.A.: Towards a reliable intrusion detection benchmark dataset. Softw. Netw. 2018(1), 177–200 (2018)

    Google Scholar 

  37. Sinclair, C., Pierce, L., Matzner, S.: An application of machine learning to network intrusion detection. In: Proceedings 15th Annual Computer Security Applications Conference (ACSAC 1999), pp. 371–377 (1999). https://doi.org/10.1109/CSAC.1999.816048

  38. Szumelda, P., Orzechowski, N., Rawski, M., Janicki, A.: VHS-22-a very heterogeneous set of network traffic data for threat detection. In: Proceedings of the 2022 European Interdisciplinary Cybersecurity Conference, pp. 72–78 (2022)

    Google Scholar 

  39. Uyar, A., Bener, A., Ciray, H.N., Bahceci, M.: A frequency based encoding technique for transformation of categorical variables in mixed IVF dataset. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 6214–6217. IEEE (2009)

    Google Scholar 

  40. Zaman, M., Lung, C.H.: Evaluation of machine learning techniques for network intrusion detection. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, pp. 1–5 (2018). https://doi.org/10.1109/NOMS.2018.8406212

  41. Zhenqi, W., Xinyu, W.: NetFlow based intrusion detection system. In: 2008 International Conference on MultiMedia and Information Technology, pp. 825–828 (2008). https://doi.org/10.1109/MMIT.2008.213

Download references

Acknowledgements

This research is funded under the Horizon 2020 APPRAISE Project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101021981.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Pawlicki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pawlicki, M., Pawlicka, A., Kozik, R., Choraś, M. (2023). How to Boost Machine Learning Network Intrusion Detection Performance with Encoding Schemes. In: Saeed, K., Dvorský, J., Nishiuchi, N., Fukumoto, M. (eds) Computer Information Systems and Industrial Management. CISIM 2023. Lecture Notes in Computer Science, vol 14164. Springer, Cham. https://doi.org/10.1007/978-3-031-42823-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42823-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42822-7

  • Online ISBN: 978-3-031-42823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics