Abstract
Network Intrusion Detection is one of the major components of maintaining cybersecurity. This is especially crucial in Soft Targets, important places which are easily accessible, and thus more vulnerable. Real-time machine-learning-based network intrusion detection is an increasingly more relevant field of study offering important benefits to the practice of securing against cyberthreats. This paper contributes to this growing body of research by evaluating one of the problems prevailing in all machine-learning-based detectors - the notion of encoding categorical values. The use of different encoding schemes is thoroughly evaluated with the use of three different classifier types, and statistical analysis of the results is performed. The best-performing solution is proposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
About 450m cyberattacks prevented during Tokyo olympics. https://www.aa.com.tr/en/asia-pacific/about-450m-cyberattacks-prevented-during-tokyo-olympics/2383969. Accessed 20 Feb 2023
Criminals hacked a fish tank to steal data from a Casino. https://www.forbes.com/sites/leemathews/2017/07/27/criminals-hacked-a-fish-tank-to-steal-data-from-a-casino/#3bc82bd032b9. Accessed 20 Feb 2023
Cyberattackers make waves in hotel swimming pool controls. https://www.darkreading.com/attacks-breaches/breached-controllers-let-attackers-breach-hotel-pools-in-israel. Accessed 20 Feb 2023
Data breach affects 63 Landry’s restaurants — threatpost. https://threatpost.com/data-breach-affects-63-landrys-restaurants/151503/. Accessed 20 Feb 2023
Lessons learned from Oldsmar water plant hack – security today. https://securitytoday.com/articles/2021/04/05/lessons-learned-from-oldsmar-water-plant-hack.aspx. Accessed 20 Feb 2023
Ransomware attack on smarterasp.net impact 440,000 customers - cybersecurity insiders. https://www.cybersecurity-insiders.com/ransomware-attack-on-smarterasp-net-impact-440000-customers/. Accessed 20 Feb 2023
Update: UHS health system confirms all us sites affected by ransomware attack. https://healthitsecurity.com/news/uhs-health-system-confirms-all-us-sites-affected-by-ransomware-attack. Accessed 20 Feb 2023
Ahmad, T., Aziz, M.N.: Data preprocessing and feature selection for machine learning intrusion detection systems. ICIC Express Lett. 13(2), 93–101 (2019)
Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., Ahmad, F.: Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 32(1), e4150 (2021)
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection (2010)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Chowdhury, M.N., Ferens, K., Ferens, M.: Network intrusion detection using machine learning. In: Proceedings of the International Conference on Security and Management (SAM), p. 30. The Steering Committee of The World Congress in Computer Science, Computer ... (2016)
Davis, J.J., Clark, A.J.: Data preprocessing for anomaly based network intrusion detection: a review. Comput. Secur. 30(6–7), 353–375 (2011)
Dias, L., Valente, S., Correia, M.: Go with the flow: clustering dynamically-defined NetFlow features for network intrusion detection with DynIDS. In: 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA), pp. 1–10 (2020). https://doi.org/10.1109/NCA51143.2020.9306732
Dutta, V., Choras, M., Pawlicki, M., Kozik, R.: Detection of cyberattacks traces in IoT data. J. Univers. Comput. Sci. 26(11), 1422–1434 (2020)
Elmasry, W., Akbulut, A., Zaim, A.H.: Evolving deep learning architectures for network intrusion detection using a double PSO metaheuristic. Comput. Netw. 168, 107042 (2020). https://doi.org/10.1016/j.comnet.2019.107042. https://www.sciencedirect.com/science/article/pii/S138912861930800X
Fix, E.: Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, vol. 1. USAF School of Aviation Medicine (1985)
Gamage, S., Samarabandu, J.: Deep learning methods in network intrusion detection: a survey and an objective comparison. J. Netw. Comput. Appl. 169, 102767 (2020). https://doi.org/10.1016/j.jnca.2020.102767. https://www.sciencedirect.com/science/article/pii/S1084804520302411
Hancock, J.T., Khoshgoftaar, T.M.: Survey on categorical data for neural networks. J. Big Data 7(1), 1–41 (2020). https://doi.org/10.1186/s40537-020-00305-w
Hassanzadeh, A., et al.: A review of cybersecurity incidents in the water sector. J. Environ. Eng. 146(5), 03120003 (2020)
Hofstede, R., Bartoš, V., Sperotto, A., Pras, A.: Towards real-time intrusion detection for NetFlow and IPFIX. In: Proceedings of the 9th International Conference on Network and Service Management (CNSM 2013), pp. 227–234 (2013). https://doi.org/10.1109/CNSM.2013.6727841
Jackson, E., Agrawal, R.: Performance evaluation of different feature encoding schemes on cybersecurity logs. In: 2019 SoutheastCon, pp. 1–9 (2019). https://doi.org/10.1109/SoutheastCon42311.2019.9020560
Jo, W., Kim, S., Lee, C., Shon, T.: Packet preprocessing in CNN-based network intrusion detection system. Electronics 9(7), 1151 (2020)
Komisarek, M., Pawlicki, M., Kozik, R., Hołubowicz, W., Choraś, M.: How to effectively collect and process network data for intrusion detection? Entropy 23(11), 1532 (2021)
Kosaraju, N., Sankepally, S.R., Mallikharjuna Rao, K.: Categorical data: need, encoding, selection of encoding method and its emergence in machine learning models–a practical review study on heart disease prediction dataset using Pearson correlation. In: Saraswat, M., Chowdhury, C., Kumar Mandal, C., Gandomi, A.H. (eds.) ICDSA 2022, vol. 1, pp. 369–382. Springer, Singapore (2023). https://doi.org/10.1007/978-981-19-6631-6_26
Leung, H., Haykin, S.: The complex backpropagation algorithm. IEEE Trans. Signal Process. 39(9), 2101–2104 (1991)
Li, J., Qu, Y., Chao, F., Shum, H.P.H., Ho, E.S.L., Yang, L.: Machine Learning Algorithms for Network Intrusion Detection, pp. 151–179. Springer, Cham (2019)
Mihailescu, M.E., et al.: The proposition and evaluation of the RoeduNet-SIMARGL2021 network intrusion detection dataset. Sensors 21(13), 4319 (2021)
Pawlicka, A., Choraś, M., Pawlicki, M., Kozik, R.: A \$10 million question and other cybersecurity-related ethical dilemmas amid the COVID-19 pandemic. Bus. Horiz. 64(6), 729–734 (2021)
Pawlicki, M., Choraś, M., Kozik, R., Hołubowicz, W.: On the impact of network data balancing in cybersecurity applications. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 196–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_15
Pawlicki, M., Kozik, R., Choraś, M.: A survey on neural networks for (cyber-) security and (cyber-) security of neural networks. Neurocomputing 500, 1075–1087 (2022)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Potdar, K., Pardawala, T.S., Pai, C.D.: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 175(4), 7–9 (2017)
Sarhan, M., Layeghy, S., Moustafa, N., Portmann, M.: NetFlow datasets for machine learning-based network intrusion detection systems. In: Deze, Z., Huang, H., Hou, R., Rho, S., Chilamkurti, N. (eds.) BDTA/WiCON 2020. LNICST, vol. 371, pp. 117–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72802-1_9
Sarhan, M., Layeghy, S., Portmann, M.: Towards a standard feature set for network intrusion detection system datasets. Mobile Netw. Appl. 27, 357–370 (2022)
Sharafaldin, I., Gharib, A., Lashkari, A.H., Ghorbani, A.A.: Towards a reliable intrusion detection benchmark dataset. Softw. Netw. 2018(1), 177–200 (2018)
Sinclair, C., Pierce, L., Matzner, S.: An application of machine learning to network intrusion detection. In: Proceedings 15th Annual Computer Security Applications Conference (ACSAC 1999), pp. 371–377 (1999). https://doi.org/10.1109/CSAC.1999.816048
Szumelda, P., Orzechowski, N., Rawski, M., Janicki, A.: VHS-22-a very heterogeneous set of network traffic data for threat detection. In: Proceedings of the 2022 European Interdisciplinary Cybersecurity Conference, pp. 72–78 (2022)
Uyar, A., Bener, A., Ciray, H.N., Bahceci, M.: A frequency based encoding technique for transformation of categorical variables in mixed IVF dataset. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 6214–6217. IEEE (2009)
Zaman, M., Lung, C.H.: Evaluation of machine learning techniques for network intrusion detection. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, pp. 1–5 (2018). https://doi.org/10.1109/NOMS.2018.8406212
Zhenqi, W., Xinyu, W.: NetFlow based intrusion detection system. In: 2008 International Conference on MultiMedia and Information Technology, pp. 825–828 (2008). https://doi.org/10.1109/MMIT.2008.213
Acknowledgements
This research is funded under the Horizon 2020 APPRAISE Project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101021981.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pawlicki, M., Pawlicka, A., Kozik, R., Choraś, M. (2023). How to Boost Machine Learning Network Intrusion Detection Performance with Encoding Schemes. In: Saeed, K., Dvorský, J., Nishiuchi, N., Fukumoto, M. (eds) Computer Information Systems and Industrial Management. CISIM 2023. Lecture Notes in Computer Science, vol 14164. Springer, Cham. https://doi.org/10.1007/978-3-031-42823-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-42823-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42822-7
Online ISBN: 978-3-031-42823-4
eBook Packages: Computer ScienceComputer Science (R0)