How to Boost Machine Learning Network Intrusion Detection Performance with Encoding Schemes

Pawlicki, Marek; Pawlicka, Aleksandra; Kozik, Rafał; Choraś, Michał

doi:10.1007/978-3-031-42823-4_21

Marek Pawlicki^11,13,
Aleksandra Pawlicka^11,12,
Rafał Kozik^11,13 &
…
Michał Choraś^11,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14164))

Included in the following conference series:

International Conference on Computer Information Systems and Industrial Management

Abstract

Network Intrusion Detection is one of the major components of maintaining cybersecurity. This is especially crucial in Soft Targets, important places which are easily accessible, and thus more vulnerable. Real-time machine-learning-based network intrusion detection is an increasingly more relevant field of study offering important benefits to the practice of securing against cyberthreats. This paper contributes to this growing body of research by evaluating one of the problems prevailing in all machine-learning-based detectors - the notion of encoding categorical values. The use of different encoding schemes is thoroughly evaluated with the use of three different classifier types, and statistical analysis of the results is performed. The best-performing solution is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

About 450m cyberattacks prevented during Tokyo olympics. https://www.aa.com.tr/en/asia-pacific/about-450m-cyberattacks-prevented-during-tokyo-olympics/2383969. Accessed 20 Feb 2023
Criminals hacked a fish tank to steal data from a Casino. https://www.forbes.com/sites/leemathews/2017/07/27/criminals-hacked-a-fish-tank-to-steal-data-from-a-casino/#3bc82bd032b9. Accessed 20 Feb 2023
Cyberattackers make waves in hotel swimming pool controls. https://www.darkreading.com/attacks-breaches/breached-controllers-let-attackers-breach-hotel-pools-in-israel. Accessed 20 Feb 2023
Data breach affects 63 Landry’s restaurants — threatpost. https://threatpost.com/data-breach-affects-63-landrys-restaurants/151503/. Accessed 20 Feb 2023
Lessons learned from Oldsmar water plant hack – security today. https://securitytoday.com/articles/2021/04/05/lessons-learned-from-oldsmar-water-plant-hack.aspx. Accessed 20 Feb 2023
Ransomware attack on smarterasp.net impact 440,000 customers - cybersecurity insiders. https://www.cybersecurity-insiders.com/ransomware-attack-on-smarterasp-net-impact-440000-customers/. Accessed 20 Feb 2023
Update: UHS health system confirms all us sites affected by ransomware attack. https://healthitsecurity.com/news/uhs-health-system-confirms-all-us-sites-affected-by-ransomware-attack. Accessed 20 Feb 2023
Ahmad, T., Aziz, M.N.: Data preprocessing and feature selection for machine learning intrusion detection systems. ICIC Express Lett. 13(2), 93–101 (2019)
Google Scholar
Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., Ahmad, F.: Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 32(1), e4150 (2021)
Google Scholar
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection (2010)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Chowdhury, M.N., Ferens, K., Ferens, M.: Network intrusion detection using machine learning. In: Proceedings of the International Conference on Security and Management (SAM), p. 30. The Steering Committee of The World Congress in Computer Science, Computer ... (2016)
Google Scholar
Davis, J.J., Clark, A.J.: Data preprocessing for anomaly based network intrusion detection: a review. Comput. Secur. 30(6–7), 353–375 (2011)
Google Scholar
Dias, L., Valente, S., Correia, M.: Go with the flow: clustering dynamically-defined NetFlow features for network intrusion detection with DynIDS. In: 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA), pp. 1–10 (2020). https://doi.org/10.1109/NCA51143.2020.9306732
Dutta, V., Choras, M., Pawlicki, M., Kozik, R.: Detection of cyberattacks traces in IoT data. J. Univers. Comput. Sci. 26(11), 1422–1434 (2020)
Article Google Scholar
Elmasry, W., Akbulut, A., Zaim, A.H.: Evolving deep learning architectures for network intrusion detection using a double PSO metaheuristic. Comput. Netw. 168, 107042 (2020). https://doi.org/10.1016/j.comnet.2019.107042. https://www.sciencedirect.com/science/article/pii/S138912861930800X
Fix, E.: Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, vol. 1. USAF School of Aviation Medicine (1985)
Google Scholar
Gamage, S., Samarabandu, J.: Deep learning methods in network intrusion detection: a survey and an objective comparison. J. Netw. Comput. Appl. 169, 102767 (2020). https://doi.org/10.1016/j.jnca.2020.102767. https://www.sciencedirect.com/science/article/pii/S1084804520302411
Hancock, J.T., Khoshgoftaar, T.M.: Survey on categorical data for neural networks. J. Big Data 7(1), 1–41 (2020). https://doi.org/10.1186/s40537-020-00305-w
Article Google Scholar
Hassanzadeh, A., et al.: A review of cybersecurity incidents in the water sector. J. Environ. Eng. 146(5), 03120003 (2020)
Article Google Scholar
Hofstede, R., Bartoš, V., Sperotto, A., Pras, A.: Towards real-time intrusion detection for NetFlow and IPFIX. In: Proceedings of the 9th International Conference on Network and Service Management (CNSM 2013), pp. 227–234 (2013). https://doi.org/10.1109/CNSM.2013.6727841
Jackson, E., Agrawal, R.: Performance evaluation of different feature encoding schemes on cybersecurity logs. In: 2019 SoutheastCon, pp. 1–9 (2019). https://doi.org/10.1109/SoutheastCon42311.2019.9020560
Jo, W., Kim, S., Lee, C., Shon, T.: Packet preprocessing in CNN-based network intrusion detection system. Electronics 9(7), 1151 (2020)
Article Google Scholar
Komisarek, M., Pawlicki, M., Kozik, R., Hołubowicz, W., Choraś, M.: How to effectively collect and process network data for intrusion detection? Entropy 23(11), 1532 (2021)
Article Google Scholar
Kosaraju, N., Sankepally, S.R., Mallikharjuna Rao, K.: Categorical data: need, encoding, selection of encoding method and its emergence in machine learning models–a practical review study on heart disease prediction dataset using Pearson correlation. In: Saraswat, M., Chowdhury, C., Kumar Mandal, C., Gandomi, A.H. (eds.) ICDSA 2022, vol. 1, pp. 369–382. Springer, Singapore (2023). https://doi.org/10.1007/978-981-19-6631-6_26
Chapter Google Scholar
Leung, H., Haykin, S.: The complex backpropagation algorithm. IEEE Trans. Signal Process. 39(9), 2101–2104 (1991)
Article Google Scholar
Li, J., Qu, Y., Chao, F., Shum, H.P.H., Ho, E.S.L., Yang, L.: Machine Learning Algorithms for Network Intrusion Detection, pp. 151–179. Springer, Cham (2019)
Google Scholar
Mihailescu, M.E., et al.: The proposition and evaluation of the RoeduNet-SIMARGL2021 network intrusion detection dataset. Sensors 21(13), 4319 (2021)
Article Google Scholar
Pawlicka, A., Choraś, M., Pawlicki, M., Kozik, R.: A \$10 million question and other cybersecurity-related ethical dilemmas amid the COVID-19 pandemic. Bus. Horiz. 64(6), 729–734 (2021)
Article Google Scholar
Pawlicki, M., Choraś, M., Kozik, R., Hołubowicz, W.: On the impact of network data balancing in cybersecurity applications. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 196–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_15
Chapter Google Scholar
Pawlicki, M., Kozik, R., Choraś, M.: A survey on neural networks for (cyber-) security and (cyber-) security of neural networks. Neurocomputing 500, 1075–1087 (2022)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Potdar, K., Pardawala, T.S., Pai, C.D.: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 175(4), 7–9 (2017)
Google Scholar
Sarhan, M., Layeghy, S., Moustafa, N., Portmann, M.: NetFlow datasets for machine learning-based network intrusion detection systems. In: Deze, Z., Huang, H., Hou, R., Rho, S., Chilamkurti, N. (eds.) BDTA/WiCON 2020. LNICST, vol. 371, pp. 117–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72802-1_9
Chapter Google Scholar
Sarhan, M., Layeghy, S., Portmann, M.: Towards a standard feature set for network intrusion detection system datasets. Mobile Netw. Appl. 27, 357–370 (2022)
Article Google Scholar
Sharafaldin, I., Gharib, A., Lashkari, A.H., Ghorbani, A.A.: Towards a reliable intrusion detection benchmark dataset. Softw. Netw. 2018(1), 177–200 (2018)
Google Scholar
Sinclair, C., Pierce, L., Matzner, S.: An application of machine learning to network intrusion detection. In: Proceedings 15th Annual Computer Security Applications Conference (ACSAC 1999), pp. 371–377 (1999). https://doi.org/10.1109/CSAC.1999.816048
Szumelda, P., Orzechowski, N., Rawski, M., Janicki, A.: VHS-22-a very heterogeneous set of network traffic data for threat detection. In: Proceedings of the 2022 European Interdisciplinary Cybersecurity Conference, pp. 72–78 (2022)
Google Scholar
Uyar, A., Bener, A., Ciray, H.N., Bahceci, M.: A frequency based encoding technique for transformation of categorical variables in mixed IVF dataset. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 6214–6217. IEEE (2009)
Google Scholar
Zaman, M., Lung, C.H.: Evaluation of machine learning techniques for network intrusion detection. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, pp. 1–5 (2018). https://doi.org/10.1109/NOMS.2018.8406212
Zhenqi, W., Xinyu, W.: NetFlow based intrusion detection system. In: 2008 International Conference on MultiMedia and Information Technology, pp. 825–828 (2008). https://doi.org/10.1109/MMIT.2008.213

Download references

Acknowledgements

This research is funded under the Horizon 2020 APPRAISE Project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101021981.

Author information

Authors and Affiliations

ITTI, Poznań, Poland
Marek Pawlicki, Aleksandra Pawlicka, Rafał Kozik & Michał Choraś
University of Warsaw, Warsaw, Poland
Aleksandra Pawlicka
Bydgoszcz University of Science and Technology, Bydgoszcz, Poland
Marek Pawlicki, Rafał Kozik & Michał Choraś

Authors

Marek Pawlicki
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Pawlicka
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Kozik
View author publications
You can also search for this author in PubMed Google Scholar
Michał Choraś
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Pawlicki .

Editor information

Editors and Affiliations

Bialystok University of Technology, Białystok, Poland
Khalid Saeed
VSB - Technical University of Ostrava, Ostrava, Czech Republic
Jiří Dvorský
Tokyo Metropolitan University, Tokyo, Japan
Nobuyuki Nishiuchi
Fukuoka Institute of Technology, Fukuoka, Japan
Makoto Fukumoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pawlicki, M., Pawlicka, A., Kozik, R., Choraś, M. (2023). How to Boost Machine Learning Network Intrusion Detection Performance with Encoding Schemes. In: Saeed, K., Dvorský, J., Nishiuchi, N., Fukumoto, M. (eds) Computer Information Systems and Industrial Management. CISIM 2023. Lecture Notes in Computer Science, vol 14164. Springer, Cham. https://doi.org/10.1007/978-3-031-42823-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-42823-4_21
Published: 15 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42822-7
Online ISBN: 978-3-031-42823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

How to Boost Machine Learning Network Intrusion Detection Performance with Encoding Schemes