Abstract
The current computing context has developed important opportunities and challenges by the new attacks that occurred recently due to the pandemic situation (COVID-19), cybersecurity has crossed and still passing through significant changes by the technology and its operation. Many computer security incident response teams (CSIRT) and cybersecurity centers had reported significant behaviors of the attacks and they raised multiple warning signs, some of them being ignored by different third parties and others were taken into consideration and new frameworks started to be translated into research directions as a cross-collaboration between researchers and professionals. As a conclusion of CSIRTs, data science is the leader and gives the tone of the change. Identifying properly the security incident patterns or different types of insights within the cybersecurity data and implementing the right data-driven model, represents the main task is to achieve for an automated and intelligent security system. In this paper, we will propose a machine learning framework for cybersecurity, focusing on data science for cybersecurity, where the data collected from trusted sources t are relevant for cybersecurity. Our work will kickstart discussion on various research challenges which are open for improvements and will also point out the most challenging future research directions. Altogether, our purpose is not limited to discussing data science within the cybersecurity context and relevant methods/algorithms, but also to focus on the applicability of taking the most intelligent decisions based on data to protect the systems against cyber attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hajny, J., Ricci, S., Piesarskas, E., Levillain, O., Galletta, L., De Nicola, R.: Framework, tools and good practices for cybersecurity curricula. IEEE Access 9, 94723–94747 (2021). https://doi.org/10.1109/ACCESS.2021.3093952
Megantara, A.A., Ahmad, T.: A hybrid machine learning method for increasing the performance of network intrusion detection systems. J. Big Data 8(1), 1–19 (2021). https://doi.org/10.1186/s40537-021-00531-w
Sennaike, O.A., et al.: Towards intelligent open data platforms: Discovering relatedness in datasets. In: 2017 Intelligent Systems Conference (IntelliSys), pp. 414-421 (2017). https://doi.org/10.1109/IntelliSys.2017.8324327
Haas, L.: Leveraging data and people to accelerate data science. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), p. 4 (2017). https://doi.org/10.1109/ICDE.2017.9
Tahtaci, B., Canbay, B.: Android malware detection using machine learning. In: 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–6 (2020). https://doi.org/10.1109/ASYU50717.2020.9259834
Firdausi, I., lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 201–203 (2010). https://doi.org/10.1109/ACT.2010.33
Choudhary, S., Sharma, A.: Malware detection & classification using machine learning. In: 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), pp. 1–4 (2020). https://doi.org/10.1109/ICONC345789.2020.9117547
Vanjire, S., Lakshmi, M.: Behavior-based malware detection system approach for mobile security using machine learning. In: 2021 International Conference on Artificial Intelligence and Machine Vision (AIMV), pp. 1–4 (2021). https://doi.org/10.1109/AIMV53313.2021.9671009
Jin, S., Chung, J.-G., Xu, Y.: Signature-based intrusion detection system (IDS) for in-vehicle CAN bus network. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2021). https://doi.org/10.1109/ISCAS51556.2021.9401087
Abri, F., Siami-Namini, S., Khanghah, M.A., Soltani, F.M., Namin, A.S.: Can machine/deep learning classifiers detect zero-day malware with high accuracy? In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3252–3259 (2019). https://doi.org/10.1109/BigData47090.2019.9006514
Qadir, S., Noor, B.: Applications of machine learning in digital forensics. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–8 (2021). https://doi.org/10.1109/ICoDT252288.2021.9441543
L’Heureux, A., Grolinger, K., Elyamany, H.F., Capretz, M.A.M.: Machine learning with big data: challenges and approaches. IEEE Access 5, 7776–7797 (2017). https://doi.org/10.1109/ACCESS.2017.2696365
Verkerken, M., D’hooge, L., Wauters, T., Volckaert, B., De Turck, F.: Unsupervised machine learning techniques for network intrusion detection on modern data. In: 2020 4th Cyber Security in Networking Conference (CSNet), pp. 1–8 (2020). https://doi.org/10.1109/CSNet50428.2020.9265461
Fadhlillah, A., Karna, N., Irawan, A.: IDS performance analysis using anomaly-based detection method for DOS attack. In: 2020 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp. 18-22 (2021). https://doi.org/10.1109/IoTaIS50849.2021.9359719
Chavan, A., Kerakalamatti, K., Srivastva, S.: Implementation of portable antivirus system using signature-based detection and heuristic analysis. In: 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1481–1486 (2021). https://doi.org/10.1109/ICOEI51242.2021.9452909
Min, B., Yoo, J., Kim, S., Shin, D., Shin, D.: Network anomaly detection using memory-augmented deep autoencoder. IEEE Access 9, 104695–104706 (2021). https://doi.org/10.1109/ACCESS.2021.3100087
Sarker, I.H., Kayes, A.S.M., Badsha, S., Alqahtani, H., Watters, P., Ng, A.: Cybersecurity data science: an overview from machine learning perspective. J. Big Data 7(1), 1–29 (2020). https://doi.org/10.1186/s40537-020-00318-5
Sarker, I.H.: Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput. Sci. 2, 377 (2021). https://doi.org/10.1007/s42979-021-00765-8
Maxwell, P., Alhajjar, E., Bastian, N.D.: Intelligent feature engineering for cybersecurity. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 5005–5011 (2019). https://doi.org/10.1109/BigData47090.2019.9006122
Ahsan, M., Rahul Gomes, M., Chowdhury, M., Nygard, K.E.: Enhancing machine learning prediction in cybersecurity using dynamic feature selector. J. Cybersecurity Priv. 1(1), 199–218 (2021). https://doi.org/10.3390/jcp1010011
Mukherjee, S.: Top 10 Breakthroughs in Big Data Science in 2017 (2017). https://www.datacamp.com/community/blog/breakthroughs-big-data-science-2017. Last accessed 22 Jan 2022
Akhmetov, B., Lakhno, V., Akhmetov, B., Alimseitova, Z.: Development of sectoral intellectualized expert systems and decision making support systems in cybersecurity. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds.) Intelligent Systems in Cybernetics and Automation Control Theory, pp. 162–171. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-00184-1_15
Langford, G.O., et al.: Cybersecurity Planning for Artificial Intelligent Systems in Space. In: 2019 Portland International Conference on Management of Engineering and Technology (PICMET), pp. 1–8. IEEE (2019)
Rodriguez, A., Okamura, K.: Cybersecurity text data classification and optimization for CTI systems. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) Web, Artificial Intelligence and Network Applications: Proceedings of the Workshops of the 34th International Conference on Advanced Information Networking and Applications (WAINA-2020), pp. 410–419. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-44038-1_37
Gnatyuk, S., Sydorenko, V., Polozhentsev, A., Fesenko, A., Akatayev, N., Zhilkishbayeva, G.: Method of cybersecurity level determining for the critical information infrastructure of the state. In: COAPSN, pp. 332–341 (2020)
Zhang, S., Xie, X., Xu, Y.: A brute-force black-box method to attack machine learning-based systems in cybersecurity. IEEE Access 8, 128250–128263 (2020)
Teixeira, M.A., Salman, T., Zolanvari, M., Jain, R., Meskin, N., Samaka, M.: SCADA system testbed for cybersecurity research using machine learning approach. Future Internet 10(8), 76 (2018)
Chesney, S., Roy, K., Khorsandroo, S.: Machine learning algorithms for preventing IoT cybersecurity attacks. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 3, pp. 679–686. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-55190-2_53
Hariharan, A., Gupta, A., Pal, T.: Camlpad: cybersecurity autonomous machine learning platform for anomaly detection. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) FICC 2020. AISC, vol. 1130, pp. 705–720. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39442-4_52
Puthran, S., Shah, K.: Intrusion detection using improved decision tree algorithm with binary and quad split. In: Mueller, P., Thampi, S.M., Bhuiyan, M.Z.A., Ko, R., Doss, R., Alcaraz Calero, J.M. (eds.) Security in Computing and Communications: 4th International Symposium, SSCC 2016, Jaipur, India, September 21–24, 2016, Proceedings, pp. 427–438. Springer Singapore, Singapore (2016). https://doi.org/10.1007/978-981-10-2738-3_37
Alves, F., Bettini, A., Ferreira, P.M., Bessani, A.: Processing tweets for cybersecurity threat awareness. Inf. Syst. 95, 101586 (2021)
Sarker, I.H., Abushark, Y.B., Alsolami, F., Khan, A.I.: IntruDTree: a machine learning-based cyber security intrusion detection model. Symmetry 12(5), 754 (2020)
Aliabadi, F., Majidi, M.-H., Khorashadizadeh, S.: Chaos synchronization using adaptive quantum neural networks and its application in secure communication and cryptography. Neural Comput. Appl. 34, 6521–6533 (2021). https://doi.org/10.1007/s00521-021-06768-z
Abubakar, A., Garko, A.B.: A Predictive model for network intrusion detection system using deep neural network. Dutse J. Pure Appl. Sci. 7(3a), 113–128 (2021). https://doi.org/10.4314/dujopas.v7i3a.12
Wang, S., Nie, L., Li, G., Wu, Y., Ning, Z.: A multi-task learning-based network traffic prediction approach for SDN-enabled Industrial Internet of Things. IEEE Trans. Industr. Inf. (2022). https://doi.org/10.1109/TII.2022.3141743
Almohamade, S.S., Clark, J. A., Law, J.: Behaviour-based biometrics for continuous user authentication to industrial collaborative robots. In: Maimut, D., Oprina, A.-G., Sauveron, D. (eds.) SecITC 2020. LNCS, vol. 12596, pp. 185–197. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-69255-1_12
Dasgupta, S., Piplai, A., Kotal, A., Joshi, A.: A comparative study of deep learning based named entity recognition algorithms for cybersecurity. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 2596-2604 (2020). https://doi.org/10.1109/BigData50022.2020.9378482
Li, L., Thakur, K., Ali, M.L.: Potential development on cyberattack and prospect analysis for cybersecurity. In: 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), pp. 1–6 (2020). https://doi.org/10.1109/IEMTRONICS51293.2020.9216374
Fontugne, R., Borgnat, P., Abry, P., Fukuda, K.: MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In: ACM CoNEXT 2010, Philadelphia, PA (2010)
Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., Villano, U.: USB-IDS-1: a public multilayer dataset of labeled network flows for IDS evaluation. In: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 1–6 (2021). https://doi.org/10.1109/DSN-W52860.2021.00012
Mäses, S., Maennel, K., Toussaint, M., Rosa, V.: Success factors for designing a cybersecurity exercise on the example of incident response. In: 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp. 259–268 (2021). https://doi.org/10.1109/EuroSPW54576.2021.00033
Phadke, A., Kulkarni, M., Bhawalkar, P., Bhattad, R.: A review of machine learning methodologies for network intrusion detection. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 272–275 (2019). https://doi.org/10.1109/ICCMC.2019.8819748
Al-Asli, M., Ghaleb, T.A.: Review of signature-based techniques in antivirus products. In: 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1–6 (2019). https://doi.org/10.1109/ICCISci.2019.8716381
Korba, A.A., Nafaa, M., Ghamri-Doudane, Y.: Anomaly-based intrusion detection system for ad hoc networks. In: 2016 7th International Conference on the Network of the Future (NOF), pp. 1–3 (2016). https://doi.org/10.1109/NOF.2016.7810132
Vengatesan, K., Kumar, A., Naik, R., Verma, D.K.: Anomaly based novel intrusion detection system for network traffic reduction. In: 2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2018 2nd International Conference on, pp. 688–690 (2018). https://doi.org/10.1109/I-SMAC.2018.8653735
Kumari, U., Soni, U.: A review of intrusion detection using anomaly based detection. In: 2017 2nd International Conference on Communication and Electronics Systems (ICCES), pp. 824–826 (2017). https://doi.org/10.1109/CESYS.2017.8321199
von Rueden, L., et al.: Informed machine learning - a taxonomy and survey of integrating prior knowledge into learning systems. In: IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2021.3079836
Ligo, A.K., Kott, A., Linkov, I.: Autonomous cyberdefense introduces risk: can we manage the risk? Computer 54(10), 106–110 (2021). https://doi.org/10.1109/MC.2021.3099042
Souza, M.A., Sabourin, R., Cavalcanti, G.D.C., Cruz, R.M.O.: Multi-label learning for dynamic model type recommendation. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–10 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207644.
Puzis, N.S.R., Angappan, K.: Deep learning for threat actor attribution from threat reports. In: 2020 4th International Conference on Computer, Communication and Signal Processing (ICCCSP), pp. 1–6 (2020). https://doi.org/10.1109/ICCCSP49186.2020.9315219
Das, P., Kalbande, D.: Behavioural analysis of multi-source social network data using object-centric behavioural constraints and data mining technique. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8 (2020). https://doi.org/10.1109/ICCCNT49239.2020.9225323
Bokan, B., Santos, J.: Managing cybersecurity risk using threat based methodology for evaluation of cybersecurity architectures. In: 2021 Systems and Information Engineering Design Symposium (SIEDS), pp. 1–6 (2021). https://doi.org/10.1109/SIEDS52267.2021.9483736
Sahakian, M.G., Musuvathy, S., Thorpe, J., Verzi, S., Vugrin, E., Dykstra, M.: Threat data generation for space systems. In: 2021 IEEE Space Computing Conference (SCC), pp. 100–109 (2021). https://doi.org/10.1109/SCC49971.2021.00018
Stergiopoulos, G., Gritzalis, D.A., Limnaios, E.: Cyber-attacks on the oil & gas sector: a survey on incident assessment and attack patterns. IEEE Access 8, 128440–128475 (2020). https://doi.org/10.1109/ACCESS.2020.3007960
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mihailescu, M.I., Nita, S.L. (2023). Towards Data Science for Cybersecurity: Machine Learning Advances as Glowing Perspective. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 543. Springer, Cham. https://doi.org/10.1007/978-3-031-16078-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-16078-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16077-6
Online ISBN: 978-3-031-16078-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)