ABSTRACT
Recent decades saw the development of a plethora of approaches that aim to use artificial intelligence to detect anomalies and potential signs of compromise in a computer network. These approaches have commonly been trained and evaluated using only a small number of datasets, which were often criticised in literature. Developing new datasets for this purpose tends to be very resource consuming, as they usually rely on testbeds and network emulation. While this level of details is important for anomaly detection over network traffic, which inspects details of network packets, it is superfluous in cases when such algorithms work with logs of security controls, such as in SIEM systems and approaches for alert correlation. Moreover, evaluation over a testbed generated dataset may not be relevant for the target IT system. In this paper, we propose a light-weight method to enrich existing security control logs with carefully crafted synthetic records that would be produced in case of cyber attacks. This method does not require running a dedicated testbed or comparable specialized equipment. We prepare a set of attack records with emphasis on network scans, and perform experiments with real-world firewall logs and several common anomaly detection algorithms to demonstrate that the injected records are appropriately integrated into the original logs. In the end, we propose future experiments to properly validate the quality of the datasets produced using the proposed method.
- Monowar H Bhuyan, Dhruba K Bhattacharyya, and Jugal K Kalita. 2015. Towards Generating Real-life Datasets for Network Intrusion Detection. Int. J. Netw. Secur. 17, 6 (2015), 683--701.Google Scholar
- Nathaniel Boggs, Hang Zhao, Senyao Du, and Salvatore J Stolfo. 2014. Synthetic data generation and defense in depth measurement of web applications. In International Workshop on Recent Advances in Intrusion Detection. Springer, 234--254.Google ScholarCross Ref
- Carson Brown, Alex Cowperthwaite, Abdurrahman Hijazi, and Anil Somayaji. 2009. Analysis of the 1999 darpa/lincoln laboratory ids evaluation data with netadhict. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, 1--7.Google ScholarCross Ref
- Amirhossein Gharib, Iman Sharafaldin, Arash Habibi Lashkari, and Ali A Ghorbani. 2016. An evaluation framework for intrusion detection dataset. In 2016 International Conference on Information Science and Security (ICISS). IEEE, 1--6.Google ScholarCross Ref
- Thomas Göbel, Thomas Schäfer, Julien Hachenberger, Jan Türr, and Harald Baier. 2020. A Novel approach for generating synthetic datasets for digital forensics. In IFIP International Conference on Digital Forensics. Springer, 73--93.Google ScholarCross Ref
- Waqas Haider, Jiankun Hu, Jill Slay, Benjamin P Turnbull, and Yi Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. Journal of Network and Computer Applications 87 (2017), 185--192.Google ScholarDigital Library
- Zengyou He, Xiaofei Xu, and Shengchun Deng. 2003. Discovering cluster-based local outliers. Pattern recognition letters 24, 9--10 (2003), 1641--1650.Google Scholar
- Ivan Kovačević, Stjepan Groš, and Karlo Slovenec. 2020. Systematic review and quantitative comparison of cyberattack scenario detection and projection. Electronics 9, 10 (2020), 1722.Google ScholarCross Ref
- Ivan Kovačević. 2023. Firewall log PCAP injection. Google ScholarCross Ref
- Jiazhong Lu, Fengmao Lv, Zhongliu Zhuo, Xiaosong Zhang, Xiaolei Liu, Teng Hu, and Wei Deng. 2019. Integrating traffics with network device logs for anomaly detection. Security and Communication Networks 2019 (2019).Google Scholar
- Gordon Fyodor Lyon. 2008. Nmap network scanning: The official Nmap project guide to network discovery and security scanning. Insecure. Com LLC (US).Google Scholar
- Michael McFail, Jordan Hanna, and Daniel Rebori-Carretero. 2022. Detection Engineering in Industrial Control Systems. Ukraine 2016 Attack: Sandworm Team and Industroyer Case Study. Technical Report. MITRE CORP MCLEAN VA.Google Scholar
- Nour Moustafa and Jill Slay. 2015. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 military communications and information systems conference (MilCIS). IEEE, 1--6.Google Scholar
- Alberto Mozo, Ángel González-Prieto, Antonio Pastor, Sandra Gómez-Canaval, and Edgar Talavera. 2022. Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks. Scientific reports 12, 1 (2022), 1--27.Google Scholar
- Sowmya Myneni, Ankur Chowdhary, Abdulhakim Sabur, Sailik Sengupta, Garima Agrawal, Dijiang Huang, and Myong Kang. 2020. DAPT 2020-constructing a benchmark dataset for advanced persistent threats. In International Workshop on Deployable Machine Learning for Security Defense. Springer, 138--163.Google Scholar
- OffSec Services Limited. 2022. Kali Docs. https://www.kali.org/docs/ [Online; accessed 16-December-2022].Google Scholar
- Stephen O'Shaughnessy and Geraldine Gray. 2011. Development and evaluation of a dataset generator tool for generating synthetic log files containing computer attack signatures. International Journal of Ambient Computing and Intelligence (IJACI) 3, 2 (2011), 64--76.Google ScholarDigital Library
- C Madhusudhana Rao and MM Naidu. 2017. A model for generating synthetic network flows and accuracy index for evaluation of anomaly network intrusion detection systems. Indian Journal of Science and Technology 10, 14 (2017).Google Scholar
- Saeed Salah, Gabriel Maciá-Fernández, and Jesús E Díaz-Verdejo. 2013. A model-based survey of alert correlation techniques. Computer Networks 57, 5 (2013), 1289--1317.Google ScholarDigital Library
- Iman Sharafaldin, Arash Habibi Lashkari, and Ali A Ghorbani. 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1 (2018), 108--116.Google Scholar
- Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security 31, 3 (2012), 357--374.Google Scholar
- Florian Skopik, Giuseppe Settanni, Roman Fiedler, and Ivo Friedberg. 2014. Semi-synthetic data set generation for security software evaluation. In 2014 Twelfth Annual International Conference on Privacy, Security and Trust. IEEE, 156--163.Google ScholarCross Ref
- Mahito Sugiyama and Karsten Borgwardt. 2013. Rapid distance-based outlier detection via sampling. Advances in neural information processing systems 26 (2013).Google Scholar
- The MITRE Corporation. 2022. CRASHOVERRIDE: Analysis of the Threat to Electric Grid Operations. https://www.dragos.com/wp-content/uploads/CrashOverride-01.pdf [Online; accessed 16-December-2022].Google Scholar
- The MITRE Corporation. 2022. Industroyer. https://attack.mitre.org/software/S0604/ [Online; accessed 16-December-2022].Google Scholar
- Markus Wurzenberger, Florian Skopik, Giuseppe Settanni, and Wolfgang Scherrer. 2016. Complex log file synthesis for rapid sandbox-benchmarking of security-and computer network analysis tools. Information Systems 60 (2016), 13--33.Google ScholarDigital Library
- Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of Machine Learning Research 20, 96 (2019), 1--7. http://jmlr.org/papers/v20/19-011.htmlGoogle Scholar
- Richard Zuech, Taghi M Khoshgoftaar, Naeem Seliya, Maryam M Najafabadi, and Clifford Kemp. 2015. A new intrusion detection benchmarking system. In The Twenty-Eighth International Flairs Conference.Google Scholar
Index Terms
- Light-Weight Synthesis of Security Logs for Evaluation of Anomaly Detection and Security Related Experiments
Recommendations
Security Protocols Protection Based on Anomaly Detection
Security protocols flaws represent a substantial portion of security exposures of data networks. In order to evaluate security protocols against any attack, formal methods are equipped with a number of techniques. Unfortunately, formal methods are ...
Government regulations in cyber security: Framework, standards and recommendations
AbstractCyber security refers to the protection of Internet-connected systems, such as hardware, software as well as data (information) from cyber attacks (adversaries). A cyber security regulation is needed in order to protect information ...
Highlights- We list and discuss the cyber attacks, security requirements and measures. We then discuss the cyber security incident management framework and its various ...
A likelihood ratio anomaly detector for identifying within-perimeter computer network attacks
The rapid detection of attackers within firewalls of enterprise computer networks is of paramount importance. Anomaly detectors address this problem by quantifying deviations from baseline statistical models of normal network behavior and signaling an ...
Comments