Abstract
The paper presents the new model of data for Internet of Things (IoT) devices monitoring and anomaly detection. The model bases mostly on behavioral description of current state of device, however it contains also some additional information. Raw input data, coming from the external simulation software, are aggregated on two levels of detail: raw variable values preprocessing and time-based aggregation. It was shown, that a sample data following this model, data that contains anomalies, can be analyzed with standard anomaly detection methods and results of this application are very satisfactory. The data used in the paper are also publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adamczyk, B., Brzȩczek, M., Michalak, M., et al.: Dataset generation framework for evaluation of IoT Linux host-based intrusion detection systems. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 6179–6187 (2022). https://doi.org/10.1109/BigData55660.2022.10020442
CAIDA: Center of applied internet data analysis (1998–2013). https://www.caida.org/catalog/datasets/completed-datasets/. Accessed 16 Mar 2021
Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492, April 2013. https://doi.org/10.1109/WCNC.2013.6555301. ISSN 1558-2612
Czerwiński, M., Michalak, M., Biczyk, P., et al.: Cybersecurity threat detection in the behavior of IoT devices: analysis of data mining competition results. In: 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023 (2023, in press)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Garcia, S., Parmisano, A., Erquiaga, M.J.: IoT-23: a labeled dataset with malicious and benign IoT network traffic, January 2020. https://doi.org/10.5281/zenodo.4743746. More details here https://www.stratosphereips.org/datasets-iot23
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics, April 2017
Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-IoT dataset. arXiv (2018). https://doi.org/10.48550/ARXIV.1811.00701. https://arxiv.org/abs/1811.00701
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1–13 (2010). https://doi.org/10.18637/jss.v036.i11. https://www.jstatsoft.org/index.php/jss/article/view/v036i11
Kyoto_University: Traffic data from Kyoto University’s honeypots (2015). https://www.takakura.com/Kyoto_data/. Accessed 17 Mar 2021
MIT: MIT Lincoln Laboratory - DARPA datasets (1998–1999). https://www.ll.mit.edu/r-d/datasets. Accessed 16 Mar 2021
Moustafa, N., Ahmed, M., Ahmed, S.: Data analytics-enabled intrusion detection: evaluations of ToN_IoT Linux datasets. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 727–735 (2020). https://doi.org/10.1109/TrustCom50675.2020.00100
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015). https://doi.org/10.1109/MilCIS.2015.7348942
Pearson, K.: On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901). https://doi.org/10.1080/14786440109462720
Pearson, K.: Detection of outliers and extreme events of ground level particulate matter using DBSCAN algorithm with local parameters. Water Air Soil Pollut. 233(5), 2003 (2022)
Sangster, B., O’Connor, T.J., Cook, T., et al.: Toward instrumenting network warfare competitions to generate labeled datasets. In: Proceedings of the 2nd Conference on Cyber Security Experimentation and Test, CSET 2009, p. 9. USENIX Association, USA (2009)
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012). https://doi.org/10.1016/j.cose.2011.12.012. https://www.sciencedirect.com/science/article/pii/S0167404811001672
Sperotto, A., Sadre, R., van Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Nunzi, G., Scoglio, C., Li, X. (eds.) IPOM 2009. LNCS, vol. 5843, pp. 39–50. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04968-2_4
Stolfo, S., Fan, W., Lee, W., et al.: Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection: results from the JAM project, September 1999
Stratosphere: Stratosphere laboratory datasets (2020). https://www.stratosphereips.org/datasets-overview. Accessed 15 Mar 2021
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6, July 2009. https://doi.org/10.1109/CISDA.2009.5356528
Thang, T.M., Kim, J.: The anomaly detection by using DBSCAN clustering with multiple parameters. In: 2011 International Conference on Information Science and Applications, pp. 1–5 (2011). https://doi.org/10.1109/ICISA.2011.5772437
Ullah, I., Mahmoud, Q.H.: A technique for generating a botnet dataset for anomalous activity detection in IoT networks. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 134–140 (2020). https://doi.org/10.1109/SMC42975.2020.9283220
Wawrowski, L., Michalak, M., Białas, A., et al.: Detecting anomalies and attacks in network traffic monitoring with classification methods and XAI-based explainability. Procedia Comput. Sci. 192(C), 2259–2268 (2021). https://doi.org/10.1016/j.procs.2021.08.239. https://doi.org/10.1016/j.procs.2021.08.239
Yang, S., Kurose, J., Levine, B.: Disambiguation of residential wired and wireless access in a forensic setting, pp. 360–364, April 2013. https://doi.org/10.1109/INFCOM.2013.6566795
Acknowledgements
The project is financed by the Polish National Centre for Research and Development as part of the fourth CyberSecIdent-Cybersecurity and e-Identity competition (agreement number: CYBERSECIDENT/489240/IV/NCBR/2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Michalak, M. et al. (2023). A New Data Model for Behavioral Based Anomaly Detection in IoT Device Monitoring. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-50959-9_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50958-2
Online ISBN: 978-3-031-50959-9
eBook Packages: Computer ScienceComputer Science (R0)