Skip to main content

Navigating IoT Complexity: Developing Datasets for Smart-Home Device Interactions

  • Conference paper
  • First Online:
Complex, Intelligent and Software Intensive Systems (CISIS 2024)

Abstract

In the dynamic realm of modern technology, the rapid growth of Internet of Things (IoT) devices introduces different challenges in considering network security and reliability. However, the different nature of IoT environments complicates the task for network operators and security experts, who must face increasingly sophisticated threats. Additionally, relying only on network traffic to detect user actions presents some problems. The complexity of IoT environments and the variability of user actions make the distinctions between legitimate activities and threats difficult to track. Recently, Machine Learning techniques have arising as a way to identify threats in networking systems. Even if such techniques are very powerful, they relies on reliable datasets able to collect examples of both licit and malicious traffic. However, often datasets are limited in the number of examples collected and in the documentation of the way in which the traffic was monitored, moreover, labelling is not always reliable. Accordingly, this paper delineates the development of a procedure to generate datasets utilizing a dedicated test bed to capture user actions associated with smart-home IoT devices. Unlike most datasets in the literature, this paper aims at offering a way to easily collect and label continuously produced data, generating datasets enriched with detailed descriptions of each device involved in traffic generation. We believe that this paper offers a first step in the direction of systematic production of datasets, more suitable for the efficient use of machine learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/VSecLab/V-IoT-Dataset.

References

  1. Alrawi, O., Lever, C., Antonakakis, M., Monrose, F.: SoK: security evaluation of home-based IoT deployments. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 1362–1380. IEEE (2019)

    Google Scholar 

  2. Booij, T.M., Chiscop, I., Meeuwissen, E., Moustafa, N., den Hartog, F.T.H.: Ton IoT: the role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion data sets. IEEE Internet of Things J. 9(1), 485–496 (2022)

    Google Scholar 

  3. Catillo, M., Pecchia, A., Rak, M., Villano, U.: Demystifying the role of public intrusion datasets: a replication study of dos network traffic data. Comput. Secur 108, 102341 (2021)

    Article  MATH  Google Scholar 

  4. Conti, M., Mancini, L.V., Spolaor, R., Verde, N.V.: Can’t you hear me knocking: identification of user actions on android apps via traffic analysis. In: Proceedings of the 5th ACM Conference on Data and Application Security and Privacy (CODASPY 2015), pp. 297–304. Association for Computing Machinery, New York (2015)

    Google Scholar 

  5. Ferrag, M.A., Friha, O., Hamouda, D., Maglaras, L., Janicke, H.: Edge-iiotset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications: centralized and federated learning (2022)

    Google Scholar 

  6. Ficco, M., Granata, D., Palmieri, F., Rak, M.: A systematic approach for threat and vulnerability analysis of unmanned aerial vehicles. Internet of Things (Netherlands) 26, 101180 (2024)

    Google Scholar 

  7. Fomichev, M., Álvarez, F., Steinmetzer, D., Gardner-Stephen, P., Hollick, M.: Survey and systematization of secure device pairing. IEEE Commun. Surv. Tutor. 20(1), 517–550 (2017)

    Article  Google Scholar 

  8. Garcia, S., Parmisano, A., Erquiaga, M.J.: IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic (Version 1.0.0) (2020)

    Google Scholar 

  9. Granata, D., Rak, M., Salzillo, G., Barbato, U.: Security in IoT Pairing and Authentication Protocols, a Threat Model and a Case Study Analysis, vol. 2940, pp. 207–218. CEUR-WS (2021)

    Google Scholar 

  10. Guerra-Manzanares, A., Medina-Galindo, J., Bahsi, H., Nõmm, S.: Medbiot: generation of an IoT botnet dataset in a medium-sized IoT network. In: Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP), vol. 1, pp. 207–218 (2020)

    Google Scholar 

  11. Hindy, H., Tachtatzis, C., Atkinson, R., Bayne, E., Bellekens, X.: Mqtt-iot-ids2020: Mqtt internet of things intrusion detection dataset (2020)

    Google Scholar 

  12. Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Futur. Gener. Comput. Syst. 100, 779–796 (2019)

    Article  Google Scholar 

  13. Liu, Z., Thapa, N., Shaver, A., Roy, K., Yuan, X., Khorsandroo, S.: Anomaly detection on IoT network intrusion using machine learning. In: 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), pp. 1–5 (2020)

    Google Scholar 

  14. Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Breitenbacher, D., Shabtai, A.: N-baiot. UCI Machine Learning Repository, Detection of iot Botnet Attacks (2018)

    Google Scholar 

  15. Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv preprint arXiv:1802.09089 (2018)

  16. Neto, E.C.P., et al.: Ghorbani. Ciciot2023: a real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors, 23(13) (2023)

    Google Scholar 

  17. Rak, M., Salzillo, G., Granata, D.: Esseca: an automated expert system for threat modelling and penetration testing for IoT ecosystems. Comput. Electric. Eng. 99, 107721 (2022)

    Google Scholar 

  18. Salzillo, G., Rak, M., Moretta, F.: Threat modeling based penetration testing: the open energy monitor case study. In: 13th International Conference on Security of Information and Networks (SIN 2020). Association for Computing Machinery, New York (2021)

    Google Scholar 

  19. Sarhan, M., Layeghy, S., Portmann, M.: Evaluating standard feature sets towards increased generalisability and explainability of ml-based network intrusion detection. Big Data Res. 30, 100359 (2022)

    Article  Google Scholar 

  20. Sivanathan, A., et al.: Classifying IoT devices in smart environments using network traffic characteristics. IEEE Trans. Mobile Comput. (2018)

    Google Scholar 

  21. Teixeira, M.A., Salman, T., Zolanvari, M., Jain, R., Meskin, N., Samaka, M.: Scada system testbed for cybersecurity research using machine learning approach. Future Internet 10, 76 (2018)

    Article  Google Scholar 

  22. Vaccari, I., Chiola, G., Aiello, M., Mongelli, M., Cambiaso, E.: MQTTSET, a new dataset for machine learning techniques on MQTT. Sensors 20(22), 6578 (2020)

    Google Scholar 

Download references

Acknowledgment

This work was partially supported by the UPSIDE project (B63D23000820004) and the project DEFEDGE (E53D23016380001) under the PRIN program funded by the Italian MUR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele Granata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rak, M., Granata, D., Esposito, A., Ferretti, A. (2024). Navigating IoT Complexity: Developing Datasets for Smart-Home Device Interactions. In: Barolli, L. (eds) Complex, Intelligent and Software Intensive Systems. CISIS 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 87. Springer, Cham. https://doi.org/10.1007/978-3-031-70011-8_41

Download citation

Publish with us

Policies and ethics