Abstract
Although deep learning algorithms can achieve high performance, deep models may not learn the right concepts and can easily overfit their training datasets. In the context of IoT devices, the problem is further exacerbated by three factors. First, traffic may be encrypted, allowing very little visibility into the activity of the endpoints. Second, devices with different models and manufacturers may exhibit very different behaviors. Finally, contrary to domains like computer vision or natural language processing, there is no well-accepted representation for the network data that characterizes IoT devices. In this work, we capture real network traffic from different environments, and we demonstrate that training models to detect specific classes of IoT devices (e.g., cameras) using state-of-the-art techniques can lead to overfitting, and very poor performance on independent datasets. However, we then show that by applying domain knowledge, one can manually define engineered features and train simple models (e.g., a decision tree) that achieve an F-1 score of 0.956 on an independent dataset. These results show the feasibility of training generalizable models, but at the same time, raise questions on how best to transform and represent the raw network data to train classifiers for other classes of IoT devices (e.g., hubs, motion sensors) while minimizing manual feature engineering. We elaborate on the challenges, drawing analogies with other fields such as natural language processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hautala, L.: Why it was so easy to hack the cameras that took down the web. In: CNET Security, October (2016)
Palmer, D.: 175,000 IoT cameras can be remotely hacked thanks to flaw, says security researcher. In: ZDNet, July (2017)
Yu, T., Sekar, V., Seshan, S., Agarwal, Y., Xu, C.: Handling a trillion (unfixable) flaws on a billion devices: rethinking network security for the internet-of-things. In: Proceedings of the 14th ACM Workshop on Hot Topics in Networks, HotNets-XIV (2015)
Sivanathan, A., et al.: Characterizing and classifying IoT traffic in smart cities and campuses. In: IEEE Infocom Workshop Smart Cities and Urban Computing (2017)
Miettinen, M., Marchal, S., Hafeez, I., Asokan, N., Sadeghi, A.R., Tarkoma, S.: Iot sentinel demo: automated device-type identification for security enforcement in iot. In: IEEE ICDCS (2017)
Meidan, Y., et al.: Profiliot: a machine learning approach for IoT device identification based on network traffic analysis (2017)
Guo, H., Heidemann, J.: Ip-based IoT device detection. In: Proceedings of the 2018 Workshop on IoT Security and Privacy, IoT S&P 2018, (New York, NY, USA), pp. 36–42. Association for Computing Machinery (2018)
Ortiz, J., Crawford, C., Le, F.: Devicemien: network device behavior modeling for identifying unknown IoT devices. In: Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI 2019, (New York, NY, USA), pp. 106–117. Association for Computing Machinery (2019)
Bremler-Barr, A., Levy, H., Yakhini, Z.: Iot or not: Identifying IoT devices in a shorttime scale (2019)
Mazhar, M.H., Shafiq, Z.: Characterizing smart home IoT traffic in the wild (2020)
Huang, D.Y., Apthorpe, N., Li, F., Acar, G., Feamster, N.: Iot inspector: crowdsourcing labeled network traffic from smart home devices at scale (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, vol. abs/1810.04805 (2018)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. CoRR, vol. abs/1909.11942 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, vol. abs/1910.01108 (2019)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. CoRR, vol. abs/1603.02754 (2016)
Dorogush, A.V., Ershov, V., Gulin, A.: Catboost: gradient boosting with categorical features support. CoRR, vol. abs/1810.11363 (2018)
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, (Red Hook, NY, USA), pp. 3149–3157. Curran Associates Inc. (2017)
Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999)
XGBoost eXtreme Gradient Boosting: Machine Learning Challenge Winning Solutions GitHub repository. https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions. Accessed 21 Mar 2021
Light Gradient Boosting Machine: Machine Learning Challenge Winning Solutions GitHub repository. https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions. Accessed 21 Mar 2021
Le, F., Srivatsa, M., Verma, D.: Unearthing and exploiting latent semantics behind DNS domains for deep network traffic analysis. In: IJCAI Workshop AI for Internet of Things (2019)
Nist internet time servers. https://tf.nist.gov/tf-cgi/servers.cgi
B4545 secure view camera ultra-quiet wi-fi garage door opener. https://www.chamberlain.com/secure-view-camera-ultra-quiet-wi-fi-garage-door-opener/p/B4545
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Le, F., Calo, S., Verma, D. (2022). Risks and Challenges of Training Classifiers for IoT. In: Tekinerdogan, B., Wang, Y., Zhang, LJ. (eds) Internet of Things – ICIOT 2021. ICIOT 2021. Lecture Notes in Computer Science(), vol 12993. Springer, Cham. https://doi.org/10.1007/978-3-030-96068-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-96068-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96067-4
Online ISBN: 978-3-030-96068-1
eBook Packages: Computer ScienceComputer Science (R0)