Risks and Challenges of Training Classifiers for IoT

Le, Franck; Calo, Seraphin; Verma, Dinesh

doi:10.1007/978-3-030-96068-1_2

Franck Le¹¹,
Seraphin Calo¹¹ &
Dinesh Verma¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12993))

Included in the following conference series:

International Conference on Internet of Things

958 Accesses

Abstract

Although deep learning algorithms can achieve high performance, deep models may not learn the right concepts and can easily overfit their training datasets. In the context of IoT devices, the problem is further exacerbated by three factors. First, traffic may be encrypted, allowing very little visibility into the activity of the endpoints. Second, devices with different models and manufacturers may exhibit very different behaviors. Finally, contrary to domains like computer vision or natural language processing, there is no well-accepted representation for the network data that characterizes IoT devices. In this work, we capture real network traffic from different environments, and we demonstrate that training models to detect specific classes of IoT devices (e.g., cameras) using state-of-the-art techniques can lead to overfitting, and very poor performance on independent datasets. However, we then show that by applying domain knowledge, one can manually define engineered features and train simple models (e.g., a decision tree) that achieve an F-1 score of 0.956 on an independent dataset. These results show the feasibility of training generalizable models, but at the same time, raise questions on how best to transform and represent the raw network data to train classifiers for other classes of IoT devices (e.g., hubs, motion sensors) while minimizing manual feature engineering. We elaborate on the challenges, drawing analogies with other fields such as natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Navigating IoT Complexity: Developing Datasets for Smart-Home Device Interactions

DeepThink IoT: The Strength of Deep Learning in Internet of Things

Article 04 June 2023

Enabling Inference and Training of Deep Learning Models for AI Applications on IoT Edge Devices

References

Hautala, L.: Why it was so easy to hack the cameras that took down the web. In: CNET Security, October (2016)
Google Scholar
Palmer, D.: 175,000 IoT cameras can be remotely hacked thanks to flaw, says security researcher. In: ZDNet, July (2017)
Google Scholar
Yu, T., Sekar, V., Seshan, S., Agarwal, Y., Xu, C.: Handling a trillion (unfixable) flaws on a billion devices: rethinking network security for the internet-of-things. In: Proceedings of the 14th ACM Workshop on Hot Topics in Networks, HotNets-XIV (2015)
Google Scholar
Sivanathan, A., et al.: Characterizing and classifying IoT traffic in smart cities and campuses. In: IEEE Infocom Workshop Smart Cities and Urban Computing (2017)
Google Scholar
Miettinen, M., Marchal, S., Hafeez, I., Asokan, N., Sadeghi, A.R., Tarkoma, S.: Iot sentinel demo: automated device-type identification for security enforcement in iot. In: IEEE ICDCS (2017)
Google Scholar
Meidan, Y., et al.: Profiliot: a machine learning approach for IoT device identification based on network traffic analysis (2017)
Google Scholar
Guo, H., Heidemann, J.: Ip-based IoT device detection. In: Proceedings of the 2018 Workshop on IoT Security and Privacy, IoT S&P 2018, (New York, NY, USA), pp. 36–42. Association for Computing Machinery (2018)
Google Scholar
Ortiz, J., Crawford, C., Le, F.: Devicemien: network device behavior modeling for identifying unknown IoT devices. In: Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI 2019, (New York, NY, USA), pp. 106–117. Association for Computing Machinery (2019)
Google Scholar
Bremler-Barr, A., Levy, H., Yakhini, Z.: Iot or not: Identifying IoT devices in a shorttime scale (2019)
Google Scholar
Mazhar, M.H., Shafiq, Z.: Characterizing smart home IoT traffic in the wild (2020)
Google Scholar
Huang, D.Y., Apthorpe, N., Li, F., Acar, G., Feamster, N.: Iot inspector: crowdsourcing labeled network traffic from smart home devices at scale (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, vol. abs/1810.04805 (2018)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. CoRR, vol. abs/1909.11942 (2019)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, vol. abs/1910.01108 (2019)
Google Scholar
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. CoRR, vol. abs/1603.02754 (2016)
Google Scholar
Dorogush, A.V., Ershov, V., Gulin, A.: Catboost: gradient boosting with categorical features support. CoRR, vol. abs/1810.11363 (2018)
Google Scholar
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, (Red Hook, NY, USA), pp. 3149–3157. Curran Associates Inc. (2017)
Google Scholar
Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999)
Article Google Scholar
XGBoost eXtreme Gradient Boosting: Machine Learning Challenge Winning Solutions GitHub repository. https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions. Accessed 21 Mar 2021
Light Gradient Boosting Machine: Machine Learning Challenge Winning Solutions GitHub repository. https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions. Accessed 21 Mar 2021
Le, F., Srivatsa, M., Verma, D.: Unearthing and exploiting latent semantics behind DNS domains for deep network traffic analysis. In: IJCAI Workshop AI for Internet of Things (2019)
Google Scholar
Nist internet time servers. https://tf.nist.gov/tf-cgi/servers.cgi
B4545 secure view camera ultra-quiet wi-fi garage door opener. https://www.chamberlain.com/secure-view-camera-ultra-quiet-wi-fi-garage-door-opener/p/B4545

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, New York, US
Franck Le, Seraphin Calo & Dinesh Verma

Authors

Franck Le
View author publications
You can also search for this author in PubMed Google Scholar
Seraphin Calo
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franck Le .

Editor information

Editors and Affiliations

Wageningen University Maatschappijw, Wageningen, The Netherlands
Bedir Tekinerdogan
University of Prince Edward Island, Charlottetown, Canada
Yingwei Wang
Kingdee International Software Group Co., Ltd., Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, F., Calo, S., Verma, D. (2022). Risks and Challenges of Training Classifiers for IoT. In: Tekinerdogan, B., Wang, Y., Zhang, LJ. (eds) Internet of Things – ICIOT 2021. ICIOT 2021. Lecture Notes in Computer Science(), vol 12993. Springer, Cham. https://doi.org/10.1007/978-3-030-96068-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-96068-1_2
Published: 18 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96067-4
Online ISBN: 978-3-030-96068-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Risks and Challenges of Training Classifiers for IoT