Skip to main content

Risks and Challenges of Training Classifiers for IoT

  • Conference paper
  • First Online:
Book cover Internet of Things – ICIOT 2021 (ICIOT 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12993))

Included in the following conference series:

Abstract

Although deep learning algorithms can achieve high performance, deep models may not learn the right concepts and can easily overfit their training datasets. In the context of IoT devices, the problem is further exacerbated by three factors. First, traffic may be encrypted, allowing very little visibility into the activity of the endpoints. Second, devices with different models and manufacturers may exhibit very different behaviors. Finally, contrary to domains like computer vision or natural language processing, there is no well-accepted representation for the network data that characterizes IoT devices. In this work, we capture real network traffic from different environments, and we demonstrate that training models to detect specific classes of IoT devices (e.g., cameras) using state-of-the-art techniques can lead to overfitting, and very poor performance on independent datasets. However, we then show that by applying domain knowledge, one can manually define engineered features and train simple models (e.g., a decision tree) that achieve an F-1 score of 0.956 on an independent dataset. These results show the feasibility of training generalizable models, but at the same time, raise questions on how best to transform and represent the raw network data to train classifiers for other classes of IoT devices (e.g., hubs, motion sensors) while minimizing manual feature engineering. We elaborate on the challenges, drawing analogies with other fields such as natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hautala, L.: Why it was so easy to hack the cameras that took down the web. In: CNET Security, October (2016)

    Google Scholar 

  2. Palmer, D.: 175,000 IoT cameras can be remotely hacked thanks to flaw, says security researcher. In: ZDNet, July (2017)

    Google Scholar 

  3. Yu, T., Sekar, V., Seshan, S., Agarwal, Y., Xu, C.: Handling a trillion (unfixable) flaws on a billion devices: rethinking network security for the internet-of-things. In: Proceedings of the 14th ACM Workshop on Hot Topics in Networks, HotNets-XIV (2015)

    Google Scholar 

  4. Sivanathan, A., et al.: Characterizing and classifying IoT traffic in smart cities and campuses. In: IEEE Infocom Workshop Smart Cities and Urban Computing (2017)

    Google Scholar 

  5. Miettinen, M., Marchal, S., Hafeez, I., Asokan, N., Sadeghi, A.R., Tarkoma, S.: Iot sentinel demo: automated device-type identification for security enforcement in iot. In: IEEE ICDCS (2017)

    Google Scholar 

  6. Meidan, Y., et al.: Profiliot: a machine learning approach for IoT device identification based on network traffic analysis (2017)

    Google Scholar 

  7. Guo, H., Heidemann, J.: Ip-based IoT device detection. In: Proceedings of the 2018 Workshop on IoT Security and Privacy, IoT S&P 2018, (New York, NY, USA), pp. 36–42. Association for Computing Machinery (2018)

    Google Scholar 

  8. Ortiz, J., Crawford, C., Le, F.: Devicemien: network device behavior modeling for identifying unknown IoT devices. In: Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI 2019, (New York, NY, USA), pp. 106–117. Association for Computing Machinery (2019)

    Google Scholar 

  9. Bremler-Barr, A., Levy, H., Yakhini, Z.: Iot or not: Identifying IoT devices in a shorttime scale (2019)

    Google Scholar 

  10. Mazhar, M.H., Shafiq, Z.: Characterizing smart home IoT traffic in the wild (2020)

    Google Scholar 

  11. Huang, D.Y., Apthorpe, N., Li, F., Acar, G., Feamster, N.: Iot inspector: crowdsourcing labeled network traffic from smart home devices at scale (2019)

    Google Scholar 

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, vol. abs/1810.04805 (2018)

    Google Scholar 

  13. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. CoRR, vol. abs/1909.11942 (2019)

    Google Scholar 

  14. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, vol. abs/1910.01108 (2019)

    Google Scholar 

  15. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. CoRR, vol. abs/1603.02754 (2016)

    Google Scholar 

  16. Dorogush, A.V., Ershov, V., Gulin, A.: Catboost: gradient boosting with categorical features support. CoRR, vol. abs/1810.11363 (2018)

    Google Scholar 

  17. Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, (Red Hook, NY, USA), pp. 3149–3157. Curran Associates Inc. (2017)

    Google Scholar 

  18. Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999)

    Article  Google Scholar 

  19. XGBoost eXtreme Gradient Boosting: Machine Learning Challenge Winning Solutions GitHub repository. https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions. Accessed 21 Mar 2021

  20. Light Gradient Boosting Machine: Machine Learning Challenge Winning Solutions GitHub repository. https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions. Accessed 21 Mar 2021

  21. Le, F., Srivatsa, M., Verma, D.: Unearthing and exploiting latent semantics behind DNS domains for deep network traffic analysis. In: IJCAI Workshop AI for Internet of Things (2019)

    Google Scholar 

  22. Nist internet time servers. https://tf.nist.gov/tf-cgi/servers.cgi

  23. B4545 secure view camera ultra-quiet wi-fi garage door opener. https://www.chamberlain.com/secure-view-camera-ultra-quiet-wi-fi-garage-door-opener/p/B4545

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Franck Le .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Le, F., Calo, S., Verma, D. (2022). Risks and Challenges of Training Classifiers for IoT. In: Tekinerdogan, B., Wang, Y., Zhang, LJ. (eds) Internet of Things – ICIOT 2021. ICIOT 2021. Lecture Notes in Computer Science(), vol 12993. Springer, Cham. https://doi.org/10.1007/978-3-030-96068-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96068-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96067-4

  • Online ISBN: 978-3-030-96068-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics