Abstract
Machine learning-based techniques have proven to be effective in Internet-of-Things (IoT) network behavioral inference. Existing works developed data-driven models based on features from network packets and/or flows, but mainly in a static and ad-hoc manner, without adequately quantifying their gains versus costs. In this article, we develop a generic architecture that comprises two distinct inference modules in tandem, which begins with IoT network behavior classification followed by continuous monitoring. In contrast to prior relevant works, our generic architecture flexibly accounts for various traffic features, modeling algorithms, and inference strategies. We argue quantitative metrics are required to systematically compare and efficiently select various traffic features for IoT traffic inference.
This article1 makes three contributions: (1) For IoT behavior classification, we identify four metrics, namely, cost, accuracy, availability, and frequency, that allow us to characterize and quantify the efficacy of seven sets of packet-based and flow-based traffic features, each resulting in a specialized model. By experimenting with traffic traces of 25 IoT devices collected from our testbed, we demonstrate that specialized-view models can be superior to a single combined-view model trained on a plurality of features by accuracy and cost. We also develop an optimization problem that selects the best set of specialized models for a multi-view classification. (2) For monitoring the expected IoT behaviors, we develop a progressive system consisting of one-class clustering models (per IoT class) at three levels of granularity. We develop an outlier detection technique on top of the convex hull algorithm to form custom-shape boundaries for the one-class models. We show how progression helps with computing costs and the explainability of detecting anomalies. (3) We evaluate the efficacy of our optimally selected classifiers versus the superset of specialized classifiers by applying them to our IoT traffic traces. We demonstrate how the optimal set can reduce the processing cost by a factor of six with insignificant impacts on the classification accuracy. Also, we apply our monitoring models to a public IoT dataset of benign and attack traces and show they yield an average true-positive rate of 94% and a false-positive rate of 5%. Finally, we publicly release our data (training and testing instances of classification and monitoring tasks) and code for convex hull-based one-class models.
- [1] . 2019. IoT Benign and Attack Traces. Retrieved from https://iotanalytics.unsw.edu.au/attack-data.htmlGoogle Scholar
- [2] . 2012. Learning from Data. AMLBook.Google ScholarDigital Library
- [3] . 2020. Monitoring enterprise DNS queries for detecting data exfiltration from internal hosts. IEEE Trans. Netw. Serv. Manage. 17, 1 (2020), 265–279.Google ScholarDigital Library
- [4] . 2019. SoK: Security evaluation of home-based IoT deployments. In Proceedings of the IEEE Symposium on Security and Privacy (S&P’19).Google ScholarCross Ref
- [5] . 2021. PARVP: Passively assessing risk of vulnerable passwords for HTTP authentication in networked cameras. In Proceedings of the ACM Workshop on DAI-SNAC. 10–16.Google ScholarDigital Library
- [6] . 2018. Behavioral fingerprinting of IoT devices. In Proceedings of the ACM ASHES.Google ScholarDigital Library
- [7] . 2018. Behavioral fingerprinting of IoT devices. In Proceedings of the ASHES. Toronto, Canada.Google ScholarDigital Library
- [8] . 2017. Infected Vending Machines, Lamps, other IoT Devices Shut Down University Network. Retrieved from https://bit.ly/3NE6dPuGoogle Scholar
- [9] . 2020. IoT or NoT: Identifying IoT devices in a short time scale. In Proceedings of the IEEE/IFIP NOMS. Google ScholarDigital Library
- [10] . 2021. Pyomo–Optimization Modeling in Python (3rd ed.). Vol. 67. Springer Science & Business Media.Google ScholarCross Ref
- [11] . 2014. A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135 (2014), 32–41.
DOI: Google ScholarDigital Library - [12] . 2012. Introduction to Cisco IOS NetFlow—A Technical Overview. Retrieved from https://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/ios-netflow/prod_white_paper0900aecd80406232.htmlGoogle Scholar
- [13] . 2020. Cyberthreat Defense Report. Retrieved from https://cyber-edge.com/wp-content/uploads/2020/03/CyberEdge-2020-CDR-Report-v1.0.pdfGoogle Scholar
- [14] . 2016. CVXPY: A Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17, 83 (2016), 1–5.Google ScholarDigital Library
- [15] . 2018. Machine learning DDoS detection for consumer Internet of Things devices. In Proceedings of the IEEE S&P Workshops.Google ScholarCross Ref
- [16] . 2021. A survey on missing data in machine learning. J. Big Data 8 (2021), 1–37.Google Scholar
- [17] . 2022. Transmission Control Protocol (TCP). Retrieved from https://www.rfc-editor.org/info/rfc9293.
DOI: Google ScholarDigital Library - [18] . 2018. Acquisitional rule-based engine for discovering Internet-of-Things devices. In Proceedings of the USENIX Security Conference.Google Scholar
- [19] . 2016. Network Visibility Survey. Retrieved from http://bit.ly/30LBGafGoogle Scholar
- [20] 2023. IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic. Retrieved from https://zenodo.org/record/4743746. Google ScholarCross Ref
- [21] . 2018. IP-based IoT device detection. In Proceedings of the ACM IoT S&P.Google ScholarDigital Library
- [22] . 2020. IoTSTEED: Bot-side Defense to IoT-based DDoS Attacks (Extended).
Technical Report ISI-TR-738. USC/Information Sciences Institute. Retrieved from https://bit.ly/3ec9eGSGoogle Scholar - [23] . 2020. IoTSTEED: Bot-side Defense to IoT-based DDoS Attacks (Extended).
Technical Report ISI-TR-738. USC/Information Sciences Institute. Retrieved from https://www.isi.edu/%7ejohnh/PAPERS/Guo20b.htmlGoogle Scholar - [24] . 2019. iTeleScope: Softwarized network middle-box for real-time video telemetry and classification. IEEE Trans. Netw. Serv. Manage. 16, 3 (2019), 1071–1085.
DOI: Google ScholarCross Ref - [25] . 2019. Detecting volumetric attacks on IoT devices via SDN-based monitoring of MUD activity. In Proceedings of the ACM SOSR.Google Scholar
- [26] . 2020. Verifying and monitoring IoTs network behavior using MUD profiles. IEEE Trans. Depend. Secure Comput. 19, 1 (
May 2020), 1–18.Google ScholarDigital Library - [27] . 2022. Verifying and monitoring IoTs network behavior using MUD profiles. IEEE TDSC 19, 1 (2022), 1–18.Google Scholar
- [28] . 2018. Combining MUD policies with SDN for IoT intrusion detection. In Proceedings of the ACM IoT S&P.Google ScholarDigital Library
- [29] . 2019. Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things J. 7 (2019), 1–14.Google Scholar
- [30] . 2019. Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things J. 7 (2019), 100059.Google ScholarCross Ref
- [31] . 2021. Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021). Info. Med. Unlock. 27 (2021), 100799.Google ScholarCross Ref
- [32] . 2020. Classifying Network Vendors at Internet Scale. Retrieved from https://arxiv.org/abs/2006.13086.
DOI: Google ScholarCross Ref - [33] . 2020. IoT inspector: Crowdsourcing labeled network traffic from smart home devices at scale. ACM IMWUT 4, 2 (2020).Google Scholar
- [34] . 2013. Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information. Retrieved from https://tools.ietf.org/html/rfc7011Google Scholar
- [35] . 2019. Manufacturer Usage Description Specification. Retrieved from https://tools.ietf.org/html/rfc8520Google Scholar
- [36] . 2022. A survey of smart home IoT device classification using machine learning-based network traffic analysis. IEEE Access 10 (2022), 97117–97141.
DOI: Google ScholarCross Ref - [37] . 2019. All things considered: An analysis of IoT devices on home networks. In Proceedings of the USENIX Security.Google Scholar
- [38] . 2017. Systematically evaluating security and privacy for consumer IoT devices. In Proceedings of the ACM Workshop on IoT S&P. 1–6.Google ScholarDigital Library
- [39] . 1997. Retrieved from Nmap. https://nmap.org/Google Scholar
- [40] . 2017. Quantifying the reflective DDoS attack capability of household IoT devices. In Proceedings of the ACM WiSec. 46–51.Google ScholarDigital Library
- [41] . 2017. Quantifying the reflective DDoS attack capability of household IoT devices. In Proceedings of the ACM WiSec.Google ScholarDigital Library
- [42] . 2019. AuDI: Toward autonomous IoT device-type identification using periodic communication. IEEE JSAC 37, 6 (
June 2019), 1402–1412.Google Scholar - [43] . 2020. Characterizing smart home IoT traffic in the wild. In Proceedings of the IEEE/ACM IoTDI.Google ScholarCross Ref
- [44] . 2017. ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis. In Proceedings of the SAC.Google ScholarDigital Library
- [45] . 2018. N-BaIoT–network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervas. Comput. 17, 3 (2018), 12–22.Google ScholarDigital Library
- [46] . 2020. A novel approach for detecting vulnerable IoT devices connected behind a home NAT. Comput. Secur. 97 (
Oct. 2020), 1–23.Google ScholarDigital Library - [47] . 2017. IoT SENTINEL: Automated device-type identification for security enforcement in IoT. In Proceedings of the IEEE ICDCS.Google Scholar
- [48] . 1992. Network Time Protocol (Version 3) Specification, Implementation and Analysis. Retrieved from https://www.rfc-editor.org/info/rfc1305Google Scholar
- [49] . 2020. Common Vulnerabilities and Exposures. Retrieved from https://cve.mitre.org/Google Scholar
- [50] . 2019. IoT device fingerprinting: Machine learning based encrypted traffic analysis. In Proceedings of the IEEE WCNC.Google ScholarDigital Library
- [51] . 2019. DÏoT: A federated self-learning anomaly detection system for IoT. In Proceedings of the IEEE ICDCS.Google Scholar
- [52] . 2019. DÏoT: A federated self-learning anomaly detection system for IoT. In Proceedings of the IEEE ICDCS.Google Scholar
- [53] . 2020. Unit 42 IoT Threat Report. Retrieved from https://start.paloaltonetworks.com/unit-42-iot-threat-reportGoogle Scholar
- [54] . 2020. Progressive monitoring of IoT networks using SDN and cost-effective traffic signatures. In Proceedings of the ETSecIoT.Google ScholarCross Ref
- [55] . 2021. Inferring connected IoT devices from IPFIX records in residential ISP networks. In Proceedings of the IEEE LCN.Google ScholarCross Ref
- [56] . 2022. IoT Traffic Instances. Retrieved from https://iotanalytics.unsw.edu.au/smartinfer.htmlGoogle Scholar
- [57] . 2022. PicP-MUD: Profiling information content of payloads in MUD flows for IoT devices. In Proceedings of the IEEE WoWMoM.Google ScholarCross Ref
- [58] . 2016. Dyn (DynDNS) DDoS Attack. Retrieved from https://www.red-button.net/blog/dyn-dyndns-ddos-attackGoogle Scholar
- [59] . 2018. One-class quantification. In Proceedings of the ECML PKDD.Google Scholar
- [60] . 1997. Convex Analysis. Princeton Mathematical Series.Google Scholar
- [61] . 2022. A survey on IoT profiling, fingerprinting, and identification. ACM TIOT 3, 4, Article
26 (Sep. 2022), 39 pages.Google Scholar - [62] . 2020. A haystack full of needles: Scalable detection of IoT devices in the wild. In Proceedings of the ACM IMC.Google ScholarDigital Library
- [63] . 2019. TLS Fingerprinting with JA3 and JA3S. Retrieved from https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967Google Scholar
- [64] . 2021. SciPy Convex Hull. Retrieved from https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.ConvexHull.htmlGoogle Scholar
- [65] . 2022. Lumos: Identifying and localizing diverse hidden IoT devices in an unfamiliar environment. In Proceedings of the USENIX Security.Google Scholar
- [66] . 2018. Can we classify an IoT device using TCP port scan?. In Proceedings of the IEEE ICIAfS.Google ScholarCross Ref
- [67] . 2020. Detecting behavioral change of IoT devices using clustering-based network traffic modeling. IEEE Internet Things J. 7, 8 (2020), 7295–7309.Google ScholarCross Ref
- [68] . 2020. Managing IoT cyber-security using programmable telemetry and machine learning. IEEE Trans. Netw. Serv. Manage. 17, 1 (2020), 60–74.Google ScholarDigital Library
- [69] . 2019. Classifying IoT devices in smart environments using network traffic characteristics. IEEE Trans. Mobile Comput. 18, 8 (2019), 1745–1759.Google ScholarCross Ref
- [70] . 2017. Experimental evaluation of cybersecurity threats to the smart-home. In Proceedings of the IEEE ANTS. 1–6.Google ScholarDigital Library
- [71] . 2016. Smart-phones attacking smart-homes. In Proceedings of the ACM WiSec. 195–200.Google ScholarDigital Library
- [72] . 2018. Smart IoT devices in the home: Security and privacy implications. IEEE Technol. Soc. Mag. 37, 2 (2018), 71–79.Google ScholarCross Ref
- [73] . 2010. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the IEEE S&P. 305–316.Google ScholarDigital Library
- [74] . 2023. Programmable active scans controlled by passive traffic inference for IoT asset characterization. In Proceedings of the IEEE/IFIP NOMS Workshop on Manage-IoT.Google ScholarCross Ref
- [75] . 2019. DEFT: A distributed IoT fingerprinting technique. IEEE Internet Things J. 6, 1 (2019), 940–952.
DOI: Google ScholarCross Ref - [76] . 2019. PingPong: Packet-level signatures for smart home device events. In Proceedings of the NDSS.Google Scholar
- [77] . 2021. Analyzing the impact of missing values and selection bias on fairness. Int. J. Data Sci. Anal. 12, 2 (2021), 101–119.Google ScholarCross Ref
- [78] . 2019. Towards automatic fingerprinting of IoT devices in the cyberspace. Comput. Netw. 148 (2019), 318–327.Google ScholarCross Ref
- [79] . 2019. Towards automatic fingerprinting of IoT devices in the cyberspace. Comput. Netw. 148 (2019), 318–327.Google ScholarCross Ref
- [80] . 2017. Multi-view learning overview: Recent progress and new challenges. Info. Fusion 38 (2017), 43–54.Google ScholarDigital Library
Index Terms
- Efficient IoT Traffic Inference: From Multi-view Classification to Progressive Monitoring
Recommendations
The rise of traffic classification in IoT networks: A survey
AbstractWith the proliferation of the Internet of Things (IoT), the integration and communication of various objects have become a prevalent practice. The huge growth of IoT devices and different characteristics in the IoT traffic patterns ...
Improved classification with allocation method and multiple classifiers
We propose a new allocation method for building a classification ensemble.Allocation method uses multiple classifiers: the allocator and micro classifiers.Allocator separates the dataset and allocates them to one of micro classifiers.Allocator is based ...
Consumer IoT device deployment optimisation through deep learning: a CNN-LSTM solution for traffic classification and service identification
The internet of things (IoT) has revolutionised our world, connecting devices and creating a more intelligent and interconnected environment. However, managing and utilising the vast amount of data generated by these devices is a major challenge. To ...
Comments