Elsevier

Computers & Security

Volume 114, March 2022, 102588
Computers & Security

A comprehensive deep learning benchmark for IoT IDS

https://doi.org/10.1016/j.cose.2021.102588Get rights and content

Abstract

The significance of an intrusion detection system (IDS) in networks security cannot be overstated in detecting and responding to malicious attacks. Failure to detect large-scale attacks like DDoS not only makes the networks vulnerable, but a failure of critical lifesaving medical and industrial equipment can also put human lives at risk. Lack of availability of comprehensive and quality network datasets and the narrow scope to build an IDS based on a single machine learning classifier adds further limitations. Such issues can risk producing inaccurate or biased results in the solutions proposed by various researchers. Toward this end, this paper analyzed several datasets (old, recent, non-IoT, and IoT specific) using several individual and hybrid deep learning classifiers. Our goal is to establish a benchmark that can compare several classification models on several datasets to limit (1) dataset quality issues and (2) possible bias in produced results. We reported our empirical results by revealing exciting findings on some of the classifiers, which took hours to converge but could not successfully detect attacks. In contrast, others quickly converged and were able to produce the best results in terms of accuracy and other performance metrics. We believe that this paper's findings will help build a comprehensive IDS by recognizing that classification or prediction models should be trained beyond a limited scope of one dataset or application.

Introduction

Internet of Things (IoT) has been making a significant impact on people's lives. Governments and industries such as healthcare and transportation have generated immense value by vastly adopting these small devices to make critical, better, effective, and timely decisions. Healthcare is one of the critical and fastest-growing industrial sectors which produces a large amount of data through continuous patient monitoring, infrastructure management, surveillance cameras, record keeping, insurance records, and meeting compliance requirements. The digital transformation of connected devices in the healthcare industry is expected for the market to grow to $534.3 billion by 2025 globally (De Michele and Furini, 2019). It is critical to analyze massive data generated by IoT devices effectively to solve complex problems, make critical decisions, address new challenges, and prevent large-scale cyber-attacks. An estimated $300 to $450 billion can be saved in the healthcare industry by applying proper tools and technologies to process massive data, generate insights, and enhance security (Kayyali et al., 2013). Similarly, IoTs have revolutionized the transportation industry through autonomous vehicles, preventing accidents, controlling traffic patterns and congestion, and optimizing the supply chain process (Humayun et al., 2020). The market value of autonomous vehicles in 2019 is worth $54.23 billion and is expected to grow ten times to $556.67 billion by 2026 (Narla and Stowell, 2019).

Today's technology advances are driven by the fact that a massive amount of heterogeneous data is generated by billions of devices connected through the internet. Managing such large data without proper tools and techniques is a big challenge, and data may become useless if handled incorrectly. Besides all the benefits and projected growth in the IoT market, serious security concerns need attention to protect networks from vulnerabilities and various known and unknown cyber-attacks. IoT devices and networks are vulnerable to many security threats. Without a comprehensive solution to protect against known and zero-day attacks, critical issues such as data breaches, false diagnostics, fatal accidents due to equipment failure, and distributed denial of service (DDoS) attacks can happen. A DDoS attack would either shut down the critical services or make them unresponsive (Hady et al., 2020). Governments and organizations have been struggling to implement a comprehensive, secure, and efficient technology-based solution to deliver essential products and services; thus, the critical national infrastructure and networks remain at risk from potential cyber-attacks.

The massive data generated by IoT devices attracted researchers to propose data-driven intrusion detection systems (IDS). The researchers mostly gather various publicly available network datasets or generate their own dataset in a small, controlled, and simulated environment to base their proposed IDS solution. Once the dataset is available, various traditional machine learning (ML) and deep learning (DL) algorithms are trained to build their models to predict traffic types such as benign or malicious. One of the difficulties in creating an effective intrusion detection system for IoT environments is the availability of a comprehensive network dataset that mirrors the current traffic patterns and includes various attacks and their variants. Over the past many years, institutions, organizations, and researchers have generated several datasets. Unfortunately, not all of them have been made available for public use due to privacy and security concerns. Due to the shortage of a reliable dataset, models suffer from inconsistent and inaccurate performance results. The dataset shortage forces researchers to (a) collect the new datasets of their own in a small simulated environment by launching a handful of known attack types in a controlled manner, (b) propose a solution by limiting their investigations on a single publicly available dataset, or (c) propose a single machine learning classifier for an IDS.

The literature review presented in Section 2 shows that researchers generally pick a single model and a dataset to propose an IDS solution and report performance metrics. One of the problems with a single datasets approach is that it produces biased results by focusing only on limited network traffic patterns and minimal attack classes. The same is the case when researchers base their analysis only on a single machine learning classifier. It lacks diversity; a classifier may perform well on a single dataset but switching the dataset to a new network dataset yields a biased result, and the performance drops considerably. Other problems in single dataset or classifier approaches include: (a) researchers spend time to tune their proposed classifier for a specific dataset which is either an old dataset that lacks the modern-day traffic and attack patterns. (b) The proposed IoT IDS model is trained on a non-IoT dataset such as NSL-KDD and KDD CUP 99; these models are prone to failure in a live IoT environment with different traffic patterns attack information. (c) Build the models using traditional machine learning techniques instead of deep learning algorithms. A common challenge in the traditional machine learning approach is the complex feature selection process which requires in-depth network traffic understanding and is labor and time-intensive (Ahmad and Alsmadi, 2021). Feature selection and model training are independent processes and cannot be performed jointly for optimized classifier performance (Liang and Znati, 2019). Our approach in this study is to analyze several datasets (old, recent, non-IoT, and IoT specific) using several individual and hybrid deep learning models. We captured several pieces of valuable information such as training parameters, training time, model settings, and performance metrics to identify several datasets and deep learning algorithm combinations. This approach helps to narrow down what is good and weak in datasets and classifiers. We hope that our analysis will help future authors gain insight and guidance on what algorithms perform well on a specific dataset and what does not. It will also help future researchers to build a comprehensive IDS based on deep learning models for IoTs.

The rest of the paper is organized as follows. Section 2 presents several publicly available network intrusion benchmark datasets, the background of several deep learning models commonly adopted by the researchers, and different intrusion detection techniques in network traffic. In Section 3, we present our adopted methodology and provide detail of several deep learning algorithms used. Section 4 presents a comprehensive analysis of our experiments and their results. Section 5 discusses the results, trends, and future assessments. Finally, Section 6 concludes the paper and provides our approach going forward.

Section snippets

Literature review

The growth of IoT devices is unprecedented. The projected number of IoT devices in 2025 is 38.6 billion, reaching 50 billion by 2030 (Karie et al., 2020). IoT devices' ability to gather and communicate data makes them very powerful but vulnerable to cyber threats. Securing network traffic in such a large number of IoT devices has been a challenge for manufacturers and researchers. Unfortunately, there is no single comprehensive solution or model available to protect these devices worldwide.

Proposed methodology

The research work above reflects recent research studies on IDS. Researchers trained their classifiers only on a single dataset and on a single or hybrid classifier, trying to find the optimal IDS solution. Such models are not only prone to return biased results in a live environment but may return high false-positive results if trained on very old datasets such as NSL-KDD or KDD CUP 99 (Das et al., 2019). Our methodology in this study analyzes several benchmark datasets using several

Experiments and results analysis

We divided our analysis into two groups of DL classifiers. Firstly, we implemented individual classifiers on datasets mentioned in Section 2.1 and captured several pieces of valuable information such as training time, parameters, model settings, and performance matrices. Secondly, we captured the same information on several hybrid classifiers. Two separate experiments were performed on NSL-KDD and UNSW_NB15 datasets; In [*a], models were trained and predicted on the given training and testing

Discussion

The empirical results and analysis presented in Section 4 reveal exciting findings that need further discussions. This section elaborates those points from both empirically and the scholarly literature.

Summary

This study presents a comparative analysis of various benchmark datasets and deep learning models commonly proposed by researchers for IDSs in an IoT environment. Our empirical results reveal numerous valuable findings which highlight various strengths and weaknesses in adopting a certain deep learning model.

  • IoTs operate in a resource constraint environment (Liang et al., 2020), whereas certain algorithms proposed by researchers require excessive computational power and resources to execute

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

CRediT authorship contribution statement

Rasheed Ahmad: Conceptualization, Methodology, Software, Visualization, Writing – original draft. Izzat Alsmadi: Conceptualization, Methodology, Validation, Writing – review & editing. Wasim Alhamdani: Supervision, Writing – review & editing. Lo'ai Tawalbeh: Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Rasheed Ahmad received his master's degree in 2006 in computer science from the City University of New York (CUNY), New York, and an MBA with specialization in Information Technology from Capella University, Minneapolis in 2016. He is currently pursuing a Ph.D. degree with a cybersecurity specialization at the University of the Cumberlands, USA. His-research interests include the Internet of Things (IoT), intrusion detection systems, cybersecurity, and large-scale attacks

References (100)

  • Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks...
  • D.S. Berman et al.

    A survey of deep learning methods for cyber security

    Information

    (2019)
  • A. Binbusayyis et al.

    Identifying and benchmarking key features for cyber intrusion detection: an ensemble approach

    IEEE Access

    (2019)
  • N. Chaabouni et al.

    Network intrusion detection for IoT security based on learning techniques

    IEEE Commun. Surv. Tutor.

    (2019)
  • S. Chang et al.

    1). Dilated recurrent neural networks

  • B. Charyyev et al.

    Detecting anomalous IoT traffic flow with locality sensitive hashes

  • P. Chaudhary et al.

    DDoS detection framework in resource constrained internet of things domain

  • Chen, C., Ghassami, A., Mohan, S., Kiyavash, N., Bobba, R.B., Pellizzoni, R., & Yoon, M. (2017). A Reconnaissance...
  • Z. Chen et al.

    Seq2Img: a sequence-to-image based approach towards IP traffic classification using convolutional neural networks

  • Cisco Annual Internet Report (2018–2023) white paper [WWW Document], 2020. Cisco. URL...
  • E. Cuervo et al.

    MAUI: making smartphones last longer with code offload

  • S. Das et al.

    DDoS intrusion detection through machine learning ensemble

  • R. De Michele et al.

    IoT healthcare: benefits, issues, and challenges

  • C. DeBeck et al.

    I can not believe Mirais: tracking the infamous IoT malware [WWW Document]

    Secur. Intell.

    (2019)
  • A. Derhab et al.

    Intrusion detection system for internet of things based on temporal convolution neural network and efficient feature engineering

    Wireless Commun. Mob. Comput.

    (2020)
  • B.A. Desai et al.

    A feature-ranking framework for IoT device classification

  • Cui, Z., Ke, R., Pu, Z., Wang, Y., 2019. Deep bidirectional and unidirectional LSTM recurrent neural network for...
  • Dhamija, A. R., Günther, M., & Boult, T. E. (2018). Reducing network agnostophobia. Proceedings of the 32nd...
  • A. Divekar et al.

    Benchmarking datasets for Anomaly-based Network Intrusion Detection

  • A. Dushimimana et al.

    Bi-directional recurrent neural network for intrusion detection system (IDS) in the internet of things (IoT)

    IJAERS

    (2020)
  • O.E. Elejla et al.

    Flow-based IDS for ICMPv6-based DDoS attacks detection

    Arab. J. Sci. Eng.

    (2018)
  • A. Fadele et al.

    A novel countermeasure technique for reactive jamming attack in internet of things

    Multimed. Tools Appl.

    (2019)
  • Z. Feng et al.

    Self-supervised representation learning from multi-domain data

  • M.A. Ferrag et al.

    RDTIDS: rules and decision tree-based intrusion detection system for internet-of-things networks

    Future Internet

    (2020)
  • N. Fu et al.

    A novel deep intrusion detection model based on a convolutional neural network

    Aust. J. Intell. Inf. Process. Syst.

    (2019)
  • M. Ge et al.

    Deep learning-based intrusion detection for IoT networks

  • J. Gehring et al.

    Convolutional sequence to sequence learning

    Proceedings of the 34th International Conference on Machine Learning -

    (2017)
  • A.A. Hady et al.

    Intrusion detection system for healthcare systems using medical and network data: a comparison study

    IEEE Access

    (2020)
  • S. Haider et al.

    A deep CNN ensemble framework for efficient DDoS attack detection in software defined networks

    IEEE Access

    (2020)
  • S. Han et al.

    MCDNN: an approximation-based execution framework for deep stream processing under resource constraints

  • M. Hassen et al.

    Unsupervised open set recognition using adversarial autoencoders

  • Hayashi, T., Watanabe, S., Toda, T., Hori, T., Le Roux, J., & Takeda, K. (2016, September). Bidirectional LSTM-HMM...
  • M. Humayun et al.

    Emerging smart logistics and transportation using IoT and blockchain

    IEEE Internet Things Mag.

    (2020)
  • R.H. Hwang et al.

    An unsupervised deep learning model for early network traffic anomaly detection

    IEEE Access

    (2020)
  • R.H. Hwang et al.

    An LSTM-based deep learning approach for classifying malicious traffic at the packet level

    Appl. Sci.

    (2019)
  • O. Ibitoye et al.

    Analyzing adversarial attacks against deep learning for intrusion detection in IoT networks

  • T.M. Ingolfsson et al.

    EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain–Machine Interfaces

  • Ingre, B., Yadav, A., 2015. Performance analysis of NSL-KDD dataset using ANN....
  • Jaidka, H., Sharma, N., Singh, R., 2020. Evolution of IoT to IIoT: applications & challenges (SSRN Scholarly Paper No....
  • N.P. Jouppi et al.

    In-datacenter performance analysis of a tensor processing unit

  • Cited by (26)

    • MEMBER: A multi-task learning model with hybrid deep features for network intrusion detection

      2022, Computers and Security
      Citation Excerpt :

      Recently, various supervised deep learning models have been employed to improve the performance of network intrusion detection. Ahmad et al. (2022) investigated six publicly available datasets (old, recent, non-IoT, and IoT-specific) with several single and hybrid neural networks. The authors aimed to provide a benchmark for comparing different deep learning models on multiple datasets.

    View all citing articles on Scopus

    Rasheed Ahmad received his master's degree in 2006 in computer science from the City University of New York (CUNY), New York, and an MBA with specialization in Information Technology from Capella University, Minneapolis in 2016. He is currently pursuing a Ph.D. degree with a cybersecurity specialization at the University of the Cumberlands, USA. His-research interests include the Internet of Things (IoT), intrusion detection systems, cybersecurity, and large-scale attacks

    Izzat Alsmadi is an Associate Professor in the department of computing and cyber security at the Texas A&M, San Antonio. He has his master and PhD in Software Engineering from North Dakota State University in 2006 and 2008. He has more than 100 conference and journal publications. His-research interests include: Cyber intelligence, Cyber security, Software security, software engineering, software testing, social networks and software defined networking. He is lead author, editor in several books including: Springer, The NICE Cyber Security Framework Cyber Security Intelligence and Analytics, 2019, Practical Information Security: A Competency-Based Education Course, 2018, Information Fusion for Cyber-Security Analytics (Studies in Computational Intelligence), 2016. The author is also a member of The National Initiative for Cybersecurity Education (NICE) group, which meets frequently to discuss enhancements on cyber security education at the national level

    Wasim Alhamdani is a Professor in the department of computer science at the University of the Cumberlands, Kentucky, USA. He has a PhD in Computer Science from University of East Anglia, Norwich, UK in 1985 and a M.Sc. in Computer Science from Loughborough University of Technology, Loughborough, UK in 1981. His-general research interests are in Cyber Security, Cryptography, and Cyber Security Management. His-current research areas include Information Security Mathematical Modeling, Cyber Security Resilient Architecture Design, Ontologies with Cybersecurity, and Ethics with Cryptography use. He is currently a Cyber Security Curriculum Adviser for two international universities

    Dr Lo'ai Tawalbeh (IEEE Senior Member) completed his PhD degree in Electrical & Computer Engineering from Oregon State University in 2004, and MSc in 2002 from the same university with GPA 4/4. Dr. Tawalbeh is currently an Associate professor at the department of Computing and Cyber Security at Texas A&M University-San Antonio. Before that he was a visiting researcher at University of California-Santa Barbra.  Since 2005 he taught/developed more than 25 courses in different disciplines of computer engineering and science with focus on cyber security for the undergraduate/graduate programs at: NewYork Institute of Technology (NYIT), DePaul's University, and Jordan University of Science and Technology.  Dr. Tawalbeh supervised successfully more than 40 graduate student, and he won many research grants and awards with over than 3 Million USD. He has over 120 research publications in refereed international Journals and conferences

    View full text