Abstract
In this research, we study data poisoning attacks against Bayesian network structure learning algorithms. We propose to use the distance between Bayesian network models and the value of data conflict to detect data poisoning attacks. We propose a 2-layered framework that detects both one-step and long-duration data poisoning attacks. Layer 1 enforces “reject on negative impacts” detection; i.e., input that changes the Bayesian network model is labeled potentially malicious. Layer 2 aims to detect long-duration attacks; i.e., observations in the incoming data that conflict with the original Bayesian model. We show that for a typical small Bayesian network, only a few contaminated cases are needed to corrupt the learned structure. Our detection methods are effective against not only one-step attacks but also sophisticated long-duration attacks. We also present our empirical results.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Adversarial machine learning
- Bayesian networks
- Data poisoning attacks
- The PC algorithm
- Long-duration attacks
- Detection methods
1 Introduction
During the last decade, several researchers addressed the problem of cyber attacks against machine learning systems (see [24] for an overview). Machine learning techniques are widely used; however, machine learning methods were not designed to function correctly in adversarial settings [16, 18]. Data poisoning attacks are considered one of the most important emerging security threats against machine learning systems [33, 35]. Data poisoning attacks aim to corrupt the machine learning model by contaminating the data in the training phase [11]. Data poisoning was studied in different machine learning algorithms, such as Support Vector Machines (SVMs) [11, 21, 28], Principal Component Analysis (PCA) [9, 10], Clustering [8, 12], and Neural Networks (NNs) [36]. However, these efforts are not directly applicable to Bayesian structure learning algorithms.
There are two main methods used in defending against a poisoning attack: (1) robust learning and (2) data sanitization [14]. Robust learning aims to increase learning algorithm robustness, thereby reducing the overall influence that contaminated data samples have on the algorithm. Data sanitization eliminates contaminated data samples from the training data set prior to training a classifier. While data sanitization shows promise to defend against data poisoning, it is often impossible to validate every data source [14].
In our earlier work [3, 4], we studied the robustness of Bayesian network structure learning algorithms against traditional (a.k.a one-step) data poisoning attacks. We proposed two subclasses of data poisoning attacks against Bayesian network algorithms: (i) model invalidation attacks and (ii) targeted change attacks. We defined a novel link strength measure that can be used to perform a security analysis of Bayesian network models [5].
In this paper, we further investigate the robustness of Bayesian network structure learning algorithms against long-duration (a.k.a multi-step) data poisoning attacks (described in Sect. 3). We use the causative model proposed by Barreno et al. [6] to contextualize Bayesian network vulnerabilities. We propose a 2-layered framework to detect poisoning attacks from untrusted data sources. Layer 1 enforces “reject on negative impacts” detection [30]; i.e., input that changes the model is labeled malicious. Layer 2 aims to detect long-duration attacks; i.e., it looks for cases in the incoming data that conflict with the original Bayesian model.
The main contributions of this paper are the following: We define long-duration data poisoning attacks when an attacker may spread the malicious workload over several datasets. We study model invalidation attacks which aim to arbitrarily corrupt the Bayesian network structure. Our 2-layered framework detects both one-step and long-duration data poisoning attacks. We use the distance between Bayesian network models, \(B_1\) and \(B_2\), denoted as \({\mathbf {ds}}(B_1, B_2)\), to detect malicious data input (Eq. 3) for one-step attacks. For long-duration attacks, we use the value of data conflict (Eq. 4) to detect potentially poisoned data. Our framework relies on offline analysis to validate the potentially malicious datasets. We present our empirical results, showing the effectiveness of our framework to detect both one-step and long-duration attacks. Our results indicate that the distance measure \({\mathbf {ds}}(B_1, B_2)\) (Eq. 3) and the conflict measure \(Conf(c,B_1)\) (Eq. 4) are sensitive to poisoned data.
The rest of the paper is structured as follows. In Sect. 2, we present the problem setting. In Sect. 3, we present long-duration data poisoning attacks against Bayesian network structure learning algorithms. In Sect. 4, we present our 2-layered detection framework and our algorithms. In Sect. 5 we present our empirical results. In Sect. 6, we give an overview of related work. In Sect. 7, we conclude and briefly discuss ongoing work.
2 Problem Setting
We focus on structure learning algorithms in Bayesian networks. Let be a validated dataset with N case. Each case c is over attributes \(x_1, \dots , x_n\) and of the form \(c \,= \,{<}x_1 = v_1, \dots , x_n = v_n{>}\), where \(v_i\) is the value of attribute \(x_i\). A Bayesian network model \(B_1\) is learned by feeding a validated dataset into a Bayesian structure learning algorithm, BN_Algo, such as the PC algorithm, which is the most widely used algorithm for structure learning in Bayesian networks [34], as shown in Eq. 1.
The defender attempts to divide an incoming dataset, , coming from an untrusted source, into clean and poisoned cases. The attacker aims to inject a contaminated dataset, with the same attributes as and N1 cases, into the validated training dataset, . A learning error occurs if , obtained by the union of and , results in a Bayesian network learning model \(B_2\) (shown in Eq. 2), such that there is a missing link, a reversed link, or an additional link in \(B_2\) that is not in \(B_1\).
To estimate the impact of the poisoned dataset on the validated dataset, we define a distance function between two Bayesian network models \(B_1\) and \(B_2\), denoted as \({\mathbf {ds}}(B_1, B_2)\). Intuitively, \(B_1\) is the validated model and \(B_2\) is the potentially corrupted model.
Let \(B_1 = (V, E_1)\) and \(B_2 = (V, E_2)\) be two Bayesian network models where \(V = \{x_1, x_2, \dots , x_n \}\) and \(E = \{(x_u, x_v): x_u, x_v \in V\}\). Let \(B_1\) be the validated model resulting from feeding to a Bayesian network structure learning algorithm, and \(B_2\) be the newly learned model resulting from feeding to a Bayesian network structure learning algorithm. Let \(e_1 = (x_u, x_v)\) be a directed edge from vertex \(x_u\) to vertex \(x_v\), and \(e_2 = (x_v, x_u)\) be a directed edge from vertex \(x_v\) to vertex \(x_u\) (\(e_2\) is the reverse of \(e_1\)). The distance function, \({\mathbf {ds}}(B_1, B_2)\), is a non-negative function that measures the changes in the newly learned model \(B_2\) with respect to the original model \(B_1\). The distance function, \({\mathbf {ds}}(B_1, B_2)\), is defined as follows:
(Distance measure). Let Bayesian network models \(B_1 = (V, E_1)\) and \(B_2 = (V, E_2)\) be the results of feeding and , respectively, to a Bayesian network structure learning algorithm. \({\mathbf {ds}}(B_1, B_2)\) is defined as the sum of distances over pairs of vertices \((x_u, x_v) \in V \times V\) as follows:
where \({\mathbf {ds_{x_u x_v}}}(B_1, B_2)\) is the distance between every pair of vertices \((x_u, x_v) \in V \times V\).
We define \({\mathbf {ds_{x_u x_v}}}(B_1, B_2)\) as the cost of making a change to \(B_1\) that results in the newly learned model \(B_2\). The function \({\mathbf {ds_{x_u x_v}}}(B_1, B_2)\) between the two Bayesian network models \(B_1\) and \(B_2\) is defined as follows [19]:
- Status 1 (True Negative Edges): :
-
if ((\(e_1 \not \in E_1\) && \(e_2 \not \in E_1\)) && (\(e_1 \not \in E_2\) && \(e_2 \not \in E_2\))), then there is no edge (neither \(e_1\) nor \(e_2\)) between vertex \(x_u\) and vertex \(x_v\) in either models \(B_1\) and \(B_2\). Hence, \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 0\).
- Status 2 (True Positive Edges): :
-
if ((\(e_1 \in E_1\) && \(e_1 \in E_2\)) \(\mid \mid \) (\(e_2 \in E_1\) && \(e_2 \in E_2\))), then the same edge (either \(e_1\) or \(e_2\)) appears from vertex \(x_u\) to vertex \(x_v\) in both models \(B_1\) and \(B_2\). Hence, \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 0\).
- Status 3 (False Negative Edges): :
-
if ((\(e_1 \mid \mid e_2 \in E_1\)) && (\(e_1\) && \(e_2 \not \in E_2\))), then there is an edge (either \(e_1\) or \(e_2\)) from vertex \(x_u\) to vertex \(x_v\) in \(B_1\) that does not exist in \(B_2\). Without loss of generality, assume that the deleted edge from \(B_1\) is \(e_1\), then if the indegree of vertex \(x_v\), denoted as \(indegree(x_v)\), which is the number if edge incoming to vertex \(x_v\), is greater than 1, then \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 8\) (deleting \(e_1\) breaks an existing v-structure and changes the Markov equivalence class); otherwise, \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 4\) (deleting \(e_1\) does not break an existing v-structure, but it changes the Markov equivalence class).
- Status 4 (False Positive Edges): :
-
if ((\(e_1\) && \(e_2 \not \in E_1\)) && (\(e_1 \mid \mid e_2 \in E_2\))), then there is an edge (either \(e_1\) or \(e_2\)) from vertex \(x_u\) to vertex \(x_v\) in \(B_2\) but not the in \(B_1\). Without loss of generality, assume that the added edge to \(B_2\) is \(e_1\), then if the indegree of vertex \(x_v\), is greater than 1, then \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 8\) (adding \(e_1\) introduces a new v-structure and changes the Markov equivalence class); otherwise, \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 4\) (adding \(e_1\) does not introduce a new v-structure, but it changes the Markov equivalence class).
- Status 5 (False Positive and True Negative Edges): :
-
if ((\(e_1 \in E_1\) && \(e_2 \in E_2\)) && (\(e_1 \in E_2\) && \(e_2 \in E_1\))), then the edge from vertex \(x_u\) to vertex \(x_v\) in \(B_1\) is the reverse of the edge from vertex \(x_u\) to vertex \(x_v\) in \(B_2\). Without loss of generality, assume that there is an edge, \(e_1\), from \(x_u\) to \(x_v\) in \(B_1\), then \(e_2\) is the reverse of \(e_1\) in \(B_2\). If the indegree of vertex \(x_u\), is greater than 1, then \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 8\) (reversing \(e_1\) introduces a new v-structure and changes the Markov equivalence class); otherwise, \({\mathbf {ds_{x_u x_v}}}(B_1, B_2) = 2\) (reversing \(e_1\) does not introduce a new v-structure, but it changes the Markov equivalence class).
To investigate the coherence of an instance case, \(c \,=\, {<}x_1 = v_1, \dots , x_n = v_n{>}\) (or simply \({<}v_1, \dots ,v_n{>}\)), in with the validated model \(B_1\), we use conflict measure, denoted as \(Conf(c, B_1)\). Conflict measure, \(Conf(c, B_1)\), is defined as follows:
(Conflict measure). Let \(B_1\) be a Bayesian network model and let be an incoming dataset, \(Conf(c, B_1)\) is defined as the process of detecting how well a given case \({<}v_1, \dots , v_n{>}\) fits the model \(B_1\) according to the following equation:
where \(c \,=\, {<}v_1, \dots ,v_n{>}\), and P(v) is the prior probability of the evidence v [31].
If \(P(v) = 0\), then we conclude that there is inconsistency among the observations \({<}v_1, \dots ,v_n{>}\). If the value of \(Conf(c, B_1)\) is positive, then we can conclude that \({<}v_1, \dots ,v_n{>}\) are negatively correlated (i.e., unlikely to be correlated as the model requires; \(P(v_1, \dots , v_n) < P(v_1) \times \dots \times P(v_n)\)) and thus are conflicting with the model \(B_1\). The higher the value of \(Conf(c, B_1)\) is, the more incompatibility we have between \(B_1\) and \({<}v_1, \dots ,v_n{>}\).
In this paper, we adopt the causative model proposed by Barreno et al. [6]. Attacks on machine learning systems are modeled as a game between malicious attackers and defenders. In our setting, defenders aim to learn a validated Bayesian network model \(B_1\) using the dataset with the fewest number of errors (minimum \({\mathbf {ds}}\) function). Malicious attackers aim to mislead the defender into learning a contaminated model \(B_2\) using the dataset , obtained by polluting with . We assume that malicious attackers have full knowledge of how Bayesian network structure learning algorithms work. Also, we assume that attackers have knowledge of the dataset . In addition, we assume that the poisoning percentage at which attackers are allowed to add new “contaminated” cases to , \(\beta \), is less than or equal to 0.05. The game between malicious attackers and defenders can be modeled as follows:
-
1.
The defender: The defender uses a validated dataset , to produce a validated Bayesian network model \(B_1\).
-
2.
The malicious attacker: The attacker injects a contaminated dataset, , to be unioned with the original dataset, , with the goal of changing the Markov equivalence class of the original validated model, \(B_1\).
-
3.
Evaluation by the defender:
-
The defender feeds the new dataset (Note that, ) to a Bayesian network structure learning algorithm, resulting in \(B_2\).
-
The defender calculates the distance function \({\mathbf {ds}}(B_1, B_2)\).
-
If \({\mathbf {ds}}(B_1, B_2) = 0\), then Bayesian models \(B_1\) and \(B_2\) are identical. Otherwise, i.e., \({\mathbf {ds}}(B_1, B_2) > 0\), the newly learned Bayesian model \(B_2\) is different from the original validated model \(B_1\).
-
For each case c, the defender calculates the value of conflict measure \(Conf(c, B_1)\).
-
If \(Conf(c, B_1)\) is positive, then the case c conflict with the Bayesian model \(B_1\). Otherwise, the newly incoming case is validated and added to .
-
Note, that the goal of malicious attackers is to maximize the quantity \({\mathbf {ds}}(B_1, B_2)\). The notations used in this paper are summarized as follows:
3 Long-Duration Data Poisoning Attacks
In our earlier work data poisoning attacks [3], we studied data poisoning attacks against Bayesian structure learning algorithms. For a Bayesian structure learning algorithms, given the dataset, , and the corresponding model, \(B_1\) (Eq. 1), a malicious attacker attempts to craft an input dataset, , such that this contaminated dataset will have an immediate impact on and thereby on \(B_1\). The defender periodically retrains the machine learning system to recover the structure of the new model, \(B_2\), using , the combination of the original dataset and the attacker supplied . We call such an attack a “one-step” data poisoning attack as malicious attackers send all contaminated cases at once.
In this section, we introduce long-duration data poisoning attacks against structure learning algorithms. Long-duration poisoning attacks are adversarial multi-step attacks in which a malicious attacker attempts to send contaminated cases over a period of time, \(t =\{1, 2, \dots , w\}\). That is, at every time point i, a malicious attacker sends in a new dataset, , which contains \(\mathrm{N}_\mathrm{i}\) cases, \(\lambda _i\mathrm{N}_\mathrm{i}\) of which are corrupted cases for some \(0< \lambda _i < 1\) (\(\lambda _i\) is the data poisoning rate at which we allowed to add contaminated cases to at iteration i). Even though the defender periodically retrains the model, \(B_2^{'}\), at time i using the dataset , which is equal to , it is not easy to detect the long-duration attack since such an attack is not instantaneous.
By the end of the long-duration poisoning attack, i.e., at time point w, the attacker would have injected to , resulting in a new dataset, . We assume that attackers cannot add more than \(\beta \)N cases to (i.e., \(0< \bigcup _{t=1}^{w} \lambda _t N_{t} < \beta N\)). When the defender retrains the model, \(B_2^{'}\), using the dataset , the attack will dramatically affect the resulting model. Note that this attack is sophisticated since the attacker may not need to send contaminated cases with the last contaminated dataset (the \(w^\mathrm{th}\) dataset) in the long-duration attack, i.e., may trigger the attack with no poisoned cases, as our experiments show.
We propose causative, long-duration model invalidation attacks against Bayesian network structure learning algorithms. Such attacks are defined as malicious active attacks in which adversarial opponents attempt to arbitrarily corrupt the structure of the original Bayesian network model in any way. The goal of adversaries in these attacks is to poison the validated training dataset, , over a period of time \(t = \{1, \dots , w\}\) using the contaminated dataset such that will be no longer valid. We categorize causative long-duration model invalidation attacks against Bayesian network structure learning algorithms into two types: (1) Model invalidation attacks based on the notion of d-separation and (2) Model invalidation attacks based on marginal independence tests.
Causative, long-duration model invalidation attacks which are based on the notion of d-separation are adversarial attacks in which adversaries attempt to introduce a new link in any triple \((A - B - C)\) in the original Bayesian network model, \(B_1\). The goal of the introduced malicious link, \((A - C)\), is to change the independence relations and the Markov equivalence class of \(B_1\). Within such attacks, we can identify two subtypes: (i) Creating a New Converging Connection (V-structure), and (ii) Breaking an Existing Converging Connection (V-structure). See Appendix A for more details.
Causative, long-duration model invalidation attacks which are based on marginal independence tests are adversarial attacks in which adversaries attempt to use marginal independence tests in order to change the conditional independence statements between variables in the original model, \(B_1\). Such attacks can be divided into two main subtypes: (i) Removing the Weakest Edge, and (ii) Adding the Most Believable Edge yet incorrect Edge. See Appendix A for more details.
Due to space limitation, in this work, we only provide a brief description of long-duration data poisoning attacks that aim to achieve a certain attack by sending in contaminated cases over a period of time t. We refer the reader to our technical report [2] for the full algorithmic details.
4 Framework for Detecting Data Poisoning Attacks
In this section, we present our detective framework for data poisoning attacks. Our techniques build on the data sanitization approach that was proposed by Nelson et al. [30]. We extend Nelson et al. approach such that it is applicable to detect both one-step and long-duration causative attacks.
The main components of our framework are: (1) Structure learning Algorithms: the PC learning algorithm, (2) FLoD: first layer of detection, and (3) SLoD: second layer of detection.
First Layer of Detection: In the FLoD, our framework uses “Reject On Negative Impact” defense [30] to examine the full dataset ( \(\cup \) ) to detect the impact of on . The attacker aims to use to change the Markov equivalence class of the validated model, \(B_1\). The first layer of detection detects the impact of adversarial attacks that aim to corrupt the model \(B_1\) using one-step data poisoning attacks. FLoD is useful for efficiently filtering obvious data poisoning attacks.
In the FLoD, we use the distance function \({\mathbf {ds}}\) described in Sect. 2 as a method for detecting the negative impact of on the validated model \(B_1\). If \({\mathbf {ds}}(B_1, B_2)\) is greater than zero, then the new incoming dataset, , is potentiality malicious. In this case, we sent to be checked offline. Otherwise, we proceed with the second layer of detection, SLoD, looking for long-duration data poisoning attacks.
Algorithm 1 provides algorithmic details of FLoD detect one-step data poisoning attacks.
Second Layer of Detection: In the SLoD, our framework uses “Data Conflict Analysis” [31] to examine the newly incoming dataset to detect if has conflicting cases with the original model \(B_1\). The Second layer of detection detects sophisticated adversarial attacks that aim to corrupt the model \(B_1\), such as long-duration data poisoning attacks.
In the SLoD, we use the value of the conflict measure \(Conf(c, B_1)\) described in Sect. 2 as a method for detecting whether or not a case, c, in the newly incoming dataset, , is conflicting with the original model \(B_1\). If the P(v) is equal to zero, then the case c is inconsistent with the validated model \(B_1\). If \(Conf(c, B_1)\) is positive, then the case c is incompatible with the validated model \(B_1\). In these two situations, we add inconsistent and incompatible cases to . is then sent to be checked offline. Thereby, the model \(B_1\) will be retrained according to the following equation: where .
Algorithm 2 provides algorithmic details of the SLoD detect long-duration data poisoning attacks.
The process of applying our framework is summarized in Fig. 1. The workflow of our framework is described as follows: (1) A validated dataset, , which is a clean training dataset that is used to recover a validated machine learning model \(B_1\). (2) A new incoming dataset, , which is coming from an untrusted source and a potentially malicious dataset, is used along with to learn \(B_2\). (3) FLoD checks for one-step data poisoning attacks. If model change occurs (i.e., \({\mathbf {ds}}(B_1, B_2) > 0\)), send for offline evaluation. Else, (4) SLoD checks for long-duration data poisoning attacks. If the value of conflict measure is positive (i.e., \(Conf(c,B_1) > 0\)), send conflicting data to offline evaluation. Else, update the validated dataset.
5 Empirical Results
We implemented our prototype system using the Chest Clinic Network [23]. The Chest Clinic Network was created by Lauritzen and Spielgelhalter [23] and is widely used in Bayesian network experiments. As shown in Fig. 2, Visit to Asia is a simple, fictitious network that could be used at a clinic to diagnose arriving patients. It consists of eight nodes and eight edges. The nodes are as follows: (1) (node A) shows whether the patient lately visited Asia; (2) (node S) shows if the patient is a smoker; (3) (node T) shows if the patient has Tuberculosis; (4) (node L) shows if the patient has lung cancer; (5) (node B) shows if the patient has Bronchitis; (6) (node E) shows if the patient has either Tuberculosis or lung cancer; (7) (node X) shows whether the patient X-ray is abnormal; and (8) (node D) shows if the patient has Dyspnea. The edges indicate the causal relations between the nodes. A simple example of a causal relation is: Visiting Asia may cause Tuberculosis and so on. We refer the readers to [23] for a full description of this network.
We used the Chest Clinic Network to demonstrate the data poisoning attacks and our detection capabilities. In each experiment, we manually generated poisoned datasets. Given the contingency table of two random variables A and B in a Bayesian network model with i and j states, respectively. To introduce a malicious link between A and B, we add corrupt cases to the cell with the highest test statistic value in the contingency table. To remove the link between A and B, we transfer cases from the cell with the highest test statistics value to the one with the lowest value.
5.1 One-Step Data Poisoning Attacks
To set up the experiment, we implemented the Chest Clinic Network using \(\textit{Hugin}^{TM}\) Research 8.1. We then used \(\textit{Hugin}^{TM}\) case generator [26, 32] to generate a simulated dataset of 20, 000 cases. We call this dataset . Using the PC algorithm on dataset with 0.05 significance setting [26], the resulting validated structure, , is given in Fig. 3. While the two networks in Figs. 2 and 3 belong to different Markov equivalence classes, we will use the validated network \(B_1\) as the starting point of our experiment.
We evaluated the effectiveness of one-step data poisoning attacks against the validated dataset (i.e., against the validated model \(B_1\)). An attacker aims to use one-step data poisoning attacks to inject in a contaminated dataset into , resulting in the dataset . The defender retrains the machine learning model by feeding the new dataset to the PC learning algorithm (), resulting in the model \(B_2\).
We aim to study the attacker’s goals, i.e., study the feasibility of one-step data poisoning attacks, which might be as follows: (i) introduce new v-structures: that is, (1) add the links \(D - S\) and \(S - E\) to the serial connections \(D \rightarrow B \rightarrow S\) and \(S \rightarrow L \rightarrow E\), respectively, and (2) add the link \(A - E\) to the diverging connection \(A \leftarrow T \rightarrow E\); (ii) break an existing v-structure \(T \rightarrow E \leftarrow L\), i.e., shield the collider E; (iii) remove the weakest edge, i.e., remove the edge \(T \rightarrow A\); and (iv) add the most believable edge, i.e., add the edge \(B \rightarrow L\). (Note that, for finding the weakest link in a given causal model or the most believable link to be added to a causal model, we refer the readers to our previous works [3, 5] for technical details on how to measure link strength of causal models).
In all of the scenarios, the attacker succeeded in corrupting the new model that was going to be learned by the defender, the model \(B_2\). The attacker had to introduce a dataset with 67 corrupt cases (data items) to introduce the link \(D - S\) in the newly learned model \(B_2\). To introduce links \(S - E\) and \(A - E\) required 21 and 7 corrupt cases, respectively. To shield the collider E, the attacker only needed 4 poisoning data items. The attacker had to modify only 3 cases to break the weakest link \(A - T\). To add the most believable link \(B - L\) required to only 7 corrupt data items.
5.2 Long-Duration Data Poisoning Attacks
To set up the implementation of long-duration attacks, let be a validated training dataset with attributes \(x_1, \dots , x_n\) and N cases, and \(\beta \) be data poisoning rate at which attackers are allowed to add new “contaminated” cases to . Let be a newly crafted dataset also with attributes \(x_1, \dots , x_n\) and \(N_i\) cases, and \(\lambda _i\) be data poisoning rate at which attackers allowed to add new crafted cases to (we default set \(0 \le \bigcup _{t=1}^{w} \lambda _i N_\mathrm{i} \le \beta N\)).
We start by calculating \(\tau \), which is the maximum number of poisoned cases that could be added to over a period of time \(t = \{1, \dots , w\}\). We then learn the structure of the validated model \(B_1\) from using the PC algorithm.
We then iterate w times. In each iteration t, we generate a clean dataset and a poisoned dataset . We let (note that, has \(N_t\) cases, \(\lambda _t N_t\) of which are poisoned). After that, we create the union of and , resulting in , which is used to learn the structure of model \(B_2^{'}\). Note that, in each iteration the number of cases in should be between 0 (i.e., no poisoned cases) and \(\frac{\tau }{w}\), which is the maximum number of poisoned cases that could be added to in the \(t^{th}\) iteration.
We terminate after iteration w. If \(\bigcup _{t=1}^{w}\lambda _t N_t \le \beta N\), we return ; otherwise, we print a failure message since implementing the long-duration attack on is not feasible.
We assumed that \(w = 4\), which means that the attacker is allowed to send in four contaminated datasets to achieve the long-duration data poisoning attack. We divided the 20, 000 case dataset that was generated for one-step data poisoning attacks in Sect. 5.1 into five datasets as follows: 12, 000 cases are used as ; and the rest is divided into four datasets of 2, 000 cases each. We call these four datasets and . Using the PC algorithm on dataset with 0.05 significance setting [26], the resulting validated structure, , is given in Fig. 3, which is the starting point of this experiment.
We evaluated the effectiveness of long-duration data poisoning attacks against the validated dataset (i.e., against the validated model \(B_1\)). At every time point \(t = \{1, \dots , w\}\), the attacker injects a contaminated dataset into , resulting in the dataset . This resulting dataset is then sent in as a new source of information. The defender receives and retrains the validated model, \(B_1\), by creating the union of and the new incoming dataset and feeding them to the PC algorithm, resulting in the model \(B_2^{'}\) (i.e., ).
The results of our experiments are presented in Table 1. In all of the scenarios, the attacker succeeded in achieving the desired modification. In our experiments, we assumed that \(t = \{1, \dots , 4\}\). For every one of the studied long-duration attacks on the dataset (Tables 1a, b, c, d, e, and f), the adversary had to send in the attack over 4 datasets. That is, at every time point t (for \(t = 1, \dots , 4\)), the attacker had to create the union of and resulting in , which was going to be sent to the targeted machine learning system as a new source of information. The defender, on the other hand, retrained the machine learning model every time a new incoming dataset arrived.
Note that, in our experiments, long-duration attacks require the same number of contaminated cases as the one-step data poisoning attacks. An important observation is that the malicious attacker does not always have to send poisoned cases in the last dataset that will trigger the attack. For instance, in our experiments, when introducing the link \(A \rightarrow E\) (Table 1a), shielding collider E (Table 1b), and removing the weakest edge (Table 1f), the last contaminated dataset, , had no contaminated cases, which makes it impossible for the defender to find what caused a change in the newly learned model.
5.3 Discussion: Detecting Data Poisoning Attacks
The results of using our framework to detect one-step data poisoning attacks are presented in Table 2. Algorithm 1 succeeded to detect the negative impact (i.e., the change in the Markov equivalence class) of the new incoming dataset on the validated model \(B_1\).
The results using our framework to detect long-duration data poisoning attacks are summarized in Table 3. Algorithm 2 succeeded to detect the long-duration impact of on the validated dataset . Note, that FLoD using traditional reject on negative impact was not able to detect long-duration attacks. However, when using the SLoD, we were able to detect the conflicting cases, which are either inconsistent or incompatible with the original validated model \(B_1\) (A detailed experiment is presented in Fig. 4). Such cases might be exploited by a malicious adversary to trigger the long-duration attack at a later time. Also, in some attacks no poisoned cases are even required to be sent with to trigger the long-duration attack, which is very hard to detect.
In summary, our 2-layered approach was able to detect both one-step and long-duration attacks. Moreover, our solution did not lose all the incoming datasets; we only send conflicting cases to be checked offline. We have carried out over 200 experiments for long-duration attacks. A comprehensive description of these experiments is given in [2].
6 Related Work
In this section, we will give a brief overview of adversarial machine learning research; focusing on data poisoning. Recent surveys on adversarial machine learning can be found in [6, 16, 24].
Data Poisoning Attacks: As machine learning algorithms have been widely used in security-critical settings such as spam filtering and intrusion detection, adversarial machine learning has become an emerging field of study. Attacks against machine learning systems have been organized by [6, 7, 18] according to three features: Influence, Security Violation, and Specificity. Influence of the attacks on machine learning models can be either causative or exploratory. Causative attacks aim to corrupt the training data whereas exploratory attacks aim to corrupt the classifier at test time. Security violation of machine learning models can be a violation of integrity, availability, or privacy. Specificity of the attacks can be either targeted or indiscriminate. Targeted attacks aim to corrupt machine learning models to misclassify a particular class of false positives whereas indiscriminate attacks have the goal of misclassifying all false positives.
Evasion attacks and Data poisoning attacks are two of the most common attacks on machine learning systems [18]. Evasion attacks [17, 20, 22] are exploratory attacks at the testing phase. In an evasion attack, an adversary attempts to pollute the data for testing the machine learning classifier; thus causing the classifier to misclassify adversarial examples as legitimate ones. Data poisoning attacks [1, 11, 21, 27, 28, 36] are causative attacks, in which adversaries attempt to corrupt the machine learning classifier itself by contaminating the data in the training phase.
Data poisoning attacks are studied extensively during the last decade [3, 8,9,10,11,12, 21, 28, 29, 36]. However, attacks against Bayesian network algorithm are limited. In our previous work, we were addressed data poisoning attacks against Bayesian network algorithms [3,4,5]. We studied how an adversary could corrupt the Bayesian network structure learning algorithms by inserting contaminated data into the training phase. We showed how our novel measure of strengths of links for Bayesian networks [5] can be used to do a security analysis of attacks against Bayesian network structure learning algorithms. However, our approach did not consider long-duration attacks.
Defenses and Countermeasures: Detecting adversarial input is a challenging problem. Recent research [13, 15, 25] illustrate these challenges. Our work addresses these issues in the specific context of Bayesian network structure learning algorithms. Data sanitization is a best practice for security optimization in the adversarial machine learning context [14]. It is often impossible to validate every data source. In the event of a poisoning attack, data sanitization adds a layer of protection for training data by removing contaminated samples from the targeted training data set prior to training a classifier. Reject on Negative Impact is one of the widely used method for data sanitization [6, 14, 24]. Reject on Negative Impact defense assesses the impact of new training sample additions, opting to remover or discard samples that yield significant, negative effects on the observed learning outcomes or classification accuracy [6, 14]. The base training set is used to train a classifier, after which, the new training instance is added and a second classifier is trained [6]. In this approach, classification performance is evaluated by comparing error rates (accuracy) between the original and the new, retrained classifier resulting from new sample integration [24]. As such, if new classification errors are substantially higher compared to the original or baseline classifier, it is assumed that the newly added samples are malicious or contaminated and are therefore removed in order to maximize and protect classification accuracy [6].
7 Conclusion and Future Work
Data integrity is vital for effective machine learning. In this paper, we studied data poisoning attacks against Bayesian network structure learning algorithms. We demonstrated the vulnerability of the PC algorithm against one-step and long-duration data poisoning attacks. We proposed a 2-layered framework for detecting data poisoning attacks. We implemented our prototype system using the Chest Clinic Network which is a widely used network in Bayesian networks. Our results indicate that Bayesian network structure learning algorithms are vulnerable to one-step and long-duration data poisoning attacks. Our framework is effective in detecting both one-step and long-duration data poisoning attacks, as it thoroughly validates and verifies training data before such data is being incorporated into the model.
Our ongoing work focuses on offline validation of potentially malicious datasets. Currently, our approach detects datasets that either change the Bayesian network structure (distance measure) or in conflict with the validated model (conflict measure). We are investigating methods for (1) distinguishing actual model shift from model enrichment, i.e., our initial model was based on data that was not fully representative of the “true” distribution, and (2) determining if cases are truly conflicting or again if the initial model poorly approximates the “true” distribution. We are also investigating the applicability of Wisdom of the Crowd (WoC) [37]. Rather than human experts, we plan to use an ensemble of classifiers, i.e., take the votes of competing algorithms instead of the votes of humans. In the case of an ensemble of classifiers, one could investigate the likelihood of unexpected cases and adjust the sensitivity to anomalies by how much perturbation it causes in the model.
References
Alfeld, S., Zhu, X., Barford, P.: Data poisoning attacks against autoregressive models. In: AAAI, pp. 1452–1458 (2016)
Alsuwat, E., Alsuwat, H., Rose, J., Valtorta, M., Farkas, C.: Long duration data poisoning attacks on Bayesian networks. Technical report, University of South Carolina, SC, USA (2019)
Alsuwat, E., Alsuwat, H., Valtorta, M., Farkas, C.: Cyber attacks against the PC learning algorithm. In: Alzate, C., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11329, pp. 159–176. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_13
Alsuwat, E., Valtorta, M., Farkas, C.: Bayesian structure learning attacks. Technical report, University of South Carolina, SC, USA (2018)
Alsuwat, E., Valtorta, M., Farkas, C.: How to generate the network you want with the PC learning algorithm. In: Proceedings of the 11th Workshop on Uncertainty Processing (WUPES 2018), pp. 1–12 (2018)
Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81(2), 121–148 (2010)
Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning be secure? In: Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, pp. 16–25. ACM (2006)
Biggio, B., et al.: Poisoning complete-linkage hierarchical clustering. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds.) S+SSPR 2014. LNCS, vol. 8621, pp. 42–52. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44415-3_5
Biggio, B., Didaci, L., Fumera, G., Roli, F.: Poisoning attacks to compromise face templates. In: 2013 International Conference on Biometrics (ICB), pp. 1–7. IEEE (2013)
Biggio, B., Fumera, G., Roli, F., Didaci, L.: Poisoning adaptive biometric systems. In: Gimel’farb, G., et al. (eds.) SSPR /SPR 2012. LNCS, vol. 7626, pp. 417–425. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34166-3_46
Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1467–1474. Omnipress (2012)
Biggio, B., Pillai, I., Rota Bulò, S., Ariu, D., Pelillo, M., Roli, F.: Is data clustering in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 87–98. ACM (2013)
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017)
Chan, P.P., He, Z.M., Li, H., Hsu, C.C.: Data sanitization against adversarial label contamination based on data complexity. Int. J. Mach. Learn. Cybern. 9(6), 1039–1052 (2018)
Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B.: Detecting adversarial samples from artifacts. CoRR abs/1703.00410 (2017)
Gardiner, J., Nagaraja, S.: On the security of machine learning in malware C&C detection: a survey. ACM Comput. Surv. (CSUR) 49(3), 59 (2016)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58. ACM (2011)
de Jongh, M., Druzdzel, M.J.: A comparison of structural distance measures for causal Bayesian network models. In: Recent Advances in Intelligent Information Systems, Challenging Problems of Science, Computer Science Series, pp. 443–456 (2009)
Kantchelian, A., Tygar, J., Joseph, A.: Evasion and hardening of tree ensemble classifiers. In: International Conference on Machine Learning, pp. 2387–2396 (2016)
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: International Conference on Machine Learning, pp. 1885–1894 (2017)
Laskov, P., et al.: Practical evasion of a learning-based classifier: a case study. In: 2014 IEEE Symposium on Security and Privacy (SP), pp. 197–211. IEEE (2014)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. Roy. Stat. Soc. Ser. B (Methodol.) 50, 157–224 (1988)
Liu, Q., Li, P., Zhao, W., Cai, W., Yu, S., Leung, V.C.: A survey on security threats and defensive techniques of machine learning: a data driven view. IEEE Access 6, 12103–12117 (2018)
Lu, J., Issaranon, T., Forsyth, D.: Safetynet: detecting and rejecting adversarial examples robustly. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 446–454, October 2017. https://doi.org/10.1109/ICCV.2017.56
Madsen, A.L., Jensen, F., Kjaerulff, U.B., Lang, M.: The Hugin tool for probabilistic graphical models. Int. J. Artif. Intell. Tools 14(03), 507–543 (2005)
Mei, S., Zhu, X.: The security of latent Dirichlet allocation. In: Artificial Intelligence and Statistics, pp. 681–689 (2015)
Mei, S., Zhu, X.: Using machine teaching to identify optimal training-set attacks on machine learners. In: AAAI, pp. 2871–2877 (2015)
Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38. ACM (2017)
Nelson, B., et al.: Misleading learners: co-opting your spam filter. In: Yu, P.S., Tsai, J.J.P. (eds.) Machine Learning in Cyber Trust, pp. 17–51. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-88735-7_2
Nielsen, T.D., Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Heidelberg (2009)
Olesen, K.G., Lauritzen, S.L., Jensen, F.V.: aHUGIN: a system creating adaptive causal probabilistic networks. In: Uncertainty in Artificial Intelligence, pp. 223–229. Elsevier (1992)
Paudice, A., Muñoz-González, L., Gyorgy, A., Lupu, E.C.: Detection of adversarial training examples in poisoning attacks through anomaly detection. arXiv preprint arXiv:1802.03041 (2018)
Spirtes, P., Glymour, C.N., Scheines, R.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
Wang, Y., Chaudhuri, K.: Data poisoning attacks against online learning. arXiv preprint arXiv:1808.08994 (2018)
Yang, C., Wu, Q., Li, H., Chen, Y.: Generative poisoning attack method against neural networks. arXiv preprint arXiv:1703.01340 (2017)
Yi, S.K.M., Steyvers, M., Lee, M.D., Dry, M.J.: The wisdom of the crowd in combinatorial problems. Cogn. Sci. 36(3), 452–470 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Causative, Long-duration Model Invalidation Attacks
A Causative, Long-duration Model Invalidation Attacks
In this Appendix, we explain the two subtypes of each of the causative long-duration attacks which are based on the notion of d-separation and marginal independence tests.
The causative long-duration attacks which are based on the notion of d-separation are divided into two main subtypes as follows:
-
(i)
Creating a new converging connection (v-structure) attacks, in which adversaries attempt to corrupt the original Bayesian network model, \(B_1\), by poisoning the validated dataset, , using contaminated datasets . Attackers aim to introduce a new v-structure by adding the link \(A \rightarrow C\) to the serial connection \(A \rightarrow B \rightarrow C\), link \(C \rightarrow A\) to the serial connection \(A \leftarrow B \leftarrow C\), or either one of the links \(A \rightarrow C\) or \(C \rightarrow A\) to the diverging connection \(A \leftarrow B \rightarrow C\) in \(B_1\).
-
(ii)
Breaking an existing converging connection (v-structure) attacks, in which malicious attackers attempt to corrupt the original model, \(B_1\), by shielding existing colliders (v-structures). Such adversarial attacks can be performed by poisoning the dataset, , over time using the poisoned datasets such that new links are introduced to marry the parents of unshielded colliders in \(B_1\) (i.e., add the link \(A \rightarrow C\) to the converging connection \(A \rightarrow B \leftarrow C\)).
We divide the causative long-duration attacks which are based on marginal independence tests into two main subtypes:
-
(i)
Removing the weakest edge attacks, in which adversarial opponents attempt to poison the validated learning dataset, , using contaminated datasets, , over a period of time t with the ultimate goal of removing weak edges. Note that, a weak edge in a Bayesian model, \(B_1\), is the easiest edge to be removed from \(B_1\). We use our previously defined link strength measure to determine such edges [5].
-
(ii)
Adding the most believable yet incorrect edge attacks, in which adversaries can cleverly craft their input datasets, , over a period of time t to poison so that adding the most believable yet incorrect edge is viable. The most believable yet incorrect edge is a newly added edge to model, \(B_1\), with the maximum amount of belief. We use our link strength measure defined in [5] to determine such edges.
Rights and permissions
Copyright information
© 2019 IFIP International Federation for Information Processing
About this paper
Cite this paper
Alsuwat, E., Alsuwat, H., Rose, J., Valtorta, M., Farkas, C. (2019). Detecting Adversarial Attacks in the Context of Bayesian Networks. In: Foley, S. (eds) Data and Applications Security and Privacy XXXIII. DBSec 2019. Lecture Notes in Computer Science(), vol 11559. Springer, Cham. https://doi.org/10.1007/978-3-030-22479-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-22479-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22478-3
Online ISBN: 978-3-030-22479-0
eBook Packages: Computer ScienceComputer Science (R0)