Elsevier

Computer Communications

Volume 173, 1 May 2021, Pages 206-213
Computer Communications

Attack sample generation algorithm based on data association group by GAN in industrial control dataset

https://doi.org/10.1016/j.comcom.2021.04.014Get rights and content

Abstract

The importance of industrial control networks security is growing, but the intrusion detection research of industrial control networks is seriously restricted by the existing attack samples of the business dataset, especially the quantity and quality. In order to solve the problem of the scarcity of attack industrial control datasets, this paper proposes an attack sample generation algorithm. Firstly, based on the weight and degree of membership distribution, calculate the value of membership distance between dimensions, and the data association is strong when the membership distance of dimensions is small. Then, divide dimensions which have small distance into a group, so as to realize the association grouping of the original data. The data association of dimensions in an association group is strong when the association group appears frequently. According to the frequency of the association group, all the association groups are divided into strong association group and weak association group. Attack all the dimensions of one strong association group in the original data by false data injection attack, realized attack sample generation algorithm in the original data. Finally, expand the attack sample into a large amount of attack sample industrial control dataset by the Generative Adversarial Network. In this paper, the attack samples are generated by the BATADAL dataset and the business dataset of an oil depot, and the data is expanded by 100 times through the algorithm. Compared with the attack samples provided by the BATADAL dataset, the coincidence degree and fitting degree of generated data is improved by 38.20%–42.94% and 98.22%–98.36%, respectively. The classification results of XGBoost and SVM are 100% and 98.01%, which is close to the classification result of attack samples provided by BATADAL dataset.

Introduction

Industrial control system productions are widely used in key information infrastructure, especially in the fields of energy, electricity, transportation. And the stable operation of the economy and society is influenced by the security of industrial control system products [1], [2], [3], [4], [5]. In June 2010, the Stuxnet caused a lot of damage to centrifuges in Iran’s nuclear facilities. The principle of the virus is to hijacking business data. In this case, it is very important to analyze the underlying business data such as sensors. At present, research of industrial control network security focuses on anomaly detection and situational awareness. The intrusion detection of traditional industrial control system is analyzed based on the data of the network layer [6], and only a few research achievements are based on the business dataset. A great deal of research has been done on intrusion detection, which the traditional methods include misuse detection, intrusion detection and mixed detection [7], [8], [9]. After investigating 30 public datasets, we found that only one dataset (the BATADAL datasets [10]) is pure business data, while other datasets, such as the Mississippi SCADA dataset (Mississippi State University gas pipeline dataset), are network layer datasets. In Mississippi SCADA dataset, each sample contains 27 dimensions with only one dimension which is related to business data. At present, the scarcity of attack sample industrial control datasets seriously limits the study of anomaly detection of industrial control networks. In order to solve the problem of the scarcity of attack industrial control datasets, this paper proposes a attack sample generation algorithm based on data association group by the Generative Adversarial Network (GAN) in industrial control dataset.

For grouping different dimensions in original datasets, degree of membership function is used to make data distribution associate with dataset association degree [11]. Fuzzy set theory is also used to grouping [12], [13], [14]. Therefore, the membership function can be used to calculate the association degree between different dimensions of strongly Coupling datasets.

Common attack samples are constructed by false data injection attack [15], [16]. Common false data injection attack include three types: surge attack, bias attack and geometric attack [17]. Sinusoidal attacks are proposed to solve the problems of limited and poor concealment of common false data injection attack [18]. In the study of anomaly detection for business data, the attack data is generated by add false data injection attack on the original data.

GAN can expands small sample to large sample [19], [20]. Common small sample datasets are expanded by the GAN.

In this paper, proposes a attack sample generation algorithm based on data association group by GAN in industrial control dataset. The association degree of original industrial control dataset is calculated by the membership function. And the association groups is divided according to the association degree and the weight coefficient given by the expert experience. Then according to the frequency of association group, strong association group and weak association group are obtained. The attack sample is generated by the false data injection attack based on the result of the associated grouping. Finally, the negative sample is expanded by GAN to enlarged samples, and the negative sample dataset generation is realized.

Section snippets

Industrial control system

In order to solve the problem of attack sample in industrial control network, it is necessary to understand the framework of industrial control network and possible intrusion attacks. The Fig. 1 describes the spatial distributed industrial control system model.

The operation of the controlled process is controlled by the controller, which can receive the measured values of sensors distributed in different regions and transmit the control signals to the spatially distributed actuators using the

Dataset processing

Select the business dataset in the BATADAL datasets (hereinafter referred to as BATADAL dataset) and an oil depot business dataset (hereinafter referred to as oil depot dataset) for the experiment. Two dataset details are shown in Table 2.

In datasets, data of all sensors in each time point corresponds to the sample at that moment, data of whole time points of each sensor corresponds to the corresponding dimension. In order to experiment with dataset, need several step of preprocessing:

1. Remove

Experimental environment

operating system: Windows10

CPU: Intel(R) Core(TM) i7-9750U CPU @2.60 GHz

internal storage: 16 GB

debugging environment: python 3.5.6,PyTorch 1.3.0

Experimental result

Generated attack samples of the 50,000 pieces BATADAL dataset, the 100,000 pieces BATADAL dataset and the oil depot dataset. Through the algorithm in this paper, attack sample quantity is generated as shown in Table 4.

The generated samples were compared with the original data by DED, TFD, SVM and XGBoost. The result shown in Table 5:

It can be seen from

Conclusion

In this paper, problem of lack of business data in industrial control system is studied, and propose a attack sample generation algorithm. Firstly, the correlation grouping results are obtained by means of weight and membership distribution, and then the strong association grouping results are attacked to obtain the attack samples. Finally, the GAN is used for sample expansion. This paper use open dataset and one oil depot dataset generate attack samples, the coincident degree and trend fitting

CRediT authorship contribution statement

Wen Zhou: Supervision, Funding acquisition. Xiang-min Kong: Methodology, Software, Validation, Writing - original draft. Kai-li Li: Formal analysis, Investigation, Writing - original draft. Xiao-ming Li: Writing - review & editing. Lin-lin Ren: Writing - review & editing. Yong Yan: Writing - review & editing. Yun Sha: Resources. Xue-ying Cao: Methodology, Data curation, Visualization. Xue-jun Liu: Project administration, Conceptualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (32)

  • LiuXuejun et al.

    Research on bidirectional matching algorithm of variable threshold SIFT based on DBSCAN

    J. Chem. Eng. Japan

    (2020)
  • TaorminaRiccardo et al.

    Battle of the attack detection algorithms: Disclosing cyber attacks on water distribution networks

    J. Water Resour. Plann. Manag.

    (2018)
  • KasabovNikola

    Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning

    IEEE Trans. Syst. Man Cybern. B

    (2001)
  • RasoolPelalak et al.

    Influence of machine learning membership functions and degree of membership function on each input parameter for simulation of reactors

    Sci. Rep.

    (2021)
  • WuTao et al.

    Distributivity of implication operator on overlap and grouping functions in interval-valued fuzzy set

    J. Jilin Univ. (Sci. Ed.)

    (2019)
  • MorikawaKazuya et al.

    Tuning membership functions of kernel fuzzy classifiers by maximizing margins

    Memetic Comput.

    (2009)
  • Cited by (8)

    • An ensemble deep federated learning cyber-threat hunting model for Industrial Internet of Things

      2023, Computer Communications
      Citation Excerpt :

      However, this integration increases their attack surfaces and risks of being targeted by cyber-attackers [4,5]. One high-profile example is the Stuxnet campaign, which targeted Iranian centrifuges for nuclear enrichment in 2010, causing severe damage to the equipment [2,6]. Another example is the incident targeting a pump that resulted in the failure of an Illinois water plant in 2011 [7].

    • Construction and Processing Method of Industrial Internet Attack Behavior Dataset

      2023, Proceedings - 2023 IEEE International Conference on Smart Internet of Things, SmartIoT 2023
    View all citing articles on Scopus

    The article has been supported by the National Key Research and Development Program of China (Grant No.2018YFC0824801) and CNAF KJ2019003.

    View full text