Elsevier

Computers & Security

Volume 46, October 2014, Pages 94-110
Computers & Security

An unsupervised anomaly-based detection approach for integrity attacks on SCADA systems

https://doi.org/10.1016/j.cose.2014.07.005Get rights and content

Abstract

Supervisory Control and Data Acquisition (SCADA) systems are a core part of industrial systems, such as smart grid power and water distribution systems. In recent years, such systems become highly vulnerable to cyber attacks. The design of efficient and accurate data-driven anomaly detection models become an important topic of interest relating to the development of SCADA-specific Intrusion Detection Systems (IDSs) to counter cyber attacks. This paper proposes two novel techniques: (i) an automatic identification of consistent and inconsistent states of SCADA data for any given system, and (ii) an automatic extraction of proximity detection rules from identified states. During the identification phase, the density factor for the k-nearest neighbours of an observation is adapted to compute its inconsistency score. Then, an optimal inconsistency threshold is calculated to separate inconsistent from consistent observations. During the extraction phase, the well-known fixed-width clustering technique is extended to extract proximity-detection rules, which forms a small and most-representative data set for both inconsistent and consistent behaviours in the training data set. Extensive experiments were carried out both on real as well as simulated data sets, and we show that the proposed techniques provide significant accuracy and efficiency in detecting cyber attacks, compared to three well-known anomaly detection approaches.

Introduction

SCADA systems control and monitor industrial and infrastructure processes such as transportation, oil and gas refining and energy and water distribution networks (Yu et al., 2011, Fahad et al., 2013). In recent years, the incorporation of Commercial-Off-The-Shelf (COTS) products such as standard hardware and software platforms have begun to be used in SCADA systems. This incorporation allowed various products from different vendors to be integrated with each other to build a SCADA system at low cost. In addition, the integration of standard protocols (e.g. TCP/IP) into COTS products has increased their connectivity, thereby increasing productivity and profitability. However, this shift from proprietary and customized products to standard ones exposes these systems to cyber threats (Oman et al., 2000). Undoubtedly, any attack targeting SCADA systems could lead to high financial losses and serious impacts on public safety and the environment. The attack on the sewage treatment system in Maroochy Shire (Australia) is an example of such attacks on critical infrastructures (Slay and Miller, 2007), where the attacker took over the control devices of a SCADA system. The Stuxnet (Falliere et al., 2011) worm, which was designed to damage nuclear power plants in Iran, is a recent example of threats targeting control systems. Both of the aforementioned attacks are classified as man-in-the-middle (MITM) attacks, where control devices are compromised to perform malicious actions, and meanwhile false information is sent to the Master Terminal Unit (MTU) to avoid detection. Such cyber threats allow attackers to perform high-level control actions (Wei et al., 2011, Queiroz et al., 2011, Nicholson et al., 2012), and pose potential threats to SCADA systems.

An awareness of the potential threats to, as well as the need to reduce the various vulnerabilities of SCADA systems have recently become an important research focus in the area of security. A number of (security) measures have been used in traditional IT systems, including management, filtering, encryption and intrusion detection. However, such measures cannot be directly applied to SCADA systems without considering their specific characteristics. Additionally, none of these traditional IT security solutions can completely protect SCADA systems from potential cyber attacks. However, properly adapting/extending such IT solutions can create robust protection of SCADA systems against cyber attacks. IDS (Intrusion Detection System) is one of the security solutions that has showed promising results in detecting malicious activities in traditional IT systems, and this is one of the reasons for using and adapting it to SCADA environments.

To illustrate the intrusion detection problem, two well-known scenarios (Verba and Milvich, 2008) are considered. Fig. 1 illustrates an attacker compromising the front end processor (FEP) by carrying out three actions: (i) initialising a connection with a remote terminal unit (RTU1.1) and sending a command without receiving a corresponding command from the application server; (ii) dropping the command sent from the application server to RTU1.1, and frogging feedback information sent back to the application server to meet the attack; and (iii) frogging the command sent from the application server to RTU1.1, as well as frogging feedback information sent back from RTU1.1 to the application server. All commands sent to RTU1.1 will be trusted, as they are syntactically valid and sent from an FEP.

Two inconsistent data can be identified in this scenario: an inconsistent network traffic pattern and (ii) an inconsistent SCADA data. The former relates to the following: (i) an FEP is not an intelligent device that can make a decision and send a command to RTU1.1 without receiving a corresponding command; (ii) and the dropped command at FEP will be shown up in the network stream from the application server to the FEP, but not in the network stream from the FEP to the RTU1.1, while the frogged commands between the application server and RTU1.1 can be identified by the inconsistent SCADA data. For example, the command in the network stream from the application server to the FEP shows that the status of pump1 is ON, while in the network stream from the FEP to the RTU1.1, it is OFF. Clearly, the inconsistencies in this scenario shows that the aforementioned MITM attacks are performed by the FEP. In what follow, however, we show a scenario where the monitoring of inconsistencies fails to detect MITM attacks.

Let's consider the example shown in Fig. 2. This example illustrates an attacker compromising an intelligent application server that can initiate independent actions. It drops commands sent from the operator, and therefore an unsafe situation could be created. An attacker initialises a command from the application server to turn off pump1, and it can be seen that both the network traffic stream and the SCADA data between RUT2.1 and the application server are consistent for this command. However, the SCADA data, such as the speed and the status of pump1, could be inconsistent with the sensory node of the water level in RTU2.2, as they are set to values that violate the specifications of the system from the operational perspective.

The evolution of SCADA data can reflect the system's state: consistent or inconsistent. Therefore, the monitoring of the SCADA data has been proposed as an efficient tailored IDS for SCADA environments. The detection methods are broadly categorized into two types: signature-based and anomaly-based. The former can detect only an attack whose signature is already known, while the latter can detect unknown attacks by looking for activities that deviate from an expected patterns (or behaviours). Learning the anomaly-based detection models can be performed via three modes, namely supervised, semi-supervised and unsupervised. The class labels must be available for the first mode; however, this type of learning is costly and time-consuming because domain experts are required to label hundreds of thousands of data observations. The second mode is based on the assumption that the training data set represents only one behaviour, either normal or abnormal. There are a number of issues pertaining to this mode. The system has to operate for a long time under normal conditions in order to obtain purely normal data that comprehensively represent normal behaviours. However, there is no guarantee that any anomalous activity will occur during the data collection period. On the another hand, it is difficult to obtain a training data set that covers all possible anomalous behaviours that could occur in the future. Alternatively, the unsupervised mode can be an appropriate solution to address the aforementioned issues, where the anomaly detection models can be learned from unlabelled data without prior knowledge about normal/abnormal behaviours. However, the poor efficiency and low accuracy this type of learning are challenging.

This paper proposes a novel unsupervised SCADA data-driven anomaly detection approach intended to be used as a passive SCADA IDS. That is, it only raises alarms when suspicious activities are detected, and the appropriate responses will be left for a system administrator. The SCADA data, which are generated by sensors/actuators, are used as valuable information in the proposed approach. Fig. 3 shows the two main steps of the proposed approach: the identification of consistent/inconsistent states from unlabelled SCADA data, and the extraction of proximity-based detection rules for each behaviour.

The use of control data has attracted the attention of many researchers studying SCADA data-driven anomaly detection models that are able to learn the mechanistic behaviour of SCADA systems without knowledge of the physical behaviour of such systems (Rrushi, April 2009, Marton et al., 2013, Gao et al., 2010, Zaher et al., 2009). Such studies however can operate only in two learning modes: supervised and semi-supervised. Despite the promising results of these learning modes, there are a number of issues that restrict their use (see the previous Section 1.1). This paper proposes an unsupervised learning approach, which consists of two novel techniques. The first one is used to identify consistent/inconsistent states from unlabelled data. This is performed by giving an inconsistency score to each observation using the density factor for the k-nearest neighbours of the observation. An optimal inconsistency threshold is later computed to separate inconsistent from consistent observations. The second proposed technique extracts proximity-based detection rules for each behaviour, whether inconsistent or consistent. During this phase, the fixed-width clustering technique (Eskin et al., 2002) is used to cluster each behaviour individually into micro-clusters with a constant fixed width, which is statistically determined. The centroids of all the created micro-clusters are used as the proximity-detection rules that are assumed to form a small and most representative data set for both inconsistent and consistent behaviours in the training data set.

The proposed approach is evaluated on both real and simulated data sets; two are generated by a simulation of a SCADA system that uses well-known models as discussed in Section 4.1, while the third is real and consists of consistent/inconsistent observations. In particular, we compared the effectiveness of our unsupervised approach with existing unsupervised and semi-supervised anomaly detection approaches.

This paper is organised as follows. Section 3 provides a characterisation of consistent/inconsistent observation states for SCADA data, as well as the details of the proposed approach. Section 4 presents the experimental setup, followed by results and analysis in Section 5. Finally, we conclude the work in Section 6.

Section snippets

Related work

In the design of an IDS, two main processes are often considered. First is the selection of the information source (e.g. network-based, application-based) to be used, through which anomalies can be detected. Second is the development of a learning (or analysis) method that is used to efficiently build the detection model using the specified information source. SCADA-specific IDSs can be broadly grouped into three categories in terms of the latter process: misuse (signature-based) detection (

The proposed intrusion detection approach

This section describes consistent/inconsistent states of SCADA data, as well as the techniques that contribute to the development of an unsupervised intrusion detection method to detect SCADA-based integrity attacks. Specifically, the proposed approach consists of (i) a technique that identifies consistent and inconsistent multivariate SCADA data, and (ii) a technique that extracts proximity-based detection rules used to perform a near-real-time monitoring of integrity attacks. Fig. 3

Experimental setup

The main focus of this section is to set up an experimental environment to evaluate the robustness of the proposed approach. In what follows, we describe the simulation system used and two integrity attacks. We also describe the data sets used and the experimental parameters chosen for this evaluation.

Results and analysis

This section evaluates the accuracy of anomaly detection of the proposed unsupervised approach, and in addition, a comparison between this approach and two existing unsupervised and semi-supervised anomaly detection approaches is carried out. The detection accuracy for each approach is separately evaluated because the existing approaches that have been chosen as a basis for comparison with the proposed approach are inherently different in terms of the required parameters for learning anomaly

Conclusion

In this paper, we proposed an innovative unsupervised SCADA data-driven anomaly detection approach to detect integrity attacks tailored to SCADA systems. This has been done by initially identifying the consistent and inconsistent states of SCADA data automatically, and then also automatically extracting proximity-based detection rules from the identified states to detect inconsistent states. Experimental results show the ability of the proposed approach to automatically identify consistent and

Abdulmohsen Almalawi received his B.S. degree in Computer Science from King Abdul Aziz University, Jeddah, Saudi Arabia, in 2003. He received his M.S. degree in 2008 from RMIT University, Melbourne, Australia, and he is currently a Ph.D. candidate in the Department of Computer Science and Information Technology at the University of RMIT. His research interests are in the areas of machine learning, and SCADA security.

References (56)

  • M.M. Breunig et al.

    Lof: identifying density-based local outliers

  • A. Carcano et al.

    A multidimensional critical state analysis for detecting intrusions in SCADA systems

    IEEE Trans Ind Inform

    (2011)
  • S. Cheung et al.

    Using model-based intrusion detection for SCADA networks

  • Digitalbond

    IDS-signatures of Modbus/TCP

    (2013)
  • E. Eskin et al.

    A geometric framework for unsupervised anomaly detection

  • A. Fahad et al.

    Toward an efficient and scalable feature selection approach for internet traffic classification

    Comput Netw

    (2013)
  • N. Falliere et al.

    W32. stuxnet dossier: version 1.4

    (2011)
  • E.B. Fernandez et al.

    Designing secure scada systems using security patterns

  • I.N. Fovino et al.

    Modbus/DNP3 state-based intrusion detection system

  • I.N. Fovino et al.

    Critical state-based filtering system for securing SCADA network protocols

    IEEE Trans Ind Electron

    (2012)
  • A. Frank et al.

    UCI machine learning repository

    (2013)
  • K. Fukunaga et al.

    A branch and bound algorithm for computing k-nearest neighbors

    IEEE Trans Comput

    (1975)
  • W. Gao et al.

    On scada control system command and response injection and intrusion detection

  • P. Gross et al.

    Secure selecticast for collaborative intrusion detection systems

  • M. Hall et al.

    The weka data mining software: an update

    ACM SIGKDD Explor Newsl

    (2009)
  • K. Hempstalk et al.

    One-class classification by combining density and class probability estimation

  • M. IDA

    Modbus messaging on TCP/IP implementation guide v1.0a

    (2013)
  • M. Jianliang et al.

    The application on intrusion detection based on k-means cluster algorithm

  • Cited by (107)

    • A novel approach for accurate detection of the DDoS attacks in SDN-based SCADA systems based on deep recurrent neural networks

      2022, Expert Systems with Applications
      Citation Excerpt :

      During identification, density factor and discrepancy score were calculated with KNN. An optimal inconsistency threshold was determined to distinguish between consistent and inconsistent states (Almalawi, Yu, Tari, Fahad, & Khalil, 2014). Hindy et al. have created a model that detects anomalies in the water system controlled by SCADA.

    View all citing articles on Scopus

    Abdulmohsen Almalawi received his B.S. degree in Computer Science from King Abdul Aziz University, Jeddah, Saudi Arabia, in 2003. He received his M.S. degree in 2008 from RMIT University, Melbourne, Australia, and he is currently a Ph.D. candidate in the Department of Computer Science and Information Technology at the University of RMIT. His research interests are in the areas of machine learning, and SCADA security.

    Xinghuo Yu Xinghuo Yu is currently with the RMIT University, Melbourne, Australia, where he is the Director of the RMIT Platform Technologies Research Institute. He has published over 350 refereed papers in technical journals, books, and conference proceedings. His research interests include variable structure and nonlinear control, complex and intelligent systems, and industrial applications. Prof. Yu is a Fellow of the Institution of Engineers Australia and the Australian Computer Society. He is currently serving as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS PART I, IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS.

    Zahir Tari is a full professor at RMIT University. He is also the direction of the DSN (Distributed Systems and Networking) discipline at the School of Computer Science and IT, RMIT (Australia). His main research areas are in performance and security in various areas of application (e.g. Web servers, Content Delivery Networks, SCADA systems etc). Prof Tari regularly publishes in reputable journals and conferences. He acted as the program committee chair as well as general chair over fifteen international conferences (e.g. DOA, CoopIS, ODBASE, GADA, IFIP DS 11.3 on Database Security). He is also the co-author of a few books. He has also been General Chair of more than 12 conferences. He is the recipient of 14 Australian Research Council (ARC) grants. More details about Zahir and his team can be found at http://www.cs.rmit.edu.au/dsn.

    Adil Fahad received his B.S. degree in Computer Science from King Abdul Aziz University, Jeddah, Saudi Arabia, in 2003. He received his M.S. degree (with high distinction) in 2008 from RMIT University, Melbourne, Australia, and he is currently a Ph.D. candidate in the Department of Computer Science and Information Technology at the University of RMIT. He joined the University of Albaha as a lecturer in 2009 and took a leave of absence in 2010 for his Ph.D. studies. His research interests are in the areas of wireless sensor networks, mobile networks, SCADA security and ad-hoc networks with emphasis on data mining, statistical analysis/modelling and machine learning.

    Ibrahim Khalil received the Ph.D. degree from the University of Berne, Berne, Switzerland, in 2003. He is a Senior Lecturer in the School of Computer Science and IT, RMIT University, Melbourne, Australia. He has several years of experience in Silicon Valley-based companies working on Large Network Provisioning and Management software. He also worked as an academic in several research universities. Before joining RMIT, he worked for EPFL and University of Berne in Switzerland and Osaka University in Japan. His research interests are quality of service, wireless sensor networks, and remote healthcare.

    View full text