Elsevier

Computers & Security

Volume 87, November 2019, 101582
Computers & Security

Using memory propagation tree to improve performance of protocol fuzzer when testing ICS

https://doi.org/10.1016/j.cose.2019.101582Get rights and content

Abstract

Protocol fuzzers are widely used for finding vulnerabilities and security bugs in the program. The main techniques used by protocol fuzzers can be divided into 2 categories: generation-based and mutation-based fuzzing. The generation-based fuzzing generates data messages using an official specification (i.e., grammar), while the mutation-based fuzzing performs random transformations on a prepared message. But these two types of fuzzing techniques are ineffective or inefficient when testing industrial control system (ICS), because many ICS protocols are unknown, undocumented or proprietary. The generation-based fuzzing cannot work well without specifications, while the mutation-based fuzzing cannot achieve high branch coverage. In this paper, we present Miff (abbreviation of the system using “M”P tree to “i”mprove per“f”ormance of “f”uzzer) that aims at automatically abstracting data models from ICS messages. The data model generated by Miff can be used to direct protocol fuzzers to test ICS. Miff has three processing stages: (1) by instrumenting and monitoring program execution, Miff obtains the execution context information, builds memory propagation (MP) tree for every byte in the message, and identifies protocol field boundaries based on the similarity between MP trees; (2) by using information-theoretic measures, Miff infers the type of every field; (3) according to analysis results of the first two stages, Miff decides the mutation strategy for every field, which combines with the field boundary and type information to form the data model. We have implemented a prototype of Miff and applied it into 4 open-source protocol fuzzers. Our experimental results show that, Miff can enable the generation-based fuzzing to test ICS even if the specification is absent, and improve the performance of the mutation-based fuzzing to achieve higher branch coverage with less test cases.

Introduction

A vulnerability is an error, failure or flaw in a computer software or hardware. Vulnerabilities, which cause the software or hardware to produce an unintended result or to behave in an unexpected way, usually stem from the source code or design. Attackers can exploit vulnerabilities to steal important data, to execute arbitrary code, even to control the system completely. There are many test methods to eliminate the vulnerability, such as unit testing and code auditing. Among them, fuzzing is a highly effective technique.

The first fuzzer was proposed by Miller et al. (1990), and was developed to test the reliability of Unix tools. In the past 30 years, many types of fuzzers have been presented, including protocol fuzzers, file fuzzers, OS kernel fuzzers, and so on. The main techniques used by protocol fuzzers can be divided into generation-based and mutation-based fuzzing according to the input generation strategy (Kim et al., 2013). The generation-based fuzzing generates data messages using an official specification, e.g., grammar and model. The mutation-based fuzzing performs random transformations on a prepared message.

However these two types of protocol fuzzing techniques are ineffective or inefficient when testing industrial control system (ICS). Many ICS protocols are unknown, undocumented or proprietary, such as Siemens S7. For the closed ICS protocol, manually creating the input data specification for the generation-based fuzzing is often time-consuming and error-prone. Without good specifications, the generation-based fuzzing cannot work well. On the other hand, although the mutation-based fuzzing does not depend on the protocol specification, the mutated data deviates too much from the expected format, which causes the mutated data to be rejected too early in processing. It means that the mutation-based fuzzing cannot achieve high branch coverage.

In this paper, we present Miff that aims at automatically abstracting data models from ICS messages, which can be used to direct protocol fuzzers to test ICS. Miff has three processing stages: (1) identifying field boundaries of an ICS message; (2) inferring field types; (3) deciding the mutation strategies for each field. In the first stage, Miff is based on the key observation that bytes belonging to the same protocol field of a packet message have the same propagation traces in the memory, due to they are typically handled together. By dynamically analyzing program execution, Miff records the address for a message byte once it propagates from one place to another. Subsequently, all address records of a message byte compose a memory propagation tree. A n-byte message results in n memory propagation trees. Through comparing between memory propagation trees, Miff can decide whether two message bytes belong to the same protocol field or not. Further, based on the similarity between memory propagation trees, Miff can identify the field boundaries of a protocol message. In the second stage, by using information-theoretic measures, Miff infers the type of every field. The field types include message length, block length, function identifier, flag, transaction identifier, limited variable, and random variable. In the third stage, Miff decides the mutation strategy for every field according to analysis results of the first two stages. For example, the transaction identifier field should not be changed in a conversation. At last, the three-tuple (field boundary, field type, mutation strategy) forms the data model, which can direct protocol fuzzers to generate ICS messages. We have implemented a prototype of Miff and applied it into 4 open-source protocol fuzzers. Our experimental results show that, the generation-based fuzzing directed by Miff without any specification can almost achieve the same branch coverage (99.5%) with that generates input data according to the protocol specification, while Miff can improve the performance of the mutation-based fuzzing to achieve higher branch coverage (increased by 24%) with less test cases (decreased by 53%) than before.

The contributions of this paper are the following:

  • 1.

    We present a novel approach to analyze the movement trace of message bytes in the memory. We use memory propagation (MP) tree as storage structure to record all movement traces of a message byte. Then we describe in detail the way how to compare between MP trees. The comparison result embodies the similarity between MP trees.

  • 2.

    We present Miff, an MP-tree-based approach to abstract data models from ICS messages, which can direct protocol fuzzers to test ICS. Miff can identify field boundaries, infer field types, and decide field mutation strategies, automatically.

  • 3.

    We applied our techniques to a set of open-source protocol fuzzers, including Kitty, Peach, Dizzy, and Sulley. And we have tested 3 ICS protocols (i.e., Modbus/TCP, IEC 60870-5-104, and Siemens S7) by using these fuzzers. Our results show that, Miff can enable the generation-based fuzzing to test ICS even if the specification is absent, and improve the performance of the mutation-based fuzzing to achieve higher branch coverage with less test cases.

Section snippets

Goal and motivation

In this paper, we focus on improving performance of existing protocol fuzzers when testing ICS. Our goal is to design an approach that, given enough messages of an ICS protocol and an application that can process these messages, automatically generates data models, which can improve performance of existing protocol fuzzers.

As mentioned above, we assume that the application, which can parse the protocol message, could be obtained. Though these appliances that run on very “special” hardware seem

System overview

Miff is interested in how to abstract data models from ICS messages, automatically. Fig. 2 shows an architectural overview of Miff, which has three processing stages:

  • Stage 1: Field boundary identification.

    For each incoming ICS protocol message, Miff marks the received bytes as tainted data, and keeps tracks of their propagation at the byte granularity. Miff saves the tracks of propagation as memory propagation records. According to memory propagation records, Miff constructs MP trees. The root

Field boundary identification

In this section, we describe the first stage of Miff in detail. The field boundary identification stage parses one message each time. It includes three main steps: (1) execution monitor; (2) MP tree generation; and (3) boundary identification.

Field type inference

In the field type inference stage, Miff uses different statistical tests to infer types for each field obtained in the previous stage. Miff infers the field types in the following order: (1) message and block length; (2) function code; (3) flag; (4) sequence number; (5) limited variable; and (6) random variable. An ICS message may not have all these field types. For Miff, missing a type of field will not affect the inference process of others. In this section, we describe the scheme of type

Mutation strategy decision

Based on the information about field boundary and type, Miff decides the mutation strategies for each field. The mutation strategy together with field boundary and type constitutes the data model, which can be used to direct existing protocol fuzzers to generate test cases. In this section, we introduce how to make mutation strategies for different fields.

Evaluation

We have implemented a Miff prototype on Linux 3.16 (Debian 8.5.0). The execution monitor module of Miff extends the instrumentation tool Pin (Luk et al., 2005) (version 2.14-71313). However, we note that our design is not tightly coupled with Pin, and can be implemented using other instrumentation tools, e.g., Valgrind (Nethercote and Seward, 2007).

We have applied Miff into 4 open-source protocol fuzzers, to evaluate it. These fuzzers includes Kitty (2018), Peach community (2018), Dizzy (2018),

Related works

The first fuzzer was proposed by Miller et al. (1990), and was developed to test the reliability of Unix tools. In the past 30 years, many types of fuzzers have been presented, including protocol fuzzers, file fuzzers, OS kernel fuzzers, and so on.

Limitations and future work

The first limitation of Miff is the granularity of dynamic taint analysis. To balance out the costs and benefits, we choose 1-byte as the minimum unit when tracing taint data. But some ICS protocol fields are not byte-aligned. In other words, if a protocol field is not byte-aligned, Miff cannot infer the field boundary accurately. Secondly, Miff is the dynamic trace dependency. If the implementation of an ICS protocol ignores some message fields, Miff cannot discover the boundaries of these

Conclusion

We have presented Miff that aims at automatically abstracting data models from ICS messages. The data model generated by Miff can be used to direct protocol fuzzers to test ICS. By instrumenting and monitoring program execution, Miff obtains the execution context information, builds memory propagation (MP) tree for every byte in the message, and identifies protocol field boundaries based on the similarity between MP trees. By using information-theoretic measures, Miff infers the type of every

Declaration of Competing Interest

None.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (2017YFB1010000). We would like to thank all anonymous reviewers for helping us make this paper better.

Kai Chen received the Ph.D degree from the University of Chinese Academy of Science (UCAS), Beijing, in 2015. He is research associate at State Key Laboratory of Information Security, Institute of Information Engineering (IIE) Chinese Academy of Sciences. He is also a CISSP and PMP. His research interests include industrial control system security, network security, and authentication. He focus on using dynamic analysis and machine learning to improve the security of industrial control system.

References (44)

  • libmodbus.org, 2013. libmodbus v3.0.6. [Online]. Available:...
  • J. Narayan et al.

    A survey of automatic protocol reverse engineering tools

    ACM Comput. Surv.

    (2016)
  • 2018. Peach community. [Online]. Available:...
  • Afl. 2018. [Online] Available:...
  • J. Antunes et al.

    Reverse engineering of protocols from network traces

    Working Conference on Reverse Engineering, Wcre 2011, Limerick, Ireland

    (2011)
  • G. Banks et al.

    Snooze: toward a stateful network protocol fuzzer

  • Beddoe, M. A., 2012. Network protocol analysis using bioinformatics...
  • M. Böhme et al.

    Regression tests to expose change interaction errors

    Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

    (2013)
  • M. Böhme et al.

    Coverage-based greybox fuzzing as markov chain

    Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

    (2016)
  • D. Brumley et al.

    Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation

    (2007)
  • J. Caballero et al.

    Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering

    ACM Conference on Computer and Communications Security, CCS 2009, Chicago, Illinois, USA

    (2009)
  • J. Caballero et al.

    Polyglot:automatic extraction of protocol message format using dynamic binary analysis

    ACM Conference on Computer and Communications Security, CCS 2007, Alexandria, Virginia, USA

    (2007)
  • C. Cadar et al.

    Klee: unassisted and automatic generation of high-coverage tests for complex systems programs

    Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation

    (2008)
  • S.K. Cha et al.

    Unleashing mayhem on binary code

    2012 IEEE Symposium on Security and Privacy

    (2012)
  • K. Chen et al.

    Automatic identification of industrial control network protocol field boundary using memory propagation tree

    20th International Conference on Information and Communications Security (ICICS 2018), Lille, France

    (2018)
  • V. Chipounov et al.

    S2e: a platform for in-vivo multi-path analysis of software systems

    SIGPLAN Not.

    (2011)
  • W. Cui et al.

    Discoverer: automatic protocol reverse engineering from network traces

    16th USENIX Security Symposium, Berkeley, CA, USA

    (2007)
  • W. Cui et al.

    Protocol-independent adaptive replay of application dialog

    Network and Distributed System Security Symposium, NDSS 2006, San Diego, California, USA

    (2006)
  • W. Cui et al.

    Tupni: automatic reverse engineering of input formats

    ACM Conference on Computer and Communications Security, CCS 2008, Alexandria, VA, USA

    (2008)
  • Dizzy. 2018. [Online]. Available:...
  • M. Egele et al.

    Dynamic spyware analysis

    2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference

    (2007)
  • V. Ganesh et al.

    Taint-based directed whitebox fuzzing

    2009 IEEE 31st International Conference on Software Engineering

    (2009)
  • Cited by (7)

    View all citing articles on Scopus

    Kai Chen received the Ph.D degree from the University of Chinese Academy of Science (UCAS), Beijing, in 2015. He is research associate at State Key Laboratory of Information Security, Institute of Information Engineering (IIE) Chinese Academy of Sciences. He is also a CISSP and PMP. His research interests include industrial control system security, network security, and authentication. He focus on using dynamic analysis and machine learning to improve the security of industrial control system.

    Chen Song received the M.S. degree from Herbin Institute of Technology, Harbin, China. She is currently the Associate Research Assistant at Institute of Information Engineering Chinese Academy of Sciences. Hers current research insterests include network security and cloude security.

    Liming Wang received the Ph.D. degrees from the Institute of Software Chinese Academy of Sciences, Beijing, China. He is the associate professor at Institute of Information Engineering Chinese Academy of Sciences, where his research covered cloud security, big data security and network security.

    Zhen Xu received the M.S. and Ph.D. degrees from the Institute of Software Chinese Academy of Sciences, Beijing, China. He is currently the research professor at Institute of Information Engineering Chinese Academy of Sciences. His current research interests include network security, trusted computing and cloud security.

    View full text