Using memory propagation tree to improve performance of protocol fuzzer when testing ICS
Introduction
A vulnerability is an error, failure or flaw in a computer software or hardware. Vulnerabilities, which cause the software or hardware to produce an unintended result or to behave in an unexpected way, usually stem from the source code or design. Attackers can exploit vulnerabilities to steal important data, to execute arbitrary code, even to control the system completely. There are many test methods to eliminate the vulnerability, such as unit testing and code auditing. Among them, fuzzing is a highly effective technique.
The first fuzzer was proposed by Miller et al. (1990), and was developed to test the reliability of Unix tools. In the past 30 years, many types of fuzzers have been presented, including protocol fuzzers, file fuzzers, OS kernel fuzzers, and so on. The main techniques used by protocol fuzzers can be divided into generation-based and mutation-based fuzzing according to the input generation strategy (Kim et al., 2013). The generation-based fuzzing generates data messages using an official specification, e.g., grammar and model. The mutation-based fuzzing performs random transformations on a prepared message.
However these two types of protocol fuzzing techniques are ineffective or inefficient when testing industrial control system (ICS). Many ICS protocols are unknown, undocumented or proprietary, such as Siemens S7. For the closed ICS protocol, manually creating the input data specification for the generation-based fuzzing is often time-consuming and error-prone. Without good specifications, the generation-based fuzzing cannot work well. On the other hand, although the mutation-based fuzzing does not depend on the protocol specification, the mutated data deviates too much from the expected format, which causes the mutated data to be rejected too early in processing. It means that the mutation-based fuzzing cannot achieve high branch coverage.
In this paper, we present Miff that aims at automatically abstracting data models from ICS messages, which can be used to direct protocol fuzzers to test ICS. Miff has three processing stages: (1) identifying field boundaries of an ICS message; (2) inferring field types; (3) deciding the mutation strategies for each field. In the first stage, Miff is based on the key observation that bytes belonging to the same protocol field of a packet message have the same propagation traces in the memory, due to they are typically handled together. By dynamically analyzing program execution, Miff records the address for a message byte once it propagates from one place to another. Subsequently, all address records of a message byte compose a memory propagation tree. A n-byte message results in n memory propagation trees. Through comparing between memory propagation trees, Miff can decide whether two message bytes belong to the same protocol field or not. Further, based on the similarity between memory propagation trees, Miff can identify the field boundaries of a protocol message. In the second stage, by using information-theoretic measures, Miff infers the type of every field. The field types include message length, block length, function identifier, flag, transaction identifier, limited variable, and random variable. In the third stage, Miff decides the mutation strategy for every field according to analysis results of the first two stages. For example, the transaction identifier field should not be changed in a conversation. At last, the three-tuple (field boundary, field type, mutation strategy) forms the data model, which can direct protocol fuzzers to generate ICS messages. We have implemented a prototype of Miff and applied it into 4 open-source protocol fuzzers. Our experimental results show that, the generation-based fuzzing directed by Miff without any specification can almost achieve the same branch coverage (99.5%) with that generates input data according to the protocol specification, while Miff can improve the performance of the mutation-based fuzzing to achieve higher branch coverage (increased by 24%) with less test cases (decreased by 53%) than before.
The contributions of this paper are the following:
- 1.
We present a novel approach to analyze the movement trace of message bytes in the memory. We use memory propagation (MP) tree as storage structure to record all movement traces of a message byte. Then we describe in detail the way how to compare between MP trees. The comparison result embodies the similarity between MP trees.
- 2.
We present Miff, an MP-tree-based approach to abstract data models from ICS messages, which can direct protocol fuzzers to test ICS. Miff can identify field boundaries, infer field types, and decide field mutation strategies, automatically.
- 3.
We applied our techniques to a set of open-source protocol fuzzers, including Kitty, Peach, Dizzy, and Sulley. And we have tested 3 ICS protocols (i.e., Modbus/TCP, IEC 60870-5-104, and Siemens S7) by using these fuzzers. Our results show that, Miff can enable the generation-based fuzzing to test ICS even if the specification is absent, and improve the performance of the mutation-based fuzzing to achieve higher branch coverage with less test cases.
Section snippets
Goal and motivation
In this paper, we focus on improving performance of existing protocol fuzzers when testing ICS. Our goal is to design an approach that, given enough messages of an ICS protocol and an application that can process these messages, automatically generates data models, which can improve performance of existing protocol fuzzers.
As mentioned above, we assume that the application, which can parse the protocol message, could be obtained. Though these appliances that run on very “special” hardware seem
System overview
Miff is interested in how to abstract data models from ICS messages, automatically. Fig. 2 shows an architectural overview of Miff, which has three processing stages:
Stage 1: Field boundary identification.
For each incoming ICS protocol message, Miff marks the received bytes as tainted data, and keeps tracks of their propagation at the byte granularity. Miff saves the tracks of propagation as memory propagation records. According to memory propagation records, Miff constructs MP trees. The root
Field boundary identification
In this section, we describe the first stage of Miff in detail. The field boundary identification stage parses one message each time. It includes three main steps: (1) execution monitor; (2) MP tree generation; and (3) boundary identification.
Field type inference
In the field type inference stage, Miff uses different statistical tests to infer types for each field obtained in the previous stage. Miff infers the field types in the following order: (1) message and block length; (2) function code; (3) flag; (4) sequence number; (5) limited variable; and (6) random variable. An ICS message may not have all these field types. For Miff, missing a type of field will not affect the inference process of others. In this section, we describe the scheme of type
Mutation strategy decision
Based on the information about field boundary and type, Miff decides the mutation strategies for each field. The mutation strategy together with field boundary and type constitutes the data model, which can be used to direct existing protocol fuzzers to generate test cases. In this section, we introduce how to make mutation strategies for different fields.
Evaluation
We have implemented a Miff prototype on Linux 3.16 (Debian 8.5.0). The execution monitor module of Miff extends the instrumentation tool Pin (Luk et al., 2005) (version 2.14-71313). However, we note that our design is not tightly coupled with Pin, and can be implemented using other instrumentation tools, e.g., Valgrind (Nethercote and Seward, 2007).
We have applied Miff into 4 open-source protocol fuzzers, to evaluate it. These fuzzers includes Kitty (2018), Peach community (2018), Dizzy (2018),
Related works
The first fuzzer was proposed by Miller et al. (1990), and was developed to test the reliability of Unix tools. In the past 30 years, many types of fuzzers have been presented, including protocol fuzzers, file fuzzers, OS kernel fuzzers, and so on.
Limitations and future work
The first limitation of Miff is the granularity of dynamic taint analysis. To balance out the costs and benefits, we choose 1-byte as the minimum unit when tracing taint data. But some ICS protocol fields are not byte-aligned. In other words, if a protocol field is not byte-aligned, Miff cannot infer the field boundary accurately. Secondly, Miff is the dynamic trace dependency. If the implementation of an ICS protocol ignores some message fields, Miff cannot discover the boundaries of these
Conclusion
We have presented Miff that aims at automatically abstracting data models from ICS messages. The data model generated by Miff can be used to direct protocol fuzzers to test ICS. By instrumenting and monitoring program execution, Miff obtains the execution context information, builds memory propagation (MP) tree for every byte in the message, and identifies protocol field boundaries based on the similarity between MP trees. By using information-theoretic measures, Miff infers the type of every
Declaration of Competing Interest
None.
Acknowledgments
This work was supported by the National Key Research and Development Program of China (2017YFB1010000). We would like to thank all anonymous reviewers for helping us make this paper better.
Kai Chen received the Ph.D degree from the University of Chinese Academy of Science (UCAS), Beijing, in 2015. He is research associate at State Key Laboratory of Information Security, Institute of Information Engineering (IIE) Chinese Academy of Sciences. He is also a CISSP and PMP. His research interests include industrial control system security, network security, and authentication. He focus on using dynamic analysis and machine learning to improve the security of industrial control system.
References (44)
- libmodbus.org, 2013. libmodbus v3.0.6. [Online]. Available:...
- et al.
A survey of automatic protocol reverse engineering tools
ACM Comput. Surv.
(2016) - 2018. Peach community. [Online]. Available:...
- Afl. 2018. [Online] Available:...
- et al.
Reverse engineering of protocols from network traces
Working Conference on Reverse Engineering, Wcre 2011, Limerick, Ireland
(2011) - et al.
Snooze: toward a stateful network protocol fuzzer
- Beddoe, M. A., 2012. Network protocol analysis using bioinformatics...
- et al.
Regression tests to expose change interaction errors
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
(2013) - et al.
Coverage-based greybox fuzzing as markov chain
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
(2016) - et al.
Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation
(2007)
Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering
ACM Conference on Computer and Communications Security, CCS 2009, Chicago, Illinois, USA
Polyglot:automatic extraction of protocol message format using dynamic binary analysis
ACM Conference on Computer and Communications Security, CCS 2007, Alexandria, Virginia, USA
Klee: unassisted and automatic generation of high-coverage tests for complex systems programs
Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation
Unleashing mayhem on binary code
2012 IEEE Symposium on Security and Privacy
Automatic identification of industrial control network protocol field boundary using memory propagation tree
20th International Conference on Information and Communications Security (ICICS 2018), Lille, France
S2e: a platform for in-vivo multi-path analysis of software systems
SIGPLAN Not.
Discoverer: automatic protocol reverse engineering from network traces
16th USENIX Security Symposium, Berkeley, CA, USA
Protocol-independent adaptive replay of application dialog
Network and Distributed System Security Symposium, NDSS 2006, San Diego, California, USA
Tupni: automatic reverse engineering of input formats
ACM Conference on Computer and Communications Security, CCS 2008, Alexandria, VA, USA
Dynamic spyware analysis
2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Taint-based directed whitebox fuzzing
2009 IEEE 31st International Conference on Software Engineering
Cited by (7)
Vulnerability mining for Modbus TCP based on exception field positioning
2020, Simulation Modelling Practice and TheoryCitation Excerpt :Hu used generative adversarial network to train real protocol messages, learn protocol syntax, then generate false but trusted messages, and found some errors in the Modbus TCP simulator [26]. Kai identified ICS protocol field boundaries by building memory propagation tree, so as to decide the mutation strategy for every field and achieved higher branch coverage [27]. In general, most studies on vulnerability mining focused on ICS platform vulnerability.
Survey on Vulnerability Mining Techniques of Network Protocol Software
2024, Ruan Jian Xue Bao/Journal of SoftwareSpenny: Extensive ICS Protocol Reverse Analysis via Field Guided Symbolic Execution
2023, IEEE Transactions on Dependable and Secure ComputingIntegrating Flow and Program Analysis for Enhanced Protocol Reverse Engineering
2023, 2023 20th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2023CGFuzzer: A Fuzzing Approach Based on Coverage-Guided Generative Adversarial Networks for Industrial IoT Protocols
2022, IEEE Internet of Things JournalModel-Based Grey-Box Fuzzing of Network Protocols
2022, Security and Communication Networks
Kai Chen received the Ph.D degree from the University of Chinese Academy of Science (UCAS), Beijing, in 2015. He is research associate at State Key Laboratory of Information Security, Institute of Information Engineering (IIE) Chinese Academy of Sciences. He is also a CISSP and PMP. His research interests include industrial control system security, network security, and authentication. He focus on using dynamic analysis and machine learning to improve the security of industrial control system.
Chen Song received the M.S. degree from Herbin Institute of Technology, Harbin, China. She is currently the Associate Research Assistant at Institute of Information Engineering Chinese Academy of Sciences. Hers current research insterests include network security and cloude security.
Liming Wang received the Ph.D. degrees from the Institute of Software Chinese Academy of Sciences, Beijing, China. He is the associate professor at Institute of Information Engineering Chinese Academy of Sciences, where his research covered cloud security, big data security and network security.
Zhen Xu received the M.S. and Ph.D. degrees from the Institute of Software Chinese Academy of Sciences, Beijing, China. He is currently the research professor at Institute of Information Engineering Chinese Academy of Sciences. His current research interests include network security, trusted computing and cloud security.