Abstract:
Reverse engineering is a key technique to perform security audits and malware analysis. However, existing tools have severe limitations in terms of recovering a correct, ...Show MoreMetadata
Abstract:
Reverse engineering is a key technique to perform security audits and malware analysis. However, existing tools have severe limitations in terms of recovering a correct, semantics-preserving representation of the program behavior. This work introduces rev. ng [1] [2], a reverse engineering framework based on QEMU [3] and LLVM [4]. QEMU is an emulator able to handle about 20 distinct architectures. We employ it to translate machine code into an Intermediate Representation (IR) independent from the input architecture. Then, we transform this representation into the IR employed by LLVM, a robust open source compiler framework. This allows rev. ng to easily recompile the obtained representation to any of the 10 architectures supported by LLVM. Currently, we can analyze large programs (e.g., GCC and Perl) and recompile them, obtaining programs that preserve their original behavior. This allows us to quickly validate the analyses that we designed. In fact, one of the key goals of rev. ng is to preserve the semantics of the analyzed program across all the transformations performed on the code. As a consequence, we can instrument programs to get deeper insights on their behavior. In addition, we can identify functions, their inputs, and try to generate input values that lead the program into an invalid state. Such inputs can be generated through coverage-guided fuzzing, a technique known for being extremely effective in finding security vulnerabilities. In this paper we introduce the architecture of rev. ng, demonstrate the performances of the generated code, and discuss a case study, to show how we can easily generate inputs triggering invalid program states using the above mentioned techniques.
Date of Conference: 22-25 October 2018
Date Added to IEEE Xplore: 23 December 2018
ISBN Information: