Download files
Access & Terms of Use
open access
Embargoed until 2023-12-15
Copyright: Wang, Xudong
Embargoed until 2023-12-15
Copyright: Wang, Xudong
Altmetric
Abstract
Binary analysis seeks to comprehend the behavior and semantics of commercial-off-the-shelf (COTS) executable programs, which are fully stripped of both source code and debugging information. This form of analysis holds great significance across various contexts, including vulnerability detection, reverse engineering, memory forensics, binary hardening, and binary rewriting. Despite its critical relevance, binary analysis has predominantly relied on manual processes, demanding substantial expertise and human involvement.
Among the diverse goals within binary analysis, two pivotal challenges stand out: type recovery and pointer analysis. These challenges underpin a multitude of subsequent tasks. The existing research landscape concerning these areas remains somewhat limited: most type recovery techniques primarily address basic C types, while binary pointer analysis tools often encounter scalability issues.
This thesis introduces innovative approaches aimed at enhancing these aspects. For type recovery, a comprehensive method is presented, capable of recovering intricate C++ container types. This involves a novel type-relevant slicing algorithm, coupled with the application of a graph convolution network for learning and predicting types based on provided addresses. In the domain of binary pointer analysis, an expansion of the SVF framework-—recognized as a cutting-edge pointer analysis tool for C/C++ programs—-is put forth. This extension introduces an additional layer of abstraction named SVF IR. It involves converting binary programs to SVF IR, enabling the utilization of rapid algorithms within the SVF framework.
A rigorous evaluation is conducted, involving the algorithms introduced in this thesis. This evaluation encompasses binary programs derived from real-world projects, exposing substantial effectiveness. The contributions of this thesis are anticipated to significantly advance the automation within the wider binary analysis community.