Abstract:
Analysis of binary code is a building block of computer security. Especially in malware or firmware analysis where source code oftentimes is not available, techniques lik...Show MoreMetadata
Abstract:
Analysis of binary code is a building block of computer security. Especially in malware or firmware analysis where source code oftentimes is not available, techniques like decompilation are utilized to Figure out the functionality of binaries. During the optimization phase in modern compilers, human-readable expressions are often transformed into instruction sequences (compiler idioms or idioms) that may be more efficient in terms of speed or size than the direct translation. However, these transformations are often considerably worse in terms of readability for the analyst. Such compiler specific sequences are not only significantly longer than the apparent translation of the original high-level language operation but also have no trivial correlation to the original expression’s semantics. Modern decompilers address this issue by reverting idioms using static, manually crafted rules. In this paper, we introduce a novel approach to find and annotate arithmetic idioms with their corresponding high-level language expressions to significantly simplify manual analysis. In contrast to previous approaches, our method does not require manual work to create the patterns for matching idioms and significantly less manual labour to derive the transformation rules to calculate the original constants. In our evaluation, we compared the results of PIdARCI against the current academic and commercial state-of-the-art Ghidra, RetDec, and Hex Rays / IDA Pro. We show that PIdARCI matches more than 99% of all considered idioms, exceeding the matching rate of the other approaches.
Date of Conference: 13-15 December 2021
Date Added to IEEE Xplore: 21 December 2021
ISBN Information: