Skip to main content
Log in

Next-Generation Intermediate Representations for Binary Code Analysis

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for solving these problems should meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications to describe target instruction sets. External specifications facilitate maintenance and allow one to quickly implement support for new instruction sets. We analyze some of the most popular instruction set architectures (ISAs), including those used in microcontrollers, and from that compile a list of requirements for the retargetable decoder. We then overview existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layer directed acyclic graph that describes the decoding process in universal terms. The second component of the analysis platform is the actual architecture-neutral IR. In this paper, we describe such IRs and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, also being easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, simpler alternative to phi-functions, and extensible elementary operation set at the core. This IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most of the binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and generic address space model, in which accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Similar content being viewed by others

REFERENCES

  1. Wang, X., Zeldovich, N., Kaashoek, M.F., and Solar-Lezama, A., A differential approach to undefined behavior detection, ACM Trans. Comput. Syst., 2015, vol. 33, no. 1.

    Article  Google Scholar 

  2. Nethercote, N. and Seward, J., Valgrind: A framework for heavyweight dynamic binary instrumentation, ACM SIGPLAN Not., 2007, vol. 42, no. 6, pp. 89–100.

    Article  Google Scholar 

  3. Chipounov, V. and Candea, G., Enabling sophisticated analyses of x86 binaries with RevGen, Proc. IEEE/IFIP 41st Int. Conf. Dependable Systems and Networks Workshops (DSN-W), 2011, pp. 211–216.

  4. Lattner, C. and Adve, V., LLVM: A compilation framework for lifelong program analysis and transformation. Proc. Int. Symp. Code Generation and Optimization: Feedback-Directed and Runtime Optimization, 2004, pp. 75–86.

  5. Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., and Saxena, P., BitBlaze: A new approach to computer security via binary analysis, Inf. Syst. Secur., 2008, pp. 1–25.

  6. Padaryan, V.A., Solov’ev, M.A., and Kononov, A.I., Simulation of operational semantics of machine instructions, Program. Comput. Software, 2011, vol. 37, no. 3, pp. 161–170.

    Article  Google Scholar 

  7. Brumley, D., Jager, I., Avgerinos, T., and Schwartz, E.J., BAP: A binary analysis platform, Proc. Computer Aided Verification, 2011, pp. 463–469.

    Google Scholar 

  8. Dullien, T. and Porst, S., REIL: A platform-independent intermediate representation of disassembled code for static code analysis, Proc. CanSecWest Conf., 2009.

  9. Bellard, F., QEMU: A fast and portable dynamic translator, Proc. USENIX Annual Technical Conf., 2005.

  10. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., and Hazelwood, K., Pin: Building customized program analysis tools with dynamic instrumentation, ACM SIGPLAN Not., 2005, vol. 40, no. 6, pp. 190–200.

    Article  Google Scholar 

  11. Bruening, D. and Amarasinghe, S., Efficient, transparent, and comprehensive runtime code manipulation, PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2004.

  12. Chipounov, V. and Kuznetsov, V., S2E: A platform for in vivo multi-path analysis of software systems, Proc. 16th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2011.

  13. Cha, S.K., Avgerinos, T., Rebert, A., and Brumley, D., Unleashing mayhem on binary code, Proc. IEEE Symp. Security and Privacy (SP), 2012, pp. 380–394.

  14. Padaryan, V.A., Kaushan, V.V., and Fedotov, A.N., Automated exploit generation method for stack buffer overflow vulnerabilities, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2014, vol. 26, no. 3, pp. 127–144.

    Google Scholar 

  15. Kruegel, C., Valeur, F., Robertson, W., and Vigna, G., Static analysis of obfuscated binaries, Proc. 13th USENIX Security Symp., 2004, pp. 255–270.

  16. Ben Khadra, M.A., Stoffel, D., and Kunz, W., Speculative disassembly of binary code, Proc. Int. Conf. Compilers, Architectures, and Synthesis for Embedded Systems, 2016.

  17. Balakrishnan, G. and Reps, T., Analyzing memory accesses in x86 executables, Proc. 13th Int. Conf. Compiler Construction, 2004, pp. 5–23.

  18. Aslanyan, H., Asryan, S., Hakobyan, J., Vardanyan, V., Sargsyan, S., and Kurmangaleev, S., Multiplatform static analysis framework for program defects detection. Proc. CSIT Conf., 2017.

  19. Cousot, P. and Cousot, R., Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints, Proc. 4th ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages, 1977, pp. 238–252.

  20. Padaryan, V.A., Getman, A.I., Solovyev, M.A., Bakulin, M.G., Borzilov, A.I., Kaushan, V.V., Ledovskikh, I.N., Markin, Yu.V., and Panasensko, S.S., Methods and software tools to support combined binary code analysis, Program. Comput. Software, 2014, vol. 40, no. 5, pp. 276–287.

    Article  Google Scholar 

  21. GNU Binutils. http://www.sourceware.org/binutils. Accessed December 3, 2018.

  22. Capstone. http://www.capstone-engine.org. Accessed December 3, 2018.

  23. IDA Pro. http://www.hex-rays.com/products/ida/index.shtml. Accessed December 3, 2018.

  24. Fauth, A., Van Praet, J., and Freericks, M., Describing instruction set processors using nML, Proc. European Design and Test Conf., 1995, pp. 503–507.

  25. Hadjiyiannis, G., Hanono, S., and Devadas, S., ISDL: An instruction set description language for retargetability, Proc. 34th Annual Design Automation Conf., 1997, pp. 299–302.

  26. Fox, A., Improved tool support for machine-code decompilation in HOL4, Proc. Int. Conf. Interactive Theorem Proving, 2015, pp. 187–202.

  27. Gray, K.E., Kerneis, G., Mulligan, D., Pulte, C., Sarkar, S., and Sewell, P., An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors, Proc. 48th Int. Symp. Microarchitecture, 2015, pp. 635–646.

  28. Muchnick, S.S., Advanced Compiler Design and Implementation, Morgan Kaufmann, 1997.

    Google Scholar 

Download references

Funding

This work was supported by the Russian Foundation for Basic Research, project no. 18-07-01256 A.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to M. A. Solovev, M. G. Bakulin, M. S. Gorbachev, D. V. Manushin, V. A. Padaryan or S. S. Panasenko.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Solovev, M.A., Bakulin, M.G., Gorbachev, M.S. et al. Next-Generation Intermediate Representations for Binary Code Analysis. Program Comput Soft 45, 424–437 (2019). https://doi.org/10.1134/S0361768819070107

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768819070107

Navigation