SINOF: A dynamic-static combined framework for dynamic binary translation

https://doi.org/10.1016/j.sysarc.2012.05.002Get rights and content

Abstract

Dynamic binary translation (DBT) is an important technique in virtualization, and in migrating legacy binaries to platforms based on a new architecture. However, poor profile information limits the process of optimization at runtime, so the DBT system may suffer from substantial overhead. In this paper, we design and implement a static-integrated optimization framework (SINOF) to improve the runtime performance for DBT. Combining static and dynamic approaches can greatly reduce the overhead of optimizing, profiling and translating for any program that runs repeatedly. Under this framework, once the source image has been executed, the profile information and target code will be saved in a software cache, and will be available for future runs. In the static phase, the saved code is analyzed and optimized based on the information collected in the previous run. Especially, we reorganize the code layout of the software cache. Experimental results show that the proposed framework can reduce run time by more than 30% on average compared to the original versions of DBT that the framework is based on.

Introduction

Dynamic binary translation is the process of translating and optimizing one binary executable into a different executable at runtime, and the target code will be executed right after the conversion. It has been proved an increasingly useful technique in many fields. One common usage is to avoid rewriting legacy programs when we want to port programs among different generations or different kinds of architectures [1]. It is also used to reduce the hardware complexity, increase the power efficiency [2], optimize native binaries to improve performance [3], and provide runtime instrumentation of applications [4], [5]. Many dynamic binary translation systems been developed. UQDBT [6], [23] is a machine-adaptable dynamic binary translator supporting different source and target ISAs. Daisy [7] can translate PowerPC instructions to VLIW instructions with the support of hardware. Aries [8] can migrate HP-PA binaries to the IA-64 architecture. DEC FX!32 [9], [10], [11] can make numerous IA-32 applications available on the DEC Alpha Platform. However, dynamic binary translation systems usually suffer from significant overhead. Generally speaking, the overhead incurred in the system can be divided into two parts: the overhead of translating and the overhead of executing target code. Many optimization systems have been proposed to reduce these overheads. As a matter of fact, most of the present DBT systems can reduce the translating costs into an acceptable range. Hence, the main attention is on reducing the execution time of the target code.

In order to reduce the time for executing the target code, many optimization methods have been proposed, such as translation block chaining, forming large translation blocks (superblocks), reordering translated instructions to improve pipeline performance, and borrowing optimization techniques from conventional compilers. In fact, many binary-code-specific optimization methods are seriously dependent on the profile information gathered at runtime. And the richness and correctness of profile information can directly determine which optimization can be implemented and the extent of its efficiency.

Profiling is a process for dynamically collecting program information (instructions and data statistics) that is used to guide optimization during the translation process. The main way to obtain profile information is to inject a few instructions into target code. However, this inevitably leads to performance loss, and complex profile mechanisms are not suitable at runtime. Actually, it is also not practical to perform a complex optimization algorithm at runtime, because the cost of these algorithms cannot be endured in a dynamic environment. Hence, in order to generate more nearly optimal binary target code using complex profile information and optimization algorithms, it is necessary to construct a binary translation framework different from conventional ones.

Generally speaking, combining dynamic translation with static analysis has the following merits:

  • (a)

    Reducing Translation time: The process of translating source code to target code consumes a substantial amount of time. Once the source code is translated for the first run, the target code will be saved in a file which can be directly loaded for future runs;

  • (b)

    Eliminating profiling overhead: Profiling will be eliminated except for the first run. If all optimizations are performed statically, the profiling process is not necessary anymore;

  • (c)

    Performing complex optimizations: Because some optimizations can be carried out statically instead of at runtime, complex optimization algorithms which are not appropriate dynamically are now available without worrying about overhead.

The approach of using a profiling phase to aid dynamic binary translation assumes that the execution of each block in the profiling phase (initial run) is representative of the block throughout its lifetime. In particular, a region is selected for optimization under the assumption that it rarely takes its side exits. If the assumption above does not hold, and the optimized regions often take their side exits, the program performance will suffer [12].

In order to explore new optimization opportunities and use the detailed profile information, we design and implement a new a DBT framework, named static-integrated optimization framework for DBT (SINOF). The framework is based on Crossbit [1] which is a multi-source and multi-target DBT aiming at quickly migrating existing executable code from one platform to an alien target platform at low cost.

In this paper, we make the following contributions:

  • (a)

    A new dynamic-static combined DBT framework (SINOF): Generally speaking, this framework can be divided into three phases: gathering profile information during the first run, performing optimization in the static phase and loading the target code that has been optimized offline for the subsequent execution.

  • (b)

    Inter-edge profiling: The inter-edge profile tracks the relationship of two adjacent edges. In fact, the inter-edge profile is a particular case of a path profile, with only two edges in each path.

  • (c)

    A new approach to reframe code layout of a software cache: Reorganizing the code layout in the software cache can greatly improve the performance of a binary translator since the execution stream will be closer to its control flow graph after reframing. The process of reorganizing code layout is performed in the static analysis and optimization phase.

This paper is a continuation of the work in [13], which described the construction of the framework (SINOF), but without static optimization. The rest of the paper is organized as follows: Section 2 introduces the DBT system that our framework is based on Crossbit. Section 3 describes the details of the implementation of SINOF. Section 4 describes the performance evaluation. Section 5 gives some related work. The conclusion and discussion of future work are in the last section.

Section snippets

Overview of the basic platform: Crossbit

This section gives an overview of the dynamic binary translator Crossbit, which our framework is based on. Crossbit is designed and implemented as a dynamic binary translator, which aims at quickly migrating existing executable code from one platform to another at low cost. It supports multiple source architectures and multiple target architectures. Until recently, it has fully or partially supported source platforms including SimpleScalar, IA32, MIPS, SPARC, and supported target platforms such

Overview

As Fig. 4 shows, the overall workflow of SINOF can be divided into three phases: collecting profile information on the first run, performing profile-directed optimizations in the static analysis phase, and loading the target code that has been optimized offline for the subsequent executions.

In the first phase, the DBT executes the source program dynamically; meanwhile instrumentation is deployed to collect the profile information. Profile information contains execution frequency of each basic

Performance evaluation

In this section, we first carry out some experiments comparing the overall performance of SINOF, Crossbit (the DBT system our framework is based on), SINFO(WSO) (the initial version of SINOF that without static optimization), DynamoRIO [27], [28] (the IA-32 version of Dynamo [22]), HDTrans [29] (a low-overhead dynamic translator) with Native (the source code that directly running on the underlying platform). Then, we emphasize the reason for the performance gained from the method of Ordered

Related work

For the purpose of gaining better performance, a static process has been adopted in several binary translation systems and conventional compilers.

Conclusion and future work

In this paper, we proposed a dynamic-static combined framework (SINOF) for dynamic binary translation and implemented it on Crossbit. Generally, the framework is divided into three phase: gathering profile information and target code in the initial phase, optimizing the target code (according the profile information) in the static analysis phase, and loading the target code that has been optimized offline for the subsequent executions.

The framework is superior to other DBT system because we

Erzhou Zhu is currently a Ph.D. student at Shanghai Jiao Tong University, China. He received the M.S. degree and B.S. degree in Computer Science and Technology of Anhui University (Anhui, China) in 2004 and 2008, respectively. His research interests include virtual machine, binary translation, and computer architecture.

References (29)

  • Yindong Yang, Haibing Guan, Erzhou Zhu, Hongbo Yang, Bo Liu, Crossbit: a multi-sources and multi-targets DBT, in: The...
  • J.C. Dehner, B.K. Grant, J.P. Banning, et al., The Transmeta code morphing software: using speculation, recovery, and...
  • V. Bala, E. Duesterwald, S. Banerjia, Dynamo: a transparent dynamic optimization system, in: SIGPLAN PLDI, June 2000,...
  • Nicholas Nethercote et al.

    Valgrind: a program supervision framework

    Electronic Notes in Theoretical Computer Science

    (2003)
  • C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, Vijay Janapa Reddi, K. Hazelwood, Pin: building...
  • D. Ung et al.

    Optimizing hot paths in a dynamic binary translator

    ACM SIGARCH Computer Architecture News

    (2001)
  • K. Ebcioglu, E.R. Altman, Daisy: dynamic compilation for 100% architectural compatibility, in: Proceedings of the 24th...
  • C. Zheng et al.

    PA-RISC to IA-64: transparent execution, no recompilation

    IEEE Computer

    (2000)
  • A. Chernoff, R. Hookway, DIGITAL FX!32 – Running 32-Bit x86 Applications on Alpha NT, in: The Proceedings of the USENIX...
  • Raymond J. Hookway et al.

    DIGITAL FX!32: combining emulation and binary translation

    Digital Technical Journal

    (1997)
  • Anton Chernoff et al.

    FX!32: a profile-directed binary translator

    IEEE Micro

    (1998)
  • Youfeng Wu, Mauricio Breternitz, Justin Quek, Orna Etzion, Jesse Fang, The accuracy of initial prediction in two-phase...
  • Jinghui Gu, Chao Xu, Ling Lin, Juyu Zheng, Kai Chen, Haibing Guan, The implementation of static-integrated optimization...
  • Huihui Shi et al.

    An intermediate level optimization framework for DBT

    ACM SIGPLAN Notice

    (2007)
  • Cited by (0)

    Erzhou Zhu is currently a Ph.D. student at Shanghai Jiao Tong University, China. He received the M.S. degree and B.S. degree in Computer Science and Technology of Anhui University (Anhui, China) in 2004 and 2008, respectively. His research interests include virtual machine, binary translation, and computer architecture.

    Haibing Guan received his Ph.D. degree in computer science from the TongJi University, China in 1999. He is currently a professor with the Faculty of Computer Science, Shanghai Jiao Tong University, Shanghai, China. His current research interests include, but are not limited to, computer architecture, compiling, virtualization, and hardware/software co-design.

    Hongxi Wang is currently a professor in School of Natural Science, Anhui Agricultural University, Anhui, China. His current research interests include, but are not limited to, applied differential equations, system simulation, and artificial intelligence.

    Ruhui Ma is currently a Ph.D. student at Shanghai Jiao Tong University, China. His research interests include virtual machine, binary translation, and computer architecture.

    Yindong Yang is currently a Ph.D. student at Shanghai Jiao Tong University, China. He received the M.S. degree at School of Computer, Electronics and Information from Guangxi University in 2007, China. In 2004 he received his B.S. degree at School of Information and Technology from Jiangnan University, China. His main research interests are in virtual machines, computer architecture, and compiling.

    Bin Wang is currently a Ph.D. student at Shanghai Jiao Tong University, China. His research interests include operating system, system virtualization, system security, computer architecture, and compiler & embedded system.

    This work was supported by the National Natural Science Foundation of China (Grant Nos. 60970107, 60970108), the National High Technology Research and Development Program (863 Program) of China (Grant No. 2012AA010905), the National Basic Research Program (973 Program) of China (Grant No. 2012CB723401), the International Cooperation Program of China (Grant No. 2011DFA10850), the Ministry of Education and Intel joint research foundation (Grant No. MOE-INTEL-11-05), the International Cooperation Program of Shanghai (Grant No. 11530700500).

    View full text