skip to main content
10.1145/2451512.2451519acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
research-article

Improving dynamic binary optimization through early-exit guided code region formation

Published: 16 March 2013 Publication History

Abstract

Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces.
In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.

References

[1]
Low Level Virtual Machine (LLVM). http://llvm.org.
[2]
QEMU. http://qemu.org.
[3]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI '00, pages 1--12. ACM, 2000.
[4]
L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. Ia-32 execution layer: a two-phase dynamic translator designed to support ia-32 applications on itanium-based systems. In MICRO-36, pages 191--201, Dec. 2003.
[5]
M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. Venter. Spur: a trace-based jit compiler for cil. SIGPLAN Not., 45:708--725, October 2010.
[6]
I. Bohm, T. E. von Koch, S. Kyle, B. Franke, and N. Topham. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proc. PLDI, 2011.
[7]
D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.d. thesis, Massachusetts Institute of Technology, Cambridge, MA, Sep 2004.
[8]
J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The transmeta code morphing#8482; software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 15--24, Washington, DC, USA, 2003. IEEE Computer Society.
[9]
K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. Dynamic binary translation and optimization. IEEE Trans. Comput., 50(6):529--548, 2001.
[10]
A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In PLDI, pages 465--478, 2009.
[11]
H. Hayashizaki, P. Wu, H. Inoue, M. J. Serrano, and T. Nakatani. Improving the performance of trace-based systems by false loop filtering. In ASPLOS, pages 405--418, 2011.
[12]
D. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In MICRO 38, pages 141--154, Washington, DC, USA, 2005. IEEE Computer Society.
[13]
D.-Y. Hong, C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, P.-C. Yew, and W.-C. Hsu. Hqemu: A multi-threaded and retargetable dynamic binary translator on multicores. In CGO '12: Proceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, 2012.
[14]
C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, D.-Y. Hong, P.-C. Yew, and W.-C. Hsu. Lnq: Building high performance dynamic binary translators with existing compiler backends. In ICPP, pages 226--234, 2011.
[15]
W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: an effective technique for vliw and superscalar compilation. J. Supercomput., 7(1-2):229--248, May 1993.
[16]
H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. A trace-based java jit compiler retrofitted from a method-based compiler. In CGO'11, pages 246--256, 2011.
[17]
H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. Adaptive multi-level compilation in a trace-based java jit compiler. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA '12, pages 179--194, New York, NY, USA, 2012. ACM.
[18]
J. Lu, H. Chen, P.-C. Yew, and W. chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:2004, 2004.
[19]
M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In 15th Annual ACM Symposium on Principles of Distributed Computing, 1996.
[20]
M. Paleczny, C. Vick, and C. Click. The java hotspot(tm) server compiler. In In USENIX Java Virtual Machine Research and Technology Symposium, pages 1--12, 2001.
[21]
perfmon2. http://perfmon2.sourceforge.net.
[22]
J. E. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufman, 2005.
[23]
T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for a java just-in-time compiler. In PLDI '03, pages 312--323. ACM, 2003.
[24]
V. Sundaresan, D. Maier, P. Ramarao, and M. Stoodley. Experiences with multi-threading and dynamic class loading in a java just-in-time compiler. In CGO '06, pages 87--97, Washington, DC, USA, 2006. IEEE Computer Society.
[25]
C. Wang, S. Hu, H.-S. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. Stardbt: An efficient multi-platform dynamic binary translation system. In ACSAC'07, pages 4--15, 2007.
[26]
P. Wu, H. Hayashizaki, H. Inoue, and T. Nakatani. Reducing trace selection footprint for large-scale java applications without performance loss. In OOPSLA '11, pages 789--804, New York, NY, USA, 2011. ACM.
[27]
C. Zhao, Y. Wu, J. G. Steffan, and C. Amza. Lengthening traces to improve opportunities for dynamic optimization. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures, 2008.

Cited By

View all
  • (2020)More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00043(415-426)Online publication date: Oct-2020
  • (2019)Unleashing the power of learningProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358815(77-89)Online publication date: 10-Jul-2019
  • (2018)Improving Dynamically-Generated Code Performance on Dynamic Binary TranslatorsACM SIGPLAN Notices10.1145/3296975.318641353:3(17-30)Online publication date: 25-Mar-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VEE '13: Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
March 2013
210 pages
ISBN:9781450312660
DOI:10.1145/2451512
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 7
    VEE '13
    July 2013
    194 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2517326
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic binary translation
  2. hardware-based performance monitoring
  3. hot region formation
  4. trace-based jit compilation
  5. virtual machine

Qualifiers

  • Research-article

Conference

VEE '13

Acceptance Rates

Overall Acceptance Rate 80 of 235 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00043(415-426)Online publication date: Oct-2020
  • (2019)Unleashing the power of learningProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358815(77-89)Online publication date: 10-Jul-2019
  • (2018)Improving Dynamically-Generated Code Performance on Dynamic Binary TranslatorsACM SIGPLAN Notices10.1145/3296975.318641353:3(17-30)Online publication date: 25-Mar-2018
  • (2018)Enhancing Cross-ISA DBT Through Automatically Learned Translation RulesACM SIGPLAN Notices10.1145/3296957.317716053:2(84-97)Online publication date: 19-Mar-2018
  • (2018)Processor-Tracing Guided Region Formation in Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/328166415:4(1-25)Online publication date: 16-Nov-2018
  • (2018)Improving Dynamically-Generated Code Performance on Dynamic Binary TranslatorsProceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3186411.3186413(17-30)Online publication date: 25-Mar-2018
  • (2018)Enhancing Cross-ISA DBT Through Automatically Learned Translation RulesProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3177160(84-97)Online publication date: 19-Mar-2018
  • (2015)Optimizing Control Transfer and Memory Virtualization in Full System EmulatorsACM Transactions on Architecture and Code Optimization10.1145/283702712:4(1-24)Online publication date: 8-Dec-2015
  • (2014)Efficient code generation in a region-based dynamic binary translatorACM SIGPLAN Notices10.1145/2666357.259781049:5(3-12)Online publication date: 12-Jun-2014
  • (2014)SPTUProceedings of International Conference on Systems and Storage10.1145/2611354.2611368(1-12)Online publication date: 30-Jun-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media