skip to main content
10.1145/2451512.2451519acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
research-article

Improving dynamic binary optimization through early-exit guided code region formation

Published:16 March 2013Publication History

ABSTRACT

Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces.

In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.

References

  1. Low Level Virtual Machine (LLVM). http://llvm.org.Google ScholarGoogle Scholar
  2. QEMU. http://qemu.org.Google ScholarGoogle Scholar
  3. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI '00, pages 1--12. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. Ia-32 execution layer: a two-phase dynamic translator designed to support ia-32 applications on itanium-based systems. In MICRO-36, pages 191--201, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. Venter. Spur: a trace-based jit compiler for cil. SIGPLAN Not., 45:708--725, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Bohm, T. E. von Koch, S. Kyle, B. Franke, and N. Topham. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proc. PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.d. thesis, Massachusetts Institute of Technology, Cambridge, MA, Sep 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The transmeta code morphing#8482; software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 15--24, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. Dynamic binary translation and optimization. IEEE Trans. Comput., 50(6):529--548, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In PLDI, pages 465--478, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Hayashizaki, P. Wu, H. Inoue, M. J. Serrano, and T. Nakatani. Improving the performance of trace-based systems by false loop filtering. In ASPLOS, pages 405--418, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In MICRO 38, pages 141--154, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D.-Y. Hong, C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, , P.-C. Yew, and W.-C. Hsu. Hqemu: A multi-threaded and retargetable dynamic binary translator on multicores. In CGO '12: Proceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, D.-Y. Hong, P.-C. Yew, and W.-C. Hsu. Lnq: Building high performance dynamic binary translators with existing compiler backends. In ICPP, pages 226--234, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: an effective technique for vliw and superscalar compilation. J. Supercomput., 7(1-2):229--248, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. A trace-based java jit compiler retrofitted from a method-based compiler. In CGO'11, pages 246--256, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. Adaptive multi-level compilation in a trace-based java jit compiler. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA '12, pages 179--194, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lu, H. Chen, P.-C. Yew, and W. chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:2004, 2004.Google ScholarGoogle Scholar
  19. M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In 15th Annual ACM Symposium on Principles of Distributed Computing, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Paleczny, C. Vick, and C. Click. The java hotspot(tm) server compiler. In In USENIX Java Virtual Machine Research and Technology Symposium, pages 1--12, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. perfmon2. http://perfmon2.sourceforge.net.Google ScholarGoogle Scholar
  22. J. E. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufman, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for a java just-in-time compiler. In PLDI '03, pages 312--323. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Sundaresan, D. Maier, P. Ramarao, and M. Stoodley. Experiences with multi-threading and dynamic class loading in a java just-in-time compiler. In CGO '06, pages 87--97, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Wang, S. Hu, H.-S. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. Stardbt: An efficient multi-platform dynamic binary translation system. In ACSAC'07, pages 4--15, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Wu, H. Hayashizaki, H. Inoue, and T. Nakatani. Reducing trace selection footprint for large-scale java applications without performance loss. In OOPSLA '11, pages 789--804, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Zhao, Y. Wu, J. G. Steffan, and C. Amza. Lengthening traces to improve opportunities for dynamic optimization. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Improving dynamic binary optimization through early-exit guided code region formation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          VEE '13: Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
          March 2013
          210 pages
          ISBN:9781450312660
          DOI:10.1145/2451512
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 7
            VEE '13
            July 2013
            194 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2517326
            Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 March 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate80of235submissions,34%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader