skip to main content
10.1145/2259016.2259030acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores

Published: 31 March 2012 Publication History

Abstract

Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and security. However, there are several factors that often impede its performance: (1) emulation overhead before translation; (2) translation and optimization overhead, and (3) translated code quality. On the dynamic binary translator itself, the issues also include its retargetability to support guest applications from different instruction-set architectures (ISAs) to host machines also with different ISAs, an important feature for system virtualization. In this work, we take advantage of the ubiquitous multicore platforms, using multithreaded approach to implement DBT. By running the translators and the dynamic binary optimizers on different threads on different cores, it could off-load the overhead caused by DBT on the target applications; thus, afford DBT of more sophisticated optimization techniques as well as the support of its retargetability. Using QEMU (a popular retargetable DBT for system virtualization) and LLVM (Low Level Virtual Machine) as our building blocks, we demonstrated in a multi-threaded DBT prototype, called HQEMU, that it could improve QEMU performance by a factor of 2.4X and 4X on the SPEC 2006 integer and floating point benchmarks for x86 to x86-64 emulations, respectively, i.e. it is only 2.5X and 2.1X slower than native execution of the same benchmarks on x86-64, as opposed to 6X and 8.4X slowdown on QEMU. For ARM to x86-64 emulation, HQEMU could gain a factor of 2.4X speedup over QEMU for the SPEC 2006 integer benchmarks.

References

[1]
Hotspot parallel collector. In Memory Management in the Java HotSpot Virtual Machine Whitepaper.
[2]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proc. PLDI, pages 1--12, 2000.
[3]
L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium Âő-based systems. In Proc. Annual Microarchitecture Symposium, 2003.
[4]
F. Bellard. QEMU, a fast and portable dynamic translator. In USENIX Annual Technical Conference, pages 41--46, 2005.
[5]
I. Bohm, T. E. von Koch, S. Kyle, B. Franke, and N. Topham. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proc. PLDI, 2011.
[6]
D. L. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, Massachusetts Institute of Technology, Sept. 2004.
[7]
D. Chen, N. Vachharajani, R. Hundt, S.-W. Liao, V. Ramasamy, P. Yuan, W. Chen, and W. Zheng. Taming hardware event samples for FDO compilation. In International Symposium on Code Generation and Optimization, pages 202--211, 2010.
[8]
A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, and J. Yates. FX!32: A profile-directed binary translator. IEEE Micro, 18(2):56--64, 1998.
[9]
GNU toolchain for ARM processors v2009q3-67. http://www.codesourcery.com/sgpp/lite/arm/.
[10]
D. Davis and K. Hazelwood. Improving region selection through loop completion. In ASPLOS Workshop on Runtime Environments/Systems, Layering, and Virtualized Environments, 2011.
[11]
J.-H. Ding, Y.-C. Chung, P.-C. Chang, and W.-C. Hsu. PQEMU: A parallel system emulator based on QEMU. In 1st International QEMU Users Forum, 2011.
[12]
E. Duesterwald and V. Bala. Software profiling for hot path prediction: Less is more. In Proc. ASPLOS, pages 202--211, 2000.
[13]
J. Ha, M. Haghighat, S. Cong, and K. McKinley. A concurrent trace-based just-in-time compiler for single-threaded javascript. In Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, 2009.
[14]
H. Hayashizaki, P. Wu, H. Inoue, M. J. Serrano, and T. Nakatani. Improving the performance of trace-based systems by false loop filtering. In Proc. ASPLOS, pages 405--418, 2011.
[15]
K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proc. International Symposium on Memory Management, 2009.
[16]
D. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In Proc. Annual Microarchitecture Symposium, pages 141--154, 2005.
[17]
A. Jeffery. Using the LLVM compiler infrastructure for optimised, asynchronous dynamic translation in QEMU. Master's thesis, University of Adelaide, Australia, 2009.
[18]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, pages 75--88, 2004.
[19]
J. Lu, H. Chen, P.-C. Yew, and W.-C. Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:1--24, 2004.
[20]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. PLDI, 2005.
[21]
D. Merrill and K. Hazelwood. Trace fragment selection within method-based JVMs. In Proc. VEE, 2008.
[22]
M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In 15th Annual ACM Symposium on Principles of Distributed Computing, 1996.
[23]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proc. PLDI, pages 89--100, 2007.
[24]
The hardware performance monitoring interface for Linux.
[25]
F. Qin, C. Wang, Z. Li, H.-S. Kim, Y. Zhou, and Y. Wu. LIFT: A low-overhead practical information flow tracking system for detecting security attacks. In Proc. Annual Microarchitecture Symposium, pages 135--148, 2006.
[26]
K. Scott, N. Kumar, B. R. Childers, J. W. Davidson, and M. L. Soffa. Overhead reduction techniques for software dynamic translation. In Proc. IPDPS, pages 200--207, 2004.
[27]
S. Sridhar, J. S. Shapiro, E. Northup, and P. P. Bungale. HDTrans: an open source, low-level dynamic instrumentation system. In Proc. VEE, pages 175--185, 2006.
[28]
Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, W. Zhang, and B. Zang. COREMU: a scalable and portable parallel full-system emulator. In Proc. PPoPP, 2011.

Cited By

View all
  • (2025)RVAM16: a low-cost multiple-ISA processor based on RISC-V and ARM ThumbFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3239-x19:1Online publication date: 1-Jan-2025
  • (2024)CrossMappingProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692054(1013-1028)Online publication date: 10-Jul-2024
  • (2024)NimbleNet: Serverless Computing for the Extreme Edge in Factory EnvironmentsProceedings of the 10th International Workshop on Serverless Computing10.1145/3702634.3702953(19-24)Online publication date: 2-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
March 2012
285 pages
ISBN:9781450312066
DOI:10.1145/2259016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LLVM
  2. dynamic binary translation
  3. feedback-directed optimization
  4. hardware performance monitoring
  5. multi-threaded
  6. multicores
  7. traces

Qualifiers

  • Research-article

Funding Sources

Conference

CGO '12

Acceptance Rates

CGO '12 Paper Acceptance Rate 26 of 90 submissions, 29%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)RVAM16: a low-cost multiple-ISA processor based on RISC-V and ARM ThumbFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3239-x19:1Online publication date: 1-Jan-2025
  • (2024)CrossMappingProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692054(1013-1028)Online publication date: 10-Jul-2024
  • (2024)NimbleNet: Serverless Computing for the Extreme Edge in Factory EnvironmentsProceedings of the 10th International Workshop on Serverless Computing10.1145/3702634.3702953(19-24)Online publication date: 2-Dec-2024
  • (2024)LeanBin: Harnessing Lifting and Recompilation to Debloat BinariesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695515(1434-1446)Online publication date: 27-Oct-2024
  • (2024)Accelerate RISC-V Instruction Set Simulation by Tiered JIT CompilationProceedings of the 16th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages10.1145/3689490.3690399(12-22)Online publication date: 17-Oct-2024
  • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024
  • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
  • (2023)Crosys: Cross Architectural Dynamic AnalysisProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596147(55-62)Online publication date: 6-Jun-2023
  • (2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
  • (2023)Risotto: A Dynamic Binary Translator for Weak Memory Model ArchitecturesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567962(107-122)Online publication date: 25-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media