skip to main content
10.1145/3337821.3337844acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Exploiting Vector Processing in Dynamic Binary Translation

Published: 05 August 2019 Publication History

Abstract

Auto vectorization techniques have been adopted by compilers to exploit data-level parallelism in parallel processing for decades. However, since processor architectures have kept enhancing with new features to improve vector/SIMD performance, legacy application binaries failed to fully exploit new vector/SIMD capabilities in modern architectures. For example, legacy ARMv7 binaries cannot benefit from ARMv8 SIMD double precision capability, and legacy x86 binaries cannot enjoy the power of AVX-512 extensions.
In this paper, we study the fundamental issues involved in cross-ISA Dynamic Binary Translation (DBT) to convert non-vectorized loops to vector/SIMD forms to achieve greater computation throughput available in newer processor architectures. The key idea is to recover critical loop information from those application binaries in order to carry out vectorization at runtime. Experiment results show that our approach achieves an average speedup of 1.42x compared to ARMv7 native run across various benchmarks in an ARMv7-to-ARMv8 dynamic binary translation system.

References

[1]
2006. Apple's Rosetta. https://www.apple.com/rosetta/index.html. (2006).
[2]
2014. McSema. https://github.com/trailofbits/mcsema. (2014).
[3]
2017. SPEC CPU 2017. https://www.spec.org/cpu2017/. (2017).
[4]
K. Anand et al. 2013. A Compiler-level Intermediate Representation Based Binary Analysis and Rewriting System (EuroSys '13).
[5]
V. Bala, E. Duesterwald, and S. Banerjia. 2000. Dynamo: A Transparent Dynamic Optimization System. SIGPLAN Not. 35, 5 (May 2000), 1--12.
[6]
F. Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator (USENIX ATC'05).
[7]
D. Bruening, E. Duesterwald, and S. Amarasinghe. 2002. Design and Implementation of a Dynamic Optimization Framework for Windows. (01 2002).
[8]
Amanieu D'Antras, Cosmin Gorgovan, Jim Garside, and Mikel Luján. 2017. Low Overhead Dynamic Binary Translation on ARM (PLDI'17).
[9]
J. Dehnert et al. 2003. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges (CGO'03).
[10]
Evelyn Duesterwald and Vasanth Bala. 2000. Software Profiling for Hot Path Prediction: Less is More. SIGPLAN Not. (2000).
[11]
James E. Smith and Ravi Nair. 2005. Virtual Machines: Versatile Platforms for Systems and Processes.
[12]
N. Hallou et al. 2017. Runtime Vectorization Transformations of Binary Code. IJPP (2017).
[13]
J. L. Henning. 2000. SPEC CPU2000: measuring CPU performance in the New Millennium. Computer 33, 7 (July 2000), 28--35.
[14]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.
[15]
D. Hong et al. 2012. HQEMU: A Multi-threaded and Retargetable Dynamic Binary Translator on Multicores (CGO'12).
[16]
D. Hong et al. 2016. Exploiting Longer SIMD Lanes in Dynamic Binary Translation. In ICPADS.
[17]
A. Kotha et al. 2010. Automatic Parallelization in a Binary Rewriter (MICRO'43).
[18]
A. Kotha et al. 2014. Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries (EOSP '14).
[19]
C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation (CGO '04).
[20]
R. B. Lee. 1996. Subword parallelism with MAX-2. IEEE Micro (1996).
[21]
Christian Lengauer. 2012. Polly---Performing Polyhedral Optimizations on a Low-Level Intermediate Representation. Parallel Processing Letters (2012).
[22]
J. Li et al. 2011. Dynamic Register Promotion of Stack Variables (CGO '11).
[23]
Y. Liu et al. 2017. Exploiting Asymmetric SIMD Register Configurations in ARM-to-x86 Dynamic Binary Translation (PACT).
[24]
Jiwei Lu et al. 2004. Design and Implementation of a Lightweight Dynamic Optimization System. JILP (2004).
[25]
C. Zheng and C. Thompson. 2000. PA-RISC to IA-64: transparent execution, no recompilation. Computer (2000).
[26]
R. Zhou, G. Wort, M. Erdős, and T. M. Jones. 2019. The Janus Triad: Exploiting Parallelism Through Dynamic Binary Modification (VEE 2019).

Cited By

View all
  • (2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Auto Vectorization
  2. Dynamic Binary Translation
  3. SIMD/Vector
  4. Virtual Register Promotion

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2019

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media