research-article

Exploiting Vector Processing in Dynamic Binary Translation

Authors:

Ding-Yong Hong,

Wei-Chung HsuAuthors Info & Claims

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Article No.: 93, Pages 1 - 10

https://doi.org/10.1145/3337821.3337844

Published: 05 August 2019 Publication History

Abstract

Auto vectorization techniques have been adopted by compilers to exploit data-level parallelism in parallel processing for decades. However, since processor architectures have kept enhancing with new features to improve vector/SIMD performance, legacy application binaries failed to fully exploit new vector/SIMD capabilities in modern architectures. For example, legacy ARMv7 binaries cannot benefit from ARMv8 SIMD double precision capability, and legacy x86 binaries cannot enjoy the power of AVX-512 extensions.

In this paper, we study the fundamental issues involved in cross-ISA Dynamic Binary Translation (DBT) to convert non-vectorized loops to vector/SIMD forms to achieve greater computation throughput available in newer processor architectures. The key idea is to recover critical loop information from those application binaries in order to carry out vectorization at runtime. Experiment results show that our approach achieves an average speedup of 1.42x compared to ARMv7 native run across various benchmarks in an ARMv7-to-ARMv8 dynamic binary translation system.

References

[1]

2006. Apple's Rosetta. https://www.apple.com/rosetta/index.html. (2006).

[2]

2014. McSema. https://github.com/trailofbits/mcsema. (2014).

[3]

2017. SPEC CPU 2017. https://www.spec.org/cpu2017/. (2017).

[4]

K. Anand et al. 2013. A Compiler-level Intermediate Representation Based Binary Analysis and Rewriting System (EuroSys '13).

Digital Library

[5]

V. Bala, E. Duesterwald, and S. Banerjia. 2000. Dynamo: A Transparent Dynamic Optimization System. SIGPLAN Not. 35, 5 (May 2000), 1--12.

Digital Library

[6]

F. Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator (USENIX ATC'05).

Digital Library

[7]

D. Bruening, E. Duesterwald, and S. Amarasinghe. 2002. Design and Implementation of a Dynamic Optimization Framework for Windows. (01 2002).

[8]

Amanieu D'Antras, Cosmin Gorgovan, Jim Garside, and Mikel Luján. 2017. Low Overhead Dynamic Binary Translation on ARM (PLDI'17).

Digital Library

[9]

J. Dehnert et al. 2003. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges (CGO'03).

Digital Library

[10]

Evelyn Duesterwald and Vasanth Bala. 2000. Software Profiling for Hot Path Prediction: Less is More. SIGPLAN Not. (2000).

Digital Library

[11]

James E. Smith and Ravi Nair. 2005. Virtual Machines: Versatile Platforms for Systems and Processes.

Digital Library

[12]

N. Hallou et al. 2017. Runtime Vectorization Transformations of Binary Code. IJPP (2017).

Digital Library

[13]

J. L. Henning. 2000. SPEC CPU2000: measuring CPU performance in the New Millennium. Computer 33, 7 (July 2000), 28--35.

Digital Library

[14]

John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.

Digital Library

[15]

D. Hong et al. 2012. HQEMU: A Multi-threaded and Retargetable Dynamic Binary Translator on Multicores (CGO'12).

Digital Library

[16]

D. Hong et al. 2016. Exploiting Longer SIMD Lanes in Dynamic Binary Translation. In ICPADS.

[17]

A. Kotha et al. 2010. Automatic Parallelization in a Binary Rewriter (MICRO'43).

Digital Library

[18]

A. Kotha et al. 2014. Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries (EOSP '14).

Digital Library

[19]

C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation (CGO '04).

Digital Library

[20]

R. B. Lee. 1996. Subword parallelism with MAX-2. IEEE Micro (1996).

Digital Library

[21]

Christian Lengauer. 2012. Polly---Performing Polyhedral Optimizations on a Low-Level Intermediate Representation. Parallel Processing Letters (2012).

[22]

J. Li et al. 2011. Dynamic Register Promotion of Stack Variables (CGO '11).

Digital Library

[23]

Y. Liu et al. 2017. Exploiting Asymmetric SIMD Register Configurations in ARM-to-x86 Dynamic Binary Translation (PACT).

[24]

Jiwei Lu et al. 2004. Design and Implementation of a Lightweight Dynamic Optimization System. JILP (2004).

[25]

C. Zheng and C. Thompson. 2000. PA-RISC to IA-64: transparent execution, no recompilation. Computer (2000).

Digital Library

[26]

R. Zhou, G. Wort, M. Erdős, and T. M. Jones. 2019. The Janus Triad: Exploiting Parallelism Through Dynamic Binary Modification (VEE 2019).

Digital Library

Cited By

Xie WTang DQi FChai ZLuo QLin Y(2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023
https://doi.org/10.1142/S0218126624501044

Recommendations

Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation

Single instruction multiple data (SIMD) has been adopted for decades because of its superior performance and power efficiency. The SIMD capability (i.e., width, number of registers, and advanced instructions) has diverged rapidly on different SIMD ...
Improving SIMD Parallelism via Dynamic Binary Translation

Recent trends in SIMD architecture have tended toward longer vector lengths, and more enhanced SIMD features have been introduced in newer vector instruction sets. However, legacy or proprietary applications compiled with short-SIMD ISA cannot benefit ...
Efficient and Retargetable Dynamic Binary Translation on Multicores

Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation, and security. However, there are several factors that often impede its performance: 1) emulation overhead ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

August 2019

1107 pages

ISBN:9781450362955

DOI:10.1145/3337821

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Ministry of Science and Technology, Taiwan

Conference

ICPP 2019

ICPP 2019: 48th International Conference on Parallel Processing

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
193
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xie WTang DQi FChai ZLuo QLin Y(2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023
https://doi.org/10.1142/S0218126624501044

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten