Article

Generation of permutations for SIMD processors

Authors:
Alexei Kudriavtsev

University of Notre Dame

University of Notre Dame
View Profile

,
Peter Kogge

University of Notre Dame

University of Notre Dame
View Profile

LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsJune 2005Pages 147–156https://doi.org/10.1145/1065910.1065931

Published:15 June 2005Publication History

LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Pages 147–156

ABSTRACT

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph.We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35% speedup.

References

A. V. Aho, M. Ganapathi, and S. W. K. Tjiang. Code generation using tree matching and dynamic programming. ACM Trans. Prog. Lang. Syst., 11(4):491--516, Oct. 1989.]] Google ScholarDigital Library
A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD architectures with alignment constraints. In PLDI, pages 82--93, June 2004.]] Google ScholarDigital Library
R. J. Fisher and H. G. Dietz. Compiling for SIMD within a register. In Workshop on Languages and Compilers for Parallel Computing, pages 290--304, Aug. 1998.]] Google ScholarDigital Library
Intel Corporation. Intel® C++ Compiler for Linux* Systems User's Guide, 2003.]]Google Scholar
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism. In Proc. of the Conference on Programming Language Design and Implementation (PLDI 2000), pages 145--156, Vancouver, British Columbia, Canada, June 2000.]] Google ScholarDigital Library
S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and detecting memory address congruence. In Proc. of International Conference on Parallel Architectures and Compilation Techniques, pages 18--29, Sept. 2002.]] Google ScholarDigital Library
R. Leupers. Code Optimization Techniques for Embedded Processors. Kluwer Academic Publishers, 2000.]] Google ScholarDigital Library
R. Leupers. Code selection for media processors with SIMD instructions. In Design, Automation and Test in Europe, pages 4--8, Mar. 2000.]] Google ScholarDigital Library
S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.]] Google ScholarDigital Library
D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP architecture. In CASES, pages 2--11, San Jose, CA, Oct. 2003.]] Google ScholarDigital Library

Index Terms

Generation of permutations for SIMD processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Generation of permutations for SIMD processors
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for ...
Read More
Compiler optimizations for processors with SIMD instructions

To achieve maximum efficiency, modern embedded processors for media applications exploit single instruction multiple data (SIMD) instructions. SIMD instructions provide a form of vectorization where a large machine word is viewed as a vector of subwords ...
Read More
Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions
LCTES '17

More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
June 2005
248 pages
ISBN:1595930183
DOI:10.1145/1065910
General Chair:
Yunheung Paek
Seoul National University, Seoul, Korea
,
Program Chair:
Rajiv Gupta
University of Arizona, Tucson, USA
ACM SIGPLAN Notices Volume 40, Issue 7
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
July 2005
238 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1070891
Issue’s Table of Contents
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SIMD
permutations
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate116of438submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 892
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generation of permutations for SIMD processors

LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Generation of permutations for SIMD processors

Compiler optimizations for processors with SIMD instructions

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions