Article

Improving superword level parallelism support in modern compilers

Authors:

F. CatthoorAuthors Info & Claims

CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Pages 303 - 308

https://doi.org/10.1145/1084834.1084909

Published: 19 September 2005 Publication History

Get Access

Abstract

Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology do not allow for an efficient exploitation of the inherent data parallelism available in many signal processing and multimedia applications. In this paper, we have explored the automatic vectorization of embedded applications. In particular, we have focused on algorithms in which the same computations are applied over a set of signals that are being processed simultaneously. Usually this set of signals is represented as a 2D array in which each row is an input signal that has to be filtered in some way. A motivating example, inspired by VoIP processing, illustrates that state-of-the-art vectorizing compilers inefficiently exploit the data parallelism inherent to this kind of applications. One of the main reasons behind this, is that they present inner loops that carry all the dependencies and external loops with strided memory accesses.We propose a modification of the Superword Level Parallelism (SLP) compiler, proposed in [9], that tries to overcome these problems. Experimental results show that our approach clearly outperforms commercial compilers.

References

[1]

Arm11 family. http://www.arm.com/products/CPUs/families/ARM11Family.html.

Google Scholar

[2]

A. Bik, M. Girkar, P. Grey, and X. Tian. Efficient exploitation of parallelism on pentium iii and pentium 4 processor-based systems. Intel Technology Journal, 2001.

Google Scholar

[3]

I. Corpation. Intel c/c++ and intel fortran compilers for linux. Available at http://www.intel.com/software/products/compilers.

Google Scholar

[4]

S. Fuller. Motorola's AltiVec technology. Technical Report ALTIVECWP/D, MOTOROLA, 1998.

Google Scholar

[5]

H. P. Hofstee. Power efficient processor architecture and the cell processor. In HPCA, pages 258--262, 2005.

Digital Library

Google Scholar

[6]

M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, February 1999.

Digital Library

Google Scholar

[7]

A. Krall and S. Lelait. Compilation techniques for multimedia processors. Int. Journal on Parallel Programing, 28(4), 2000.

Crossref

Google Scholar

[8]

K. Krewell. Cell moves into the limelight. Microprocessor Report, (2/14/05-01), February 2005.

Google Scholar

[9]

S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. ACM SIGPLAN Notices, 35(5):145--156, 2000.

Digital Library

Google Scholar

[10]

S. Larsen, E. Witchel, and S. Amarasinghe. Techniques for increasing and detecting memory alignment. Technical Report MIT-LCS-TM-621, MIT, USA, 2001.

Google Scholar

[11]

J. Shin, J. Chame, and M. W. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. In Int. Conf. on Parallel Architectures and Compiler Techniques, pages 45--55, 2002.

Digital Library

Google Scholar

[12]

S. T. Thakkar and T. Huff. Internet streaming simd extensions. Computer, 32(12):26--34, 1999.

Digital Library

Google Scholar

[13]

H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. Addison-Wesley, Massachusetts, USA, 1991.

Crossref

Google Scholar

Cited By

View all

Manilov SFranke BMagrath AAndrieu C(2016)Free RiderACM Transactions on Embedded Computing Systems10.1145/299019416:2(1-24)Online publication date: 12-Dec-2016
https://dl.acm.org/doi/10.1145/2990194
Manilov SFranke BMagrath AAndrieu C(2015)Free RiderACM SIGPLAN Notices10.1145/2808704.275496250:5(1-10)Online publication date: 4-Jun-2015
https://dl.acm.org/doi/10.1145/2808704.2754962
Manilov SFranke BMagrath AAndrieu CNoh SFischmeister SXue J(2015)Free RiderProceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM10.1145/2670529.2754962(1-10)Online publication date: 4-Jun-2015
https://dl.acm.org/doi/10.1145/2670529.2754962
Show More Cited By

Index Terms

Improving superword level parallelism support in modern compilers
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A compiler framework for extracting superword level parallelism
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). ...
Improving the effectiveness of searching for isomorphic chains in superword level parallelism
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Most high-performance microprocessors come equipped with general-purpose Single Instruction Multiple Data (SIMD) execution engines to enhance performance. Compilers use auto-vectorization techniques to identify vector parallelism and generate SIMD code ...
goSLP: globally optimized superword level parallelism framework

Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit superword level parallelism (SLP), a type of fine-grained parallelism. Current SLP auto-vectorization techniques ...

Comments

Information & Contributors

Information

Published In

CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

September 2005

356 pages

ISBN:1595931619

DOI:10.1145/1084834

General Chairs:
Petru Eles
Linköping University, Sweden
,
Axel Jantsch
Royal Institute of Technology, Sweden
,
Program Chair:
Reinaldo Bergamaschi
IBM T. J. Watson Research Center

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CODES/ISSS05

Sponsor:

CODES/ISSS05: Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

September 19 - 21, 2005

NJ, Jersey City, USA

Acceptance Rates

CODES+ISSS '05 Paper Acceptance Rate 50 of 200 submissions, 25%;

Overall Acceptance Rate 280 of 864 submissions, 32%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
335
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Manilov SFranke BMagrath AAndrieu C(2016)Free RiderACM Transactions on Embedded Computing Systems10.1145/299019416:2(1-24)Online publication date: 12-Dec-2016
https://dl.acm.org/doi/10.1145/2990194
Manilov SFranke BMagrath AAndrieu C(2015)Free RiderACM SIGPLAN Notices10.1145/2808704.275496250:5(1-10)Online publication date: 4-Jun-2015
https://dl.acm.org/doi/10.1145/2808704.2754962
Manilov SFranke BMagrath AAndrieu CNoh SFischmeister SXue J(2015)Free RiderProceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM10.1145/2670529.2754962(1-10)Online publication date: 4-Jun-2015
https://dl.acm.org/doi/10.1145/2670529.2754962
Xu SGregg D(2015)Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPUProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0310.1109/Trustcom.2015.612(53-60)Online publication date: 20-Aug-2015
https://dl.acm.org/doi/10.1109/Trustcom.2015.612
Rotem NBen Asher Y(2014)Block Unification IF-conversion for High Performance ArchitecturesIEEE Computer Architecture Letters10.1109/L-CA.2012.2813:1(17-20)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1109/L-CA.2012.28
Asher YRotem N(2013)Hybrid type legalization for a sparse SIMD instruction setACM Transactions on Architecture and Code Optimization10.1145/2509420.250942210:3(1-14)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1145/2509420.2509422
Liu JZhang YJang ODing WKandemir M(2012)A compiler framework for extracting superword level parallelismACM SIGPLAN Notices10.1145/2345156.225410647:6(347-358)Online publication date: 11-Jun-2012
https://dl.acm.org/doi/10.1145/2345156.2254106
Liu JZhang YJang ODing WKandemir MVitek JLin HTip F(2012)A compiler framework for extracting superword level parallelismProceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2254064.2254106(347-358)Online publication date: 11-Jun-2012
https://dl.acm.org/doi/10.1145/2254064.2254106
Nuzman DZaks AMoshovos ATarditi DOlukotun K(2008)Outer-loop vectorizationProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454119(2-11)Online publication date: 25-Oct-2008
https://dl.acm.org/doi/10.1145/1454115.1454119
Nuzman DNamolaru MZaks ADerby JRamirez ABiliardi GGschwind M(2008)Compiling for an indirect vector register architectureProceedings of the 5th conference on Computing frontiers10.1145/1366230.1366266(199-208)Online publication date: 5-May-2008
https://dl.acm.org/doi/10.1145/1366230.1366266

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A compiler framework for extracting superword level parallelism

Improving the effectiveness of searching for isomorphic chains in superword level parallelism

goSLP: globally optimized superword level parallelism framework

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations