skip to main content
10.1145/1084834.1084909acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Improving superword level parallelism support in modern compilers

Published: 19 September 2005 Publication History

Abstract

Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology do not allow for an efficient exploitation of the inherent data parallelism available in many signal processing and multimedia applications. In this paper, we have explored the automatic vectorization of embedded applications. In particular, we have focused on algorithms in which the same computations are applied over a set of signals that are being processed simultaneously. Usually this set of signals is represented as a 2D array in which each row is an input signal that has to be filtered in some way. A motivating example, inspired by VoIP processing, illustrates that state-of-the-art vectorizing compilers inefficiently exploit the data parallelism inherent to this kind of applications. One of the main reasons behind this, is that they present inner loops that carry all the dependencies and external loops with strided memory accesses.We propose a modification of the Superword Level Parallelism (SLP) compiler, proposed in [9], that tries to overcome these problems. Experimental results show that our approach clearly outperforms commercial compilers.

References

[1]
Arm11 family. http://www.arm.com/products/CPUs/families/ARM11Family.html.
[2]
A. Bik, M. Girkar, P. Grey, and X. Tian. Efficient exploitation of parallelism on pentium iii and pentium 4 processor-based systems. Intel Technology Journal, 2001.
[3]
I. Corpation. Intel c/c++ and intel fortran compilers for linux. Available at http://www.intel.com/software/products/compilers.
[4]
S. Fuller. Motorola's AltiVec technology. Technical Report ALTIVECWP/D, MOTOROLA, 1998.
[5]
H. P. Hofstee. Power efficient processor architecture and the cell processor. In HPCA, pages 258--262, 2005.
[6]
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, February 1999.
[7]
A. Krall and S. Lelait. Compilation techniques for multimedia processors. Int. Journal on Parallel Programing, 28(4), 2000.
[8]
K. Krewell. Cell moves into the limelight. Microprocessor Report, (2/14/05-01), February 2005.
[9]
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. ACM SIGPLAN Notices, 35(5):145--156, 2000.
[10]
S. Larsen, E. Witchel, and S. Amarasinghe. Techniques for increasing and detecting memory alignment. Technical Report MIT-LCS-TM-621, MIT, USA, 2001.
[11]
J. Shin, J. Chame, and M. W. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. In Int. Conf. on Parallel Architectures and Compiler Techniques, pages 45--55, 2002.
[12]
S. T. Thakkar and T. Huff. Internet streaming simd extensions. Computer, 32(12):26--34, 1999.
[13]
H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. Addison-Wesley, Massachusetts, USA, 1991.

Cited By

View all

Index Terms

  1. Improving superword level parallelism support in modern compilers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
    September 2005
    356 pages
    ISBN:1595931619
    DOI:10.1145/1084834
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 September 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FIR
    2. automatic vectorization
    3. superword level parallelism

    Qualifiers

    • Article

    Conference

    CODES/ISSS05

    Acceptance Rates

    CODES+ISSS '05 Paper Acceptance Rate 50 of 200 submissions, 25%;
    Overall Acceptance Rate 280 of 864 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Free RiderACM Transactions on Embedded Computing Systems10.1145/299019416:2(1-24)Online publication date: 12-Dec-2016
    • (2015)Free RiderACM SIGPLAN Notices10.1145/2808704.275496250:5(1-10)Online publication date: 4-Jun-2015
    • (2015)Free RiderProceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM10.1145/2670529.2754962(1-10)Online publication date: 4-Jun-2015
    • (2015)Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPUProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0310.1109/Trustcom.2015.612(53-60)Online publication date: 20-Aug-2015
    • (2014)Block Unification IF-conversion for High Performance ArchitecturesIEEE Computer Architecture Letters10.1109/L-CA.2012.2813:1(17-20)Online publication date: 1-Jan-2014
    • (2013)Hybrid type legalization for a sparse SIMD instruction setACM Transactions on Architecture and Code Optimization10.1145/2509420.250942210:3(1-14)Online publication date: 16-Sep-2013
    • (2012)A compiler framework for extracting superword level parallelismACM SIGPLAN Notices10.1145/2345156.225410647:6(347-358)Online publication date: 11-Jun-2012
    • (2012)A compiler framework for extracting superword level parallelismProceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2254064.2254106(347-358)Online publication date: 11-Jun-2012
    • (2008)Outer-loop vectorizationProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454119(2-11)Online publication date: 25-Oct-2008
    • (2008)Compiling for an indirect vector register architectureProceedings of the 5th conference on Computing frontiers10.1145/1366230.1366266(199-208)Online publication date: 5-May-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media