Automatic Intra-Register Vectorization for the Intel® Architecture

Bik, Aart J. C.; Girkar, Milind; Grey, Paul M.; Tian, Xinmin

doi:10.1023/A:1014230429447

Automatic Intra-Register Vectorization for the Intel® Architecture

Published: April 2002

Volume 30, pages 65–98, (2002)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Aart J. C. Bik¹,
Milind Girkar¹,
Paul M. Grey¹ &
…
Xinmin Tian¹

336 Accesses
86 Citations
6 Altmetric
Explore all metrics

Abstract

Recent extensions to the Intel^® Architecture feature the SIMD technique to enhance the performance of computational intensive applications that perform the same operation on different elements in a data set. To date, much of the code that exploits these extensions has been hand-coded. The task of the programmer is substantially simplified, however, if a compiler does this exploitation automatically. The high-performance Intel^® C++/Fortran compiler supports automatic translation of serial loops into code that uses the SIMD extensions to the Intel^® Architecture. This paper provides a detailed overview of the automatic vectorization methods used by this compiler together with an experimental validation of their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Compiler Optimization of Multiple Vector Loads/Stores

Article 09 January 2017

Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation

Automatic SIMD Vectorization of Loops: Issues, Energy Efficiency and Performance on Intel Processors

REFERENCES

Michael J. Flynn, Computer Architecture, Jones and Bartlett Publishers, Boston, Massachusetts (1995).
Google Scholar
John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, San Mateo, California (1990).
Google Scholar
Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis, Introduction to Parallel Programming, The Benjamin/Cummings Publishing Company, Redwood City, California (1994).
Google Scholar
Dezsö Sima, Terence Fountain, and Péter Kacsuk, Advanced Computer Architectures- A Design Space Approach, Addison-Wesley, Harlow England (1997).
Google Scholar
R. M. Russel, The CRAY-1 Processor System, Comm. ACM 21(1):63–72 (January 1978).
Google Scholar
T. Blank, The Maspar MP-1 Architecture, Proc. IEEE Compcon Spring (February 1990).
David Bistry et al., The Complete Guide to MMX− Technology, McGraw-Hill, New York (1997).
Google Scholar
Intel Corporation, Intel Architecture MMX− Technology-Programmer's Reference Manual, Intel Corporation, Order No. 243007-003, available at http://developer.intel.com (1997).
Glenn Hinton, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, and Patrice Roussel, The Microarchitecture of the Pentium® 4 Processor. Intel Technology Journal (2001), http://intel.com/technology/itj/.
Intel Corporation, Intel Architecture Software Developer's Manual, Volume 1: Basic Architecture, Intel Corporation, available at http://developer.intel.com/ (2001).
J. R. Allen and K. Kennedy, Automatic Translation of Fortran Programs to Vector Form, ACM Transactions on Programming Languages and Systems 9:491–542 (1987).
Google Scholar
David J. Kuck, The Structure of Computers and Computations, John Wiley and Sons, New York (1978), Vol. 1.
Google Scholar
John M. Levesque and Joel W. Williamson, A Guidebook to Fortran on Supercomputers, Academic Press, San Diego (1991).
Google Scholar
Constantine D. Polychronopoulos, Parallel Programming and Compilers, Kluwer, Boston (1988).
Google Scholar
Michael J. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, California (1996).
Google Scholar
Hans Zima, Supercompilers for Parallel and Vector Computers, ACM Press, New York (1990).
Google Scholar
Alfred V. Aho, Ravi Sethi, and Jeffrey D. U llman, Compilers Principles, Techniques and Tools, Addison-Wesley (1986).
Andrew Appel, Modern Compiler Implementation in C, Cambridge University Press (1998).
Utpal Banerjee. Dependence Analysis, Kluwer, Boston, 1997. A Book Series on Loop Transformations for Restructuring Compilers.
Google Scholar
Michael Burke and Ron Cytron, Interprocedural dependence analysis and parallelization, Proceedings of the Symposium on Compiler Construction, pp. 162–175 (1986).
C. N. Fisher and R. J. LeBlanc, Crafting a Compiler with C, Benjamin-Cummings, Menlo Park, California (1991).
Google Scholar
Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufman Publishers (1997).
George B. Dantzig and B. Curtis Eaves, Fourier-Motzkin Elimination and Its Dual, J. Comb. Theory 14:288–297 (1973).
Google Scholar
Alexander Schrijver, Theory of Linear and Integer Programming, John Wiley and Sons, Chichester, England (1986).
Google Scholar
Brian W. Kernighan and Dennis M. R itchie, The C Programming Language, Prentice-Hall, Englewood Cliffs, New Jersey (1988).
Google Scholar
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian, Automatic Detection of Saturation and Clipping Idioms for the Intel® Architecture, manuscript in preparation (2001).
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian, Experiments with automatic vectorization for the Pentium® 4 Processor, Proceedings of the 9th Workshop on Compilers for Parallel Computers, pp. 1–10 (June 2001).
J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. van der Vorst, Solving linear systems on vector and shared memory computers, SIAM, Philadelphia, PA (1991).
Google Scholar
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic Linear algebra subprograms for Fortran usage, ACM Transactions on Mathematical Software 5:308–323 (1979).
Google Scholar
R. J. Fisher and H. G. Dietz, Compiling for SIMD within a Register, 1998 Workshop on Languages and Compilers for Parallel Computing, University of North Carolina at Chapel Hill, North Carolina, August 7-9 (1998).
Samuel Larsen and Saman Amarasinghe, Exploiting Superword Level Parallelism with Multimedia Instruction Sets, Proceeding of the SIGPLAN Conference on Programming Language Design and Implementation, Vancouver, B.C. (June 2000).
Gilles Pokam, Julien Simonnet, and FranÇois Bodin, A Retargetable Preprocessor for Multimedia Instructions, Proceedings of the 9th Workshop on Compilers for Parallel Computers, pp. 291–301 (June 2001).
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin T ian, Efficient Exploitation of Parallelism on Pentium® III and Pentium® 4 Processor-Based Systems, Intel Technology Journal (2001), http://intel.com/technology/itj/.

Download references

Author information

Authors and Affiliations

Intel Corporation, 2200 Mission College Blvd. SC12-301, Santa Clara, California, 95052
Aart J. C. Bik, Milind Girkar, Paul M. Grey & Xinmin Tian

Authors

Aart J. C. Bik
View author publications
You can also search for this author in PubMed Google Scholar
Milind Girkar
View author publications
You can also search for this author in PubMed Google Scholar
Paul M. Grey
View author publications
You can also search for this author in PubMed Google Scholar
Xinmin Tian
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bik, A.J.C., Girkar, M., Grey, P.M. et al. Automatic Intra-Register Vectorization for the Intel® Architecture. International Journal of Parallel Programming 30, 65–98 (2002). https://doi.org/10.1023/A:1014230429447

Download citation

Issue Date: April 2002
DOI: https://doi.org/10.1023/A:1014230429447

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Intra-Register Vectorization for the Intel® Architecture

Abstract

Access this article

Similar content being viewed by others

Automated Compiler Optimization of Multiple Vector Loads/Stores

Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation

Automatic SIMD Vectorization of Loops: Issues, Energy Efficiency and Performance on Intel Processors

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Automatic Intra-Register Vectorization for the Intel® Architecture

Abstract

Access this article

Similar content being viewed by others

Automated Compiler Optimization of Multiple Vector Loads/Stores

Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation

Automatic SIMD Vectorization of Loops: Issues, Energy Efficiency and Performance on Intel Processors

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation