Skip to main content
Log in

Automatic Intra-Register Vectorization for the Intel® Architecture

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Recent extensions to the Intel® Architecture feature the SIMD technique to enhance the performance of computational intensive applications that perform the same operation on different elements in a data set. To date, much of the code that exploits these extensions has been hand-coded. The task of the programmer is substantially simplified, however, if a compiler does this exploitation automatically. The high-performance Intel® C++/Fortran compiler supports automatic translation of serial loops into code that uses the SIMD extensions to the Intel® Architecture. This paper provides a detailed overview of the automatic vectorization methods used by this compiler together with an experimental validation of their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Michael J. Flynn, Computer Architecture, Jones and Bartlett Publishers, Boston, Massachusetts (1995).

    Google Scholar 

  2. John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, San Mateo, California (1990).

    Google Scholar 

  3. Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis, Introduction to Parallel Programming, The Benjamin/Cummings Publishing Company, Redwood City, California (1994).

    Google Scholar 

  4. Dezsö Sima, Terence Fountain, and Péter Kacsuk, Advanced Computer Architectures- A Design Space Approach, Addison-Wesley, Harlow England (1997).

    Google Scholar 

  5. R. M. Russel, The CRAY-1 Processor System, Comm. ACM 21(1):63–72 (January 1978).

    Google Scholar 

  6. T. Blank, The Maspar MP-1 Architecture, Proc. IEEE Compcon Spring (February 1990).

  7. David Bistry et al., The Complete Guide to MMX− Technology, McGraw-Hill, New York (1997).

    Google Scholar 

  8. Intel Corporation, Intel Architecture MMX− Technology-Programmer's Reference Manual, Intel Corporation, Order No. 243007-003, available at http://developer.intel.com (1997).

  9. Glenn Hinton, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, and Patrice Roussel, The Microarchitecture of the Pentium® 4 Processor. Intel Technology Journal (2001), http://intel.com/technology/itj/.

  10. Intel Corporation, Intel Architecture Software Developer's Manual, Volume 1: Basic Architecture, Intel Corporation, available at http://developer.intel.com/ (2001).

  11. J. R. Allen and K. Kennedy, Automatic Translation of Fortran Programs to Vector Form, ACM Transactions on Programming Languages and Systems 9:491–542 (1987).

    Google Scholar 

  12. David J. Kuck, The Structure of Computers and Computations, John Wiley and Sons, New York (1978), Vol. 1.

    Google Scholar 

  13. John M. Levesque and Joel W. Williamson, A Guidebook to Fortran on Supercomputers, Academic Press, San Diego (1991).

    Google Scholar 

  14. Constantine D. Polychronopoulos, Parallel Programming and Compilers, Kluwer, Boston (1988).

    Google Scholar 

  15. Michael J. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, California (1996).

    Google Scholar 

  16. Hans Zima, Supercompilers for Parallel and Vector Computers, ACM Press, New York (1990).

    Google Scholar 

  17. Alfred V. Aho, Ravi Sethi, and Jeffrey D. U llman, Compilers Principles, Techniques and Tools, Addison-Wesley (1986).

  18. Andrew Appel, Modern Compiler Implementation in C, Cambridge University Press (1998).

  19. Utpal Banerjee. Dependence Analysis, Kluwer, Boston, 1997. A Book Series on Loop Transformations for Restructuring Compilers.

    Google Scholar 

  20. Michael Burke and Ron Cytron, Interprocedural dependence analysis and parallelization, Proceedings of the Symposium on Compiler Construction, pp. 162–175 (1986).

  21. C. N. Fisher and R. J. LeBlanc, Crafting a Compiler with C, Benjamin-Cummings, Menlo Park, California (1991).

    Google Scholar 

  22. Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufman Publishers (1997).

  23. George B. Dantzig and B. Curtis Eaves, Fourier-Motzkin Elimination and Its Dual, J. Comb. Theory 14:288–297 (1973).

    Google Scholar 

  24. Alexander Schrijver, Theory of Linear and Integer Programming, John Wiley and Sons, Chichester, England (1986).

    Google Scholar 

  25. Brian W. Kernighan and Dennis M. R itchie, The C Programming Language, Prentice-Hall, Englewood Cliffs, New Jersey (1988).

    Google Scholar 

  26. Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian, Automatic Detection of Saturation and Clipping Idioms for the Intel® Architecture, manuscript in preparation (2001).

  27. Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian, Experiments with automatic vectorization for the Pentium® 4 Processor, Proceedings of the 9th Workshop on Compilers for Parallel Computers, pp. 1–10 (June 2001).

  28. J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. van der Vorst, Solving linear systems on vector and shared memory computers, SIAM, Philadelphia, PA (1991).

    Google Scholar 

  29. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic Linear algebra subprograms for Fortran usage, ACM Transactions on Mathematical Software 5:308–323 (1979).

    Google Scholar 

  30. R. J. Fisher and H. G. Dietz, Compiling for SIMD within a Register, 1998 Workshop on Languages and Compilers for Parallel Computing, University of North Carolina at Chapel Hill, North Carolina, August 7-9 (1998).

  31. Samuel Larsen and Saman Amarasinghe, Exploiting Superword Level Parallelism with Multimedia Instruction Sets, Proceeding of the SIGPLAN Conference on Programming Language Design and Implementation, Vancouver, B.C. (June 2000).

  32. Gilles Pokam, Julien Simonnet, and FranÇois Bodin, A Retargetable Preprocessor for Multimedia Instructions, Proceedings of the 9th Workshop on Compilers for Parallel Computers, pp. 291–301 (June 2001).

  33. Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin T ian, Efficient Exploitation of Parallelism on Pentium® III and Pentium® 4 Processor-Based Systems, Intel Technology Journal (2001), http://intel.com/technology/itj/.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bik, A.J.C., Girkar, M., Grey, P.M. et al. Automatic Intra-Register Vectorization for the Intel® Architecture. International Journal of Parallel Programming 30, 65–98 (2002). https://doi.org/10.1023/A:1014230429447

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014230429447

Navigation