Scalarization Using Loop Alignment and Loop Skewing

Zhao, Yuan; Kennedy, Ken

doi:10.1023/B:SUPE.0000049323.47732.02

Scalarization Using Loop Alignment and Loop Skewing

Published: January 2005

Volume 31, pages 5–46, (2005)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yuan Zhao¹ &
Ken Kennedy¹

78 Accesses
4 Citations
Explore all metrics

Abstract

Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is called scalarization.

Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.

In this paper, we present a novel application of two compiler strategies—loop alignment and loop skewing—to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, and J. L. Wagener. The Fortran 90 Handbook, McGraw-Hill, 1992.
J. R. Allen. Dependence analysis for subscripted variables and its application to program transformation, PhD thesis, Rice University, Houston, TX, 1983.
Google Scholar
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures, Morgan Kauffman, Oct. 2001.
B. Bollobas and I. Leader. Edge-isoperimetric inequalities in the grid. Combinatorica, 11(4):299–314, 1991.
Google Scholar
B. Bollobas and I. Leader. Compressions and edge-isoperimetric inequalities. Journal of Combinatorial Theory, Series A 56:47–62, 1991.
Google Scholar
W. S. Brainerd, C. H. Goldberg, and J. C. Adams. Programmer's Guide to Fortran 90, McGraw-Hill, 1990.
C. Ding. Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse. PhD thesis, Rice University, Houston, TX, 2000.
Google Scholar
G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.
L. D. Grey. A Course in APL with Applications, Addison-Wesley, Reading, Mass, 1973.
B. Jähne. Digital Image Processing, Springer, 1997.
P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. Journal of SuperComputing, 24(1):43–67, 2003.
Google Scholar
L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, 1974.
Google Scholar
F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann Publishers, San Mateo, CA, 1992.
Google Scholar
E. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. Proceedings of the SIGPLAN'98 conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.
Mathworks Inc. The Student Edition of MATLAB: The Language of Technical Computing, Prentice Hall, 1997.
K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T. Spencer. End-user tools for application performance analysis, using hardware counters. International Conference on Parallel and Distributed Computing Systems, Aug. 2001.
J. K. Peir. Program partitioning and synchronization on multiprocessor systems. PhD thesis, University of Illinois at Urbana-Champaign, March 1986. Technical Report UIUC-DCS-R-86-1259.
Google Scholar
G. Roth and K. Kennedy. Dependence analysis of Fortran90 array syntax. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'96), Sunnyvale, CA, Aug. 1996.
G. Roth. Optimizing Fortran90D/HPF for distributed memory computers. PhD thesis, Dept. of Computer Science, Rice University, April 1997.
G. Roth. Advanced scalarization of array syntax. Proceedings of the 9th International Conference on Compiler Construction (CC'2000), March 2000.
V. Sarkar. Optimized execution of fortran 90 array language on symmetric shared-memory multiprocessors. Lecture Notes of Computer Science (LNCS), Vol. 1656, 1999, pp. 131–147.
Google Scholar
D.-L. Wang and P. Wang. Discrete isoperimetric problems. SIAM J. Appl. Math., 32: 860–870, 1977.
Google Scholar
D. Wedel. FORTRAN for the texas instruments ASC system. SIGPLAN Notices, 10(3):119–132, 1975.
Google Scholar
M. Wolfe. Optimizing supercompilers for supercomputers. PhD Dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign, Nov. 1982.
Google Scholar
M. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279–293, 1986.
Google Scholar
Y. Zhao and K. Kennedy. Scalarizing Fortran 90 array syntax. Technical Report TR01-373, Computer Science Department, Rice University, March 2001. A variation of this paper appears in Proceedings of the Second Los Alamos Computer Science Institute Symposium (LACSI 2001), Santa Fe, New Mexico, Oct. 2001.

Download references

Author information

Authors and Affiliations

Computer Science Department, Rice University, 6100 Main St, Houston, Texas, 77005, USA
Yuan Zhao & Ken Kennedy

Authors

Yuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kennedy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Kennedy, K. Scalarization Using Loop Alignment and Loop Skewing. The Journal of Supercomputing 31, 5–46 (2005). https://doi.org/10.1023/B:SUPE.0000049323.47732.02

Download citation

Issue Date: January 2005
DOI: https://doi.org/10.1023/B:SUPE.0000049323.47732.02

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalarization Using Loop Alignment and Loop Skewing

Abstract

Access this article

Similar content being viewed by others

Inter-iteration Scalar Replacement Using Array SSA Form

Automatic Vectorization for MATLAB

Automated Compiler Optimization of Multiple Vector Loads/Stores

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Scalarization Using Loop Alignment and Loop Skewing

Abstract

Access this article

Similar content being viewed by others

Inter-iteration Scalar Replacement Using Array SSA Form

Automatic Vectorization for MATLAB

Automated Compiler Optimization of Multiple Vector Loads/Stores

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation