Abstract
Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is called scalarization.
Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.
In this paper, we present a novel application of two compiler strategies—loop alignment and loop skewing—to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.
Similar content being viewed by others
References
J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, and J. L. Wagener. The Fortran 90 Handbook, McGraw-Hill, 1992.
J. R. Allen. Dependence analysis for subscripted variables and its application to program transformation, PhD thesis, Rice University, Houston, TX, 1983.
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures, Morgan Kauffman, Oct. 2001.
B. Bollobas and I. Leader. Edge-isoperimetric inequalities in the grid. Combinatorica, 11(4):299–314, 1991.
B. Bollobas and I. Leader. Compressions and edge-isoperimetric inequalities. Journal of Combinatorial Theory, Series A 56:47–62, 1991.
W. S. Brainerd, C. H. Goldberg, and J. C. Adams. Programmer's Guide to Fortran 90, McGraw-Hill, 1990.
C. Ding. Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse. PhD thesis, Rice University, Houston, TX, 2000.
G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.
L. D. Grey. A Course in APL with Applications, Addison-Wesley, Reading, Mass, 1973.
B. Jähne. Digital Image Processing, Springer, 1997.
P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. Journal of SuperComputing, 24(1):43–67, 2003.
L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, 1974.
F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann Publishers, San Mateo, CA, 1992.
E. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. Proceedings of the SIGPLAN'98 conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.
Mathworks Inc. The Student Edition of MATLAB: The Language of Technical Computing, Prentice Hall, 1997.
K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T. Spencer. End-user tools for application performance analysis, using hardware counters. International Conference on Parallel and Distributed Computing Systems, Aug. 2001.
J. K. Peir. Program partitioning and synchronization on multiprocessor systems. PhD thesis, University of Illinois at Urbana-Champaign, March 1986. Technical Report UIUC-DCS-R-86-1259.
G. Roth and K. Kennedy. Dependence analysis of Fortran90 array syntax. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'96), Sunnyvale, CA, Aug. 1996.
G. Roth. Optimizing Fortran90D/HPF for distributed memory computers. PhD thesis, Dept. of Computer Science, Rice University, April 1997.
G. Roth. Advanced scalarization of array syntax. Proceedings of the 9th International Conference on Compiler Construction (CC'2000), March 2000.
V. Sarkar. Optimized execution of fortran 90 array language on symmetric shared-memory multiprocessors. Lecture Notes of Computer Science (LNCS), Vol. 1656, 1999, pp. 131–147.
D.-L. Wang and P. Wang. Discrete isoperimetric problems. SIAM J. Appl. Math., 32: 860–870, 1977.
D. Wedel. FORTRAN for the texas instruments ASC system. SIGPLAN Notices, 10(3):119–132, 1975.
M. Wolfe. Optimizing supercompilers for supercomputers. PhD Dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign, Nov. 1982.
M. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279–293, 1986.
Y. Zhao and K. Kennedy. Scalarizing Fortran 90 array syntax. Technical Report TR01-373, Computer Science Department, Rice University, March 2001. A variation of this paper appears in Proceedings of the Second Los Alamos Computer Science Institute Symposium (LACSI 2001), Santa Fe, New Mexico, Oct. 2001.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhao, Y., Kennedy, K. Scalarization Using Loop Alignment and Loop Skewing. The Journal of Supercomputing 31, 5–46 (2005). https://doi.org/10.1023/B:SUPE.0000049323.47732.02
Issue Date:
DOI: https://doi.org/10.1023/B:SUPE.0000049323.47732.02