Skip to main content
Log in

Scalarization Using Loop Alignment and Loop Skewing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is called scalarization.

Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.

In this paper, we present a novel application of two compiler strategies—loop alignment and loop skewing—to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, and J. L. Wagener. The Fortran 90 Handbook, McGraw-Hill, 1992.

  2. J. R. Allen. Dependence analysis for subscripted variables and its application to program transformation, PhD thesis, Rice University, Houston, TX, 1983.

    Google Scholar 

  3. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures, Morgan Kauffman, Oct. 2001.

  4. B. Bollobas and I. Leader. Edge-isoperimetric inequalities in the grid. Combinatorica, 11(4):299–314, 1991.

    Google Scholar 

  5. B. Bollobas and I. Leader. Compressions and edge-isoperimetric inequalities. Journal of Combinatorial Theory, Series A 56:47–62, 1991.

    Google Scholar 

  6. W. S. Brainerd, C. H. Goldberg, and J. C. Adams. Programmer's Guide to Fortran 90, McGraw-Hill, 1990.

  7. C. Ding. Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse. PhD thesis, Rice University, Houston, TX, 2000.

    Google Scholar 

  8. G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.

  9. L. D. Grey. A Course in APL with Applications, Addison-Wesley, Reading, Mass, 1973.

  10. B. Jähne. Digital Image Processing, Springer, 1997.

  11. P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. Journal of SuperComputing, 24(1):43–67, 2003.

    Google Scholar 

  12. L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, 1974.

    Google Scholar 

  13. F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann Publishers, San Mateo, CA, 1992.

    Google Scholar 

  14. E. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. Proceedings of the SIGPLAN'98 conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.

  15. Mathworks Inc. The Student Edition of MATLAB: The Language of Technical Computing, Prentice Hall, 1997.

  16. K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T. Spencer. End-user tools for application performance analysis, using hardware counters. International Conference on Parallel and Distributed Computing Systems, Aug. 2001.

  17. J. K. Peir. Program partitioning and synchronization on multiprocessor systems. PhD thesis, University of Illinois at Urbana-Champaign, March 1986. Technical Report UIUC-DCS-R-86-1259.

    Google Scholar 

  18. G. Roth and K. Kennedy. Dependence analysis of Fortran90 array syntax. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'96), Sunnyvale, CA, Aug. 1996.

  19. G. Roth. Optimizing Fortran90D/HPF for distributed memory computers. PhD thesis, Dept. of Computer Science, Rice University, April 1997.

  20. G. Roth. Advanced scalarization of array syntax. Proceedings of the 9th International Conference on Compiler Construction (CC'2000), March 2000.

  21. V. Sarkar. Optimized execution of fortran 90 array language on symmetric shared-memory multiprocessors. Lecture Notes of Computer Science (LNCS), Vol. 1656, 1999, pp. 131–147.

    Google Scholar 

  22. D.-L. Wang and P. Wang. Discrete isoperimetric problems. SIAM J. Appl. Math., 32: 860–870, 1977.

    Google Scholar 

  23. D. Wedel. FORTRAN for the texas instruments ASC system. SIGPLAN Notices, 10(3):119–132, 1975.

    Google Scholar 

  24. M. Wolfe. Optimizing supercompilers for supercomputers. PhD Dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign, Nov. 1982.

    Google Scholar 

  25. M. Wolfe. Loop skewing: The wavefront method revisited. International Journal of Parallel Programming, 15(4):279–293, 1986.

    Google Scholar 

  26. Y. Zhao and K. Kennedy. Scalarizing Fortran 90 array syntax. Technical Report TR01-373, Computer Science Department, Rice University, March 2001. A variation of this paper appears in Proceedings of the Second Los Alamos Computer Science Institute Symposium (LACSI 2001), Santa Fe, New Mexico, Oct. 2001.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Kennedy, K. Scalarization Using Loop Alignment and Loop Skewing. The Journal of Supercomputing 31, 5–46 (2005). https://doi.org/10.1023/B:SUPE.0000049323.47732.02

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:SUPE.0000049323.47732.02

Navigation