Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Ahmed, Nawaaz; Mateev, Nikolay; Pingali, Keshav

doi:10.1023/A:1012293814832

Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Published: October 2001

Volume 29, pages 493–544, (2001)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Nawaaz Ahmed¹,
Nikolay Mateev¹ &
Keshav Pingali¹

148 Accesses
27 Citations
Explore all metrics

Abstract

Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-nested loops into perfectly-nested loops by using statement sinking, loop fusion, etc., and then apply locality enhancing transformations to the resulting perfectly-nested loops, but the approaches used are fairly ad hoc and may fail even for simple programs. In this paper, we present a systematic approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of each statement into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like statement sinking and loop fusion which are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space can itself be transformed to increase locality further, after which fully permutable loops can be tiled. The final code generation step may produce imperfectly-nested loops as output if that is desirable. We present experimental evidence for the effectiveness of this approach, using dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

Semi-automatic Composition of Data Layout Transformations for Loop Vectorization

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

REFERENCES

C. Ancourt and F. Irigoin, Scanning Polyhedra with DO Loops, Principle and Practice of Parallel Progr., pp. 39 50 (April 1991).
E. Ayguadé and Jordi Torres, Partitioning the Statement per Iteration Space Using Nonsingular Matrices, ACM Inter. Conf. Supercomputing, Tokyo, pp. 407-415, (July 1993).
Uptal Banerjee, A Theory of Loop Permutations, Languages and Compilers for Parallel Computing, pp. 54-74 (1989).
Wei Li and Keshav Pingali, A Singular Loop Transformation Based on Nonsingular Matrices, IJPP, 22(2): xx-xx (April 1994).
Google Scholar
J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for multicomputers, J. Parallel Distributed Computing, 16(2):108-120 (October 1992).
Google Scholar
M. E. Wolf and M. S. Lam, A Data Locality Optimizing Algorithm, SIGPLAN Conf. Progr. Lang. Design and Implementation (June 1991).
Gene Golub and Charles Van Loan, Matrix Computations, The Johns Hopkins University Press (1996).
Steve Carr and K. Kennedy, Compiler Blockability of Numerical Algorithms, Supercomputing (1992).
Yonghong Song and Zhiyuan Li, New Tiling Techniques to Improve Cache Temporal Locality, SIGPLAN Conf. Progr. Lang. Design and Implementation (June 1999).
Induprakas Kodukula, Keshav Pingali, Robert Cox, and Dror Maydan, Imperfectly Nested Loop Transformations for Memory Hierarchy Management, Intern. Conf. Supercomputing, Rhodes, Greece (June 1999).
K. Kennedy and K. S. McKinley, Optimizing for Parallelism and Data Locality, ACM Int. Conf. Supercomputing, ACM Press, Washington, D.C., pp. 323-334 (July 1992).
Google Scholar
M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley Publishing Company (1995).
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali, Data-Centric Multi-Level Blocking, Progr. Lang. Design and Implementation, ACM SIGPLAN (June 1997).
W. Li and K. Pingali, Access Normalization: Loop Restructuring for NUMA Compilers, ACM Trans. Computer Systems (1993).
William Pugh, Counting Solutions to Presburger Formulas: How and Why, Technical Report, University of Maryland (1993).
Phillipe Claus, Counting Solutions to Linear and Nonlinear Constraints Through Erhart Polynomials, ACM Int. Conf. Supercomputing, ACM (May 1996).
Stephanie Coleman and Kathryn S. McKinley, The Size Selection Using Cache Organization and Data Layout, ACM SIGPLAN conf. Progr. Lang. Design and Implementation (PLDI), ACM Press (June 1995).
S. Ghosh, M. Martonosi, and S. Malik, Cache Miss Equations: An Analytical Representation of Cache Misses, Proc. The 11th Int. Conf. Supercomputing (ICS-97), ACM Press, New York, pp. 317-324 (July 1997).
Google Scholar
Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms, Fourth Int. Conf. Architectural Support for Progr. Lang. Operat. Syst., pp. 63-74 (April 1991).
Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen, Combining Loop Transformations Considering Caches and Scheduling, Silicon Graphics, Mountain View, California, MICRO 29, pp. 274-286 (1996).
S. Y. Kung, VLSI Array Processors, Prentice-Hall Inc. (1988).
Paul Feautrier, Some Efficient Solutions to the Affine Scheduling Problem-Part II: Multi-Dimensional time, I. J. P. P. (December 1992).
Wayne Kelly and William Pugh, Finding Legal Reordering Transformations Using Mappings, Proc. Seventh Int. Workshop of Lang. Compilers for Parallel Computing, Springer-Verlag, Ithaca, New York, pp. 107-124 (August 1994).
Google Scholar
Amy Lim and Monica Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Partitions, Parallel Computing 24:445-475 (1998).
Google Scholar
Wayne Kelly and William Pugh, Selecting Affine Mappings Based on Performance Estimation, Parallel Processing Letters 4(3):205-209 (September 1994).
Google Scholar
William Pugh and Evan Rosser, Iteration Space Slicing for Locality, Proc. 12th Int. Workshop of Languages and Compilers for Parallel Computing (LCPC99) (August 1999).
Nikolay Mateev, Keshav Pingali, Paul Stodghill, and Vladimir Kotlyar, Next-Generation Generic Programming and Its Application to Sparse Matrix Computations, Proc. Int. Conf. Supercomputing, Santa Fe, New Mexico (May 2000).
Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali and Paul Stodghill, A Framework for Sparse Matrix Code Synthesis from High-Level Specifications, Proc. SC2000, Dallas, Texas (November 2000).
S. Chaterjee, V. Jain, A. Lebeck, S. Mundhra, and M. Thottethodi, Nonlinear Array Layouts for Hierarchical Memory Systems, Int. Conf. On Supercomputing (ICS'99) (June 1999).
F. G. Gustavson, Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms, IBM J. Res. Dev. 41(6):737-755 (November 1997).
Google Scholar
Nawaaz Ahmed and Keshav Pingali, Automatic Generation of Block-Recursive Codes, Proc Euro-Par, Munich, Germany (August/September 2000).
Qing Yi, Vikram Adve, and Ken Kennedy, Transforming Loops to Recursion for Multi-Level Memory Hierarchies, Proc. ACM Sympos. Progr. Lang. Design and Implementation, Vancouver, Canada (June 2000).
Nikolay Mateev, Vijay Menon, and Keshav Pingali, Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring, Proc. Euro-Par, Munich, Germany (August/September 2000).
Nikolay Mateev, Vijay Menon, and Keshav Pingali, Fractal Symbolic Analysis for Program Transformations, ACM Int. Conf. Supercomputing (ICS), ACM (June 2001).

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, New York, 14853
Nawaaz Ahmed, Nikolay Mateev & Keshav Pingali

Authors

Nawaaz Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Mateev
View author publications
You can also search for this author in PubMed Google Scholar
Keshav Pingali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keshav Pingali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmed, N., Mateev, N. & Pingali, K. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests. International Journal of Parallel Programming 29, 493–544 (2001). https://doi.org/10.1023/A:1012293814832

Download citation

Issue Date: October 2001
DOI: https://doi.org/10.1023/A:1012293814832

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Abstract

Access this article

Similar content being viewed by others

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

Semi-automatic Composition of Data Layout Transformations for Loop Vectorization

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

REFERENCES

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Abstract

Access this article

Similar content being viewed by others

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

Semi-automatic Composition of Data Layout Transformations for Loop Vectorization

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

REFERENCES

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation