Abstract
Tiling is a loop transformation that the compiler uses to create automatically blocked algorithms in order to improve the benefits of the memory hierarchy and reduce the communication overhead between processors. Motivated by existing results, this paper presents a conceptually simple approach to finding tilings with a minimal amount of communication between tiles. The development of all results is based primarily on the inequality of arithmetic and geometric means, except for Lemma 8 whose proof relies on the concept of extremal rays of convex cones. The key insight is mat a tiling that is communication-minimal must induce the same amount of communication through all faces of a tile, which restricts the search space for optimal tilings to those tiling matrices whose rows are all extremal rays in a cone. For nested loops with several special forms of dependences, closed-form optimal tilings are derived. In the general case, a procedure is given that always returns optimal tilings. A detailed comparison of this work with some existing results is provided.
Supported by an Australian Research Council Grant A49600987.
Preview
Unable to display preview. Download preview PDF.
References
U. Banerjee. Loop Parallelization. Kluwer Academic Publishers, 1994.
P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling. Integration, the VLSI Journal, 17:33–51, 1994.
S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercomputing '92, pages 114–124, Minneapolis, Minn., Nov. 1992.
E. W. Dijkstra. Predicate Calculus and Programming Semantics. Series in Automatic Computation. Prentice-Hall, 1990.
J. J. Dongarra, S. J. Hammarline, and D. C. Sorensen. Block reduction of matrices to condensed forms for eigenvalue computations. J. of Computer Application and Mathematics, 27:216–227, 1989.
K. Gallivan, W. Jalby, U. Meier, and A. H. Sameh. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. of Supercomputer Applications, 2:12–48, 1988.
F. Irigoin and R. Triolet Supemode partitioning. In Proc. of the 15th Annual ACM Symposium on Principles of Programming Languages, pages 319–329, San Diego, California., Jan. 1988.
C. King and L. Ni. Grouping in nested loops for parallel execution on multicomputers. In Proc. of Int. Conf. on Parallel Processing, volume 2, pages II–31–II–38, Aug. 1989.
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74, Santa Clara, California, Apr. 1991.
H. Ohta, Y. Saito, M. Kainaga, and H. Ono. Optimal tile size adjustment in compiling for general DOACROSS loop nests. In Supercomputing '95, pages 270–279. ACM Press, 1995.
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. J. of Parallel and Distributed Computing, 16(2): 108–230, Oct. 1992.
A. Rogers and K. Pingali. Compiling for distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, 5(3):281–298, Mar. 1994.
R. Schreiber and J. J. Dongarra. Automatic blocking of nested loops. Technical Report 90.38, RIACS, May 1990.
A. Schrijver. Theory of Linear and Integer Programming. Series in Discrete Mathematics. John Wiley & Sons, 1986.
M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.
M. J. Wolfe. Iteration space tiling for memory hierarchies. In G. Rodrigue, editor, Parallel Processing for Scientific Computing, pages 357–361, Philadelphia PA, 1987.
M. J. Wolfe. More iteration space tiling. In Supercomputing '88, pages 655–664,Nov. 1989.
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, 1989.
M. J.Wolfe. High Performance Compilers for Parallel Computing. Addision-Wesley, 1996.
J. Xue. On tiling as a loop transformation. In Proc. of the SPDP Workshop on Challenges in Compiling for Scalable Parallel Systems, New Orleans, 1996. IEEE Computer Society Press.
Y.Q. Yang, C. Ancourt, and F. Irigoin. Minimal data dependence abstractions for loop transformations. In Proc. of the 7 th Workshop on Languages and Compilers for Parallel Computing, Ithaca, Aug 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xue, J. (1997). Communication-minimal tiling of uniform dependence loops. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017262
Download citation
DOI: https://doi.org/10.1007/BFb0017262
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive