Loop parallelization in the polytope model

Lengauer, Christian

doi:10.1007/3-540-57208-2_28

Loop parallelization in the polytope model

Christian Lengauer¹

Invited Talk
Conference paper
First Online: 01 January 2005

311 Accesses
78 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 715))

Abstract

During the course of the last decade, a mathematical model for the parallelization of FOR-loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is represented by a convex polytope in ℤ^r. The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in space-time. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speed-ups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of future parallelizing compilers. This paper provides an overview and future perspective of the polytope model and methods based on it.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proc. 3rd ACM SIGPLAN Symp. on Principles & Practice of Parallel Programming (PPoPP), pages 39–50. ACM Press, 1991.
Google Scholar
M. Barnett and C. Lengauer. Unimodularity and the parallelization of loops. Parallel Processing Letters, 2(2–3):273–281, 1992.
Article Google Scholar
M. Barnett and C. Lengauer. Unimodularity considered non-essential (extended abstract). In L. Bougé, M. Cosnard, Y. Robert, and D. Trystram, editors, Parallel Processing: CONPAR 92-VAPP V, Lecture Notes in Computer Science 634, pages 659–664. Springer-Verlag, 1992.
Google Scholar
M. Barnett and C. Lengauer. A systolizing compilation scheme for nested loops with linear bounds. In P. E. Lauer, R. Janicki, and J. Zucker, editors, Functional Programming, Concurrency, Simulation and Automated Reasoning (FPCSAR), Lecture Notes in Computer Science. Springer-Verlag, 1993. To appear.
Google Scholar
J. Bu. Systematic Design of Regular VLSI Processor Arrays. PhD thesis, Department of Electrical Engineering, Delft University of Technology, May 1990.
Google Scholar
J. Bu and E. F. Deprettere. Processor clustering for the design of optimal fixed-size systolic arrays. In E. F. Deprettere and A.-J. van der Veen, editors, Algorithms and Parallel VLSI-Architectures, volume A, pages 341–362. Elsevier (North-Holland), 1991.
Google Scholar
P. R. Cappello. A processor-time-minimal systolic array for cubical mesh algorithms. IEEE Trans. on Parallel and Distributed Systems, 3(1):4–13, January 1992.
Article Google Scholar
P. R. Cappello and K. Steiglitz. Unifying VLSI array design with linear transformations of space-time. In F. P. Preparata, editor, Advances in Computing Research, Vol. 2: VLSI Theory, pages 23–65. JAI Press, 1984.
Google Scholar
M. Chen, Y. Choo, and J. Li. Crystal: Theory and pragmatics of generating efficient parallel code. In B. K. Szymanski, editor, Parallel Functional Languages and Compilers, Frontier Series, chapter 7. ACM Press, 1991.
Google Scholar
M. C. Chen. A design methodology for synthesizing parallel algorithms and architectures. J. Parallel and Distributed Computing, 3(4):461–491, 1986.
Article Google Scholar
M. C. Chen, Y. Choo, and J. Li. Compiling parallel programs by optimizing performance. J. Supercomputing, 2:171–207, 1988.
Article Google Scholar
P. Clauss, C. Mongenet, and G. R. Perrin. Calculus of space-optimal mappings of systolic algorithms on processor arrays. J. VLSI Signal Processing, 4(1):27–36, February 1992.
Article Google Scholar
A. Darte. Regular partitioning for synthesizing fixed-size systolic arrays. Integration, 12(3):293–304, December 1991.
Google Scholar
E. W. Dijkstra and C. S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science. Springer-Verlag, 1990.
Google Scholar
P. Feautrier. Parametric integer programming. Operations Research, 22(3):243–268, 1988.
Google Scholar
P. Feautrier. Semantical analysis and mathematical programming. In M. Cosnard, Y. Robert, P. Quinton, and M. Raynal, editors, Parallel & Distributed Algorithms, pages 309–320. North-Holland, 1989.
Google Scholar
P. Feautrier. Dataflow analysis of array and scalar references. Int. J. Parallel Programming, 20(1):23–53, February 1991.
Google Scholar
M. R. Garey and D. S. Johnson. Computers and Intractability. Freeman, 1979.
Google Scholar
P. Held and E. F. Deprettere. HiFi: From parallel algorithm to fixed-size VLSI processor array. In F. Catthoor and L. Svensson, editors, Application-Driven Architecture Synthesis, pages 71–92. Kluwer Academic Publishers, 1993.
Google Scholar
C.-H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. In D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing. The MIT Press, 1990.
Google Scholar
R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. J. ACM, 14(3):563–590, July 1967.
Google Scholar
R. H. Kuhn. Optimization and Interconnection Complexity for Parallel Processors, Single-Stage Networks and Decision Trees. PhD thesis, University of Illinois at Urbana-Champaign, 1980.
Google Scholar
H. T. Kung and C. E. Leiserson. Algorithms for VLSI processor arrays. In C. Mead and L. Conway, editors, Introduction to VLSI Systems, chapter 8.3. Addison-Wesley, 1980. Previously published as: Systolic arrays for VLSI, in SIAM Sparse Matrix Proceedings, 1978, 245–282.
Google Scholar
L. Lamport. The parallel execution of DO loops. Comm. ACM, 17(2):83–93, February 1974.
Google Scholar
H. Le Verge. Reduction operators in ALPHA. In D. Etiemble and J.-C. Syre, editors, Parallel Architectures and Languages Europe (PARLE '92), Lecture Notes in Computer Science 605, pages 397–410. Springer-Verlag, 1992.
Google Scholar
H. Le Verge, C. Mauras, and P. Quinton. The ALPHA language and its use for the design of systolic arrays. J. VLSI Signal Processing, 3:173–182, 1991.
Google Scholar
P. Lee and Z. Kedem. Synthesizing linear-array algorithms from nested for loop algorithms. IEEE Trans. on Computers, C-37(12):1578–1598, December 1988.
Google Scholar
C. Lengauer and J. Xue. A systolic array for pyramidal algorithms. J. VLSI Signal Processing, 3(3):239–259, 1991.
Google Scholar
J. Li and M. Chen. The data alignment phase in compiling programs for distributed memory machines. J. Parallel and Distributed Computing, 13(2):213–221, October 1991.
Google Scholar
W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. Technical Report TR 92-1294, Department of Computer Science, Cornell University, July 1992.
Google Scholar
W. L. Miranker and A. Winkler. Space-time representation of computational structures. Computing, pages 93–114, 1984.
Google Scholar
D. I. Moldovan. On the design of algorithms for VLSI systolic arrays. Proc. IEEE, 71(1):113–120, January 1983.
Google Scholar
D. I. Moldovan and J. A. B. Fortes. Partitioning and mapping algorithms into fixed-size systolic arrays. IEEE Trans. on Computers, C-35(1):1–12, January 1986.
Google Scholar
G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Interscience Series in Discrete Mathematics and Optimization. Wiley & Sons, 1988.
Google Scholar
D. D. Prest. Translation of abstract distributed programs to occam 2. 4th-Year Report, Department of Computer Science, University of Edinburgh, June 1992.
Google Scholar
P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In Proc. 11th Ann. Int. Symp. on Computer Architecture, pages 208–214. IEEE Computer Society Press, 1984.
Google Scholar
P. Quinton and Y. Robert. Systolic Algorithms and Architectures. Prentice-Hall, 1990.
Google Scholar
P. Quinton and V. van Dongen. The mapping of linear recurrence equations on regular arrays. J. VLSI Signal Processing, 1(2):95–113, October 1989.
Google Scholar
S. V. Rajopadhye. Algebraic transformations in systolic array synthesis: A case study. In L. J. M. Claesen, editor, Formal VLSI Specification and Synthesis (VLSI Design Methods-I), pages 361–370. North-Holland, 1990.
Google Scholar
S. V. Rajopadhye and M. Muddarangegowda. Parallel assignment, reduction and communication. In Proc. SIAM Conference on Parallel Processing for Scientific Computing, pages 849–853. SIAM, 1993.
Google Scholar
J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Trans, on Parallel and Distributed Systems, 2(4):472–482, 1991.
Google Scholar
S. K. Rao. Regular Iterative Algorithms and their Implementations on Processor Arrays. PhD thesis, Department of Electrical Engineering, Stanford University, October 1985.
Google Scholar
S. K. Rao and T. Kailath. Regular iterative algorithms and their implementations on processor arrays. Proc. IEEE, 76 (3):259–282, March 1988.
Google Scholar
H. B. Ribas. Automatic Generation of Systolic Programs from Nested Loops. PhD thesis, Department of Computer Science, Carnegie-Mellon University, June 1990. Technical Report CMU-CS-90-143.
Google Scholar
Y. Robert and S. W. Song. Revisiting cycle shrinking. Parallel Computing, 18:481–496, 1992.
Google Scholar
V. Roychowdhury, L. Thiele, S. K. Rao, and T. Kailath. On the localization of algorithms for VLSI processor arrays. In R. Brodersen and H. Moscovitz, editors, VLSI Signal Processing III, pages 459–470. IEEE Press, 1988.
Google Scholar
A. Schrijver. Theory of Linear and Integer Programming. Series in Discrete Mathematics. Wiley & Sons, 1986.
Google Scholar
J. Teich and L. Thiele. Control generation in the design of processor arrays. J. VLSI Signal Processing, 3(1–2):77–92, 1991.
Google Scholar
J. Teich and L. Thiele. Partitioning of processor arrays: A piecewise regular approach. INTEGRATION, 14(3):297–332, 1993.
Google Scholar
L. Thiele. CAD for signal processing architectures. In P. Dewilde, editor, The State of the Art in Computer Systems and Software Engineering, pages 101–151. Kluwer Academic Publishers, 1992.
Google Scholar
A. van der Hoeven. Concepts and Implementation of a Design System for Digital Signal Processing. PhD thesis, Department of Electrical Engineering, Delft University of Technology, October 1992.
Google Scholar
V. van Dongen. Quasi-regular arrays: Definition and design methodology. In J. V. McCanny, J. McWhirter, and E. E. Swartzlander, editors, Systolic Array Processors, pages 126–135. Prentice Hall, 1989.
Google Scholar
V. van Dongen and M. Petit. PRESAGE: A tool for the parallelization of nested loop programs. In L. J. M. Claesen, editor, Formal VLSI Specification and Synthesis (VLSI Design Methods-I), pages 341–359. North-Holland, 1990.
Google Scholar
M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, October 1991.
Google Scholar
M. Wolfe. Multiprocessor synchronization for concurrent loops. IEEE Software, pages 34–42, January 1988.
Google Scholar
M. Wolfe. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, 1989.
Google Scholar
Y. Wong and J. M. Delosme. Optimal systolic implementations of n-dimensional recurrences. In Proc. IEEE Int. Conf. on Computer Design (ICCD 85), pages 618–621. IEEE Press, 1985. Also: Technical Report 8810, Department of Computer Science, Yale University, 1988.
Google Scholar
J. Xue. Specifying control signals for systolic arrays by uniform recurrence equations. Parallel Processing Letters, 1(2):83–93, 1992.
Google Scholar
J. Xue and C. Lengauer. The synthesis of control signals for one-dimensional systolic arrays. INTEGRATION, 14:1–32, 1992.
Google Scholar
H. Zima. Supercompilers for Parallel and Vector Computers. Frontier Series. Addison-Wesley (ACM Press), 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Mathematik und Informatik, Universität Passau, D-94030, Passau, Germany
Christian Lengauer

Authors

Christian Lengauer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Eike Best

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lengauer, C. (1993). Loop parallelization in the polytope model. In: Best, E. (eds) CONCUR'93. CONCUR 1993. Lecture Notes in Computer Science, vol 715. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57208-2_28

Download citation

DOI: https://doi.org/10.1007/3-540-57208-2_28
Published: 27 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57208-4
Online ISBN: 978-3-540-47968-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics