Abstract
During the course of the last decade, a mathematical model for the parallelization of FOR-loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is represented by a convex polytope in ℤr. The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in space-time. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speed-ups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of future parallelizing compilers. This paper provides an overview and future perspective of the polytope model and methods based on it.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proc. 3rd ACM SIGPLAN Symp. on Principles & Practice of Parallel Programming (PPoPP), pages 39–50. ACM Press, 1991.
M. Barnett and C. Lengauer. Unimodularity and the parallelization of loops. Parallel Processing Letters, 2(2–3):273–281, 1992.
M. Barnett and C. Lengauer. Unimodularity considered non-essential (extended abstract). In L. Bougé, M. Cosnard, Y. Robert, and D. Trystram, editors, Parallel Processing: CONPAR 92-VAPP V, Lecture Notes in Computer Science 634, pages 659–664. Springer-Verlag, 1992.
M. Barnett and C. Lengauer. A systolizing compilation scheme for nested loops with linear bounds. In P. E. Lauer, R. Janicki, and J. Zucker, editors, Functional Programming, Concurrency, Simulation and Automated Reasoning (FPCSAR), Lecture Notes in Computer Science. Springer-Verlag, 1993. To appear.
J. Bu. Systematic Design of Regular VLSI Processor Arrays. PhD thesis, Department of Electrical Engineering, Delft University of Technology, May 1990.
J. Bu and E. F. Deprettere. Processor clustering for the design of optimal fixed-size systolic arrays. In E. F. Deprettere and A.-J. van der Veen, editors, Algorithms and Parallel VLSI-Architectures, volume A, pages 341–362. Elsevier (North-Holland), 1991.
P. R. Cappello. A processor-time-minimal systolic array for cubical mesh algorithms. IEEE Trans. on Parallel and Distributed Systems, 3(1):4–13, January 1992.
P. R. Cappello and K. Steiglitz. Unifying VLSI array design with linear transformations of space-time. In F. P. Preparata, editor, Advances in Computing Research, Vol. 2: VLSI Theory, pages 23–65. JAI Press, 1984.
M. Chen, Y. Choo, and J. Li. Crystal: Theory and pragmatics of generating efficient parallel code. In B. K. Szymanski, editor, Parallel Functional Languages and Compilers, Frontier Series, chapter 7. ACM Press, 1991.
M. C. Chen. A design methodology for synthesizing parallel algorithms and architectures. J. Parallel and Distributed Computing, 3(4):461–491, 1986.
M. C. Chen, Y. Choo, and J. Li. Compiling parallel programs by optimizing performance. J. Supercomputing, 2:171–207, 1988.
P. Clauss, C. Mongenet, and G. R. Perrin. Calculus of space-optimal mappings of systolic algorithms on processor arrays. J. VLSI Signal Processing, 4(1):27–36, February 1992.
A. Darte. Regular partitioning for synthesizing fixed-size systolic arrays. Integration, 12(3):293–304, December 1991.
E. W. Dijkstra and C. S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science. Springer-Verlag, 1990.
P. Feautrier. Parametric integer programming. Operations Research, 22(3):243–268, 1988.
P. Feautrier. Semantical analysis and mathematical programming. In M. Cosnard, Y. Robert, P. Quinton, and M. Raynal, editors, Parallel & Distributed Algorithms, pages 309–320. North-Holland, 1989.
P. Feautrier. Dataflow analysis of array and scalar references. Int. J. Parallel Programming, 20(1):23–53, February 1991.
M. R. Garey and D. S. Johnson. Computers and Intractability. Freeman, 1979.
P. Held and E. F. Deprettere. HiFi: From parallel algorithm to fixed-size VLSI processor array. In F. Catthoor and L. Svensson, editors, Application-Driven Architecture Synthesis, pages 71–92. Kluwer Academic Publishers, 1993.
C.-H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. In D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing. The MIT Press, 1990.
R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. J. ACM, 14(3):563–590, July 1967.
R. H. Kuhn. Optimization and Interconnection Complexity for Parallel Processors, Single-Stage Networks and Decision Trees. PhD thesis, University of Illinois at Urbana-Champaign, 1980.
H. T. Kung and C. E. Leiserson. Algorithms for VLSI processor arrays. In C. Mead and L. Conway, editors, Introduction to VLSI Systems, chapter 8.3. Addison-Wesley, 1980. Previously published as: Systolic arrays for VLSI, in SIAM Sparse Matrix Proceedings, 1978, 245–282.
L. Lamport. The parallel execution of DO loops. Comm. ACM, 17(2):83–93, February 1974.
H. Le Verge. Reduction operators in ALPHA. In D. Etiemble and J.-C. Syre, editors, Parallel Architectures and Languages Europe (PARLE '92), Lecture Notes in Computer Science 605, pages 397–410. Springer-Verlag, 1992.
H. Le Verge, C. Mauras, and P. Quinton. The ALPHA language and its use for the design of systolic arrays. J. VLSI Signal Processing, 3:173–182, 1991.
P. Lee and Z. Kedem. Synthesizing linear-array algorithms from nested for loop algorithms. IEEE Trans. on Computers, C-37(12):1578–1598, December 1988.
C. Lengauer and J. Xue. A systolic array for pyramidal algorithms. J. VLSI Signal Processing, 3(3):239–259, 1991.
J. Li and M. Chen. The data alignment phase in compiling programs for distributed memory machines. J. Parallel and Distributed Computing, 13(2):213–221, October 1991.
W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. Technical Report TR 92-1294, Department of Computer Science, Cornell University, July 1992.
W. L. Miranker and A. Winkler. Space-time representation of computational structures. Computing, pages 93–114, 1984.
D. I. Moldovan. On the design of algorithms for VLSI systolic arrays. Proc. IEEE, 71(1):113–120, January 1983.
D. I. Moldovan and J. A. B. Fortes. Partitioning and mapping algorithms into fixed-size systolic arrays. IEEE Trans. on Computers, C-35(1):1–12, January 1986.
G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Interscience Series in Discrete Mathematics and Optimization. Wiley & Sons, 1988.
D. D. Prest. Translation of abstract distributed programs to occam 2. 4th-Year Report, Department of Computer Science, University of Edinburgh, June 1992.
P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In Proc. 11th Ann. Int. Symp. on Computer Architecture, pages 208–214. IEEE Computer Society Press, 1984.
P. Quinton and Y. Robert. Systolic Algorithms and Architectures. Prentice-Hall, 1990.
P. Quinton and V. van Dongen. The mapping of linear recurrence equations on regular arrays. J. VLSI Signal Processing, 1(2):95–113, October 1989.
S. V. Rajopadhye. Algebraic transformations in systolic array synthesis: A case study. In L. J. M. Claesen, editor, Formal VLSI Specification and Synthesis (VLSI Design Methods-I), pages 361–370. North-Holland, 1990.
S. V. Rajopadhye and M. Muddarangegowda. Parallel assignment, reduction and communication. In Proc. SIAM Conference on Parallel Processing for Scientific Computing, pages 849–853. SIAM, 1993.
J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Trans, on Parallel and Distributed Systems, 2(4):472–482, 1991.
S. K. Rao. Regular Iterative Algorithms and their Implementations on Processor Arrays. PhD thesis, Department of Electrical Engineering, Stanford University, October 1985.
S. K. Rao and T. Kailath. Regular iterative algorithms and their implementations on processor arrays. Proc. IEEE, 76 (3):259–282, March 1988.
H. B. Ribas. Automatic Generation of Systolic Programs from Nested Loops. PhD thesis, Department of Computer Science, Carnegie-Mellon University, June 1990. Technical Report CMU-CS-90-143.
Y. Robert and S. W. Song. Revisiting cycle shrinking. Parallel Computing, 18:481–496, 1992.
V. Roychowdhury, L. Thiele, S. K. Rao, and T. Kailath. On the localization of algorithms for VLSI processor arrays. In R. Brodersen and H. Moscovitz, editors, VLSI Signal Processing III, pages 459–470. IEEE Press, 1988.
A. Schrijver. Theory of Linear and Integer Programming. Series in Discrete Mathematics. Wiley & Sons, 1986.
J. Teich and L. Thiele. Control generation in the design of processor arrays. J. VLSI Signal Processing, 3(1–2):77–92, 1991.
J. Teich and L. Thiele. Partitioning of processor arrays: A piecewise regular approach. INTEGRATION, 14(3):297–332, 1993.
L. Thiele. CAD for signal processing architectures. In P. Dewilde, editor, The State of the Art in Computer Systems and Software Engineering, pages 101–151. Kluwer Academic Publishers, 1992.
A. van der Hoeven. Concepts and Implementation of a Design System for Digital Signal Processing. PhD thesis, Department of Electrical Engineering, Delft University of Technology, October 1992.
V. van Dongen. Quasi-regular arrays: Definition and design methodology. In J. V. McCanny, J. McWhirter, and E. E. Swartzlander, editors, Systolic Array Processors, pages 126–135. Prentice Hall, 1989.
V. van Dongen and M. Petit. PRESAGE: A tool for the parallelization of nested loop programs. In L. J. M. Claesen, editor, Formal VLSI Specification and Synthesis (VLSI Design Methods-I), pages 341–359. North-Holland, 1990.
M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, October 1991.
M. Wolfe. Multiprocessor synchronization for concurrent loops. IEEE Software, pages 34–42, January 1988.
M. Wolfe. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, 1989.
Y. Wong and J. M. Delosme. Optimal systolic implementations of n-dimensional recurrences. In Proc. IEEE Int. Conf. on Computer Design (ICCD 85), pages 618–621. IEEE Press, 1985. Also: Technical Report 8810, Department of Computer Science, Yale University, 1988.
J. Xue. Specifying control signals for systolic arrays by uniform recurrence equations. Parallel Processing Letters, 1(2):83–93, 1992.
J. Xue and C. Lengauer. The synthesis of control signals for one-dimensional systolic arrays. INTEGRATION, 14:1–32, 1992.
H. Zima. Supercompilers for Parallel and Vector Computers. Frontier Series. Addison-Wesley (ACM Press), 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lengauer, C. (1993). Loop parallelization in the polytope model. In: Best, E. (eds) CONCUR'93. CONCUR 1993. Lecture Notes in Computer Science, vol 715. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57208-2_28
Download citation
DOI: https://doi.org/10.1007/3-540-57208-2_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57208-4
Online ISBN: 978-3-540-47968-0
eBook Packages: Springer Book Archive