Skip to main content

Loop parallelization in the polytope model

  • Invited Talk
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 715))

Abstract

During the course of the last decade, a mathematical model for the parallelization of FOR-loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is represented by a convex polytope in ℤr. The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in space-time. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speed-ups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of future parallelizing compilers. This paper provides an overview and future perspective of the polytope model and methods based on it.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proc. 3rd ACM SIGPLAN Symp. on Principles & Practice of Parallel Programming (PPoPP), pages 39–50. ACM Press, 1991.

    Google Scholar 

  2. M. Barnett and C. Lengauer. Unimodularity and the parallelization of loops. Parallel Processing Letters, 2(2–3):273–281, 1992.

    Article  Google Scholar 

  3. M. Barnett and C. Lengauer. Unimodularity considered non-essential (extended abstract). In L. Bougé, M. Cosnard, Y. Robert, and D. Trystram, editors, Parallel Processing: CONPAR 92-VAPP V, Lecture Notes in Computer Science 634, pages 659–664. Springer-Verlag, 1992.

    Google Scholar 

  4. M. Barnett and C. Lengauer. A systolizing compilation scheme for nested loops with linear bounds. In P. E. Lauer, R. Janicki, and J. Zucker, editors, Functional Programming, Concurrency, Simulation and Automated Reasoning (FPCSAR), Lecture Notes in Computer Science. Springer-Verlag, 1993. To appear.

    Google Scholar 

  5. J. Bu. Systematic Design of Regular VLSI Processor Arrays. PhD thesis, Department of Electrical Engineering, Delft University of Technology, May 1990.

    Google Scholar 

  6. J. Bu and E. F. Deprettere. Processor clustering for the design of optimal fixed-size systolic arrays. In E. F. Deprettere and A.-J. van der Veen, editors, Algorithms and Parallel VLSI-Architectures, volume A, pages 341–362. Elsevier (North-Holland), 1991.

    Google Scholar 

  7. P. R. Cappello. A processor-time-minimal systolic array for cubical mesh algorithms. IEEE Trans. on Parallel and Distributed Systems, 3(1):4–13, January 1992.

    Article  Google Scholar 

  8. P. R. Cappello and K. Steiglitz. Unifying VLSI array design with linear transformations of space-time. In F. P. Preparata, editor, Advances in Computing Research, Vol. 2: VLSI Theory, pages 23–65. JAI Press, 1984.

    Google Scholar 

  9. M. Chen, Y. Choo, and J. Li. Crystal: Theory and pragmatics of generating efficient parallel code. In B. K. Szymanski, editor, Parallel Functional Languages and Compilers, Frontier Series, chapter 7. ACM Press, 1991.

    Google Scholar 

  10. M. C. Chen. A design methodology for synthesizing parallel algorithms and architectures. J. Parallel and Distributed Computing, 3(4):461–491, 1986.

    Article  Google Scholar 

  11. M. C. Chen, Y. Choo, and J. Li. Compiling parallel programs by optimizing performance. J. Supercomputing, 2:171–207, 1988.

    Article  Google Scholar 

  12. P. Clauss, C. Mongenet, and G. R. Perrin. Calculus of space-optimal mappings of systolic algorithms on processor arrays. J. VLSI Signal Processing, 4(1):27–36, February 1992.

    Article  Google Scholar 

  13. A. Darte. Regular partitioning for synthesizing fixed-size systolic arrays. Integration, 12(3):293–304, December 1991.

    Google Scholar 

  14. E. W. Dijkstra and C. S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science. Springer-Verlag, 1990.

    Google Scholar 

  15. P. Feautrier. Parametric integer programming. Operations Research, 22(3):243–268, 1988.

    Google Scholar 

  16. P. Feautrier. Semantical analysis and mathematical programming. In M. Cosnard, Y. Robert, P. Quinton, and M. Raynal, editors, Parallel & Distributed Algorithms, pages 309–320. North-Holland, 1989.

    Google Scholar 

  17. P. Feautrier. Dataflow analysis of array and scalar references. Int. J. Parallel Programming, 20(1):23–53, February 1991.

    Google Scholar 

  18. M. R. Garey and D. S. Johnson. Computers and Intractability. Freeman, 1979.

    Google Scholar 

  19. P. Held and E. F. Deprettere. HiFi: From parallel algorithm to fixed-size VLSI processor array. In F. Catthoor and L. Svensson, editors, Application-Driven Architecture Synthesis, pages 71–92. Kluwer Academic Publishers, 1993.

    Google Scholar 

  20. C.-H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. In D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing. The MIT Press, 1990.

    Google Scholar 

  21. R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. J. ACM, 14(3):563–590, July 1967.

    Google Scholar 

  22. R. H. Kuhn. Optimization and Interconnection Complexity for Parallel Processors, Single-Stage Networks and Decision Trees. PhD thesis, University of Illinois at Urbana-Champaign, 1980.

    Google Scholar 

  23. H. T. Kung and C. E. Leiserson. Algorithms for VLSI processor arrays. In C. Mead and L. Conway, editors, Introduction to VLSI Systems, chapter 8.3. Addison-Wesley, 1980. Previously published as: Systolic arrays for VLSI, in SIAM Sparse Matrix Proceedings, 1978, 245–282.

    Google Scholar 

  24. L. Lamport. The parallel execution of DO loops. Comm. ACM, 17(2):83–93, February 1974.

    Google Scholar 

  25. H. Le Verge. Reduction operators in ALPHA. In D. Etiemble and J.-C. Syre, editors, Parallel Architectures and Languages Europe (PARLE '92), Lecture Notes in Computer Science 605, pages 397–410. Springer-Verlag, 1992.

    Google Scholar 

  26. H. Le Verge, C. Mauras, and P. Quinton. The ALPHA language and its use for the design of systolic arrays. J. VLSI Signal Processing, 3:173–182, 1991.

    Google Scholar 

  27. P. Lee and Z. Kedem. Synthesizing linear-array algorithms from nested for loop algorithms. IEEE Trans. on Computers, C-37(12):1578–1598, December 1988.

    Google Scholar 

  28. C. Lengauer and J. Xue. A systolic array for pyramidal algorithms. J. VLSI Signal Processing, 3(3):239–259, 1991.

    Google Scholar 

  29. J. Li and M. Chen. The data alignment phase in compiling programs for distributed memory machines. J. Parallel and Distributed Computing, 13(2):213–221, October 1991.

    Google Scholar 

  30. W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. Technical Report TR 92-1294, Department of Computer Science, Cornell University, July 1992.

    Google Scholar 

  31. W. L. Miranker and A. Winkler. Space-time representation of computational structures. Computing, pages 93–114, 1984.

    Google Scholar 

  32. D. I. Moldovan. On the design of algorithms for VLSI systolic arrays. Proc. IEEE, 71(1):113–120, January 1983.

    Google Scholar 

  33. D. I. Moldovan and J. A. B. Fortes. Partitioning and mapping algorithms into fixed-size systolic arrays. IEEE Trans. on Computers, C-35(1):1–12, January 1986.

    Google Scholar 

  34. G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Interscience Series in Discrete Mathematics and Optimization. Wiley & Sons, 1988.

    Google Scholar 

  35. D. D. Prest. Translation of abstract distributed programs to occam 2. 4th-Year Report, Department of Computer Science, University of Edinburgh, June 1992.

    Google Scholar 

  36. P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In Proc. 11th Ann. Int. Symp. on Computer Architecture, pages 208–214. IEEE Computer Society Press, 1984.

    Google Scholar 

  37. P. Quinton and Y. Robert. Systolic Algorithms and Architectures. Prentice-Hall, 1990.

    Google Scholar 

  38. P. Quinton and V. van Dongen. The mapping of linear recurrence equations on regular arrays. J. VLSI Signal Processing, 1(2):95–113, October 1989.

    Google Scholar 

  39. S. V. Rajopadhye. Algebraic transformations in systolic array synthesis: A case study. In L. J. M. Claesen, editor, Formal VLSI Specification and Synthesis (VLSI Design Methods-I), pages 361–370. North-Holland, 1990.

    Google Scholar 

  40. S. V. Rajopadhye and M. Muddarangegowda. Parallel assignment, reduction and communication. In Proc. SIAM Conference on Parallel Processing for Scientific Computing, pages 849–853. SIAM, 1993.

    Google Scholar 

  41. J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Trans, on Parallel and Distributed Systems, 2(4):472–482, 1991.

    Google Scholar 

  42. S. K. Rao. Regular Iterative Algorithms and their Implementations on Processor Arrays. PhD thesis, Department of Electrical Engineering, Stanford University, October 1985.

    Google Scholar 

  43. S. K. Rao and T. Kailath. Regular iterative algorithms and their implementations on processor arrays. Proc. IEEE, 76 (3):259–282, March 1988.

    Google Scholar 

  44. H. B. Ribas. Automatic Generation of Systolic Programs from Nested Loops. PhD thesis, Department of Computer Science, Carnegie-Mellon University, June 1990. Technical Report CMU-CS-90-143.

    Google Scholar 

  45. Y. Robert and S. W. Song. Revisiting cycle shrinking. Parallel Computing, 18:481–496, 1992.

    Google Scholar 

  46. V. Roychowdhury, L. Thiele, S. K. Rao, and T. Kailath. On the localization of algorithms for VLSI processor arrays. In R. Brodersen and H. Moscovitz, editors, VLSI Signal Processing III, pages 459–470. IEEE Press, 1988.

    Google Scholar 

  47. A. Schrijver. Theory of Linear and Integer Programming. Series in Discrete Mathematics. Wiley & Sons, 1986.

    Google Scholar 

  48. J. Teich and L. Thiele. Control generation in the design of processor arrays. J. VLSI Signal Processing, 3(1–2):77–92, 1991.

    Google Scholar 

  49. J. Teich and L. Thiele. Partitioning of processor arrays: A piecewise regular approach. INTEGRATION, 14(3):297–332, 1993.

    Google Scholar 

  50. L. Thiele. CAD for signal processing architectures. In P. Dewilde, editor, The State of the Art in Computer Systems and Software Engineering, pages 101–151. Kluwer Academic Publishers, 1992.

    Google Scholar 

  51. A. van der Hoeven. Concepts and Implementation of a Design System for Digital Signal Processing. PhD thesis, Department of Electrical Engineering, Delft University of Technology, October 1992.

    Google Scholar 

  52. V. van Dongen. Quasi-regular arrays: Definition and design methodology. In J. V. McCanny, J. McWhirter, and E. E. Swartzlander, editors, Systolic Array Processors, pages 126–135. Prentice Hall, 1989.

    Google Scholar 

  53. V. van Dongen and M. Petit. PRESAGE: A tool for the parallelization of nested loop programs. In L. J. M. Claesen, editor, Formal VLSI Specification and Synthesis (VLSI Design Methods-I), pages 341–359. North-Holland, 1990.

    Google Scholar 

  54. M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, October 1991.

    Google Scholar 

  55. M. Wolfe. Multiprocessor synchronization for concurrent loops. IEEE Software, pages 34–42, January 1988.

    Google Scholar 

  56. M. Wolfe. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, 1989.

    Google Scholar 

  57. Y. Wong and J. M. Delosme. Optimal systolic implementations of n-dimensional recurrences. In Proc. IEEE Int. Conf. on Computer Design (ICCD 85), pages 618–621. IEEE Press, 1985. Also: Technical Report 8810, Department of Computer Science, Yale University, 1988.

    Google Scholar 

  58. J. Xue. Specifying control signals for systolic arrays by uniform recurrence equations. Parallel Processing Letters, 1(2):83–93, 1992.

    Google Scholar 

  59. J. Xue and C. Lengauer. The synthesis of control signals for one-dimensional systolic arrays. INTEGRATION, 14:1–32, 1992.

    Google Scholar 

  60. H. Zima. Supercompilers for Parallel and Vector Computers. Frontier Series. Addison-Wesley (ACM Press), 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eike Best

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lengauer, C. (1993). Loop parallelization in the polytope model. In: Best, E. (eds) CONCUR'93. CONCUR 1993. Lecture Notes in Computer Science, vol 715. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57208-2_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-57208-2_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57208-4

  • Online ISBN: 978-3-540-47968-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics