Skip to main content

The Static Parallelization of Loops and Recursions

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We demonstrate approaches to the static parallelization of loops and recursions on the example of the polynomial product. Phrased as a loop nest, the polynomial product can be parallelized automatically by applying a space-time mapping technique based on linear algebra and linear programming. One can choose a parallel program that is optimal with respect to some objective function like the number of execution steps, processors, channels, etc. However,at best,linear execution time complexity can be atained. Through phrasing the polynomial product as a divide-and-conquer recursion, one can obtain a parallel program with sublinear execution time. In this case, the target program is not derived by an automatic search but given as a program skeleton, which can be deduced by a sequence of equational program transformations. We discuss the use of such skeletons, compare and assess the models in which loops and divide-and-conquer resursions are parallelized and comment on the performance properties of the resulting parallel implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms. Series in Computer Science and Information Processing. Addison-Wesley, Reading, MA, 1974.

    Google Scholar 

  2. C. Ancourt, F. Coelho, B. Creusillet, F. Irigoin, P. Jouvelot, and R. Keryell. PIPS: A framework for building interprocedural compilers, parallelizers and optimizers. Technical Report A/289. Centre de Recherche en Informatique, Ecole des Mines de Paris, April, 1996. WWW: http://www.cri.ensmp.fr/~pips/.

    Google Scholar 

  3. D.G. Baltus and J. Allen. Efficient exploration of nonuniform space-time transformations for optimal systolic array synthesis. In L. Dadda and B. Wah, eds, Proc. Int. Conf. Application Specific Array Processors (ASAP'93), pp. 428–441. IEEE Computer Society Press. Los Alamitos, CA, 1993.

    Google Scholar 

  4. U. Banerjee. Loop Transformations for Restructuring Compilers: The Foundations. Series on Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, 1993.

  5. U. Banerjee. Loop Parallelization. Series on Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, 1994.

  6. U. Banerjee. Dependence Analysis. Series on Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, 1997.

  7. R.S. Bird. Lectures on constructive functional programming. In M. Broy, ed., Constructive Methods in Computing Science, NATO ASI Series F: Computer and Systems Sciences, Vol. 55, pp. 151–216, Springer-Verlag, 1988.

  8. G. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38:1526–1538, 1989.

    Google Scholar 

  9. P. Boulet, M. Dijon, E. Lequiniou, and T. Risset. Reference manual of the Bouclettes parallelizer. Technical Report 94-04, Laboratoire de l'Informatique du Parallélisme. Ecole Normale Supérieure de Lyon, 1994. WWW: http://www.prism.uvsq.fr/public/bop/base/bclt/ bouclettes.html.

  10. M.I. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation, Research Monographs in Parallel and Distributed Computing, Pitman, 1989.

  11. M.I. Cole. Parallel programming with list homomorphisms. Parallel Processing Letters, 5:191–204, 1995.

    Google Scholar 

  12. M.I. Cole, S. Gorlatch, C. Lengauer, and D.B. Skillicorn, eds. Theory and practice of higher-order parallel programming. Technical Report 169. Schloβ Dagstuhl, 1997.

  13. J.-F. Collard. Automatic parallelization of while-loops using speculative execution. Int. J. Parallel Programming, 23:191–219, 1995.

    Google Scholar 

  14. A. Darte. Regular partitioning for synthesizing fixed-size systolic arrays. INTEGRATION, 12:293–304, 1991.

    Google Scholar 

  15. E.W. Dijkstra and C.S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science, Springer-Verlag, 1990.

  16. P. Feautrier. Array expansion. In Proc. Int. Conf. on Supercomputing, pp. 429–441, ACM Press, 1988.

  17. P. Feautrier. Automatic parallelization in the polytope model. In G.-R. Perrin and A. Darte, eds., The Data Parallel Programming Model. Lecture Notes in Computer Science 1132, pp. 79–103, Springer-Verlag, 1996.

  18. J. Gibbons. Upwards and downwards accumulations on trees. In R. Bird, C. Morgan, and J. Woodcock, eds., Mathematics of Program Construction, Lecture Notes in Computer Science 669, pp. 122–138, Springer-Verlag, 1992.

  19. S. Gorlatch. From transformations to methodology in parallel program development: a case study. Microprocessing and Microprogramming, 41:571–588, 1996.

    Google Scholar 

  20. S. Gorlatch. Systematic efficient parallelization of scan and other list homomorphisms. In L. Bougé, P. Fraigniaud, A. Mignotte, and Y. Robert, eds., Euro-Par'96, Lecture Notes in Computer Science 1124, pp. 401–408, Springer-Verlag, 1996.

  21. S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, eds., Programming Languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pp. 274–288, Springer-Verlag, 1996c.

  22. S. Gorlatch and H. Bischof. Formal derivation of divide-and-conquer programs: A case study in the multidimensional FFT's. In D. Mery, ed., Proc. 2nd Int. Workshop on Formal Methods for Parallel Programming: Theory and Applications (FMPPTA'97), pp. 80–94, IEEE Computer Society, 1997.

  23. M. Griebl. The Mechanical Parallelization of Loop Nests Containing while Loops. PhD thesis, Fakultät für Mathematik und Informatik, Universität Passau. Technical Report MIP-9701, 1996.

  24. M. Griebl and C. Lengauer. Classifying loops for space-time mapping. In L. Bougé, P. Fraigniaud, A. Mignotte, and Y. Robert, eds., Euro-Par'96, Vol. I, Lecture Notes in Computer Science 1123, pp. 467–474, Springer-Verlag, 1996.

  25. M. Griebl and C. Lengauer. The loop parallelizer LooPo. In M. Gerndt, ed., Proc. Sixth Workshop on Compilers for Parallel Computers, volume 21 of Konferenzen des Forschungszentrums Jülich, pp. 311–320, Forschungszentrum Jülich, 1996. WWW: http://www.uni-passau.de/~loopo/.

  26. C.A. Herrmann and C. Lengauer. On the space-time mapping of a class of divide-and-conquer recursions. Parallel Processing Letters, 6:525–537, 1996.

    Google Scholar 

  27. C.A. Herrmann and C. Lengauer. Transformation of divide -- conquer to nested parallel loops. In H. Glaser, P. Hartel, and H. Kuchen, eds., Programming Languages: Implementation, Logics, and Programs (PLILP'97), Lecture Notes in Computer Science 1292, pp. 95–109, Springer-Verlag, 1997.

  28. V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms, Benjamin/Cummings, 1994.

  29. C. Lengauer. Aview of systolic design. In N.N. Mirenkov, ed., Parallel Computing Technologies (PaCT-91), pp. 32–46, World Scientific, 1991.

  30. C. Lengauer. Loop parallelization in the polytope model. In E. Best, ed., CONCUR'93, Lecture Notes in Computer Science 715, pp. 398–416, Springer-Verlag, 1993.

  31. C. Lengauer and M. Griebl. On the parallelization of loop nests containing while loops. In N.N. Mirenkov, Q.-P. Gu, S. Peng, and S. Sedukhin, eds., Proc. 1st Aizu Int. Symp. on Parallel Algorithm/Architecture Synthesis (pAs'95), pp. 10–18, IEEE Computer Society Press, 1995.

  32. C. Lengauer, L. Thiele, M. Wolfe, and H. Zima, eds. Loop parallelization. Technical Report 142, Schloβ Dagstuhl, 1996.

  33. V. Loechner and C. Mongenet. OPERA:Atoolbox for loop parallelization. In I. Jelly, I. Gorton, and P. Croll, eds., Proc. 1st Int. Workshop on Software Engineering for Parallel and Distributed Systems, pp. 134–145. Chapman -- Hall, 1996. WWW: http://icps.u-strasbg.fr/opera/.

  34. R. Miller. A Constructive Theory of Multidimensional Arrays. PhD thesis, Programming Research Group, Oxford University, 1993.

  35. Z.G. Mou and P. Hudak. An algebraic model for divide-and-conquer algorithms and its parallelism. Journal of Supercomputing, 2:257–278, 1988.

    Google Scholar 

  36. M.J. Quinn. Parallel Computing, McGraw-Hill, 1994.

  37. P. Quinton and Y. Robert. Systolic Algorithms and Architectures, Prentice-Hall, 1990.

  38. R. Sedgewick. Algorithms, Addison-Wesley, 1988.

  39. D.B. Skillicorn. Foundations of Parallel Programming, Cambridge International Series on Parallel Computation, Cambridge University Press, 1994.

  40. D. Smith. Applications of a strategy for designing divide-and-conquer algorithms. Science of Computer Programming, 8:213–229, 1987.

    Google Scholar 

  41. J. Teich and L. Thiele. Control generation in the design of processor arrays. J. VLSI Signal Processing, 3:77–92, 1991.

    Google Scholar 

  42. S. Thompson. Haskell --The Craft of Functional Programming, International Computer Science Series, Addison-Wesley, 1996.

  43. C. Wedler and C. Lengauer. Parallel implementations of combinations of broadcast, reduction and scan. In Proc. 2nd Int. Workshop on Software Engineering for Parallel and Distributed Systems (PDSE'97), pp. 108–119, IEEE Computer Society Press, 1997.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lengauer, C., Gorlatch, S. & Herrmann, C. The Static Parallelization of Loops and Recursions. The Journal of Supercomputing 11, 333–353 (1997). https://doi.org/10.1023/A:1007904422322

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007904422322