Skip to main content
Log in

Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

This paper presents an optimal algorithm for detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result generalizes, to the case of several statements. Wolf and Lam's algorithm which is optimal for a single statement. Our algorithm relies on a dependence uniformization process and on parallelization techniques related to system of uniform recurrence equations. It can also be viewed as a combination of both Allen and Kennedy's algorithm and Wolf and Lam's algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. David F. Bacon, Susan L. Graham, and Oliver J. Sharp, Compiler Transformations for High-Performance Computing ACM Computing Surveys 26(4):345–420 (1994).

    Google Scholar 

  2. John R. Allen and Ken Kennedy, Automatic Translation of Fortran Programs to Vector Form, ACM Trans. Program. Lang. Sys. 9(4): 491–542 (October 1987).

    Google Scholar 

  3. Utpal Banerjee, A Theory of Loop Permutations, in D. Gelernter, A. Nicolau, and D. Padua, (eds.), Languages and Compilers for Parallel Computing, MIT Press, (1990).

  4. Michael E. Wolf and Monica S. Lam, A Loop Transformation Theory and an Algorithm to Maximize Parallelism, IEEE Trans. Parallel Distribut. Syst. 2(4):452–471 (October 1991).

    Google Scholar 

  5. Wayne Kelly and William Pugh, A Framework for Unifying Reordering Transformations, Technical Report CS-TR-3193, University of Maryland (April 1993).

  6. Paul Feautrier, Some Efficient Solutions to the Affine Scheduling Problem, Part II: Multi-Dimensional Time, IJPP 21(6): 389–420 (December 1992).

    Google Scholar 

  7. R. M. Karp, R. E. Miller, and S. Winograd, The Organization of Computations for Uniform Recurrence Equations, J. ACM 14(3): 563–590 (July 1967).

    Google Scholar 

  8. Alain Darte and Frédéric Vivien, A Classification of Nested Loops Parallelization Algorithms. INRIA-IEEE Symp. on Emerging Technologies and Factory Automation IEEE Computer Society Press, pp. 217–224 (1995). Will also appear in PPL, Special issue (1997).

  9. Pierre-Yves Calland, Alain Darte, Yves Robert, and Frédéric Vivien, Plugging Anti and Output Dependence Removal Techniques into Loop Parallelization Algorithms. Parallel Computing 23(1, 2):251–266 (1997).

    Google Scholar 

  10. Alain Darte, Georges-André Silber, and Frédéric Vivien, Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling, Parallel Processing Letters (1997). Special issue, to appear. Also available as Technical Report LIP, ENS-Lyon, RR96-34.

  11. Wolfgang Meisl, Practical Methods for Scheduling and Allocation in the Polytope Model, World Wide Web document, URL:http: brahms.fmi.uni-passau.de cl loopo doc.

  12. Leslie Lamport, The Parallel Execution of DO Loops. Commun. ACM 17(2):83–93. (February 1974).

    Google Scholar 

  13. Alain Darte and Yves Robert, Constructive Methods for Scheduling Uniform Loop Nests. IEEE Trans. Parallel Distribut. Syst. 5(8):814–822 (1994).

    Google Scholar 

  14. Alain Darte and Yves Robert, Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric Domains. J. Parallel and Distributed Computing 29:43–59 (1995).

    Google Scholar 

  15. Paul Feautrier, Some Efficient Solutions to the Affine Scheduling Problem. Part I: One-Dimensional Time. IJPP 21(5): 313–348 (October 1992).

    Google Scholar 

  16. Amy W. Lim and Monica S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Transforms, Proc. 24th Ann. ACM SIGPLAN-SIGACT Symp. Principles of Progr. Lang. (January 1997).

  17. Alain Darte, Leonid Khachiyan, and Yves Robert, Linear Scheduling is Nearly Optimal. Parallel Processing Letters 1(2): 73–81 (1991).

    Google Scholar 

  18. Patrick Le Gouëslier d'Argence, An Asymptotically Optimal Affine Schedule on Bounded Convex Polyhedric Domains. Proc. Euro-Par '96 Parallel Processing. Vol. 1124 of LNCS. Springer-Verlag (August 1996).

  19. Paul Feautrier, Dataflow Analysis of Array and Scalar References. Int. JPP 20(1):23–51 (1991).

    Google Scholar 

  20. Jean-François Collard, Denis Barthou, and Paul Feautrier. Fuzzy Array Datallow Analysis. Proc. 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. Santa Barbara, California (July 1995).

  21. Alain Darte and Frédéric Vivien, On the Optimality of Allen and Kennedy's Algorithm for Parallelism Extraction in Nested Loops. Journal of Parallel Algorithms and Applications 12(1–3):83–112 (1997). Special issue on Optimizing Compilers for Parallel Languages.

    Google Scholar 

  22. Alain Darte and Frédéric Vivien, Revisiting the Decomposition of Karp, Miller, and Winograd. Parallel Processing Letters 5(4):551–562 (December 1995).

    Google Scholar 

  23. Gene H. Golub and Charles F. Van Loan, Matrix Computations. Johns Hopkins, Second Edition (1989).

  24. Jack J. Dongarra and Stanley C. Eisenstat, lud. World Wide Web document. URL:http://netlib.bell-labs.com/netlib/benchmark/index. html.

  25. W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott, New User Interface for Petit and Other Interfaces: User Guide. University of Maryland (June 1995).

  26. Arthur J. Bernstein. Analysis of Programs for Parallel Processing. IEEE Trans. Electronic Computers 15:757–762 (October 1966).

    Google Scholar 

  27. John R. Allen and Ken Kennedy, PFC: A program to convert Fortran to Parallel Form. Technical Report MASC-TR82-6, Rice University, Houston, Texas. (1982).

    Google Scholar 

  28. Michael Wolfe, Optimizing Supercompilers for Supercomputers Ph.D. Thesis. Department of Computer Science, University of Illinois at Urbana-Champaign (October 1982).

  29. Michael Wolfe, Optimizing Supercompilers for Supercomputers, MIT Press, Cambridge Massachusetts (1989).

    Google Scholar 

  30. François Irigoin and Rémy Triolet, Computing Dependence Direction Vectors and Dependence Cones with Linear Systems, Technical Report ENSMP-CAI-87-E94, École des Mines de Paris, Fontainebleau, France (1987).

    Google Scholar 

  31. François Irigoin and Rémy Triolet, Supernode Partitioning, Proc 15th Ann. ACM Symp. Principles of Progr. Lang., San Diego, California, pp. 319–329 (January 1988).

  32. François Irigoin, Pierre Jouvelot, and Rémy Triolet, Semantical Interprocedural Parallelization: An overview of the PIPS Project, Proc. ACM Int. Conf. Supercomputing, Cologne, Germany (June 1991).

  33. Alexander Schrijver, Theory of Linear and Integer Programming, John Wiley and Sons, New York (1986).

    Google Scholar 

  34. François Irigoin and Rémy Triolet, Dependence Approximation and Global Parallel Code Generation for Nested Loops, Proc. Int. Workshop on Parallel and Distributed Algorithms (October 1988).

  35. Michael Wolfe, TINY, a Loop Restructuring Research Tool, Oregon Graduate Institute of Science and Technology (December 1990).

  36. Alain Darte and Frédéric Vivien, Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs, Technical Report 96-06, LIP. ENS-Lyon, France (April 1996).

    Google Scholar 

  37. Sailesh K. Rao, Regular Iterative Algorithms and their Implementations on Processor Arrays, Ph.D. Thesis, Stanford University (October 1985).

  38. Vwani P. Roychowdhury, Derivation, Extensions and Parallel Implementation of Regular Iterative Algorithms, Ph.D. Thesis, Stanford University, December 1988.

  39. S. Rao Kosaraju and Gregory F. Sullivan, Detecting Cycles in Dynamic Graphs in Polynomial Time (preliminary version), Proc. 20 th Ann. ACM Sympos. Theory of Computing, pp. 398–406 (May 1988).

  40. Alain Darte and Frédéric Vivien, Automatic Parallelization based on Multi-Dimensional Scheduling. Technical Report 94-24, LIP. ENS-Lyon, France (September 1994).

    Google Scholar 

  41. M. Gondran and M. Minoux, Graphs and Algorithms. John Wiley and Sons (1984).

  42. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, Introduction to Algorithms, MIT Press (1990).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Darte, A., Vivien, F. Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs. International Journal of Parallel Programming 25, 447–496 (1997). https://doi.org/10.1023/A:1025168022993

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025168022993

Navigation