Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

Darte, Alain; Vivien, Frédéric

doi:10.1023/A:1025168022993

Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

Published: December 1997

Volume 25, pages 447–496, (1997)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Alain Darte¹ &
Frédéric Vivien¹

141 Accesses
24 Citations
Explore all metrics

Abstract

This paper presents an optimal algorithm for detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result generalizes, to the case of several statements. Wolf and Lam's algorithm which is optimal for a single statement. Our algorithm relies on a dependence uniformization process and on parallelization techniques related to system of uniform recurrence equations. It can also be viewed as a combination of both Allen and Kennedy's algorithm and Wolf and Lam's algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Article 26 March 2022

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Efficient High-Level Programming in Plain Java

Article 05 December 2022

REFERENCES

David F. Bacon, Susan L. Graham, and Oliver J. Sharp, Compiler Transformations for High-Performance Computing ACM Computing Surveys 26(4):345–420 (1994).
Google Scholar
John R. Allen and Ken Kennedy, Automatic Translation of Fortran Programs to Vector Form, ACM Trans. Program. Lang. Sys. 9(4): 491–542 (October 1987).
Google Scholar
Utpal Banerjee, A Theory of Loop Permutations, in D. Gelernter, A. Nicolau, and D. Padua, (eds.), Languages and Compilers for Parallel Computing, MIT Press, (1990).
Michael E. Wolf and Monica S. Lam, A Loop Transformation Theory and an Algorithm to Maximize Parallelism, IEEE Trans. Parallel Distribut. Syst. 2(4):452–471 (October 1991).
Google Scholar
Wayne Kelly and William Pugh, A Framework for Unifying Reordering Transformations, Technical Report CS-TR-3193, University of Maryland (April 1993).
Paul Feautrier, Some Efficient Solutions to the Affine Scheduling Problem, Part II: Multi-Dimensional Time, IJPP 21(6): 389–420 (December 1992).
Google Scholar
R. M. Karp, R. E. Miller, and S. Winograd, The Organization of Computations for Uniform Recurrence Equations, J. ACM 14(3): 563–590 (July 1967).
Google Scholar
Alain Darte and Frédéric Vivien, A Classification of Nested Loops Parallelization Algorithms. INRIA-IEEE Symp. on Emerging Technologies and Factory Automation IEEE Computer Society Press, pp. 217–224 (1995). Will also appear in PPL, Special issue (1997).
Pierre-Yves Calland, Alain Darte, Yves Robert, and Frédéric Vivien, Plugging Anti and Output Dependence Removal Techniques into Loop Parallelization Algorithms. Parallel Computing 23(1, 2):251–266 (1997).
Google Scholar
Alain Darte, Georges-André Silber, and Frédéric Vivien, Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling, Parallel Processing Letters (1997). Special issue, to appear. Also available as Technical Report LIP, ENS-Lyon, RR96-34.
Wolfgang Meisl, Practical Methods for Scheduling and Allocation in the Polytope Model, World Wide Web document, URL:http: brahms.fmi.uni-passau.de cl loopo doc.
Leslie Lamport, The Parallel Execution of DO Loops. Commun. ACM 17(2):83–93. (February 1974).
Google Scholar
Alain Darte and Yves Robert, Constructive Methods for Scheduling Uniform Loop Nests. IEEE Trans. Parallel Distribut. Syst. 5(8):814–822 (1994).
Google Scholar
Alain Darte and Yves Robert, Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric Domains. J. Parallel and Distributed Computing 29:43–59 (1995).
Google Scholar
Paul Feautrier, Some Efficient Solutions to the Affine Scheduling Problem. Part I: One-Dimensional Time. IJPP 21(5): 313–348 (October 1992).
Google Scholar
Amy W. Lim and Monica S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Transforms, Proc. 24th Ann. ACM SIGPLAN-SIGACT Symp. Principles of Progr. Lang. (January 1997).
Alain Darte, Leonid Khachiyan, and Yves Robert, Linear Scheduling is Nearly Optimal. Parallel Processing Letters 1(2): 73–81 (1991).
Google Scholar
Patrick Le Gouëslier d'Argence, An Asymptotically Optimal Affine Schedule on Bounded Convex Polyhedric Domains. Proc. Euro-Par '96 Parallel Processing. Vol. 1124 of LNCS. Springer-Verlag (August 1996).
Paul Feautrier, Dataflow Analysis of Array and Scalar References. Int. JPP 20(1):23–51 (1991).
Google Scholar
Jean-François Collard, Denis Barthou, and Paul Feautrier. Fuzzy Array Datallow Analysis. Proc. 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. Santa Barbara, California (July 1995).
Alain Darte and Frédéric Vivien, On the Optimality of Allen and Kennedy's Algorithm for Parallelism Extraction in Nested Loops. Journal of Parallel Algorithms and Applications 12(1–3):83–112 (1997). Special issue on Optimizing Compilers for Parallel Languages.
Google Scholar
Alain Darte and Frédéric Vivien, Revisiting the Decomposition of Karp, Miller, and Winograd. Parallel Processing Letters 5(4):551–562 (December 1995).
Google Scholar
Gene H. Golub and Charles F. Van Loan, Matrix Computations. Johns Hopkins, Second Edition (1989).
Jack J. Dongarra and Stanley C. Eisenstat, lud. World Wide Web document. URL:http://netlib.bell-labs.com/netlib/benchmark/index. html.
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott, New User Interface for Petit and Other Interfaces: User Guide. University of Maryland (June 1995).
Arthur J. Bernstein. Analysis of Programs for Parallel Processing. IEEE Trans. Electronic Computers 15:757–762 (October 1966).
Google Scholar
John R. Allen and Ken Kennedy, PFC: A program to convert Fortran to Parallel Form. Technical Report MASC-TR82-6, Rice University, Houston, Texas. (1982).
Google Scholar
Michael Wolfe, Optimizing Supercompilers for Supercomputers Ph.D. Thesis. Department of Computer Science, University of Illinois at Urbana-Champaign (October 1982).
Michael Wolfe, Optimizing Supercompilers for Supercomputers, MIT Press, Cambridge Massachusetts (1989).
Google Scholar
François Irigoin and Rémy Triolet, Computing Dependence Direction Vectors and Dependence Cones with Linear Systems, Technical Report ENSMP-CAI-87-E94, École des Mines de Paris, Fontainebleau, France (1987).
Google Scholar
François Irigoin and Rémy Triolet, Supernode Partitioning, Proc 15th Ann. ACM Symp. Principles of Progr. Lang., San Diego, California, pp. 319–329 (January 1988).
François Irigoin, Pierre Jouvelot, and Rémy Triolet, Semantical Interprocedural Parallelization: An overview of the PIPS Project, Proc. ACM Int. Conf. Supercomputing, Cologne, Germany (June 1991).
Alexander Schrijver, Theory of Linear and Integer Programming, John Wiley and Sons, New York (1986).
Google Scholar
François Irigoin and Rémy Triolet, Dependence Approximation and Global Parallel Code Generation for Nested Loops, Proc. Int. Workshop on Parallel and Distributed Algorithms (October 1988).
Michael Wolfe, TINY, a Loop Restructuring Research Tool, Oregon Graduate Institute of Science and Technology (December 1990).
Alain Darte and Frédéric Vivien, Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs, Technical Report 96-06, LIP. ENS-Lyon, France (April 1996).
Google Scholar
Sailesh K. Rao, Regular Iterative Algorithms and their Implementations on Processor Arrays, Ph.D. Thesis, Stanford University (October 1985).
Vwani P. Roychowdhury, Derivation, Extensions and Parallel Implementation of Regular Iterative Algorithms, Ph.D. Thesis, Stanford University, December 1988.
S. Rao Kosaraju and Gregory F. Sullivan, Detecting Cycles in Dynamic Graphs in Polynomial Time (preliminary version), Proc. 20 th Ann. ACM Sympos. Theory of Computing, pp. 398–406 (May 1988).
Alain Darte and Frédéric Vivien, Automatic Parallelization based on Multi-Dimensional Scheduling. Technical Report 94-24, LIP. ENS-Lyon, France (September 1994).
Google Scholar
M. Gondran and M. Minoux, Graphs and Algorithms. John Wiley and Sons (1984).
Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, Introduction to Algorithms, MIT Press (1990).

Download references

Author information

Authors and Affiliations

Laboratoire LIP, URA CNRS 1398, École Normale Supérieure de Lyon, F-69364, Lyon Cedex 07
Alain Darte & Frédéric Vivien

Authors

Alain Darte
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Vivien
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Darte, A., Vivien, F. Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs. International Journal of Parallel Programming 25, 447–496 (1997). https://doi.org/10.1023/A:1025168022993

Download citation

Issue Date: December 1997
DOI: https://doi.org/10.1023/A:1025168022993

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

Abstract

Access this article

Similar content being viewed by others

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

Abstract

Access this article

Similar content being viewed by others

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation