Automatic generation of nested, fork-join parallelism

Burke, Michael; Cytron, Ron; Ferrante, Jeanne; Hsieh, Wilson

doi:10.1007/BF00129843

Automatic generation of nested, fork-join parallelism

Published: July 1989

Volume 3, pages 71–88, (1989)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Michael Burke¹,
Ron Cytron¹,
Jeanne Ferrante¹ &
…
Wilson Hsieh²

136 Accesses
Explore all metrics

Abstract

This paper presents an efficient algorithm that automatically generates a parallel program from a dependence-based representation of a sequential program. The resulting parallel program consists of nested fork-join constructs, composed from the loops and statements of the sequential program. Data dependences are handled by two techniques. One technique implicitly satisfies them by sequencing, thereby reducing parallelism. Where increased parallelism results, the other technique eliminates them by privatization: the introduction of process-specific private instances of variables. Additionally, the algorithm determines when copying values of such instances in and out of nested parallel constructs results in greater parallelism. This is the first algorithm for automatically generating parallelism for such a general model. The algorithm generates as much parallelism as is possible in our model while minimizing privatization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating Nested Data Parallelism: Preserving Regularity

Beyond Data Parallelism: Identifying Parallel Tasks in Sequential Programs

Distributing and Parallelizing Non-canonical Loops

References

Aho, A., Hopcroft, J., and Ullman, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley.
Allen, F., Burke, M., Charles, P., Cytron, R., and Ferrante, J. 1988. An overview of the PTRAN analysis system for multiprocessing. J. Parallel and Distributed Computing, 5, 5 (Oct.), 617–640.
Google Scholar
Allen, R., Callahan, D., and Kennedy, K. 1987. Automatic decomposition of scientific programs for parallel execution. In Conference Record of the Fourteenth Annual ACM Symposium on Principles of Programming Languages (Jan.), pp. 63–76.
Baxter, W., and Bauer, J. R., III. 1989. The program dependence graph in vectorization. In Sixteenth ACM Principles of Programming Languages Symposium (Austin, Tex., Jan. 11–13), pp. 1–11.
Burke, M., and Cytron, R. 1986. Interprocedural dependence analysis and parallelization. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, 21, 7 (July), 162–175.
Google Scholar
Burke, M., Cytron, R., Ferrante, J., Hind, M., Hsieh, W., and Sarkar, V. 1989. Automatic parallelization for DAG parallelism. Tech. rept., MIT, NYU, IBM (in prep.).
Cytron, R. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing (Aug.), pp. 836–844.
Cytron, R., and Ferrante, J. 1987a. An improved control dependence algorithm. IBM tech. rept. RC 13291.
Cytron, R., and Ferrante, J. 1987b. What's in a name? On the value of renaming for parallelism detection and storage allocation. In Proceedings of the 1987 International Conference on Parallel Processing (Aug.), pp. 19–27.
Cytron, R., Hind, M., and Hsieh, W. 1989. Automatic generation of DAG parallelism. In Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation (to appear).
Cytron, R., Karlovsky, S., and McAuliffe, K. P. 1988. Automatic management of programmable caches (extended abstract). In Proceedings of the 1988 International Conference on Parallel Processing (Aug.).
Cytron, R., Ferrante, J., Rosen, B., Wegman, M., and Zadeck, K. 1989. An efficient method of computing static single assignment form. In Sixteenth ACM Principles of Programming Languages Symposium (Austin, Tex., Jan. 11–13), pp. 25–35.
Fisher, J. A., Ellis, J. R., Ruttenberg, J. C., and Nicolau, A. 1984. Parallel processing: A smart compiler and a dumb machine. In Proceedings of the ACM Symposium on Compiler Construction (June), pp. 37–47.
Ferrante, J., Ottenstein, K., and Warren, J. 1987. The program dependence graph and its use in optimization. In ACM Transactions on Programming Languages and Systems (July), pp. 319–349.
Gannon, D. 1986. Restructuring nested loops on the Alliant Cedar cluster: A case study of Gaussian elimination of banded matrices. SIAM Special Issue (Feb.).
Harel, D. 1980. A linear time algorithm for the lowest common ancestor problem. Foundations of Computer Science (Oct.).
Hecht, M. S. 1977. Flow Analysis of Computer Programs. Elsevier North-Holland, Inc.
Hsieh, W. C. 1988. Extracting parallelism from sequential programs. MIT tech. rept. (May). (M.S. thesis)
IBM. 1988. Parallel Fortran language and library reference. IBM tech. rept., pub. no. SC23-0431-0 (Mar.).
Kuck, D. J. 1978. The Structure of Computers and Computations. John Wiley and Sons.
Lengauer, T., and Tarjan, R. 1979. A fast algorithm for finding dominators in a flowgraph. TOPLAS (July).
Mills, H. D. 1982. Mathematical foundations for structured programming. In Writings of the Revolution: Selected Readings on Software Engineering, pp. 220–226.
Padua, D. A., and Wolfe, M. J. 1986. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29, 12 (Dec.), 1184–1201.
Google Scholar
Pountain, D., and May, D. 1987. A tutorial introduction to OCCAM programming. Mar.
Schwartz, J. T., and Sharir, M. 1979. A design for optimizations of the bivectoring class. New York Univ., Courant Institute of Mathematical Science, Courant Computer Science Rept. No. 17 (Sept.).
Stone, J. M. 1985. Nested parallelism in a parallel FORTRAN environment. IBM, T. J. Watson Research Center, tech. rept. RC 11506 (Nov.).
Tarjan, R. 1984. Testing flow graph reducibility. J. Computer and System Sciences, 9, 3 (Dec.), 355–365.
Google Scholar
Veidenbaum, A. 1985. Compiler optimizations and architecture design issues for multiprocessors. Univ. Ill. at Urbana-Champaign, Ph.D. diss.
Google Scholar
Warren, H. S., Jr. 1978. Static main storage packing problems. Acta Informatica, 9: 355–376.
Google Scholar
Wolfe, M. J. 1978. Techniques for improving the inherent parallelism in programs. Univ. of Ill. at Urbana-Champaign, M.S. thesis.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research Division, T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA
Michael Burke, Ron Cytron & Jeanne Ferrante
Massachusetts Institute of Technology, 545 Technology Square, 02139, Cambridge, MA, USA
Wilson Hsieh

Authors

Michael Burke
View author publications
You can also search for this author inPubMed Google Scholar
Ron Cytron
View author publications
You can also search for this author inPubMed Google Scholar
Jeanne Ferrante
View author publications
You can also search for this author inPubMed Google Scholar
Wilson Hsieh
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

An earlier version of this paper was presented at the First Workshop on Languages and Compilers for Vector and Parallel Machines, which was held at Cornell University in August 1988. That same year a select group of these workshop papers were published in two special issues of the journal: volume 2, numbers 2 and 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burke, M., Cytron, R., Ferrante, J. et al. Automatic generation of nested, fork-join parallelism. J Supercomput 3, 71–88 (1989). https://doi.org/10.1007/BF00129843

Download citation

Issue Date: July 1989
DOI: https://doi.org/10.1007/BF00129843

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic generation of nested, fork-join parallelism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerating Nested Data Parallelism: Preserving Regularity

Beyond Data Parallelism: Identifying Parallel Tasks in Sequential Programs

Distributing and Parallelizing Non-canonical Loops

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now