Skip to main content
Log in

Automatic generation of nested, fork-join parallelism

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper presents an efficient algorithm that automatically generates a parallel program from a dependence-based representation of a sequential program. The resulting parallel program consists of nested fork-join constructs, composed from the loops and statements of the sequential program. Data dependences are handled by two techniques. One technique implicitly satisfies them by sequencing, thereby reducing parallelism. Where increased parallelism results, the other technique eliminates them by privatization: the introduction of process-specific private instances of variables. Additionally, the algorithm determines when copying values of such instances in and out of nested parallel constructs results in greater parallelism. This is the first algorithm for automatically generating parallelism for such a general model. The algorithm generates as much parallelism as is possible in our model while minimizing privatization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aho, A., Hopcroft, J., and Ullman, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley.

  • Allen, F., Burke, M., Charles, P., Cytron, R., and Ferrante, J. 1988. An overview of the PTRAN analysis system for multiprocessing. J. Parallel and Distributed Computing, 5, 5 (Oct.), 617–640.

    Google Scholar 

  • Allen, R., Callahan, D., and Kennedy, K. 1987. Automatic decomposition of scientific programs for parallel execution. In Conference Record of the Fourteenth Annual ACM Symposium on Principles of Programming Languages (Jan.), pp. 63–76.

  • Baxter, W., and Bauer, J. R., III. 1989. The program dependence graph in vectorization. In Sixteenth ACM Principles of Programming Languages Symposium (Austin, Tex., Jan. 11–13), pp. 1–11.

  • Burke, M., and Cytron, R. 1986. Interprocedural dependence analysis and parallelization. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, 21, 7 (July), 162–175.

    Google Scholar 

  • Burke, M., Cytron, R., Ferrante, J., Hind, M., Hsieh, W., and Sarkar, V. 1989. Automatic parallelization for DAG parallelism. Tech. rept., MIT, NYU, IBM (in prep.).

  • Cytron, R. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing (Aug.), pp. 836–844.

  • Cytron, R., and Ferrante, J. 1987a. An improved control dependence algorithm. IBM tech. rept. RC 13291.

  • Cytron, R., and Ferrante, J. 1987b. What's in a name? On the value of renaming for parallelism detection and storage allocation. In Proceedings of the 1987 International Conference on Parallel Processing (Aug.), pp. 19–27.

  • Cytron, R., Hind, M., and Hsieh, W. 1989. Automatic generation of DAG parallelism. In Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation (to appear).

  • Cytron, R., Karlovsky, S., and McAuliffe, K. P. 1988. Automatic management of programmable caches (extended abstract). In Proceedings of the 1988 International Conference on Parallel Processing (Aug.).

  • Cytron, R., Ferrante, J., Rosen, B., Wegman, M., and Zadeck, K. 1989. An efficient method of computing static single assignment form. In Sixteenth ACM Principles of Programming Languages Symposium (Austin, Tex., Jan. 11–13), pp. 25–35.

  • Fisher, J. A., Ellis, J. R., Ruttenberg, J. C., and Nicolau, A. 1984. Parallel processing: A smart compiler and a dumb machine. In Proceedings of the ACM Symposium on Compiler Construction (June), pp. 37–47.

  • Ferrante, J., Ottenstein, K., and Warren, J. 1987. The program dependence graph and its use in optimization. In ACM Transactions on Programming Languages and Systems (July), pp. 319–349.

  • Gannon, D. 1986. Restructuring nested loops on the Alliant Cedar cluster: A case study of Gaussian elimination of banded matrices. SIAM Special Issue (Feb.).

  • Harel, D. 1980. A linear time algorithm for the lowest common ancestor problem. Foundations of Computer Science (Oct.).

  • Hecht, M. S. 1977. Flow Analysis of Computer Programs. Elsevier North-Holland, Inc.

  • Hsieh, W. C. 1988. Extracting parallelism from sequential programs. MIT tech. rept. (May). (M.S. thesis)

  • IBM. 1988. Parallel Fortran language and library reference. IBM tech. rept., pub. no. SC23-0431-0 (Mar.).

  • Kuck, D. J. 1978. The Structure of Computers and Computations. John Wiley and Sons.

  • Lengauer, T., and Tarjan, R. 1979. A fast algorithm for finding dominators in a flowgraph. TOPLAS (July).

  • Mills, H. D. 1982. Mathematical foundations for structured programming. In Writings of the Revolution: Selected Readings on Software Engineering, pp. 220–226.

  • Padua, D. A., and Wolfe, M. J. 1986. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29, 12 (Dec.), 1184–1201.

    Google Scholar 

  • Pountain, D., and May, D. 1987. A tutorial introduction to OCCAM programming. Mar.

  • Schwartz, J. T., and Sharir, M. 1979. A design for optimizations of the bivectoring class. New York Univ., Courant Institute of Mathematical Science, Courant Computer Science Rept. No. 17 (Sept.).

  • Stone, J. M. 1985. Nested parallelism in a parallel FORTRAN environment. IBM, T. J. Watson Research Center, tech. rept. RC 11506 (Nov.).

  • Tarjan, R. 1984. Testing flow graph reducibility. J. Computer and System Sciences, 9, 3 (Dec.), 355–365.

    Google Scholar 

  • Veidenbaum, A. 1985. Compiler optimizations and architecture design issues for multiprocessors. Univ. Ill. at Urbana-Champaign, Ph.D. diss.

    Google Scholar 

  • Warren, H. S., Jr. 1978. Static main storage packing problems. Acta Informatica, 9: 355–376.

    Google Scholar 

  • Wolfe, M. J. 1978. Techniques for improving the inherent parallelism in programs. Univ. of Ill. at Urbana-Champaign, M.S. thesis.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

An earlier version of this paper was presented at the First Workshop on Languages and Compilers for Vector and Parallel Machines, which was held at Cornell University in August 1988. That same year a select group of these workshop papers were published in two special issues of the journal: volume 2, numbers 2 and 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burke, M., Cytron, R., Ferrante, J. et al. Automatic generation of nested, fork-join parallelism. J Supercomput 3, 71–88 (1989). https://doi.org/10.1007/BF00129843

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00129843

Keywords

Navigation