Skip to main content
Log in

Partitioning and mapping of nested loops for linear array multicomputers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In distributed-memory multicomputers, minimizing interprocessor communication is the key to the efficient execution of parallel programs. In order to reduce the amount of communication overhead, parallel programs on multicomputers must be carefully scheduled by parallelizing compilers. This paper proposes some compilation techniques for partitioning and mapping nested loops with constant data dependences onto linear array multicomputers. First, a systematic partition strategy is proposed to project ann-dimensional computational structure, representing ann-nested loop, onto a line to form a one-dimensional projected structure with low communication overhead. Then, a mapping algorithm is proposed for mapping the partitioned loops onto linear arrays in a way that balances the workload and minimizes the communication cost among processors. Finally, parallel execution codes can be automatically generated for such linear array multicomputers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bannerjee, U., Chen, S.C., Kuck, D.J., and Towle, R.A. 1979. Time and parallel processor bounds for Fortran-like loops.IEEE Trans. Comps., C-28, 9(Sept.): 660–670.

    Google Scholar 

  • Bokhari, S.H. 1988. Partitioning problems in parallel, pipelined, and distributed computing.IEEE Trans. Comps., 37(Jan.): 48–57.

    Google Scholar 

  • Chen, T.S., and Sheu, J.P. 1994. Communication-free data allocation techniques for parallelizing compilers on multicomputers.IEEE Trans. Parallel and Distr. Systems, 5, 9(Sept.): 924–938.

    Google Scholar 

  • D'Hollander, E.H. 1989. Partitioning and labeling of index sets in do loops with constant dependence. InProc., 1989 Internat. Conf. on Parallel Processing, vol. 2 (Aug.), pp. 139–144.

    Google Scholar 

  • Ercal, F., Ramanujam, J., and Sadayappan, P. 1990. Task allocation onto a hypercube by recursive mincut bipartitioning.J. Parallel and Distr. Comp., 10, 1(Sept.): 35–44.

    Google Scholar 

  • Friedberg, S.H., Insel, A.J., and Spence, L.E. 1979.Linear Algebra. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Gupta, R. 1992. Synchronization and communication costs of loop partitioning on shared-memory multiprocessor systems.IEEE Trans. Parallel and Distr. Systems, 3, 4(July): 505–512.

    Google Scholar 

  • King, C.T., Chou, W.H., and Ni, L.M. 1990. Pipelined data-parallel algorithms: Part II, design.IEEE Trans. Parallel and Distr. Systems, 1, 4(Oct.): 486–499.

    Google Scholar 

  • Kung, S.Y. 1987.VLSI Array Processors. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Lamport, L. 1974. The parallel execution of do loops.CACM, 17, 2(Feb.): 83–93.

    Google Scholar 

  • Liu, L.S., Ho, C.W., and Sheu, J.P. 1990. On the parallelism of nested for-loops using index shift method. InProc., 1990 Internat. Conf. on Parallel Processing, vol. 2(Aug.), pp. 119–123.

    Google Scholar 

  • Lu, M., and Fang, J.Z. 1992. A solution of the cache ping-pong problem in multiprocessor systems.J. Parallel and Distr. Comp., 16(Oct.): 158–171.

    Google Scholar 

  • Padua, D.A., and Wolfe, M.J. 1986. Advanced compiler optimizations for supercomputers.CACM (Dec.): 1184–1201.

    Google Scholar 

  • Peir, J.-K., and Cytron, R. 1989. Minimum distance: A method for partitioning recurrences for multiprocessors.IEEE Trans. Comps., 38, 8(Aug.): 1203–1211.

    Google Scholar 

  • Ramanujam, J., and Sadayappan, P. 1989. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. InProc., ACM Internat. Conf. on Supercomputing (Nov.), pp. 637–646.

    Google Scholar 

  • Ramanujam, J., and Sadayappan, P. 1990. Tiling of iteration spaces for multicomputers. InProc., 1990 Internat. Conf. on Parallel Processing (Aug.), pp. 23–36.

    Google Scholar 

  • Sadayappan, P., and Ercal, F. 1987. Nearest-neighbor mapping of finite element graphs onto processor meshes.IEEE Trans. Comps., C-36, 12(Dec.): 1408–1424.

    Google Scholar 

  • Shang, W., and Fortes, J.A.B. 1988. Independent partitioning of algorithms with uniform dependencies. InProc., 1988 Internat. Conf. on Parallel Processing (Aug.), pp. 26–33.

    Google Scholar 

  • Shang, W., and Fortes, J.A.B. 1991. Time optimal linear schedules for algorithms with uniform dependencies.IEEE Trans. Comps., 40, 6(June): 723–742.

    Google Scholar 

  • Sheu, J.P., and Tai, T.H. 1991. Partitioning and mapping nested loops on multiprocessor systems.IEEE Trans. Parallel and Distr. Systems, 2, 4(Oct).: 430–439.

    Google Scholar 

  • Wolf, M.E., and Lam, M.S. 1991. A loop transformation theory and an algorithm to maximize parallelism.IEEE Trans. Parallel and Distr. Systems, 2, 4(Oct.): 452–471.

    Google Scholar 

  • Wolfe, M.J. 1989a. More iteration space tiling. InProc., ACM Internat. Conf. on Supercomputing (Nov.), pp. 655–664.

    Google Scholar 

  • Wolfe, M.J. 1989b.Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, Mass.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheu, JP., Chen, TS. Partitioning and mapping of nested loops for linear array multicomputers. J Supercomput 9, 183–202 (1995). https://doi.org/10.1007/BF01245404

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01245404

Keywords

Navigation