Skip to main content
Log in

Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this article, we study the effects of network topology and load balancing on the performance of a new parallel algorithm for solving triangular systems of linear equations on distributed-memory message-passing multiprocessors. The proposed algorithm employs novel runtime data mapping and workload redistribution methods on a communication network which is configured as a toroidal mesh. A fully parameterized theoretical model is used to predict communication behaviors of the proposed algorithm relevant to load balancing, and the analytical performance results correctly determine the optimal dimensions of the toroidal mesh, which vary with the problem size, the number of available processors, and the hardware parameters of the machine. Further enhancement to the proposed algorithm is then achieved through redistributing the arithmetic workload at runtime. Our FORTRAN implementation of the proposed algorithm as well as its enhanced version has been tested on an Intel iPSC/2 hypercube, and the same code is also suitable for executing the algorithm on the iPSC/860 hypercube and the Intel Paragon mesh multiprocessor. The actual timing results support our theoretical findings, and they both confirm the very significant impact a network topology chosen at runtime can have on the computational load distribution, the communication behaviors and the overall performance of parallel algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. H. Bisseling and J. G. G. van de Vorst. Parallel triangular system solving on a mesh network of transputers. SIAM J. Sci. Stat. Comput., 12:787—799, 1991.

    Google Scholar 

  2. T. H. Dunigan. Performance of the Intel iPSC/860 Hypercube. Technical Report ORNL/TM-11491, Oak Ridge National Laboratory, Oak Ridge, Tennessee, 1990.

    Google Scholar 

  3. T. H. Dunigan. Communication performances of the Intel Touchstone DELTA Mesh. Technical Report ORNL/TM-11983, Oak Ridge National Laboratory, Oak Ridge, Tennessee, 1992.

    Google Scholar 

  4. M. T. Heath and C. H. Romine. Parallel solution of triangular systems on distributed-memory mul-tiprocessors. SIAM J. Sci. Stat. Comput., 9:558—588, 1988.

    Google Scholar 

  5. G. Li and T. F. Coleman. A parallel triangular solver on a distributed-memory multiprocessor. SIAM J. Sci. Stat. Comput., 9:485—502, 1988.

    Google Scholar 

  6. G. Li and T. F. Coleman. A new method for solving triangular system on distributed-memory mul-tiprocessors. SIAM J. Sci. Stat. Comput., 10:382—396, 1989.

    Google Scholar 

  7. L. D. J. C. Loyens and R. H. Bisseling. The formal construction of a parallel triangular system solver. In Lecture Notes in Computer Science, Number 375, pages 325—334. Springer-Verlag, 1989.

  8. C. H. Romine and J. M. Ortega. Parallel solution of triangular systems of equations. Parallel Comput., 6:109—114, 1988.

    Google Scholar 

  9. S. C. Eisenstat, M. T. Heath, C. S. Henkel, and C. H. Romine. Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors. SIAM J. Sci. Stat. Comput., 9:589—600, 1988.

    Google Scholar 

  10. D. Wang. Solving Triangular Systems on Distributed-Memory Multiprocessors, M.Sc. thesis, Dept. of Mathematics and Statistics, University of Guelph, Guelph, Ontario, Canada, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Chu, E. Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution. The Journal of Supercomputing 14, 77–95 (1999). https://doi.org/10.1023/A:1008151330872

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008151330872

Navigation