Elsevier

Parallel Computing

Volume 22, Issue 3, March 1996, Pages 383-393
Parallel Computing

Paper
Cyclic block-algorithms for solving triangular systems on distributed-memory multiprocessors with mesh topology

https://doi.org/10.1016/0167-8191(96)00073-7Get rights and content

Abstract

Parallel blocked versions of short-cut algorithms are presented for solving triangular systems with, possibly, multiple right hand sides. They mainly use BLAS3-routines for their implementation and require the data to be distributed through a square-block torus-wrap mapping. Numerical experiments on an INTEL Paragon XP/S show an efficiency of 50–75% for a wide range of block sizes and mesh forms.

References (12)

  • C.C. Ashcraft

    The distributed solution of linear systems using the Torus Wrap Data Mapping

  • R.H. Bisseling et al.

    Parallel triangular system solving on a mesh network of transputers

    SIAM J. Sci. Stat. Comput.

    (1991)
  • J. Choi et al.

    ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers

    LAPACK Working Note 55

    (1992)
  • J. Choi et al.

    PB-BLAS: A set of parallel block basic linear algebra subprograms

  • J.J. Dongarra et al.

    A set of level 3 basic linear algebra subprograms

    ACM Trans. Math. Software

    (1990)
  • J.J. Dongarra et al.

    An extended set of Fortran basic linear algebra subprograms

    ACM Trans. Math. Software

    (1988)
There are more references available in the full text version of this article.

Cited by (0)

View full text