Elsevier

Parallel Computing

Volume 21, Issue 3, March 1995, Pages 353-372
Parallel Computing

Paper
Basic routines for the Rank-2k update: 2D torus vs reconfigurable network

https://doi.org/10.1016/0167-8191(94)00094-QGet rights and content

Abstract

Our aim is to provide the Rank-2k update on different parallel machines. In this paper, we compare the performance obtained on a fixed 2D torus topology and on a reconfigurable system. This results in the development of two basic communication subroutines, namely scattering and matrix-transposition. And two basic computation subroutines, namely matrix product and Rank-2k update (both belongs to the level 3 BLAS).

The preceding distributed-memory machines generation used fixed networks such as grid, multidimensional tori or hypercubes. Today, vendors propose machines with networks that can be reconfigured during program execution are available. A large number of possibilities are therefore available to the programmer, who can adapt his configuration during runtime to suit both best algorithm and data distribution. This dynamical reconfiguration obviously introduces an overhead through the setting of the network switch(es). This overhead must be taken into account in the cost of the whole computation.

Using complexity analysis and experiments on a machine issued from the SuperNode Esprit project, we compare for each subroutine studied, one method using a 2D torus topology and another one using a dynamically reconfigurable network. For each solution, performance evaluations and experiments exhibit interesting speed-ups for the algorithms on a reconfigurable network.

References (36)

  • A. Bar-Noy et al.

    Multiple message broadcasting in the postal model

  • L.E. Cannon

    A cellular computer to implement the Kalman Filter Algorithm

  • J. Choi et al.

    The design of scalable software libraries for distributed memory concurrent computers

  • J. Choi et al.

    Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers

  • C. Calvin et al.

    Matrix transpose for block allocation on processor networks

  • M. Cosnard et al.

    Gaussian elimination on message passing architecture

  • J.J. Dongarra et al.

    A set of Level 3 basic linear algebra subprograms

    ACM Trans. Math. Software

    (1990)
  • F. Desprez

    Algèbre linéaire sur TNode

  • On leave from LIP, CNRS URA 1398, ENS Lyon, 46 allée d'ltalie, 69364 Lyon Cedex 07, France.

    This work was supported in part by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615, CNRS-NSF grant 950.223/07, PRC C3, Archipel SA and MRE under grant 974, and DRET.

    View full text