Research Note
Fast Runtime Block Cyclic Data Redistribution on Multiprocessors

https://doi.org/10.1006/jpdc.1997.1351Get rights and content

Abstract

Block cyclic distribution seems to be well suited for most linear algebra algorithms, and this type of data distribution was chosen for the ScaLAPACK library as well as for the HPF language. However, one must choose a good compromise for the size of the blocks (to achieve a good computation and communication efficiency and a good load balancing). This choice heavily depends on each operation, so it is essential to be able to go from one block cyclic distribution to another very quickly. Moreover, it is also essential to be able to choose the right number of processors and the best grid shape for a given operation. We present here the data redistribution algorithms we implemented in the ScaLAPACK library in order to go from one block cyclic distribution on one grid to that on another grid. A complexity study is made that shows the efficiency of our solution. Timing results on the Intel Paragon and the Cray T3D corroborate our results.

References (27)

  • S Domas et al.

    Optimization of the ScaLAPACK LU factorization routine using communication/computation overlap

    Europar'96 Parallel Processing

    (1996)
  • J. Dongarra, C. Randriamaro, L. Prylli, B. Tourancheau, 1995, Array redistribution in ScaLAPACK using PVM, EuroPVM...
  • J Dongarra et al.

    Software libraries for linear algebra computations on high performance computers

    SIAM Rev.

    (June 1995)
  • Cited by (47)

    View all citing articles on Scopus

    J. J. DongarraB. Tourancheau, Eds.

    *

    This work has been supported by CNRS Contract PICS, and CEE-EUREKA Contract EUROTOPS. E-mail: {loic.prylli, bernard.tourancheau}@lip.ens-lyon.fr.

    View full text