Research NoteFast Runtime Block Cyclic Data Redistribution on Multiprocessors☆
References (27)
- et al.
Generating local addresses and communication sets for data-parallel programs
J. Parallel Distrib. Comput.
(April 1995) - et al.
ScaLAPACK: A portable linear algebra library for distributed memory computers: Design issues and performance
Comput. Phys. Comm.
(1996) - et al.
LOCCS: Low overhead communication and computation subroutines
Future Generation Comput. Systems
(1994) - et al.
Data allocation strategies for the gauss and jordan algorithms on a ring of processors
Inform. Process. Lett.
(1989) - et al.
Generating communications for array statements: Design, implementation, and evaluation
J. Parallel Distrib. Comput.
(1994) - G. Agrawal, A. Sussman, J. Saltz, Compiler and runtime support for structured and block structured applications,...
- S. P. Amarasinghe, M. S. Lam, Communication optimization and code generation for distributed memory machines,...
- et al.
Optimal compilation of HPF remappings
Research Report
(October 1995) - et al.
ScaLAPACK: A portable linear algebra library for distributed memory computers: Design issues and performance
Technical Report
(1995) - et al.
The design of scalable software libraries for distributed memory concurrent computers
Environments and Tools for Parallel Scientific Computing
(1992)
Optimization of the ScaLAPACK LU factorization routine using communication/computation overlap
Europar'96 Parallel Processing
(1996)
Software libraries for linear algebra computations on high performance computers
SIAM Rev.
(June 1995)
Cited by (47)
Efficient Data Redistribution Algorithms From Irregular to Block Cyclic Data Distribution
2022, IEEE Transactions on Parallel and Distributed SystemsASPEN: An efficient algorithm for data redistribution between producer and consumer grids
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization
2016, ACM Transactions on Mathematical SoftwareAffine loop optimization based on modulo unrolling in Chapel
2014, ACM International Conference Proceeding SeriesDetermining the optimal redistribution for a given data partition
2014, Proceedings - IEEE 13th International Symposium on Parallel and Distributed Computing, ISPDC 2014
- ☆
J. J. DongarraB. Tourancheau, Eds.
- *
This work has been supported by CNRS Contract PICS, and CEE-EUREKA Contract EUROTOPS. E-mail: {loic.prylli, bernard.tourancheau}@lip.ens-lyon.fr.
Copyright © 1997 Academic Press. All rights reserved.