Abstract
This paper presents some optimizations based on communications/computations overlap for the ScaLAPACK LU factorization. First a theoretical computation of the optimal block size is given for the block scattered decomposition of the matrix. Two optimizations of this routine are presented that use asynchronous communications to hide the communication overhead and to obtain optimal speed-ups.
This work has been supported by the INRIA RhÔne-Alpes and the EUREKA-EUROTOPS project.
Chapter PDF
Keywords
- Asynchronous Communication
- Block Column
- Optimal Block Size
- Basic Linear Algebra Subprogram
- Linear Algebra Library
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. LAPACK Working Note: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers — Design Issues and Performances. Technical Report 95, CS Dept — Univ. of Tennessee, 1995.
E. Chu and A. George. Gaussian Elimination with Partial Pivoting and Load Balancing on a Multiprocessor. Parallel Computing, 5:65–74, 1987.
F. Desprez, S. Domas, and B. Tourancheau. Optimization of Parallel LU Factorization by Communication Overlap. Technical Report ???, LIP-ENS Lyon, 1996.
F. Desprez, J.J. Dongarra, and B. Tourancheau. Performance Complexity of LU Factorization with Efficient Pipelining and Overlap on a Multiprocessor. Parallel Processing Letters, 5-II, 1995.
J.J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Soft., 16(1):1–17, 1990.
J.J. Dongarra, R. Van De Geijn, and D.W. Walker. A Look at Dense Linear Algebra Libraries. Technical Report ORNL/TM-12126, Oak Ridge Nat. Lab., July 1992.
C. Lawson, R. Hanson, D. Kincaid, and F. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Soft., 5:308–323, 1979.
B.V. Purushotham, A. Basu, P.S. Kumar, and L.M. Patnaik. Performance Estimation of LU Factorisation on Message Passing Multiprocessors. Parallel Processing Letters, 2(1):51–60, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Desprez, F., Domas, S., Tourancheau, B. (1996). Optimization of the ScaLAPACK LU factorization routine using communication/computation overlap. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024678
Download citation
DOI: https://doi.org/10.1007/BFb0024678
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61627-6
Online ISBN: 978-3-540-70636-6
eBook Packages: Springer Book Archive