Abstract
We present a high performance Cholesky factorization algorithm, called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and a recursive packed data format. A full analysis of overcoming the non-linear addressing overhead imposed by recursion is given and discussed. Finally, since BPC uses GEMM to a great extent, we easily get a considerable amount of SMP parallelism from an SMP GEMM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B. Andersen, F. Gustavson, and J. Waśniewski. A recursive formulation of Cholesky factorization of a matrix in packed storage. Technical Report CS-00-441, University of Tennessee, Knoxville, TN, Computer Science Dept., May 2000. Also LAPACK Working Note number 146 (lawn146.ps), and submitted to the ACM Transaction of Mathematical Software.
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK User’s Guide. SIAM, Philadelphia, third edition, 1999.
Gene H. Golub and Charles Van Loan. Matrix Computations. Johns Hopkins, third edition, 1996.
F.G. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development, 41(6), November 1997.
F.G. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Recursive Blocked Data Formats and BLAS’s for Dense Linear Algebra Algorithms. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Applied Parallel Computing, PARA’98, volume 1541 of Lecture Notes in Computer Science, pages 195–206. Springer-Verlag, 1998.
F.G. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Superscalar GEMM-based Level 3 BLAS-The On-going Evolution of a Portable and High-Performance Library. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Applied Parallel Computing, PARA’98, volume 1541 of Lecture Notes in Computer Science, pages 207–215. Springer-Verlag,1998.
Fred Gustavson and Isak Jonsson. Minimal Storage High Performance Cholesky Factorization via Blocking and Recursion. Submitted to IBM Journal of Research and Development in June 2000.
IBM Corporation. Engineering and Scientific Subroutine Library for AIX, Guide and Reference, third edition, October 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gustavson, F., Jonsson, I. (2001). High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage. In: Sørevik, T., Manne, F., Gebremedhin, A.H., Moe, R. (eds) Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. PARA 2000. Lecture Notes in Computer Science, vol 1947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70734-4_12
Download citation
DOI: https://doi.org/10.1007/3-540-70734-4_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41729-3
Online ISBN: 978-3-540-70734-9
eBook Packages: Springer Book Archive