Abstract
A parameterized ordering of Givens rotations and guidelines for choosing parameter values is presented in the context of QR decomposition. Although a standard selection of parameter values retrieves an ordering that corresponds to a well-known algorithm, we show that non-standard values decrease the execution time. We implement the new ordering on an Intel Pentium Pro system, a single thin POWER2 processor of the IBM SP2, and a single R8000 processor of the SGI POWER Challenge XL. On each machine, we observe performance that is more than twice that of the original ordering.
Similar content being viewed by others
References
I.J. Anderson and S.K. Harbour, Parallel factorization of banded linear matrices using a systolic array processor, Adv. Comput. Math. 5 (1996) 1-14.
J.L. Barlow and I.C.F. Ipsen, Scaled Givens rotations for the solution of linear least squares problems on systolic arrays, SIAM J. Sci. Statist. Comput. 8 (1987) 716-733.
C. Bischof and C.F. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Statist. Comput. 8 (1987) 2-13.
J.J. Carrig Jr. and G.G.L. Meyer, A banded fast Givens QR algorithm for efficient cache utilization, Technical Report 96-04, Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD (1996).
M. Cosnard and E.M. Daoudi, Optimal algorithms for parallel Given's factorizations on a coarse-grained PRAM, J. Assoc. Comput. Mach. 41 (1994) 399-421.
J.J. Dongarra and D.W. Walker, Software libraries for linear algebra computations on high performance computers, SIAM Rev. 37 (1995) 151-180.
K. Gallivan, W. Jalby and U. Meier, The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory, SIAM J. Sci. Statist. Comput. 8 (1987) 1079-1083.
K. Gallivan, W. Jalby, U. Meier and A.H. Sameh, Impact of hierarchical memory systems on linear algebra algorithm design, Internat. J. Supercomputer Appl. 2 (1988) 12-48.
G.H. Golub and C.F. Van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, MD, 1989).
C.L. Lawson and R.J. Hanson, Solving Least Squares Problems (Prentice-Hall, Englewood Cliffs, NJ, 1974).
R.E. Lord, J.S. Kowalik and S.P. Kumar, Solving linear algebraic equations on an MIMD computer, J. Assoc. Comput. Mach. 30 (1983) 103-117.
G.G.L. Meyer and M. Pascale, A family of parallel QR factorization algorithms, Special Issue of Concurrency Practice and Experience 8 (1996) 461-473.
A. Sameh and D. Kuck, On stable parallel linear system solvers, J. Assoc. Comput. Mach. 25 (1978) 81-91.
R. Schreiber and C.F. Van Loan, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Statist. Comput. 10 (1989) 53-57.
R.A. Van De Geijn, Deferred shifting schemes for parallel QR methods, SIAM J. Matrix Anal. 14 (1993) 180-194.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Carrig, J.J., Meyer, G.G. A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition. Advances in Computational Mathematics 10, 97–113 (1999). https://doi.org/10.1023/A:1018970413988
Issue Date:
DOI: https://doi.org/10.1023/A:1018970413988