A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition

Carrig, James J.; Meyer, Gerard G.L.

doi:10.1023/A:1018970413988

A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition

Published: January 1999

Volume 10, pages 97–113, (1999)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

James J. Carrig Jr.¹ &
Gerard G.L. Meyer²

60 Accesses
2 Citations
Explore all metrics

Abstract

A parameterized ordering of Givens rotations and guidelines for choosing parameter values is presented in the context of QR decomposition. Although a standard selection of parameter values retrieves an ordering that corresponds to a well-known algorithm, we show that non-standard values decrease the execution time. We implement the new ordering on an Intel Pentium Pro system, a single thin POWER2 processor of the IBM SP2, and a single R8000 processor of the SGI POWER Challenge XL. On each machine, we observe performance that is more than twice that of the original ordering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Family of Modular QRD-Accelerator Architectures and Circuits Cross-Layer Optimized for High Area- and Energy-Efficiency

Article 17 February 2015

Upasna Vishnoi, Michael Meixner & Tobias G. Noll

HeDPM: load balancing of linear pipeline applications on heterogeneous systems

Article Open access 02 February 2017

Andreu Moreno, Anna Sikora, … Tomàs Margalef

A Parallel Factorization for Generating Orthogonal Matrices

References

I.J. Anderson and S.K. Harbour, Parallel factorization of banded linear matrices using a systolic array processor, Adv. Comput. Math. 5 (1996) 1-14.
Article MATH MathSciNet Google Scholar
J.L. Barlow and I.C.F. Ipsen, Scaled Givens rotations for the solution of linear least squares problems on systolic arrays, SIAM J. Sci. Statist. Comput. 8 (1987) 716-733.
Article MATH MathSciNet Google Scholar
C. Bischof and C.F. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Statist. Comput. 8 (1987) 2-13.
Article MathSciNet Google Scholar
J.J. Carrig Jr. and G.G.L. Meyer, A banded fast Givens QR algorithm for efficient cache utilization, Technical Report 96-04, Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD (1996).
Google Scholar
M. Cosnard and E.M. Daoudi, Optimal algorithms for parallel Given's factorizations on a coarse-grained PRAM, J. Assoc. Comput. Mach. 41 (1994) 399-421.
MATH MathSciNet Google Scholar
J.J. Dongarra and D.W. Walker, Software libraries for linear algebra computations on high performance computers, SIAM Rev. 37 (1995) 151-180.
Article MathSciNet Google Scholar
K. Gallivan, W. Jalby and U. Meier, The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory, SIAM J. Sci. Statist. Comput. 8 (1987) 1079-1083.
Article MATH Google Scholar
K. Gallivan, W. Jalby, U. Meier and A.H. Sameh, Impact of hierarchical memory systems on linear algebra algorithm design, Internat. J. Supercomputer Appl. 2 (1988) 12-48.
Article Google Scholar
G.H. Golub and C.F. Van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, MD, 1989).
MATH Google Scholar
C.L. Lawson and R.J. Hanson, Solving Least Squares Problems (Prentice-Hall, Englewood Cliffs, NJ, 1974).
MATH Google Scholar
R.E. Lord, J.S. Kowalik and S.P. Kumar, Solving linear algebraic equations on an MIMD computer, J. Assoc. Comput. Mach. 30 (1983) 103-117.
MATH MathSciNet Google Scholar
G.G.L. Meyer and M. Pascale, A family of parallel QR factorization algorithms, Special Issue of Concurrency Practice and Experience 8 (1996) 461-473.
Article Google Scholar
A. Sameh and D. Kuck, On stable parallel linear system solvers, J. Assoc. Comput. Mach. 25 (1978) 81-91.
MATH MathSciNet Google Scholar
R. Schreiber and C.F. Van Loan, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Statist. Comput. 10 (1989) 53-57.
Article MATH MathSciNet Google Scholar
R.A. Van De Geijn, Deferred shifting schemes for parallel QR methods, SIAM J. Matrix Anal. 14 (1993) 180-194.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Sony Electronics, Inc., Santa Clara, CA, 95054, USA
James J. Carrig Jr.
Johns Hopkins University, Baltimore, MD, 21218, USA
Gerard G.L. Meyer

Authors

James J. Carrig Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Gerard G.L. Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carrig, J.J., Meyer, G.G. A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition. Advances in Computational Mathematics 10, 97–113 (1999). https://doi.org/10.1023/A:1018970413988

Download citation

Issue Date: January 1999
DOI: https://doi.org/10.1023/A:1018970413988

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition

Abstract

Access this article

Similar content being viewed by others

A Family of Modular QRD-Accelerator Architectures and Circuits Cross-Layer Optimized for High Area- and Energy-Efficiency

HeDPM: load balancing of linear pipeline applications on heterogeneous systems

A Parallel Factorization for Generating Orthogonal Matrices

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition

Abstract

Access this article

Similar content being viewed by others

A Family of Modular QRD-Accelerator Architectures and Circuits Cross-Layer Optimized for High Area- and Energy-Efficiency

HeDPM: load balancing of linear pipeline applications on heterogeneous systems

A Parallel Factorization for Generating Orthogonal Matrices

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation