Skip to main content
Log in

A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

A parameterized ordering of Givens rotations and guidelines for choosing parameter values is presented in the context of QR decomposition. Although a standard selection of parameter values retrieves an ordering that corresponds to a well-known algorithm, we show that non-standard values decrease the execution time. We implement the new ordering on an Intel Pentium Pro system, a single thin POWER2 processor of the IBM SP2, and a single R8000 processor of the SGI POWER Challenge XL. On each machine, we observe performance that is more than twice that of the original ordering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. I.J. Anderson and S.K. Harbour, Parallel factorization of banded linear matrices using a systolic array processor, Adv. Comput. Math. 5 (1996) 1-14.

    Article  MATH  MathSciNet  Google Scholar 

  2. J.L. Barlow and I.C.F. Ipsen, Scaled Givens rotations for the solution of linear least squares problems on systolic arrays, SIAM J. Sci. Statist. Comput. 8 (1987) 716-733.

    Article  MATH  MathSciNet  Google Scholar 

  3. C. Bischof and C.F. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Statist. Comput. 8 (1987) 2-13.

    Article  MathSciNet  Google Scholar 

  4. J.J. Carrig Jr. and G.G.L. Meyer, A banded fast Givens QR algorithm for efficient cache utilization, Technical Report 96-04, Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD (1996).

    Google Scholar 

  5. M. Cosnard and E.M. Daoudi, Optimal algorithms for parallel Given's factorizations on a coarse-grained PRAM, J. Assoc. Comput. Mach. 41 (1994) 399-421.

    MATH  MathSciNet  Google Scholar 

  6. J.J. Dongarra and D.W. Walker, Software libraries for linear algebra computations on high performance computers, SIAM Rev. 37 (1995) 151-180.

    Article  MathSciNet  Google Scholar 

  7. K. Gallivan, W. Jalby and U. Meier, The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory, SIAM J. Sci. Statist. Comput. 8 (1987) 1079-1083.

    Article  MATH  Google Scholar 

  8. K. Gallivan, W. Jalby, U. Meier and A.H. Sameh, Impact of hierarchical memory systems on linear algebra algorithm design, Internat. J. Supercomputer Appl. 2 (1988) 12-48.

    Article  Google Scholar 

  9. G.H. Golub and C.F. Van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, MD, 1989).

    MATH  Google Scholar 

  10. C.L. Lawson and R.J. Hanson, Solving Least Squares Problems (Prentice-Hall, Englewood Cliffs, NJ, 1974).

    MATH  Google Scholar 

  11. R.E. Lord, J.S. Kowalik and S.P. Kumar, Solving linear algebraic equations on an MIMD computer, J. Assoc. Comput. Mach. 30 (1983) 103-117.

    MATH  MathSciNet  Google Scholar 

  12. G.G.L. Meyer and M. Pascale, A family of parallel QR factorization algorithms, Special Issue of Concurrency Practice and Experience 8 (1996) 461-473.

    Article  Google Scholar 

  13. A. Sameh and D. Kuck, On stable parallel linear system solvers, J. Assoc. Comput. Mach. 25 (1978) 81-91.

    MATH  MathSciNet  Google Scholar 

  14. R. Schreiber and C.F. Van Loan, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Statist. Comput. 10 (1989) 53-57.

    Article  MATH  MathSciNet  Google Scholar 

  15. R.A. Van De Geijn, Deferred shifting schemes for parallel QR methods, SIAM J. Matrix Anal. 14 (1993) 180-194.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carrig, J.J., Meyer, G.G. A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition. Advances in Computational Mathematics 10, 97–113 (1999). https://doi.org/10.1023/A:1018970413988

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018970413988

Navigation