Skip to main content

Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage

  • Conference paper
Book cover Applied Parallel Computing. State of the Art in Scientific Computing (PARA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4699))

Included in the following conference series:

Abstract

We present three algorithms for Cholesky factorization using minimum block storage for a distributed memory (DM) environment. One of the distributed square block packed (SBP) format algorithms performs similar to ScaLAPACK PDPOTRF, and our algorithm with iteration overlapping typically outperforms it by 15–50% for small and medium sized matrices. By storing the blocks contiguously, we get better performing BLAS operations. Our DM algorithms are not sensitive to cache conflicts and thus give smooth and predictable performance. We also investigate the intricacies of using rectangular full packed (RFP) format with ScaLAPACK routines and point out some advantages and drawbacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R.C., Gustavson, F.G.: A parallel implementation of matrix multiplication and LU factorization on the IBM 3090. In: Wright, M. (ed.) Aspects of Computation on Asynchronous and Parallel Processors, pp. 217–221. IFIP, North-Holland, Amsterdam (1989)

    Google Scholar 

  2. Baboulin, M., Giraud, L., Gratton, S., Langou, J.: A distributed packed storage for large parallel calculations. Technical Report TR/PA/05/30, CERFACS, Toulouse, France (2005)

    Google Scholar 

  3. Blackford, L.S., et al.: ScaLAPACK user’s guide. SIAM Publications (1997)

    Google Scholar 

  4. Choi, J., Dongarra, J.J., Ostrouchov, S., Petitet, A.P., Walker, D.W., Whaley, R.C.: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming 5(3), 173–184 (1996)

    Google Scholar 

  5. Dackland, K., Elmroth, E., Kågström, B.: A ring–oriented approach for block matrix factorizations on shared and distributed memory architectures. In: Sincovec, R.F., et al. (eds.) SIAM Conference on Parallel Processing for Scientific Computing, pp. 330–338. SIAM Publications (1993)

    Google Scholar 

  6. D’Azevedo, E., Dongarra, J.: Packed storage extension for ScaLAPACK. Technical Report UT-CS-98-385 (1998)

    Google Scholar 

  7. Gustavson, F.: Algorithm compiler architecture interaction relative to dense linear algebra. Technical Report RC 23715, IBM Thomas J. Watson Research Center (September 2005)

    Google Scholar 

  8. Gustavson, F.: New generalized data structures for matrices lead to a variety of high performance dense linear algebra algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 11–20. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Gustavson, F., Wasniewski, J.: LAPACK Cholesky routines in rectangular full packed format. In: Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers. Workshop on State-of-the-Art in Scientific and Parallel Computing. LNCS, pp. 570–579. Springer, Heidelberg, 2006 (to appear)

    Google Scholar 

  10. Kurzak, J., Dongarra, J.J.: Pipelined shared memory implementation of linear algebra routines with arbitrary lookahead – LU, Cholesky, QR. In: Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead. Workshop on State-of-the-Art in Scientific and Parallel Computing. LNCS, pp. 147–156. Springer, Heidelberg, 2006 (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bo Kågström Erik Elmroth Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gustavson, F.G., Karlsson, L., Kågström, B. (2007). Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75755-9_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75754-2

  • Online ISBN: 978-3-540-75755-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics