Skip to main content

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators

  • Conference paper
High Performance Computing for Computational Science – VECPAR 2010 (VECPAR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6449))

Abstract

We present a Cholesky factorization for multicore with GPU accelerators systems. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs’ compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. This results in a scalable hybrid Cholesky factorization of unprecedented performance. In particular, using NVIDIA’s Tesla S1070 (4 C1060 GPUs, each with 30 cores @1.44 GHz) connected to two dual-core AMD Opteron @1.8GHz processors, we reach up to 1.163 TFlop/s in single and up to 275 GFlop/s in double precision arithmetic. Compared with the performance of the embarrassingly parallel xGEMM over four GPUs, where no communication between GPUs are involved, our algorithm still runs at 73% and 84% for single and double precision arithmetic respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NVIDIA CUDA Compute Unified Device Architecture - Programming Guide (2007), http://developer.download.nvidia.com

  2. NVIDIA CUDA ZONE, http://www.nvidia.com/object/cuda_home.html

  3. Volkov, V., Demmel, J.: Benchmarking GPUs to tune dense linear algebra. In: Proc. of SC 2008, Piscataway, NJ, USA, pp. 1–11 (2008)

    Google Scholar 

  4. Tomov, S., Dongarra, J.: Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing. LAPACK Working Note 219 (May 2009)

    Google Scholar 

  5. CUDA CUBLAS Library, http://developer.download.nvidia.com

  6. Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., YarKhan, A.: PLASMA version 2.0 user guide (2009), http://icl.cs.utk.edu/plasma

  7. Kurzak, J., Buttari, A., Dongarra, J.J.: Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Transactions on Parallel and Distributed Systems 19(9), 1–11 (2008)

    Article  Google Scholar 

  8. Baboulin, M., Dongarra, J., Tomov, S.: Some issues in dense linear algebra for multicore and special purpose architectures. LAPACK Working Note 200 (May 2008)

    Google Scholar 

  9. Tomov, S., Nath, R., Du, P., Dongarra, J.: MAGMA version 0.2 User Guide (November 2009), http://icl.cs.utk.edu/magma

  10. Li, Y., Dongarra, J., Tomov, S.: A note on auto-tuning GEMM for gPUs. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5544, pp. 884–892. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Ayguadé, E., Badia, R., Igual, F., Labarta, J., Mayo, R., Quintana-Ortí, E.: An extension of the starSs programming model for platforms with multiple gPUs. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 851–862. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense Linear Algebra Solvers for Multicore with GPU Accelerators. In: Proceedings of IPDPS 2010, Atlanta, GA (April 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ltaief, H., Tomov, S., Nath, R., Du, P., Dongarra, J. (2011). A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19328-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19327-9

  • Online ISBN: 978-3-642-19328-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics