Skip to main content

Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9573))

Abstract

We study the performance of dense symmetric indefinite factorizations (Bunch-Kaufman and Aasen’s algorithms) on multicore CPUs with a Graphics Processing Unit (GPU). Though such algorithms are needed in many scientific and engineering simulations, obtaining high performance of the factorization on the GPU is difficult because the pivoting that is required to ensure the numerical stability of the factorization leads to frequent synchronizations and irregular data accesses. As a result, until recently, there has not been any implementation of these algorithms on hybrid CPU/GPU architectures. To improve their performance on the hybrid architecture, we explore different techniques to reduce the expensive communication and synchronization between the CPU and GPU, or on the GPU. We also study the performance of an \(LDL^T\) factorization with no pivoting combined with the preprocessing technique based on Random Butterfly Transformations. Though such transformations only have probabilistic results on the numerical stability, they avoid the pivoting and obtain a great performance on the GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A Bunch-Kaufman implementation became recently available in the cuSolver library as part of the CUDA Toolkit v7.5 from NVIDIA.

References

  1. Aasen, J.: On the reduction of a symmetric matrix to tridiagonal form. BIT 11, 233–242 (1971)

    Article  MathSciNet  Google Scholar 

  2. Anderson, E., Bai, Z., Dongarra, J.J., Greenbaum, A., McKenney, A., Du Croz, J., Hammarling, S., Demmel, J.W., Bischof, C., Sorensen, D.: LAPACK: a portable linear algebra library for high-performance computers. In: Proceedings of the ACM/IEEE Conference on Supercomputing (1990)

    Google Scholar 

  3. Intel, Math Kernel Library (MKL). http://www.intel.com/software/products/mkl/

  4. Ashcraft, C., Grimes, R.G., Lewis, J.G.: Accurate symmetric indefinite linear equation solvers. SIAM J. Matrix Anal. Appl. 20(2), 513–561 (1998)

    Article  MathSciNet  Google Scholar 

  5. Baboulin, M., Dongarra, J.J., Hermann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 8 (2013)

    Article  MathSciNet  Google Scholar 

  6. Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014)

    Article  MathSciNet  Google Scholar 

  7. Baboulin, M., Li, X.S., Rouet, F.-H.: Using random butterfly transformations to avoid pivoting in sparse direct methods. In: Proceedings of International Conference on Vector and Parallel Processing (VecPar 2014), Eugene (OR), USA

    Google Scholar 

  8. Baboulin, M., Becker, D., Dongarra, J.J.: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures. In: Parallel Distributed Processing Symposium (IPDPS) (2012)

    Google Scholar 

  9. Ballard, G., Becker, D., Demmel, J., Dongarra, J., Druinsky, A., Peled, I., Schwartz, O., Toledo, S., Yamazaki, I.: Communication-avoiding symmetric-indefinite factorization. SIAM J. Matrix Anal. Appl. 35, 1364–1460 (2014)

    Article  MathSciNet  Google Scholar 

  10. Blackford, L., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J.W., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK Users Guide. SIAM, Philadelphia (1997)

    Book  Google Scholar 

  11. Bunch, J.R., Parlett, B.N.: Direct methods for solving symmetric indefinite systems of linear equations. SIAM J. Numer. Anal. 8, 639–655 (1971)

    Article  MathSciNet  Google Scholar 

  12. Bunch, J.R., Kaufman, L.: Some stable methods for calculating inertia and solving symmetric linear systems. Math. Comput. 31, 163–179 (1977)

    Article  MathSciNet  Google Scholar 

  13. Ballard, G., Becker, D., Demmel, J., Dongarra, J., Druinsky, A., Peled, I., Schwartz, O., Toledo, S., Yamazaki, I.: Implementing a blocked Aasen’s algorithm with a dynamic scheduler on multicore architectures. In: Proceedings of the 27th International Symposium on Parallel and Distributed Processing, pp. 895–907 (2013)

    Google Scholar 

  14. Becker, D., Baboulin, M., Dongarra, J.: Reducing the amount of pivoting in symmetric indefinite systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 133–142. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)

    Book  Google Scholar 

  16. Castaldo, A., Whaley, R.: Scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th AGM SIGPLAN Symposium on Principle and Practice of Parallel Programming, pp. 223–232 (2010)

    Google Scholar 

  17. Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34, A206–A239 (2012). Technical report (UCB/EECS-2008-89), EECS Department, University of California, Berkeley

    Article  MathSciNet  Google Scholar 

  18. Grigori, L., Demmel, J., Xiang, H.: CALU: a communication optimal LU factorization algorithm. SIAM. J. Matrix Anal. Appl. 32(4), 1317–1350 (2011)

    Article  MathSciNet  Google Scholar 

  19. Gustavson, F.: Recursive leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41, 737–755 (1997)

    Article  Google Scholar 

  20. Higham, N.J.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia (2002)

    Book  Google Scholar 

  21. Nédélec, J.-C.: Acoustic and Electromagnetic Equations. Integral Representations for Harmonic Problems. Applied Mathematical Sciences, vol. 144. Springer, New York (2001)

    Book  Google Scholar 

  22. Parker, D.S.: Random butterfly transformations with applications in computational linear algebra. Technical report CSD-950023, UCLA Computer Science Department (1995)

    Google Scholar 

  23. Rozloz̆ník, M., Shklarski, G., Toledo, S.: Partitioned triangular tridiagonalization. ACM Trans. Math. Softw. 37(4), 1–16 (2011)

    Article  MathSciNet  Google Scholar 

  24. Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5&6), 232–240 (2010)

    Article  Google Scholar 

  25. Toledo, S.: Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18(4), 1065–1081 (1997)

    Article  MathSciNet  Google Scholar 

  26. University of Tennessee: PLASMA Users’ Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3 (2010)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the NSF grant #ACI-1339822, NVIDIA, and MathWorks for supporting this research effort. The authors are also grateful to Nicolas Zerbib (ESI Group) for his help in using test matrices from acoustics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Baboulin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Baboulin, M., Dongarra, J., Rémy, A., Tomov, S., Yamazaki, I. (2016). Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32149-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32148-6

  • Online ISBN: 978-3-319-32149-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics