Abstract
We study the performance of dense symmetric indefinite factorizations (Bunch-Kaufman and Aasen’s algorithms) on multicore CPUs with a Graphics Processing Unit (GPU). Though such algorithms are needed in many scientific and engineering simulations, obtaining high performance of the factorization on the GPU is difficult because the pivoting that is required to ensure the numerical stability of the factorization leads to frequent synchronizations and irregular data accesses. As a result, until recently, there has not been any implementation of these algorithms on hybrid CPU/GPU architectures. To improve their performance on the hybrid architecture, we explore different techniques to reduce the expensive communication and synchronization between the CPU and GPU, or on the GPU. We also study the performance of an \(LDL^T\) factorization with no pivoting combined with the preprocessing technique based on Random Butterfly Transformations. Though such transformations only have probabilistic results on the numerical stability, they avoid the pivoting and obtain a great performance on the GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A Bunch-Kaufman implementation became recently available in the cuSolver library as part of the CUDA Toolkit v7.5 from NVIDIA.
References
Aasen, J.: On the reduction of a symmetric matrix to tridiagonal form. BIT 11, 233–242 (1971)
Anderson, E., Bai, Z., Dongarra, J.J., Greenbaum, A., McKenney, A., Du Croz, J., Hammarling, S., Demmel, J.W., Bischof, C., Sorensen, D.: LAPACK: a portable linear algebra library for high-performance computers. In: Proceedings of the ACM/IEEE Conference on Supercomputing (1990)
Intel, Math Kernel Library (MKL). http://www.intel.com/software/products/mkl/
Ashcraft, C., Grimes, R.G., Lewis, J.G.: Accurate symmetric indefinite linear equation solvers. SIAM J. Matrix Anal. Appl. 20(2), 513–561 (1998)
Baboulin, M., Dongarra, J.J., Hermann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 8 (2013)
Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014)
Baboulin, M., Li, X.S., Rouet, F.-H.: Using random butterfly transformations to avoid pivoting in sparse direct methods. In: Proceedings of International Conference on Vector and Parallel Processing (VecPar 2014), Eugene (OR), USA
Baboulin, M., Becker, D., Dongarra, J.J.: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures. In: Parallel Distributed Processing Symposium (IPDPS) (2012)
Ballard, G., Becker, D., Demmel, J., Dongarra, J., Druinsky, A., Peled, I., Schwartz, O., Toledo, S., Yamazaki, I.: Communication-avoiding symmetric-indefinite factorization. SIAM J. Matrix Anal. Appl. 35, 1364–1460 (2014)
Blackford, L., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J.W., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK Users Guide. SIAM, Philadelphia (1997)
Bunch, J.R., Parlett, B.N.: Direct methods for solving symmetric indefinite systems of linear equations. SIAM J. Numer. Anal. 8, 639–655 (1971)
Bunch, J.R., Kaufman, L.: Some stable methods for calculating inertia and solving symmetric linear systems. Math. Comput. 31, 163–179 (1977)
Ballard, G., Becker, D., Demmel, J., Dongarra, J., Druinsky, A., Peled, I., Schwartz, O., Toledo, S., Yamazaki, I.: Implementing a blocked Aasen’s algorithm with a dynamic scheduler on multicore architectures. In: Proceedings of the 27th International Symposium on Parallel and Distributed Processing, pp. 895–907 (2013)
Becker, D., Baboulin, M., Dongarra, J.: Reducing the amount of pivoting in symmetric indefinite systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 133–142. Springer, Heidelberg (2012)
Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Castaldo, A., Whaley, R.: Scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th AGM SIGPLAN Symposium on Principle and Practice of Parallel Programming, pp. 223–232 (2010)
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34, A206–A239 (2012). Technical report (UCB/EECS-2008-89), EECS Department, University of California, Berkeley
Grigori, L., Demmel, J., Xiang, H.: CALU: a communication optimal LU factorization algorithm. SIAM. J. Matrix Anal. Appl. 32(4), 1317–1350 (2011)
Gustavson, F.: Recursive leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41, 737–755 (1997)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia (2002)
Nédélec, J.-C.: Acoustic and Electromagnetic Equations. Integral Representations for Harmonic Problems. Applied Mathematical Sciences, vol. 144. Springer, New York (2001)
Parker, D.S.: Random butterfly transformations with applications in computational linear algebra. Technical report CSD-950023, UCLA Computer Science Department (1995)
Rozloz̆ník, M., Shklarski, G., Toledo, S.: Partitioned triangular tridiagonalization. ACM Trans. Math. Softw. 37(4), 1–16 (2011)
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5&6), 232–240 (2010)
Toledo, S.: Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18(4), 1065–1081 (1997)
University of Tennessee: PLASMA Users’ Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3 (2010)
Acknowledgments
The authors would like to thank the NSF grant #ACI-1339822, NVIDIA, and MathWorks for supporting this research effort. The authors are also grateful to Nicolas Zerbib (ESI Group) for his help in using test matrices from acoustics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Baboulin, M., Dongarra, J., Rémy, A., Tomov, S., Yamazaki, I. (2016). Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-32149-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)