Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures

Baboulin, Marc; Dongarra, Jack; Rémy, Adrien; Tomov, Stanimire; Yamazaki, Ichitaro

doi:10.1007/978-3-319-32149-3_9

Marc Baboulin⁷,
Jack Dongarra⁸,
Adrien Rémy⁷,
Stanimire Tomov⁸ &
…
Ichitaro Yamazaki⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9573))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1228 Accesses
4 Citations

Abstract

We study the performance of dense symmetric indefinite factorizations (Bunch-Kaufman and Aasen’s algorithms) on multicore CPUs with a Graphics Processing Unit (GPU). Though such algorithms are needed in many scientific and engineering simulations, obtaining high performance of the factorization on the GPU is difficult because the pivoting that is required to ensure the numerical stability of the factorization leads to frequent synchronizations and irregular data accesses. As a result, until recently, there has not been any implementation of these algorithms on hybrid CPU/GPU architectures. To improve their performance on the hybrid architecture, we explore different techniques to reduce the expensive communication and synchronization between the CPU and GPU, or on the GPU. We also study the performance of an \(LDL^T\) factorization with no pivoting combined with the preprocessing technique based on Random Butterfly Transformations. Though such transformations only have probabilistic results on the numerical stability, they avoid the pivoting and obtain a great performance on the GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A Bunch-Kaufman implementation became recently available in the cuSolver library as part of the CUDA Toolkit v7.5 from NVIDIA.

References

Aasen, J.: On the reduction of a symmetric matrix to tridiagonal form. BIT 11, 233–242 (1971)
Article MathSciNet Google Scholar
Anderson, E., Bai, Z., Dongarra, J.J., Greenbaum, A., McKenney, A., Du Croz, J., Hammarling, S., Demmel, J.W., Bischof, C., Sorensen, D.: LAPACK: a portable linear algebra library for high-performance computers. In: Proceedings of the ACM/IEEE Conference on Supercomputing (1990)
Google Scholar
Intel, Math Kernel Library (MKL). http://www.intel.com/software/products/mkl/
Ashcraft, C., Grimes, R.G., Lewis, J.G.: Accurate symmetric indefinite linear equation solvers. SIAM J. Matrix Anal. Appl. 20(2), 513–561 (1998)
Article MathSciNet Google Scholar
Baboulin, M., Dongarra, J.J., Hermann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 8 (2013)
Article MathSciNet Google Scholar
Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014)
Article MathSciNet Google Scholar
Baboulin, M., Li, X.S., Rouet, F.-H.: Using random butterfly transformations to avoid pivoting in sparse direct methods. In: Proceedings of International Conference on Vector and Parallel Processing (VecPar 2014), Eugene (OR), USA
Google Scholar
Baboulin, M., Becker, D., Dongarra, J.J.: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures. In: Parallel Distributed Processing Symposium (IPDPS) (2012)
Google Scholar
Ballard, G., Becker, D., Demmel, J., Dongarra, J., Druinsky, A., Peled, I., Schwartz, O., Toledo, S., Yamazaki, I.: Communication-avoiding symmetric-indefinite factorization. SIAM J. Matrix Anal. Appl. 35, 1364–1460 (2014)
Article MathSciNet Google Scholar
Blackford, L., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J.W., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK Users Guide. SIAM, Philadelphia (1997)
Book Google Scholar
Bunch, J.R., Parlett, B.N.: Direct methods for solving symmetric indefinite systems of linear equations. SIAM J. Numer. Anal. 8, 639–655 (1971)
Article MathSciNet Google Scholar
Bunch, J.R., Kaufman, L.: Some stable methods for calculating inertia and solving symmetric linear systems. Math. Comput. 31, 163–179 (1977)
Article MathSciNet Google Scholar
Ballard, G., Becker, D., Demmel, J., Dongarra, J., Druinsky, A., Peled, I., Schwartz, O., Toledo, S., Yamazaki, I.: Implementing a blocked Aasen’s algorithm with a dynamic scheduler on multicore architectures. In: Proceedings of the 27th International Symposium on Parallel and Distributed Processing, pp. 895–907 (2013)
Google Scholar
Becker, D., Baboulin, M., Dongarra, J.: Reducing the amount of pivoting in symmetric indefinite systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 133–142. Springer, Heidelberg (2012)
Chapter Google Scholar
Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Book Google Scholar
Castaldo, A., Whaley, R.: Scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th AGM SIGPLAN Symposium on Principle and Practice of Parallel Programming, pp. 223–232 (2010)
Google Scholar
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34, A206–A239 (2012). Technical report (UCB/EECS-2008-89), EECS Department, University of California, Berkeley
Article MathSciNet Google Scholar
Grigori, L., Demmel, J., Xiang, H.: CALU: a communication optimal LU factorization algorithm. SIAM. J. Matrix Anal. Appl. 32(4), 1317–1350 (2011)
Article MathSciNet Google Scholar
Gustavson, F.: Recursive leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41, 737–755 (1997)
Article Google Scholar
Higham, N.J.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia (2002)
Book Google Scholar
Nédélec, J.-C.: Acoustic and Electromagnetic Equations. Integral Representations for Harmonic Problems. Applied Mathematical Sciences, vol. 144. Springer, New York (2001)
Book Google Scholar
Parker, D.S.: Random butterfly transformations with applications in computational linear algebra. Technical report CSD-950023, UCLA Computer Science Department (1995)
Google Scholar
Rozloz̆ník, M., Shklarski, G., Toledo, S.: Partitioned triangular tridiagonalization. ACM Trans. Math. Softw. 37(4), 1–16 (2011)
Article MathSciNet Google Scholar
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5&6), 232–240 (2010)
Article Google Scholar
Toledo, S.: Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18(4), 1065–1081 (1997)
Article MathSciNet Google Scholar
University of Tennessee: PLASMA Users’ Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3 (2010)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the NSF grant #ACI-1339822, NVIDIA, and MathWorks for supporting this research effort. The authors are also grateful to Nicolas Zerbib (ESI Group) for his help in using test matrices from acoustics.

Author information

Authors and Affiliations

University of Paris-Sud and Inria, Orsay, France
Marc Baboulin & Adrien Rémy
University of Tennessee, Knoxville, USA
Jack Dongarra, Stanimire Tomov & Ichitaro Yamazaki

Authors

Marc Baboulin
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Rémy
View author publications
You can also search for this author in PubMed Google Scholar
Stanimire Tomov
View author publications
You can also search for this author in PubMed Google Scholar
Ichitaro Yamazaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Baboulin .

Editor information

Editors and Affiliations

Czestochowa University of Technolog, Czestochowa, Poland
Roman Wyrzykowski
Department of Computer Science, University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Electrical Engineering & Comput. Science, University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
Czestochowa University of Technology, Institute of Computer & Information Sci., Czestochowa, Poland
Konrad Karczewski
Department of Computer Science, AGH University of Science and Technology, Krakow, Poland
Jacek Kitowski
Systèmes d’informations, Big Data et Rec, AGH University of Science and Technology, Krakow, Poland
Kazimierz Wiatr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baboulin, M., Dongarra, J., Rémy, A., Tomov, S., Yamazaki, I. (2016). Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-32149-3_9
Published: 02 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics