Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators

Gogolenko, Sergiy; Bai, Zhaojun; Scalettar, Richard

doi:10.1007/978-3-319-09873-9_44

Sergiy Gogolenko¹⁶,
Zhaojun Bai¹⁷ &
Richard Scalettar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2717 Accesses
2 Citations

Abstract

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices. We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF based inversion algorithm attains up to 90% of DGEMM performance on hybrid CPU+GPU systems.

This work was supported by the National Science Foundation under grant NSF-PHY-1005503. SG would like to thank the Fulbright Program Office in Ukraine and the Institute of International Education for financial support during this study. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Download to read the full chapter text

Chapter PDF

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Keywords

References

Bai, Z., Chen, W., Scalettar, R., Yamazaki, I.: Numerical methods for Quantum Monte Carlo simulations of the Hubbard model. In: Hou, T.Y., Liu, C., Liu, J.G. (eds.) Multi-Scale Phenomena in Complex Fluids. Contemporary Applied Mathematics, ch. 1, vol. 12, pp. 1–100. World Scientific (2009)
Google Scholar
Ernst, O.G.: Equivalent iterative methods for p-cyclic matrices. Numerical Algorithms 25(1-4), 161–180 (2000)
Article MathSciNet MATH Google Scholar
Fairweather, G., Gladwell, I.: Algorithms for almost block diagonal linear systems. SIAM Review 46(1), 49–58 (2004)
Article MathSciNet MATH Google Scholar
Gogolenko, S., Bai, Z.: A structured orthogonal inversion of block p-cyclic matrices on multicores with GPU accelerators. Tech. Rep. CSE-2013-78, CS Dept., UC Davis (2013), http://www.cs.ucdavis.edu/research/tech-reports/2012/CSE-2013-78.pdf
Khabou, A., Demmel, J., Grigori, L., Gu, M.: LU factorization with panel rank revealing pivoting and its communication avoiding version. SIAM J. Matrix Analysis Applications 34(3), 1401–1429 (2013)
Article MathSciNet MATH Google Scholar
Tomas, A., Chang, C.C., Scalettar, R., Bai, Z.: Advancing large scale many-body QMC simulations on GPU accelerated multicore systems. In: Proceedings of IPDPSW 2012, pp. 308–319. IEEE, Washington, DC (2012)
Google Scholar
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Proceedings of IPDPSW 2010, pp. 1–8. IEEE, Atlanta (2010)
Google Scholar
Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. UCB/EECS-2008-49, EECS Dept., UC Berkeley (2008)
Google Scholar
Wright, S.J.: Stable parallel algorithms for two-point boundary value problems. SIAM J. Sci. Stat. Comput. 13(3), 742–764 (1992)
Article MATH Google Scholar
Wright, S.J.: A collection of problems for which Gaussian elimination with partial pivoting is unstable. SIAM J. Sci. Comput. 14(1), 231–238 (1993)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Donetsk National Technical University, Donetsk, 83001, Ukraine
Sergiy Gogolenko
University of California, Davis, CA, 95616, USA
Zhaojun Bai & Richard Scalettar

Authors

Sergiy Gogolenko
View author publications
You can also search for this author in PubMed Google Scholar
Zhaojun Bai
View author publications
You can also search for this author in PubMed Google Scholar
Richard Scalettar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gogolenko, S., Bai, Z., Scalettar, R. (2014). Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators

Abstract

Chapter PDF

Similar content being viewed by others

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators

Abstract

Chapter PDF

Similar content being viewed by others

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation