Abstract
We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices. We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF based inversion algorithm attains up to 90% of DGEMM performance on hybrid CPU+GPU systems.
This work was supported by the National Science Foundation under grant NSF-PHY-1005503. SG would like to thank the Fulbright Program Office in Ukraine and the Institute of International Education for financial support during this study. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Chapter PDF
Similar content being viewed by others
Keywords
References
Bai, Z., Chen, W., Scalettar, R., Yamazaki, I.: Numerical methods for Quantum Monte Carlo simulations of the Hubbard model. In: Hou, T.Y., Liu, C., Liu, J.G. (eds.) Multi-Scale Phenomena in Complex Fluids. Contemporary Applied Mathematics, ch. 1, vol. 12, pp. 1–100. World Scientific (2009)
Ernst, O.G.: Equivalent iterative methods for p-cyclic matrices. Numerical Algorithms 25(1-4), 161–180 (2000)
Fairweather, G., Gladwell, I.: Algorithms for almost block diagonal linear systems. SIAM Review 46(1), 49–58 (2004)
Gogolenko, S., Bai, Z.: A structured orthogonal inversion of block p-cyclic matrices on multicores with GPU accelerators. Tech. Rep. CSE-2013-78, CS Dept., UC Davis (2013), http://www.cs.ucdavis.edu/research/tech-reports/2012/CSE-2013-78.pdf
Khabou, A., Demmel, J., Grigori, L., Gu, M.: LU factorization with panel rank revealing pivoting and its communication avoiding version. SIAM J. Matrix Analysis Applications 34(3), 1401–1429 (2013)
Tomas, A., Chang, C.C., Scalettar, R., Bai, Z.: Advancing large scale many-body QMC simulations on GPU accelerated multicore systems. In: Proceedings of IPDPSW 2012, pp. 308–319. IEEE, Washington, DC (2012)
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Proceedings of IPDPSW 2010, pp. 1–8. IEEE, Atlanta (2010)
Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. UCB/EECS-2008-49, EECS Dept., UC Berkeley (2008)
Wright, S.J.: Stable parallel algorithms for two-point boundary value problems. SIAM J. Sci. Stat. Comput. 13(3), 742–764 (1992)
Wright, S.J.: A collection of problems for which Gaussian elimination with partial pivoting is unstable. SIAM J. Sci. Comput. 14(1), 231–238 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Gogolenko, S., Bai, Z., Scalettar, R. (2014). Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)