Abstract
The tile QR factorization provides an efficient and scalable way for factoring a dense matrix in parallel on multicore processors. This article presents a way of efficiently implementing the algorithm on a system with a powerful GPU and many multicore CPUs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
MAGMA, http://icl.cs.utk.edu/magma/
PLASMA, http://icl.cs.utk.edu/plasma/
The Jade Parallel Programming Language, http://suif.stanford.edu/jade.html
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.: StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency Computat. Pract. Exper. (2010) (to appear)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.J.: Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper. 20(13), 1573–1590 (2008)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. Syst. Appl. 35, 38–53 (2009)
Kurzak, J., Dongarra, J.J.: QR factorization for the CELL processor. Scientific Programming, 1–12 (2008)
Kurzak, J., Ltaief, H., Dongarra, J.J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurrency Computat.: Pract. Exper. 21(1), 15–44 (2009)
Li, Y., Dongarra, J., Tomov, S.: A Note on Auto-Tuning GEMM for GPUs. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5544, pp. 884–892. Springer, Heidelberg (2009)
Nath, R., Tomov, S., Dongarra, J.: Accelerating GPU Kernels for Dense Linear Algebra. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 83–92. Springer, Heidelberg (2011)
Planas, J., Badia, R.M., Ayguad, E., Labarta, J.: Hierarchical task-based programming with StarSs. Int. J. High Perf. Comput. Applic. 23(3), 284–299 (2009)
Rinard, M.C., Lam, M.S.: The design, implementation, and evaluation of Jade. ACM Trans. Programming Lang. Syst. 20(3), 483–545 (1998)
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid gpu accelerated manycore systems. Parellel Comput. Syst. Appl. 36(5-6), 232–240 (2010)
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Proceedings of the 2010 IEEE International Parallel & Distributed Processing Symposium, IPDPS 2010, April 19-23, pp. 1–8. IEEE Computer Society, Atlanta (2010)
Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parellel Comput. Syst. Appl. 27(1-2), 3–35 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kurzak, J., Nath, R., Du, P., Dongarra, J. (2012). An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28145-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-28145-7_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28144-0
Online ISBN: 978-3-642-28145-7
eBook Packages: Computer ScienceComputer Science (R0)