Abstract
PLASMA is a numerical library intended as a successor to LAPACK for solving problems in dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for efficient multithreading of algorithms expressed in a serial fashion. QUARK is a superscalar scheduler and implements automatic parallelization by tracking data dependencies and resolving data hazards at runtime. Recently, this type of scheduling has been incorporated in the OpenMP standard, which allows to transition PLASMA from the proprietary solution offered by QUARK to the standard solution offered by OpenMP. This article studies the feasibility of such transition.
Similar content being viewed by others
References
Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., Rosenberg, L.: Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In: High Performance Computing for Computational Science—VECPAR 2010, pp. 129–138. Springer (2011)
Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. In: Journal of Physics: Conference Series, vol. 180, p. 012037. IOP Publishing (2009)
Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. New York (2009)
Amdahl, G.M.: Validity of the single-processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485, Atlantic City, N.J., APR 18–20 1967. AFIPS Press, Reston (1967)
Anderson, E., Dongarra, J.: Implementation guide for LAPACK. Technical Report UT-CS-90-101, University of Tennessee, Computer Science Department, LAPACK Working Note 18 (1990)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammerling, S., McKenney, A., et al.: LAPACK Users’ Guide, vol. 9. SIAM, Philadelphia (1999)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011)
Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr. Comput.: Pract. Exp. 21(18), 2438–2456 (2009)
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)
Bouwmeester, H.: Tiled algorithms for matrix computations on multicore architectures. arXiv preprint arXiv:1303.3182 (2013)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)
Castaldo, A.M., Whaley, R.: Clint: acaling lapack panel operations using parallel cache assignment. In: ACM Sigplan Notices, vol. 45, pp. 223–232. ACM (2010)
Castaldo, A.M., Whaley, R.: Clint: scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 223–232 (2010)
Donfack, S., Dongarra, J., Faverge, M., Gates, M., Kurzak, J., Luszczek, P., Yamazaki, I.: A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exp. 27(5), 1292–1309 (2015)
Dongarra, J., Kurzak, J., Luszczek, P., Yamazaki, I.: PULSAR Users’ Guide: Parallel Ultra-Light Systolic Array Runtime. Technical Report UT-EECS-14-733, EECS Department, University of Tennessee (2014)
Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.: Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exp. 26(7), 1408–1431 (2014)
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. (TOMS) 16(1), 1–17 (1990)
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OMPSS: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)
Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Weirong Z.: Parallex: a study of a new parallel computation model. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–6. IEEE (2007)
Gustafson, J.L.: Reevaluating Amdahl’s Law. Commun. ACM 31(5), 532–533 (1988)
Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. (TOMS) 38(3), 17 (2012)
Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41(6), 737–755 (1997)
Haidar, A., Kurzak, J., Luszczek, P.: An improved parallel singular value algorithm and its implementation for multicore hardware. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 90. ACM (2013)
Haidar, A., Ltaief, H., YarKhan, A., Dongarra, J.: Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exp. 24(3), 305–321 (2012)
Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: International Conference on Parallel Processing Workshops, 2009. ICPPW’09, pp. 394–401. IEEE (2009)
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, vol. 28, pp. 91–108. ACM (1993)
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the Cell processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)
Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exp. 22(1), 15–44 (2010)
OpenMP Architecture Review Board: OpenMP Application Program Interface, version 4.5 edition (2015)
Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)
Pichon, G., Haidar, A., Faverge, M., Kurzak, J.: Divide and conquer symmetric tridiagonal eigensolver for multicore architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 51–60. IEEE (2015)
Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22(5), 1762–1771 (2001)
Quintana-Ortí, G., Quintana-Ortí, E.S., Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. (TOMS) 36(3), 14 (2009)
Tillenius, M.: Superglue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. 37(6), C617–C642 (2015)
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)
YarKhan, A.: Dynamic Task Execution on Shared and Distributed Memory Architectures. PhD thesis, University of Tennessee (2012)
Zhao, Y., Hategan, M., Clifford, B., Foster, I., Von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: fast, reliable, loosely coupled parallel computation. In: Services, 2007 IEEE Congress on, pp. 199–206. IEEE (2007)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported in part by the National Science Foundation Grants Numbers: 1339822 and 1527706.
Rights and permissions
About this article
Cite this article
YarKhan, A., Kurzak, J., Luszczek, P. et al. Porting the PLASMA Numerical Library to the OpenMP Standard. Int J Parallel Prog 45, 612–633 (2017). https://doi.org/10.1007/s10766-016-0441-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0441-6