Porting the PLASMA Numerical Library to the OpenMP Standard

YarKhan, Asim; Kurzak, Jakub; Luszczek, Piotr; Dongarra, Jack

doi:10.1007/s10766-016-0441-6

Porting the PLASMA Numerical Library to the OpenMP Standard

Published: 14 June 2016

Volume 45, pages 612–633, (2017)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Asim YarKhan¹,
Jakub Kurzak¹,
Piotr Luszczek¹ &
…
Jack Dongarra¹

862 Accesses
26 Citations
Explore all metrics

Abstract

PLASMA is a numerical library intended as a successor to LAPACK for solving problems in dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for efficient multithreading of algorithms expressed in a serial fashion. QUARK is a superscalar scheduler and implements automatic parallelization by tracking data dependencies and resolving data hazards at runtime. Recently, this type of scheduling has been incorporated in the OpenMP standard, which allows to transition PLASMA from the proprietary solution offered by QUARK to the standard solution offered by OpenMP. This article studies the feasibility of such transition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App

CGYRO Performance on Power9 CPUs and Volta GPUs

Hybrid Parallelization of Particle in Cell Monte Carlo Collision (PIC-MCC) Algorithm for Simulation of Low Temperature Plasmas

References

Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., Rosenberg, L.: Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. In: High Performance Computing for Computational Science—VECPAR 2010, pp. 129–138. Springer (2011)
Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. In: Journal of Physics: Conference Series, vol. 180, p. 012037. IOP Publishing (2009)
Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. New York (2009)
Amdahl, G.M.: Validity of the single-processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485, Atlantic City, N.J., APR 18–20 1967. AFIPS Press, Reston (1967)
Anderson, E., Dongarra, J.: Implementation guide for LAPACK. Technical Report UT-CS-90-101, University of Tennessee, Computer Science Department, LAPACK Working Note 18 (1990)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammerling, S., McKenney, A., et al.: LAPACK Users’ Guide, vol. 9. SIAM, Philadelphia (1999)
Book MATH Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011)
Article Google Scholar
Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr. Comput.: Pract. Exp. 21(18), 2438–2456 (2009)
Article Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)
Article Google Scholar
Bouwmeester, H.: Tiled algorithms for matrix computations on multicore architectures. arXiv preprint arXiv:1303.3182 (2013)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)
Article MathSciNet Google Scholar
Castaldo, A.M., Whaley, R.: Clint: acaling lapack panel operations using parallel cache assignment. In: ACM Sigplan Notices, vol. 45, pp. 223–232. ACM (2010)
Castaldo, A.M., Whaley, R.: Clint: scaling LAPACK panel operations using parallel cache assignment. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 223–232 (2010)
Donfack, S., Dongarra, J., Faverge, M., Gates, M., Kurzak, J., Luszczek, P., Yamazaki, I.: A survey of recent developments in parallel implementations of Gaussian elimination. Concurr. Comput.: Pract. Exp. 27(5), 1292–1309 (2015)
Article Google Scholar
Dongarra, J., Kurzak, J., Luszczek, P., Yamazaki, I.: PULSAR Users’ Guide: Parallel Ultra-Light Systolic Array Runtime. Technical Report UT-EECS-14-733, EECS Department, University of Tennessee (2014)
Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.: Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurr. Comput.: Pract. Exp. 26(7), 1408–1431 (2014)
Article Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. (TOMS) 16(1), 1–17 (1990)
Article MATH Google Scholar
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OMPSS: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)
Article MathSciNet Google Scholar
Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Weirong Z.: Parallex: a study of a new parallel computation model. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–6. IEEE (2007)
Gustafson, J.L.: Reevaluating Amdahl’s Law. Commun. ACM 31(5), 532–533 (1988)
Article Google Scholar
Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. (TOMS) 38(3), 17 (2012)
Article Google Scholar
Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev. 41(6), 737–755 (1997)
Article Google Scholar
Haidar, A., Kurzak, J., Luszczek, P.: An improved parallel singular value algorithm and its implementation for multicore hardware. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 90. ACM (2013)
Haidar, A., Ltaief, H., YarKhan, A., Dongarra, J.: Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurr. Comput.: Pract. Exp. 24(3), 305–321 (2012)
Article Google Scholar
Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: International Conference on Parallel Processing Workshops, 2009. ICPPW’09, pp. 394–401. IEEE (2009)
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, vol. 28, pp. 91–108. ACM (1993)
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the Cell processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)
Article Google Scholar
Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exp. 22(1), 15–44 (2010)
Article Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface, version 4.5 edition (2015)
Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)
Article Google Scholar
Pichon, G., Haidar, A., Faverge, M., Kurzak, J.: Divide and conquer symmetric tridiagonal eigensolver for multicore architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 51–60. IEEE (2015)
Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22(5), 1762–1771 (2001)
Article MathSciNet MATH Google Scholar
Quintana-Ortí, G., Quintana-Ortí, E.S., Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. (TOMS) 36(3), 14 (2009)
Article MathSciNet Google Scholar
Tillenius, M.: Superglue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. 37(6), C617–C642 (2015)
Article MathSciNet MATH Google Scholar
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)
Article Google Scholar
YarKhan, A.: Dynamic Task Execution on Shared and Distributed Memory Architectures. PhD thesis, University of Tennessee (2012)
Zhao, Y., Hategan, M., Clifford, B., Foster, I., Von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: fast, reliable, loosely coupled parallel computation. In: Services, 2007 IEEE Congress on, pp. 199–206. IEEE (2007)

Download references

Author information

Authors and Affiliations

Electrical Engineering and Computer Science, University of Tennessee, 1122 Volunteer Blvd, Ste 203 Claxton, Knoxville, TN, 37996, USA
Asim YarKhan, Jakub Kurzak, Piotr Luszczek & Jack Dongarra

Authors

Asim YarKhan
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Kurzak
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Luszczek
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Kurzak.

Additional information

This work has been supported in part by the National Science Foundation Grants Numbers: 1339822 and 1527706.

Rights and permissions

Reprints and permissions

About this article

Cite this article

YarKhan, A., Kurzak, J., Luszczek, P. et al. Porting the PLASMA Numerical Library to the OpenMP Standard. Int J Parallel Prog 45, 612–633 (2017). https://doi.org/10.1007/s10766-016-0441-6

Download citation

Received: 04 January 2016
Accepted: 31 May 2016
Published: 14 June 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10766-016-0441-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Porting the PLASMA Numerical Library to the OpenMP Standard

Abstract

Access this article

Similar content being viewed by others

Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App

CGYRO Performance on Power9 CPUs and Volta GPUs

Hybrid Parallelization of Particle in Cell Monte Carlo Collision (PIC-MCC) Algorithm for Simulation of Low Temperature Plasmas

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Porting the PLASMA Numerical Library to the OpenMP Standard

Abstract

Access this article

Similar content being viewed by others

Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App

CGYRO Performance on Power9 CPUs and Volta GPUs

Hybrid Parallelization of Particle in Cell Monte Carlo Collision (PIC-MCC) Algorithm for Simulation of Low Temperature Plasmas

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation