Abstract
Automatic synthesis of efficient scientific parallel programs for supercomputers is in general a complex problem of system parallel programming. Therefore various specialized synthesis algorithms and heuristics are of use. LuNA system for automatic construction of distributed parallel programs provides a basis for accumulation of such algorithms to provide high-quality parallel programs generation in particular subject domains. If no specialized support is available in LuNA for given input, then the general synthesis algorithm is used, which does construct the required program, but its efficiency may be unsatisfactory. In the paper a specialized run-time system for LuNA is presented, which provides runtime support for dense linear algebra operations implementation on distributed memory multicomputers. Experimental results demonstrate, that automatically generated parallel programs of the class outperform corresponding ScaLAPACK library subroutines, which makes LuNA system practically applicable for generating high performance distributed parallel programs for supercomputers in the dense linear algebra application class.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sterling, T., Anderson, M., Brodowicz, M.: A survey: runtime software systems for high performance computing. Supercomput. Front. Innov. 4(1), 48–68 (2017). https://doi.org/10.14529/jsfi170103
Thoman, P., et al.: A taxonomy of task-based parallel programming technologies for high-performance computing. J. Supercomput. 74(4), 1422–1434 (2018). https://doi.org/10.1007/s11227-018-2238-4
Kale, L.V., Krishnan, S.: Charm++ a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 91–108, October 1993
Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 647–658. IEEE, November 2014
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 1–11. IEEE, November 2012.
Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, November 2015
Slaughter, E.: Regent: a high-productivity programming language for implicit parallelism with logical regions. Doctoral dissertation, Stanford University (2017)
Torres, H., Papadakis, M., Jofre Cruanyes, L.: Soleil-X: turbulence, particles, and radiation in the Regent programming language. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, pp. 1–4 (2019)
Malyshkin, V.E., Perepelkin, V.A.: LuNA fragmented programming system, main functions and peculiarities of run-time subsystem. In: Malyshkin, V. (ed.) PaCT 2011. LNCS, vol. 6873, pp. 53–61. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23178-0_5
Valkovsky, V.A., Malyshkin, V.E.: Synthesis of Parallel Programs and Systems on the Basis of Computational Models. Nauka, Novosibirsk (1988). (in Russian)
Malyshkin, V.: Active knowledge, LuNA and literacy for oncoming centuries. In: Bodei, C., Ferrari, G.-L., Priami, C. (eds.) Programming Languages with Applications to Biology and Security. LNCS, vol. 9465, pp. 292–303. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25527-9_19
Hiranandani, S., Kennedy, K., Mellor-Crummey, J., Sethi, A.: Compilation techniques for block-cyclic distributions. In: Proceedings of the 8th International Conference on Supercomputing, pp. 392–403, July 1994
Choi, J., Dongarra, J.J., Pozo, R., Walker, D.W.: ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers. In: The Fourth Symposium on the Frontiers of Massively Parallel Computation, pp. 120–121. IEEE Computer Society, January 1992
Goto, K., Van De Geijn, R.: High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 1–14 (2008). Article 4. https://doi.org/10.1145/1377603.1377607
Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurr. Comput. Practice Exp. 22(1), 15–44 (2010)
Acknowledgments
The work was supported by the budget project of the ICMMG SB RAS No. 0251-2021-0005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Belyaev, N., Perepelkin, V. (2021). High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA System. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-86359-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86358-6
Online ISBN: 978-3-030-86359-3
eBook Packages: Computer ScienceComputer Science (R0)