Improving the Performance of Task-Based Linear Algebra Software with Autotuning Techniques on Heterogeneous Architectures

Cámara, Jesús; Cuenca, Javier; Boratto, Murilo

doi:10.1007/978-3-031-35995-8_47

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14073))

Included in the following conference series:

International Conference on Computational Science

751 Accesses

Abstract

This work presents several self-optimization strategies to improve the performance of task-based linear algebra software on heterogeneous systems. The study focuses on Chameleon, a task-based dense linear algebra software whose routines are computed using a tile-based algorithmic scheme and executed in the available computing resources of the system using a scheduler which dynamically handles data dependencies among the basic computational kernels of each linear algebra routine. The proposed strategies are applied to select the best values for the parameters that affect the performance of the routines, such as the tile size or the scheduling policy, among others. Also, parallel optimized implementations provided by existing linear algebra libraries, such as Intel MKL (on multicore CPU) or cuBLAS (on GPU) are used to execute each of the computational kernels of the routines. Results obtained on a heterogeneous system composed of several multicore and multiGPU are satisfactory, with performances close to the experimental optimum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S.: Faster, cheaper, better - a hybridization methodology to develop linear algebra software for GPUs. In: GPU Computing Gems, vol. 2 (2010)
Google Scholar
Agullo, E., et al.: Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. Parallel Distrib. Syst. (2017)
Google Scholar
Agullo, E., Cámara, J., Cuenca, J., Giménez, D.: On the autotuning of task-based numerical libraries for heterogeneous architectures. In: Advances in Parallel Computing, vol. 36, pp. 157–166 (2020)
Google Scholar
Anzt, H., Haugen, B., Kurzak, J., Luszczek, P., Dongarra, J.: Experiences in autotuning matrix multiplication for energy minimization on GPUs. Concurr. Comput. Pract. Exp. 27, 5096–5113 (2015)
Article Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: STARPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)
Article Google Scholar
Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, pp. 340–347 (1997)
Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 38–53 (2009)
Article MathSciNet Google Scholar
Cámara, J., Cuenca, J., García, L.P., Giménez, D.: Empirical modelling of linear algebra shared-memory routines. Procedia Comput. Sci. 18, 110–119 (2013)
Article Google Scholar
Cámara, J., Cuenca, J., Giménez, D.: Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters. J. Supercomput. 76(12), 9922–9941 (2020). https://doi.org/10.1007/s11227-020-03235-9
Article Google Scholar
Intel MKL. http://software.intel.com/en-us/intel-mkl/
Kelefouras, V., Kritikakou, A., Goutis, C.: A matrix-matrix multiplication methodology for single/multi-core architectures using SIMD. J. Supercomput. 68(3), 1418–1440 (2014)
Article Google Scholar
Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2), 1–18 (2016)
Google Scholar
NVIDIA cuBLAS. https://docs.nvidia.com/cuda/cublas/index.html
Stanisic, L., Thibault, S., Legrand, A.: Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures. Concurr. Comput. Pract. Exp. 27, 4075–4090 (2015)
Article Google Scholar
Tomov, S., Dongarra, J.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36, 232–240 (2010)
Article MATH Google Scholar
Videau, B., et al.: BOAST: a metaprogramming framework to produce portable and efficient computing kernels for HPC applications. Int. J. High Perform. Comput. Appl. 32, 28–44 (2018)
Article Google Scholar
Whaley, C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27, 3–35 (2001)
Article MATH Google Scholar

Download references

Acknowledgements

Grant RTI2018-098156-B-C53 funded by MCIN/AEI and by “ERDF A way of making Europe”.

Author information

Authors and Affiliations

Department of Informatics, University of Valladolid, Valladolid, Spain
Jesús Cámara
Department of Engineering and Technology of Computers, University of Murcia, Murcia, Spain
Javier Cuenca
Department of Sciences, University of Bahia, Salvador, Brazil
Murilo Boratto

Authors

Jesús Cámara
View author publications
You can also search for this author in PubMed Google Scholar
Javier Cuenca
View author publications
You can also search for this author in PubMed Google Scholar
Murilo Boratto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesús Cámara .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cámara, J., Cuenca, J., Boratto, M. (2023). Improving the Performance of Task-Based Linear Algebra Software with Autotuning Techniques on Heterogeneous Architectures. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073. Springer, Cham. https://doi.org/10.1007/978-3-031-35995-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-031-35995-8_47
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35994-1
Online ISBN: 978-3-031-35995-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving the Performance of Task-Based Linear Algebra Software with Autotuning Techniques on Heterogeneous Architectures