Skip to main content

Improving the Performance of Task-Based Linear Algebra Software with Autotuning Techniques on Heterogeneous Architectures

  • Conference paper
  • First Online:
Computational Science – ICCS 2023 (ICCS 2023)

Abstract

This work presents several self-optimization strategies to improve the performance of task-based linear algebra software on heterogeneous systems. The study focuses on Chameleon, a task-based dense linear algebra software whose routines are computed using a tile-based algorithmic scheme and executed in the available computing resources of the system using a scheduler which dynamically handles data dependencies among the basic computational kernels of each linear algebra routine. The proposed strategies are applied to select the best values for the parameters that affect the performance of the routines, such as the tile size or the scheduling policy, among others. Also, parallel optimized implementations provided by existing linear algebra libraries, such as Intel MKL (on multicore CPU) or cuBLAS (on GPU) are used to execute each of the computational kernels of the routines. Results obtained on a heterogeneous system composed of several multicore and multiGPU are satisfactory, with performances close to the experimental optimum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S.: Faster, cheaper, better - a hybridization methodology to develop linear algebra software for GPUs. In: GPU Computing Gems, vol. 2 (2010)

    Google Scholar 

  2. Agullo, E., et al.: Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. Parallel Distrib. Syst. (2017)

    Google Scholar 

  3. Agullo, E., Cámara, J., Cuenca, J., Giménez, D.: On the autotuning of task-based numerical libraries for heterogeneous architectures. In: Advances in Parallel Computing, vol. 36, pp. 157–166 (2020)

    Google Scholar 

  4. Anzt, H., Haugen, B., Kurzak, J., Luszczek, P., Dongarra, J.: Experiences in autotuning matrix multiplication for energy minimization on GPUs. Concurr. Comput. Pract. Exp. 27, 5096–5113 (2015)

    Article  Google Scholar 

  5. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: STARPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)

    Article  Google Scholar 

  6. Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, pp. 340–347 (1997)

    Google Scholar 

  7. Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 38–53 (2009)

    Article  MathSciNet  Google Scholar 

  8. Cámara, J., Cuenca, J., García, L.P., Giménez, D.: Empirical modelling of linear algebra shared-memory routines. Procedia Comput. Sci. 18, 110–119 (2013)

    Article  Google Scholar 

  9. Cámara, J., Cuenca, J., Giménez, D.: Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters. J. Supercomput. 76(12), 9922–9941 (2020). https://doi.org/10.1007/s11227-020-03235-9

    Article  Google Scholar 

  10. Intel MKL. http://software.intel.com/en-us/intel-mkl/

  11. Kelefouras, V., Kritikakou, A., Goutis, C.: A matrix-matrix multiplication methodology for single/multi-core architectures using SIMD. J. Supercomput. 68(3), 1418–1440 (2014)

    Article  Google Scholar 

  12. Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2), 1–18 (2016)

    Google Scholar 

  13. NVIDIA cuBLAS. https://docs.nvidia.com/cuda/cublas/index.html

  14. Stanisic, L., Thibault, S., Legrand, A.: Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures. Concurr. Comput. Pract. Exp. 27, 4075–4090 (2015)

    Article  Google Scholar 

  15. Tomov, S., Dongarra, J.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36, 232–240 (2010)

    Article  MATH  Google Scholar 

  16. Videau, B., et al.: BOAST: a metaprogramming framework to produce portable and efficient computing kernels for HPC applications. Int. J. High Perform. Comput. Appl. 32, 28–44 (2018)

    Article  Google Scholar 

  17. Whaley, C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27, 3–35 (2001)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

Grant RTI2018-098156-B-C53 funded by MCIN/AEI and by “ERDF A way of making Europe”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesús Cámara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cámara, J., Cuenca, J., Boratto, M. (2023). Improving the Performance of Task-Based Linear Algebra Software with Autotuning Techniques on Heterogeneous Architectures. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073. Springer, Cham. https://doi.org/10.1007/978-3-031-35995-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35995-8_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35994-1

  • Online ISBN: 978-3-031-35995-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics