Skip to main content

A Portable and Heterogeneous LU Factorization on IRIS

  • Conference paper
  • First Online:
Book cover Euro-Par 2022: Parallel Processing Workshops (Euro-Par 2022)

Abstract

Here, the IRIS programming model is evaluated as a method to improve performance portability for heterogeneous systems that use LU matrix factorization. LU (lower-upper) factorization is considered one of the most important numerical linear algebra operations used in multiple high-performance computing and scientific applications. IRIS enables the separation of the algorithm’s definition from the tuning by using tasks + dependencies. This considerably reduces the effort required to achieve performance portability on heterogeneous systems. One IRIS code can use different settings depending on the underlying hardware features. Different configurations are evaluated on two different heterogeneous systems to achieve important speedups for the reference code with minimal changes to the source code.

J. Kim—Now at NVIDIA.

Notice: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://iris-programming.github.io/.

  2. 2.

    https://www.top500.org/.

  3. 3.

    https://software.intel.com/en-us/mkl-developer-reference-c-mkl-getrfnpi.

  4. 4.

    http://www.netlib.org/lapack/explore-html/db/def/group__complex__blas__level3_gaf33844c7fd27e5434496d2ce0c1fc9d4.html.

References

  1. Bellavia, S., Morini, B., Porcelli, M.: New updates of incomplete LU factorizations and applications to large nonlinear systems. Optim. Methods Softw. 29(2), 321–340 (2014). https://doi.org/10.1080/10556788.2012.762517

    Article  MathSciNet  MATH  Google Scholar 

  2. Eickhoff, K.M., Engl, W.L.: Levelized incomplete LU factorization and its application to large-scale circuit simulation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14(6), 720–727 (1995). https://doi.org/10.1109/43.387732

    Article  Google Scholar 

  3. Luciani, X., Albera, L.: Joint eigenvalue decomposition of non-defective matrices based on the LU factorization with application to ICA. IEEE Trans. Signal Process. 63(17), 4594–4608 (2015). https://doi.org/10.1109/TSP.2015.2440219

    Article  MathSciNet  MATH  Google Scholar 

  4. Kudo, S., Nitadori, K., Ina, T., Imamura, T.: Implementation and numerical techniques for one eflop/s HPL-AI benchmark on fugaku. In: 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2020, Atlanta, GA, USA, 13 November 2020, pp. 69–76. IEEE (2020). https://doi.org/10.1109/ScalA51936.2020.00014

  5. Gan, X., et al.: Customizing the HPL for china accelerator. Sci. China Inf. Sci. 61(4), 042 102:1-042 102:11 (2018). https://doi.org/10.1007/s11432-017-9221-0

    Article  Google Scholar 

  6. Kim, J., Lee, S., Johnston, B., Vetter, J.S.: IRIS: a portable runtime system exploiting multiple heterogeneous programming systems. In: Proceedings of the 25th IEEE High Performance Extreme Computing Conference, ser. HPEC 2021, pp. 1–8 (2021)

    Google Scholar 

  7. Valero-Lara, P., Catalán, S., Martorell, X., Usui, T., Labarta, J.: slass: a fully automatic auto-tuned linear algebra library based on openmp extensions implemented in ompss (lass library). J. Parallel Distributed Comput. 138, 153–171 (2020)

    Article  Google Scholar 

  8. Valero-Lara, P., Catalán, S., Martorell, X., Labarta, J.: BLAS-3 optimized by ompss regions (lass library). In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019, Pavia, Italy, 13–15 February 2019, pp. 25–32. IEEE (2019)

    Google Scholar 

  9. Dongarra, J.J., et al.: PLASMA: parallel linear algebra software for multicore using openmp. ACM Trans. Math. Softw. 45(2), 16:1-16:35 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  10. Valero-Lara, P., Martínez-Pérez, I., Sirvent, R., Martorell, X., Peña, A.J.: NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuThomasBatch. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10777, pp. 243–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78024-5_22

    Chapter  Google Scholar 

  11. Valero-Lara, P., Martínez-Pérez, I., Sirvent, R., Martorell, X., Peña, A.J.: cuThomasBatch and cuThomasVBatch, CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs. Concurr. Comput. Pract. Exp. 30(24), e4909 (2018)

    Article  Google Scholar 

  12. Valero-Lara, P., Pinelli, A., Favier, J., Matias, M.P.: Block tridiagonal solvers on heterogeneous architectures. In: IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, ser. ISPA 2012, pp. 609–616 (2012)

    Google Scholar 

  13. Valero-Lara, P., Pinelli, A., Prieto-Matias, M.: Fast finite difference Poisson solvers on heterogeneous architectures. Comput. Phys. Commun. 185(4), 1265–1272 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Demmel, J.W., Gilbert, J.R., Li, X.S.: An asynchronous parallel supernodal algorithm for sparse gaussian elimination. SIAM J. Matrix Anal. Appl. 20(4), 915–952 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distributed Syst. 33(4), 805–817 (2022). https://doi.org/10.1109/TPDS.2021.3097283

    Article  Google Scholar 

  16. Beckingsale, D., Hornung, R.D., Scogland, T., Vargas, A.: Performance portable C++ programming with RAJA. In: Hollingsworth, J.K., Keidar, I. (eds.) Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, DC, USA, 16–20 February 2019, pp. 455–456. ACM (2019)

    Google Scholar 

  17. Valero-Lara, P., Jansson, J.: Heterogeneous CPU+GPU approaches for mesh refinement over lattice-boltzmann simulations. Concurr. Comput. Pract. Exp. 29(7), e3919 (2017)

    Article  Google Scholar 

  18. Valero-Lara, P., Igual, F.D., Prieto-Matías, M., Pinelli, A., Favier, J.: Accelerating fluid-solid simulations (lattice-boltzmann & immersed-boundary) on heterogeneous architectures. J. Comput. Sci. 10, 249–261 (2015)

    Article  Google Scholar 

  19. Valero-Lara, P., Kim, J., Hernandez, O., Vetter, J.S.: Openmp target task: tasking and target offloading on heterogeneous systems. In: Chaves, R., et al. (eds.) Euro-Par 2021. LNCS, vol. 13098, pp. 445–455. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-06156-1_35

    Chapter  Google Scholar 

  20. Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Technical report, 2008-01 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Valero-Lara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Valero-Lara, P., Kim, J., Vetter, J.S. (2023). A Portable and Heterogeneous LU Factorization on IRIS. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31209-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31208-3

  • Online ISBN: 978-3-031-31209-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics