Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

doi:10.1007/s11227-016-1654-6

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Published: 05 February 2016

Volume 73, pages 139–151, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sandra Catalán ORCID: orcid.org/0000-0002-9321-2728¹,
Francisco D. Igual²,
Rafael Mayo¹,
Rafael Rodríguez-Sánchez¹ &
…
Enrique S. Quintana-Ortí¹

359 Accesses
3 Citations
Explore all metrics

Abstract

We present accurate time and energy piece-wise models of high-performance multi-threaded implementations for the general matrix multiplication, triangular system solve with multiple right-hand sides, and symmetric rank-k update. Furthermore, these are then assembled to provide accurate models of the Cholesky factorization built on top of these Level-3 BLAS operations. Our models consider the costs, in terms of time and energy, of the floating-point operations involved in the routines as well as the overhead due to data movements across the levels of the memory hierarchy. The accuracy of the multi-threaded models is tested on an Intel Xeon E5-2620 processor, reporting relative errors for the Cholesky factorization that are, respectively, around 2.4 and 2.9 % on average for time and energy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures

References

Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley, Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Electrical Engineering and Computer Sciences
Yotov K, Li X, Garzarán MJ, Padua D, Pingali K, Stodghill P (2005) Is search really necessary to generate high-performance BLAS? In: Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol 93, no 2
Low TM, Igual FD, Smith TM, Quintana-Ortí ES (2015) Analytical modeling is enough for high performance BLIS, Tech. Rep. FLAWN #74, Department of Computer Sciences, The University of Texas at Austin. ACM Trans. Math. Softw. http://www.cs.utexas.edu/users/flame/
Choi JW, Bedard D, Fowler R, Vuduc R (2013) A roofline model of energy. In: Parallel distributed processing (IPDPS), 2013 IEEE 27th international symposium on, 2013, pp 661–672. doi:10.1109/IPDPS.2013.77
Bertran R, Gonzalez M, Martorell X, Navarro N, Ayguade E (2010) Decomposable and responsive power models for multicore processors using performance counters. In: Proceedings of the 24th ACM Int. conference on supercomputing, ICS ’10, 2010, pp 147–158
Goel B, McKee S, Gioiosa R, Singh K, Bhadauria M, Cesati M (2010) Portable, scalable, per-core power estimation for intelligent resource management. In: Int. Conf. on Green Computing, pp 135–146
Kestor G, Gioiosa R, Kerbyson DJ, Hoisie A (2013) Quantifying the energy cost of data movement in scientific applications. In: IEEE Int. Symp. on Workload Characterization (IISWC), pp 56–65
Liu Q, Moreto M, Jimenez V, Abella J, Cazorla F, Valero M (2013) Hardware support for accurate per-task energy metering in multicore systems. ACM Trans Archit Code Optim 10(4):34:1–34:27
Article Google Scholar
Van Zee FG, van de Geijn RA (2015) BLIS: a framework for generating BLAS-like libraries. ACM Trans Math Softw 41(3):14:1–14:33
Article MATH Google Scholar
Intel Corp., Intel math kernel library (MKL) 11.0. http://software.intel.com/en-us/intel-mkl
AMD (2012) AMD Core Math Library. http://developer.amd.com/tools/cpu/acml/pages/default.aspx
Alonso P, Catalán S, Igual FD, Mayo R, Rodríguez-Sánchez R, Quintana-Ortí ES (2015) Time and energy modeling of high-performance level-3 BLAS on x86 architectures. Simul Model Pract Theory 55:77–94
Article Google Scholar
Catalán S, Igual FD, Mayo R, Rodríguez-Sánchez R, Quintana-Ortí ES (2015) Time and energy modeling of high-performance multi-threaded matrix multiplication. In: 15th International conference computational and mathematical methods in science and engineering—CMMSE 2015, vol 1, pp 311–316
Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
MATH Google Scholar
David H, Gorbatov E, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE International symposium on low-power electronics and design (ISLPED), pp 189–194
Kågström B, Ling P, van Loan C (1998) Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. ACM Trans Math Softw 24(3):268–302
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by the CICYT projects TIN2014-53495-R and CICYT-TIN 2012-32180 of the MINECO and FEDER, the EU FET Project FP7 318793 “EXA2GREEN”, and the FPU program of MECD.

Author information

Authors and Affiliations

Depto. Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain
Sandra Catalán, Rafael Mayo, Rafael Rodríguez-Sánchez & Enrique S. Quintana-Ortí
Depto. de Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, Spain
Francisco D. Igual

Authors

Sandra Catalán
View author publications
You can also search for this author in PubMed Google Scholar
Francisco D. Igual
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Mayo
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Rodríguez-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Catalán.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Catalán, S., Igual, F.D., Mayo, R. et al. Time and energy modeling of a high-performance multi-threaded Cholesky factorization. J Supercomput 73, 139–151 (2017). https://doi.org/10.1007/s11227-016-1654-6

Download citation

Published: 05 February 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11227-016-1654-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Abstract

Access this article

Similar content being viewed by others

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Abstract

Access this article

Similar content being viewed by others

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation