Skip to main content

Advertisement

Log in

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We present accurate time and energy piece-wise models of high-performance multi-threaded implementations for the general matrix multiplication, triangular system solve with multiple right-hand sides, and symmetric rank-k update. Furthermore, these are then assembled to provide accurate models of the Cholesky factorization built on top of these Level-3 BLAS operations. Our models consider the costs, in terms of time and energy, of the floating-point operations involved in the routines as well as the overhead due to data movements across the levels of the memory hierarchy. The accuracy of the multi-threaded models is tested on an Intel Xeon E5-2620 processor, reporting relative errors for the Cholesky factorization that are, respectively, around 2.4 and 2.9 % on average for time and energy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley, Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Electrical Engineering and Computer Sciences

  2. Yotov K, Li X, Garzarán MJ, Padua D, Pingali K, Stodghill P (2005) Is search really necessary to generate high-performance BLAS? In: Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol 93, no 2

  3. Low TM, Igual FD, Smith TM, Quintana-Ortí ES (2015) Analytical modeling is enough for high performance BLIS, Tech. Rep. FLAWN #74, Department of Computer Sciences, The University of Texas at Austin. ACM Trans. Math. Softw. http://www.cs.utexas.edu/users/flame/

  4. Choi JW, Bedard D, Fowler R, Vuduc R (2013) A roofline model of energy. In: Parallel distributed processing (IPDPS), 2013 IEEE 27th international symposium on, 2013, pp 661–672. doi:10.1109/IPDPS.2013.77

  5. Bertran R, Gonzalez M, Martorell X, Navarro N, Ayguade E (2010) Decomposable and responsive power models for multicore processors using performance counters. In: Proceedings of the 24th ACM Int. conference on supercomputing, ICS ’10, 2010, pp 147–158

  6. Goel B, McKee S, Gioiosa R, Singh K, Bhadauria M, Cesati M (2010) Portable, scalable, per-core power estimation for intelligent resource management. In: Int. Conf. on Green Computing, pp 135–146

  7. Kestor G, Gioiosa R, Kerbyson DJ, Hoisie A (2013) Quantifying the energy cost of data movement in scientific applications. In: IEEE Int. Symp. on Workload Characterization (IISWC), pp 56–65

  8. Liu Q, Moreto M, Jimenez V, Abella J, Cazorla F, Valero M (2013) Hardware support for accurate per-task energy metering in multicore systems. ACM Trans Archit Code Optim 10(4):34:1–34:27

    Article  Google Scholar 

  9. Van Zee FG, van de Geijn RA (2015) BLIS: a framework for generating BLAS-like libraries. ACM Trans Math Softw 41(3):14:1–14:33

    Article  MATH  Google Scholar 

  10. Intel Corp., Intel math kernel library (MKL) 11.0. http://software.intel.com/en-us/intel-mkl

  11. AMD (2012) AMD Core Math Library. http://developer.amd.com/tools/cpu/acml/pages/default.aspx

  12. Alonso P, Catalán S, Igual FD, Mayo R, Rodríguez-Sánchez R, Quintana-Ortí ES (2015) Time and energy modeling of high-performance level-3 BLAS on x86 architectures. Simul Model Pract Theory 55:77–94

    Article  Google Scholar 

  13. Catalán S, Igual FD, Mayo R, Rodríguez-Sánchez R, Quintana-Ortí ES (2015) Time and energy modeling of high-performance multi-threaded matrix multiplication. In: 15th International conference computational and mathematical methods in science and engineering—CMMSE 2015, vol 1, pp 311–316

  14. Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  15. David H, Gorbatov E, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE International symposium on low-power electronics and design (ISLPED), pp 189–194

  16. Kågström B, Ling P, van Loan C (1998) Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. ACM Trans Math Softw 24(3):268–302

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the CICYT projects TIN2014-53495-R and CICYT-TIN 2012-32180 of the MINECO and FEDER, the EU FET Project FP7 318793 “EXA2GREEN”, and the FPU program of MECD.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandra Catalán.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Catalán, S., Igual, F.D., Mayo, R. et al. Time and energy modeling of a high-performance multi-threaded Cholesky factorization. J Supercomput 73, 139–151 (2017). https://doi.org/10.1007/s11227-016-1654-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1654-6

Keywords

Navigation