Skip to main content

Advertisement

Log in

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to adapt concurrency throttling and the voltage–frequency setting in order to obtain an energy-efficient execution of LAPACK’s routine dsytrd. Our strategies take into account the differences between the memory-bound and CPU-bound kernels that govern this routine, and whether problem data fits into the processor’s last level cache.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The ondemand governor dynamically sets the frequency based on the current workload. When idle, the CPU remains in the lowest frequency. If the load surpasses a specified threshold (by default 95 %), the ondemand governor switches the CPU to the highest frequency. When the load falls below that threshold, the ondemand governor switches to the next lowest frequency, and continues till the lowest frequency is reached (if the load stays below the threshold). On the contrary, the performance governor maintains the CPU always at the highest frequency.

References

  1. Aliaga JI, Barreda M, Dolz MF, Quintana-Ortí ES (2014) Are our dense linear algebra libraries energy-friendly? Comput Sci Res Dev 30:187–196

  2. Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D (1999) LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  3. Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: A view from Berkeley. Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Electrical Engineering and Computer Sciences

  4. Curtis-Maury M, Blagojevic F, Antonopoulos C, Nikolopoulos D (2008) Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Trans Parallel Distrib Syst 19(10):1396–1410

    Article  Google Scholar 

  5. Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08, pp 250–259. ACM, New York, NY, USA. doi:10.1145/1454115.1454151.http://doi.acm.org/10.1145/1454115.1454151

  6. David H, Gorbatov E, Hanebutte UR, Khanna R, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE international symposium low-power electronics and design (ISLPED), pp 189–194

  7. Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17

    Article  MATH  Google Scholar 

  8. Elnozahy E, Kistler M, Rajamony R (2003) Energy-efficient server clusters. In: Power-aware computer systems second international workshop, PACS 2002, Lecture Notes in Computer Science (LNCS), vol 2325, pp 179–197. Springer, Berlin

  9. Esmaeilzadeh H, Blem E, St Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of 38th annual international symposium on computer architecture, ISCA ’11, pp 365–376

  10. Hackenberg D, Schöne R, Ilsche T, Molka D, Schuchart J, Geyer R (2015) An energy efficiency feature survey of the intel haswell processor. In: 2015 IEEE international parallel and distributed processing symposium workshop, IPDPS 2015, Hyderabad, India, May 25–29, 2015, pp 896–904

  11. How to use cpufrequtils. http://www.thinkwiki.org/wiki/How_to_use_cpufrequtils

  12. HP Corp., Intel Corp., Microsoft Corp., Phoenix Tech. Ltd., Toshiba Corp.: Advanced configuration and power interface specification, revision 5.0a (2013)

  13. Li D, de Supinski B, Schulz M, Cameron K, Nikolopoulos D (2010) Hybrid MPI/OpenMP power-aware computing. In: 2010 IEEE international symposium on parallel distributed processing (IPDPS), pp 1–12. doi:10.1109/IPDPS.2010.5470463

  14. Li D, de Supinski BR, Schulz M, Nikolopoulos DS, Cameron KW (2013) Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans Parallel Distrib Syst 24(1):144–157. http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.95

  15. Lively C, Taylor V, Wu X, Chang HC, Su CY, Cameron K, Moore S, Terpstra D (2014) E-amom: an energy-aware modeling and optimization methodology for scientific applications. Comput Sci Res Dev 29(3–4):197–210. doi:10.1007/s00450-013-0239-3

    Article  Google Scholar 

  16. Mazouz A, Laurent A, Pradelle B, Jalby W (2014) Evaluation of CPU frequency transition latency. Comput Sci Res Dev 29(3–4):187–195

    Article  Google Scholar 

  17. Porterfield A, Olivier S, Bhalachandra S, Prins J (2013) Power measurement and concurrency throttling for energy reduction in OpenMP programs. In: 2013 IEEE 27th international parallel and distributed processing symposium workshops. Ph.D. Forum (IPDPSW), pp 884–891

  18. Ryckbosch F, Polfliet S, Eeckhout L (2011) Trends in server energy proportionality. Computer 44(9):69–72

    Article  Google Scholar 

  19. Sasaki H, Ikeda Y, Kondo M, Nakamura H (2007) An intra-task DVFS technique based on statistical analysis of hardware events. In: Proceedings of the 4th international conference on computing frontiers, CF ’07, pp 123–130. ACM, New York, NY, USA. doi:10.1145/1242531.1242551. http://doi.acm.org/10.1145/1242531.1242551

  20. Schöne R, Molka D (2014) Integrating performance analysis and energy efficiency optimizations in a unified environment. Comput Sci Res Dev 29(3–4):231–239. doi:10.1007/s00450-013-0243-7

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the CICYT Project TIN2011-23283 of MINECO and FEDER, the EU Project FP7 318793 “EXA2GREEN”, and the FPU program of the Ministerio de Educación, Cultura y Deporte.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to María Barreda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aliaga, J.I., Barreda, M., Castaño, M.A. et al. Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers. J Supercomput 73, 29–43 (2017). https://doi.org/10.1007/s11227-015-1600-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1600-z

Keywords

Navigation