Abstract
We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to adapt concurrency throttling and the voltage–frequency setting in order to obtain an energy-efficient execution of LAPACK’s routine dsytrd. Our strategies take into account the differences between the memory-bound and CPU-bound kernels that govern this routine, and whether problem data fits into the processor’s last level cache.
Similar content being viewed by others
Notes
The ondemand governor dynamically sets the frequency based on the current workload. When idle, the CPU remains in the lowest frequency. If the load surpasses a specified threshold (by default 95 %), the ondemand governor switches the CPU to the highest frequency. When the load falls below that threshold, the ondemand governor switches to the next lowest frequency, and continues till the lowest frequency is reached (if the load stays below the threshold). On the contrary, the performance governor maintains the CPU always at the highest frequency.
References
Aliaga JI, Barreda M, Dolz MF, Quintana-Ortí ES (2014) Are our dense linear algebra libraries energy-friendly? Comput Sci Res Dev 30:187–196
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D (1999) LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia
Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: A view from Berkeley. Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Electrical Engineering and Computer Sciences
Curtis-Maury M, Blagojevic F, Antonopoulos C, Nikolopoulos D (2008) Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Trans Parallel Distrib Syst 19(10):1396–1410
Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08, pp 250–259. ACM, New York, NY, USA. doi:10.1145/1454115.1454151.http://doi.acm.org/10.1145/1454115.1454151
David H, Gorbatov E, Hanebutte UR, Khanna R, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE international symposium low-power electronics and design (ISLPED), pp 189–194
Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17
Elnozahy E, Kistler M, Rajamony R (2003) Energy-efficient server clusters. In: Power-aware computer systems second international workshop, PACS 2002, Lecture Notes in Computer Science (LNCS), vol 2325, pp 179–197. Springer, Berlin
Esmaeilzadeh H, Blem E, St Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of 38th annual international symposium on computer architecture, ISCA ’11, pp 365–376
Hackenberg D, Schöne R, Ilsche T, Molka D, Schuchart J, Geyer R (2015) An energy efficiency feature survey of the intel haswell processor. In: 2015 IEEE international parallel and distributed processing symposium workshop, IPDPS 2015, Hyderabad, India, May 25–29, 2015, pp 896–904
How to use cpufrequtils. http://www.thinkwiki.org/wiki/How_to_use_cpufrequtils
HP Corp., Intel Corp., Microsoft Corp., Phoenix Tech. Ltd., Toshiba Corp.: Advanced configuration and power interface specification, revision 5.0a (2013)
Li D, de Supinski B, Schulz M, Cameron K, Nikolopoulos D (2010) Hybrid MPI/OpenMP power-aware computing. In: 2010 IEEE international symposium on parallel distributed processing (IPDPS), pp 1–12. doi:10.1109/IPDPS.2010.5470463
Li D, de Supinski BR, Schulz M, Nikolopoulos DS, Cameron KW (2013) Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans Parallel Distrib Syst 24(1):144–157. http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.95
Lively C, Taylor V, Wu X, Chang HC, Su CY, Cameron K, Moore S, Terpstra D (2014) E-amom: an energy-aware modeling and optimization methodology for scientific applications. Comput Sci Res Dev 29(3–4):197–210. doi:10.1007/s00450-013-0239-3
Mazouz A, Laurent A, Pradelle B, Jalby W (2014) Evaluation of CPU frequency transition latency. Comput Sci Res Dev 29(3–4):187–195
Porterfield A, Olivier S, Bhalachandra S, Prins J (2013) Power measurement and concurrency throttling for energy reduction in OpenMP programs. In: 2013 IEEE 27th international parallel and distributed processing symposium workshops. Ph.D. Forum (IPDPSW), pp 884–891
Ryckbosch F, Polfliet S, Eeckhout L (2011) Trends in server energy proportionality. Computer 44(9):69–72
Sasaki H, Ikeda Y, Kondo M, Nakamura H (2007) An intra-task DVFS technique based on statistical analysis of hardware events. In: Proceedings of the 4th international conference on computing frontiers, CF ’07, pp 123–130. ACM, New York, NY, USA. doi:10.1145/1242531.1242551. http://doi.acm.org/10.1145/1242531.1242551
Schöne R, Molka D (2014) Integrating performance analysis and energy efficiency optimizations in a unified environment. Comput Sci Res Dev 29(3–4):231–239. doi:10.1007/s00450-013-0243-7
Acknowledgments
This work was supported by the CICYT Project TIN2011-23283 of MINECO and FEDER, the EU Project FP7 318793 “EXA2GREEN”, and the FPU program of the Ministerio de Educación, Cultura y Deporte.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aliaga, J.I., Barreda, M., Castaño, M.A. et al. Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers. J Supercomput 73, 29–43 (2017). https://doi.org/10.1007/s11227-015-1600-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1600-z