Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Aliaga, José I.; Barreda, María; Castaño, M. Asunción; Dolz, Manuel F.; Quintana-Ortí, Enrique S.

doi:10.1007/s11227-015-1600-z

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Published: 28 December 2015

Volume 73, pages 29–43, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

José I. Aliaga¹,
María Barreda ORCID: orcid.org/0000-0002-8526-1140¹,
M. Asunción Castaño¹,
Manuel F. Dolz² &
…
Enrique S. Quintana-Ortí¹

356 Accesses
1 Citation
Explore all metrics

Abstract

We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to adapt concurrency throttling and the voltage–frequency setting in order to obtain an energy-efficient execution of LAPACK’s routine dsytrd. Our strategies take into account the differences between the memory-bound and CPU-bound kernels that govern this routine, and whether problem data fits into the processor’s last level cache.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

Notes

The ondemand governor dynamically sets the frequency based on the current workload. When idle, the CPU remains in the lowest frequency. If the load surpasses a specified threshold (by default 95 %), the ondemand governor switches the CPU to the highest frequency. When the load falls below that threshold, the ondemand governor switches to the next lowest frequency, and continues till the lowest frequency is reached (if the load stays below the threshold). On the contrary, the performance governor maintains the CPU always at the highest frequency.

References

Aliaga JI, Barreda M, Dolz MF, Quintana-Ortí ES (2014) Are our dense linear algebra libraries energy-friendly? Comput Sci Res Dev 30:187–196
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D (1999) LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia
Book MATH Google Scholar
Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: A view from Berkeley. Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Electrical Engineering and Computer Sciences
Curtis-Maury M, Blagojevic F, Antonopoulos C, Nikolopoulos D (2008) Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Trans Parallel Distrib Syst 19(10):1396–1410
Article Google Scholar
Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08, pp 250–259. ACM, New York, NY, USA. doi:10.1145/1454115.1454151.http://doi.acm.org/10.1145/1454115.1454151
David H, Gorbatov E, Hanebutte UR, Khanna R, Le C (2010) RAPL: memory power estimation and capping. In: 2010 ACM/IEEE international symposium low-power electronics and design (ISLPED), pp 189–194
Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17
Article MATH Google Scholar
Elnozahy E, Kistler M, Rajamony R (2003) Energy-efficient server clusters. In: Power-aware computer systems second international workshop, PACS 2002, Lecture Notes in Computer Science (LNCS), vol 2325, pp 179–197. Springer, Berlin
Esmaeilzadeh H, Blem E, St Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of 38th annual international symposium on computer architecture, ISCA ’11, pp 365–376
Hackenberg D, Schöne R, Ilsche T, Molka D, Schuchart J, Geyer R (2015) An energy efficiency feature survey of the intel haswell processor. In: 2015 IEEE international parallel and distributed processing symposium workshop, IPDPS 2015, Hyderabad, India, May 25–29, 2015, pp 896–904
How to use cpufrequtils. http://www.thinkwiki.org/wiki/How_to_use_cpufrequtils
HP Corp., Intel Corp., Microsoft Corp., Phoenix Tech. Ltd., Toshiba Corp.: Advanced configuration and power interface specification, revision 5.0a (2013)
Li D, de Supinski B, Schulz M, Cameron K, Nikolopoulos D (2010) Hybrid MPI/OpenMP power-aware computing. In: 2010 IEEE international symposium on parallel distributed processing (IPDPS), pp 1–12. doi:10.1109/IPDPS.2010.5470463
Li D, de Supinski BR, Schulz M, Nikolopoulos DS, Cameron KW (2013) Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans Parallel Distrib Syst 24(1):144–157. http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.95
Lively C, Taylor V, Wu X, Chang HC, Su CY, Cameron K, Moore S, Terpstra D (2014) E-amom: an energy-aware modeling and optimization methodology for scientific applications. Comput Sci Res Dev 29(3–4):197–210. doi:10.1007/s00450-013-0239-3
Article Google Scholar
Mazouz A, Laurent A, Pradelle B, Jalby W (2014) Evaluation of CPU frequency transition latency. Comput Sci Res Dev 29(3–4):187–195
Article Google Scholar
Porterfield A, Olivier S, Bhalachandra S, Prins J (2013) Power measurement and concurrency throttling for energy reduction in OpenMP programs. In: 2013 IEEE 27th international parallel and distributed processing symposium workshops. Ph.D. Forum (IPDPSW), pp 884–891
Ryckbosch F, Polfliet S, Eeckhout L (2011) Trends in server energy proportionality. Computer 44(9):69–72
Article Google Scholar
Sasaki H, Ikeda Y, Kondo M, Nakamura H (2007) An intra-task DVFS technique based on statistical analysis of hardware events. In: Proceedings of the 4th international conference on computing frontiers, CF ’07, pp 123–130. ACM, New York, NY, USA. doi:10.1145/1242531.1242551. http://doi.acm.org/10.1145/1242531.1242551
Schöne R, Molka D (2014) Integrating performance analysis and energy efficiency optimizations in a unified environment. Comput Sci Res Dev 29(3–4):231–239. doi:10.1007/s00450-013-0243-7
Article Google Scholar

Download references

Acknowledgments

This work was supported by the CICYT Project TIN2011-23283 of MINECO and FEDER, the EU Project FP7 318793 “EXA2GREEN”, and the FPU program of the Ministerio de Educación, Cultura y Deporte.

Author information

Authors and Affiliations

Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaime I, 12071, Castellón, Spain
José I. Aliaga, María Barreda, M. Asunción Castaño & Enrique S. Quintana-Ortí
Dept. de Informática, Universidad Carlos III de Madrid, 28911, Leganés, Madrid, Spain
Manuel F. Dolz

Authors

José I. Aliaga
View author publications
You can also search for this author in PubMed Google Scholar
María Barreda
View author publications
You can also search for this author in PubMed Google Scholar
M. Asunción Castaño
View author publications
You can also search for this author in PubMed Google Scholar
Manuel F. Dolz
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to María Barreda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aliaga, J.I., Barreda, M., Castaño, M.A. et al. Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers. J Supercomput 73, 29–43 (2017). https://doi.org/10.1007/s11227-015-1600-z

Download citation

Published: 28 December 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11227-015-1600-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Abstract

Access this article

Similar content being viewed by others

Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Abstract

Access this article

Similar content being viewed by others

Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation