Automatic detection of power bottlenecks in parallel scientific applications

Barreda, María; Catalán, Sandra; Dolz, Manuel F.; Mayo, Rafael; Quintana-Ortí, Enrique S.

doi:10.1007/s00450-013-0242-8

Automatic detection of power bottlenecks in parallel scientific applications

Special Issue Paper
Published: 25 July 2013

Volume 29, pages 221–229, (2014)
Cite this article

Computer Science - Research and Development

María Barreda¹,
Sandra Catalán¹,
Manuel F. Dolz²,
Rafael Mayo¹ &
…
Enrique S. Quintana-Ortí¹

381 Accesses
4 Citations
Explore all metrics

Abstract

In this paper we present an extension of the pmlib framework for power-performance analysis that permits a rapid and automatic detection of power sinks during the execution of concurrent scientific workloads. The extension is shaped in the form of a multithreaded Python module that offers high reliability and flexibility, rendering an overall inspection process that introduces low overhead. Additionally, we investigate the advantages and drawbacks of the RAPL power model, introduced in the Intel Xeon “Sandy-Bridge” CPU, versus a data acquisition system from National Instruments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated and dynamic abstraction of MPI application performance

Article Open access 19 August 2016

Performance Evaluation of Scientific Applications on POWER8

AntSM: Efficient Debugging for Shared Memory Parallel Programs

Notes

http://ilupack.tu-bs.de.

References

Albers S (2010) Energy-efficient algorithms. Commun ACM 53:86–96
Article Google Scholar
Aliaga JI, Bollhöfer M, Martín AF, Quintana-Ortí ES (2011) Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput 37(3):183–202
Article MATH MathSciNet Google Scholar
Aliaga JI, Dolz MF, Martín AF, Mayo R, Quintana-Ortí ES (2012) Leveraging task-parallelism in energy-efficient ILU preconditioners. In: 2nd int con on ICT as key technology against global warming—ICT-GLOW. Lecture notes in computer science, vol 7453, pp 55–63
Chapter Google Scholar
Alonso P, Badia RM, Labarta J, Barreda M, Dolz MF, Mayo R, Quintana-Ortí ES, Reyes R (2012) Tools for power-energy modelling and analysis of parallel scientific applications. In: 41st int conf on parallel processing—ICPP, pp 420–429
Google Scholar
Alonso P, Dolz MF, Igual FD, Mayo R, Quintana-Ortí ES (2012) Reducing energy consumption of dense linear algebra operations on hybrid CPU-GPU platforms. In: Proc 10th IEEE int symp on parallel and distributed processing with applications—ISPA 2012, pp 56–62
Chapter Google Scholar
Alonso P, Dolz MF, Igual FD, Quintana-Ortí ES, Mayo R (2013) Runtime scheduling of the LU factorization: performance and energy. In: Proc energy efficiency in large scale distributed systems conference—EE-LSDS 2013 (to appear)
Google Scholar
Ashby S et al. (2010) The opportunities and challenges of Exascale computing. In: Summary report of the advanced scientific computing advisory committee (ASCAC) subcommittee. http://science.energy.gov/~/media/ascr/ascac/pdf/reports/Exascale_subcommittee_report.pdf
Google Scholar
Barreda M, Barrachina S, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana ES (2013) A framework for power-performance analysis of parallel scientific applications. In: Third int conference on smart grids, green communications and IT energy-aware technologies—Energy 2013, pp 114–119
Google Scholar
Bergman K et al. (2008) Exascale computing study: technology challenges in achieving exascale systems. In: DARPA IPTO Exascale computing study. http://computationalsciencesolutions.com/docs/DARPA
Google Scholar
Castillo M, Fernández JC, Mayo R, Quintana-Ortí ES, Roca V (2012) Analysis of strategies to save energy for message-passing dense linear algebra kernels. In: Proc 20th euromicro conference on parallel, distributed and network based processing, pp 346–352
Google Scholar
Dongarra J et al. (2011) The international Exascale software project roadmap. Int J High Perform Comput Appl 25(1):3–60
Article Google Scholar
Duranton M et al. (2013) The HiPEAC vision for advanced computing in horizon 2020. http://www.hipeac.net/roadmap
Google Scholar
El Mehdi Diouri M, Dolz MF, Glück O, Lefèvre L, Alonso P, Catalán S, Mayo R, Quintana-Ortí ES (2013) Solving some mysteries in power monitoring of servers: take care of your wattmeters! In: Proc energy efficiency in large scale distributed systems conference—EE-LSDS 2013 (to appear)
Google Scholar
HP Corp, Intel Corp, Microsoft Corp, Phoenix Tech Ltd, Toshiba Corp (2011) Advanced configuration and power interface specification, revision 5.0
Google Scholar
Intel Corp (2012) Intel 64 and IA-32 architectures software developer manual
Google Scholar
Intel Corp (2012) Intel Xeon processor. http://www.intel.com/xeon
Google Scholar
Intel: Intel math kernel library (mkl) 11.0. http://software.intel.com/en-us/intel-mkl
Knüpfer A, Brunst H et al. (2008) The vampir performance analysis tool-set. In: Tools for high performance computing, pp 139–155
Chapter Google Scholar
Kunkel J (2011) HDTrace—a tracing and simulation environment of application and system interaction. Tech Rep 2, Department of Informatics, Scientific Computing. Universität Hamburg
Mienik M CPU burn-in v1.01. http://www.cpuburnin.com/
NVIDIA Corporation (2009) NVIDIA CUDA compute unified device architecture programming guide, 2.3.1 edn.
Google Scholar
Official Website. Python Programming Language. http://www.python.org/
Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: a tool to visualize and analyze parallel code. In: 18th world OCCAM and transputer user group technical meeting
Google Scholar
Quintana-Ortí G, Igual FD, Quintana-Ortí ES, van de Geijn RA (2009) Solving dense linear systems on platforms with multiple hardware accelerators. SIGPLAN Not 44(4):121–130. doi:10.1145/1594835.1504196
Article Google Scholar
Quintana-Ortí G, Quintana-Ortí E, van de Geijn R, Zee FV, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:26
Article Google Scholar
Saxe E (2010) Power-efficient software. In: ACM queue
Google Scholar
Servat H, Llort G Extrae user guide manual for version 2.1.1
Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311
Article Google Scholar
The Green500 list (2012). http://www.green500.org

Download references

Acknowledgements

This work was supported by the CICYT project TIN2011-23283 of the Ministerio de Economía y Competitividad and FEDER, and the EU Project FP7 318793 “EXA2GREEN”.

We thank Rosa M. Badia, from the Barcelona Supercomputing Center (BSC) and the Spanish National Research Council (CSIC), for providing several of the OMPSs applications used for the calibration of pmlib.

Author information

Authors and Affiliations

Depto. de Ingeniería y Ciencia de Computadores, Univ. Jaume I, 12071, Castellón, Spain
María Barreda, Sandra Catalán, Rafael Mayo & Enrique S. Quintana-Ortí
Department of Informatics, University of Hamburg, 22527, Hamburg, Germany
Manuel F. Dolz

Authors

María Barreda
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Catalán
View author publications
You can also search for this author in PubMed Google Scholar
Manuel F. Dolz
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Mayo
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to María Barreda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barreda, M., Catalán, S., Dolz, M.F. et al. Automatic detection of power bottlenecks in parallel scientific applications. Comput Sci Res Dev 29, 221–229 (2014). https://doi.org/10.1007/s00450-013-0242-8

Download citation

Published: 25 July 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00450-013-0242-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic detection of power bottlenecks in parallel scientific applications

Abstract

Access this article

Similar content being viewed by others

Automated and dynamic abstraction of MPI application performance

Performance Evaluation of Scientific Applications on POWER8

AntSM: Efficient Debugging for Shared Memory Parallel Programs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic detection of power bottlenecks in parallel scientific applications

Abstract

Access this article

Similar content being viewed by others

Automated and dynamic abstraction of MPI application performance

Performance Evaluation of Scientific Applications on POWER8

AntSM: Efficient Debugging for Shared Memory Parallel Programs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation