Abstract
In this paper we present an extension of the pmlib framework for power-performance analysis that permits a rapid and automatic detection of power sinks during the execution of concurrent scientific workloads. The extension is shaped in the form of a multithreaded Python module that offers high reliability and flexibility, rendering an overall inspection process that introduces low overhead. Additionally, we investigate the advantages and drawbacks of the RAPL power model, introduced in the Intel Xeon “Sandy-Bridge” CPU, versus a data acquisition system from National Instruments.
Similar content being viewed by others
Notes
References
Albers S (2010) Energy-efficient algorithms. Commun ACM 53:86–96
Aliaga JI, Bollhöfer M, Martín AF, Quintana-Ortí ES (2011) Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput 37(3):183–202
Aliaga JI, Dolz MF, Martín AF, Mayo R, Quintana-Ortí ES (2012) Leveraging task-parallelism in energy-efficient ILU preconditioners. In: 2nd int con on ICT as key technology against global warming—ICT-GLOW. Lecture notes in computer science, vol 7453, pp 55–63
Alonso P, Badia RM, Labarta J, Barreda M, Dolz MF, Mayo R, Quintana-Ortí ES, Reyes R (2012) Tools for power-energy modelling and analysis of parallel scientific applications. In: 41st int conf on parallel processing—ICPP, pp 420–429
Alonso P, Dolz MF, Igual FD, Mayo R, Quintana-Ortí ES (2012) Reducing energy consumption of dense linear algebra operations on hybrid CPU-GPU platforms. In: Proc 10th IEEE int symp on parallel and distributed processing with applications—ISPA 2012, pp 56–62
Alonso P, Dolz MF, Igual FD, Quintana-Ortí ES, Mayo R (2013) Runtime scheduling of the LU factorization: performance and energy. In: Proc energy efficiency in large scale distributed systems conference—EE-LSDS 2013 (to appear)
Ashby S et al. (2010) The opportunities and challenges of Exascale computing. In: Summary report of the advanced scientific computing advisory committee (ASCAC) subcommittee. http://science.energy.gov/~/media/ascr/ascac/pdf/reports/Exascale_subcommittee_report.pdf
Barreda M, Barrachina S, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana ES (2013) A framework for power-performance analysis of parallel scientific applications. In: Third int conference on smart grids, green communications and IT energy-aware technologies—Energy 2013, pp 114–119
Bergman K et al. (2008) Exascale computing study: technology challenges in achieving exascale systems. In: DARPA IPTO Exascale computing study. http://computationalsciencesolutions.com/docs/DARPA
Castillo M, Fernández JC, Mayo R, Quintana-Ortí ES, Roca V (2012) Analysis of strategies to save energy for message-passing dense linear algebra kernels. In: Proc 20th euromicro conference on parallel, distributed and network based processing, pp 346–352
Dongarra J et al. (2011) The international Exascale software project roadmap. Int J High Perform Comput Appl 25(1):3–60
Duranton M et al. (2013) The HiPEAC vision for advanced computing in horizon 2020. http://www.hipeac.net/roadmap
El Mehdi Diouri M, Dolz MF, Glück O, Lefèvre L, Alonso P, Catalán S, Mayo R, Quintana-Ortí ES (2013) Solving some mysteries in power monitoring of servers: take care of your wattmeters! In: Proc energy efficiency in large scale distributed systems conference—EE-LSDS 2013 (to appear)
HP Corp, Intel Corp, Microsoft Corp, Phoenix Tech Ltd, Toshiba Corp (2011) Advanced configuration and power interface specification, revision 5.0
Intel Corp (2012) Intel 64 and IA-32 architectures software developer manual
Intel Corp (2012) Intel Xeon processor. http://www.intel.com/xeon
Intel: Intel math kernel library (mkl) 11.0. http://software.intel.com/en-us/intel-mkl
Knüpfer A, Brunst H et al. (2008) The vampir performance analysis tool-set. In: Tools for high performance computing, pp 139–155
Kunkel J (2011) HDTrace—a tracing and simulation environment of application and system interaction. Tech Rep 2, Department of Informatics, Scientific Computing. Universität Hamburg
Mienik M CPU burn-in v1.01. http://www.cpuburnin.com/
NVIDIA Corporation (2009) NVIDIA CUDA compute unified device architecture programming guide, 2.3.1 edn.
Official Website. Python Programming Language. http://www.python.org/
Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: a tool to visualize and analyze parallel code. In: 18th world OCCAM and transputer user group technical meeting
Quintana-Ortí G, Igual FD, Quintana-Ortí ES, van de Geijn RA (2009) Solving dense linear systems on platforms with multiple hardware accelerators. SIGPLAN Not 44(4):121–130. doi:10.1145/1594835.1504196
Quintana-Ortí G, Quintana-Ortí E, van de Geijn R, Zee FV, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:26
Saxe E (2010) Power-efficient software. In: ACM queue
Servat H, Llort G Extrae user guide manual for version 2.1.1
Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311
The Green500 list (2012). http://www.green500.org
Acknowledgements
This work was supported by the CICYT project TIN2011-23283 of the Ministerio de Economía y Competitividad and FEDER, and the EU Project FP7 318793 “EXA2GREEN”.
We thank Rosa M. Badia, from the Barcelona Supercomputing Center (BSC) and the Spanish National Research Council (CSIC), for providing several of the OMPSs applications used for the calibration of pmlib.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barreda, M., Catalán, S., Dolz, M.F. et al. Automatic detection of power bottlenecks in parallel scientific applications. Comput Sci Res Dev 29, 221–229 (2014). https://doi.org/10.1007/s00450-013-0242-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-013-0242-8